list of quality indicators - european commission 2011_deliverable_6.1.pdf · indicators, which draw...
TRANSCRIPT
SGA 2011: Deliverable 6.1
Version 1.0
ESSNET
USE OF ADMINISTRATIVE AND ACCOUNTS DATA
IN BUSINESS STATISTICS
WP6 Quality Indicators when using Administrative Data
in Statistical Outputs
List of Quality Indicators
May, 2012
ESSnet – Admin data: Quality Indicators 2
Contents
1. Executive summary…………………………………………………………….3
2. Introduction………………………………………………………………………5
3. Glossary………………………………………………………………………….9
4. List of basic quality indicators…………………………………………….......11
4.1. Background Information indicators…………………………………...11 4.2. Quality Indicators……………………………………………………….15
5. Examples of calculating the indicators……………………………………….23
ESSnet – Admin data: Quality Indicators 3
1. Executive Summary: A Guide to the Quality Indicators
What are the quality indicators?
The European Statistical System Network project on administrative data (ESSnet
AdminData) has developed a list of quantitative quality indicators, for use with business
statistics involving admin data. The indicators provide a measure of quality of the statistical
output, taking input and process into account. They are based on the ESS dimensions of
statistical output quality.
Who are they for?
The list of quality indicators has been developed primarily for producers of statistics, within
the ESS and more widely. The indicators can also be used for quality reporting, thus
benefiting users of the statistical outputs. They provide the user with an indication of the
quality of the output, and an awareness of how the admin data have been used in the
production of the output.
When can they be used?
The list of quality indicators is particularly useful for two broad situations:
1. When planning to start using admin data as a replacement for, or to supplement,
survey data. In this scenario, the indicators can be used to assess the feasibility of
moving to admin data, and the impact on output quality.
2. When admin data are already being used to produce statistical outputs. In this
scenario, the indicators can be used to gauge and report on the quality of the output,
and to monitor it over time. Certain indicators will be suitable to report to users, whilst
others will be most useful for the producers of the statistics only.
How should they be used?
There are 23 basic quality indicators in total, but a statistical producer need only use the
indicators relevant to their output. The table below shows which of the indicators relates to
which dimension or ‘theme’ of quality, which may be useful in identifying which indicators to
use. For more information about each of the themes, please refer to the ‘Introduction’
section.
ESSnet – Admin data: Quality Indicators 4
Indicators 1 to 8 are background indicators, which provide general information on the use of
administrative data in the statistical output in question but do not, directly, relate to the
quality of the statistical output.
Indicators 9 to 23 provide information directly addressing the quality of the statistical output.
Quality theme Indicators relevant to that theme
Accuracy 9 , 10, 11, 12, 13, 14, 15, 16, 17.
Timeliness and punctuality
4, 18.
Comparability 19.
Coherence 5, 6, 20, 21.
Cost and efficiency 7, 8, 22, 23.
Use of administrative data
1, 2, 3.
Further information
For more detailed information about the indicators and how to use them, please consult the
‘Introduction’ section.
ESSnet – Admin data: Quality Indicators 5
Quality Indicators when using Administrative Data in Statistical Outputs
2. Introduction
One of the aims of the European Statistical System Network project on administrative data (ESSnet AdminData) is the development of quality indicators for business statistics involving administrative data, with a particular focus on developing quantitative quality indicators.
Some work has already been done in the area of quality of business statistics involving administrative data and some indicators have been produced, namely under the preparation of the Quality Report Framework for Business Statistics under Regulation (CE) no. 295/2008. However, the work conducted thus far refers to qualitative indicators or is based more on a descriptive analysis of administrative data (see Eurostat, 2003). The quantitative indicators that have been produced have been more to do with the quality of the administrative sources (Daas, Ossen & Tennekes, 2010) or have been to develop a quality framework for the evaluation of administrative data (Ossen, Daas & Tennekes, 2011). These do not address the quality of the production of the statistical output however. In fact, almost no work has been done on quantitative indicators of business statistics involving administrative data, which is the main focus of this project (for further discussion on this topic see Frost, Green, Pereira, Rodrigues, Chumbau & Mendes, 2010).
The ESSnet aims to develop quality indicators of statistical outputs that involve administrative data. These indicators are for the use of members of the European Statistical System; producers of statistics. Therefore, the list contains indicators on input and process because these are critical to the work of the National Statistical Institutes and it is the input and process in particular that are different when using administrative data. Moreover, the list of indicators developed is specifically in relation to business statistics involving administrative data. Indicators (e.g. on accessibility) that do not differ for administrative vs. survey based statistics are not included in this work because they fall outside the remit of this section of the ESSnet AdminData project.
To address some issues of terminology, a few definitions are provided below to clarify how these terms are used in this document and throughout the ESSnet AdminData. Further information on terminology is included in the glossary in Section 3.
What is administrative data? Administrative data are data derived from an administrative source, before any processing or validation by the NSI.
What is an administrative source? A data holding containing information collected and maintained for the purpose of implementing one or more administrative regulations.
The list of quality indicators
A list of quantitative quality indicators has been developed on the basis of research which took stock of work being conducted in this field across Europe. This list was then user tested within five European National Statistical Institutions (NSIs), before testing across Europe. Feedback from user testing was used to improve the list of quality indicators.
ESSnet – Admin data: Quality Indicators 6
The current list of indicators has been grouped into two main areas:
o Background Information – these are ‘indicators’ in the loosest sense. They provide general information on the use of administrative data in the statistical output in question but do not, directly, relate to the quality of the statistical output. This information is often crucial in understanding better those indicators that measure quality more directly.
o Quality Indicators – these provide information directly addressing the quality of the statistical output.
The background information indicators and the quality indicators are further grouped by quality ‘theme’. These quality themes are based on the ESS dimensions of output quality, with some additional themes which relate specifically to administrative data. These themes also appear in the composite quality indicators that are being developed by WP6 (see ‘Future work’). The quality themes are:
Quality theme Description
Accuracy The closeness between an estimated result and the unknown true value.
Timeliness and punctuality The lapse of time between publication and the period to which the data refer, and the time lag between actual and planned publication dates.
Comparability The degree to which data can be compared over time and domain.
Coherence The degree to which data that are derived from different sources or methods, but which refer to the same phenomenon, are similar.
Cost and efficiency The cost of incorporating admin data into statistical systems, and the efficiency savings possible when using admin data in place of survey data.
Use of administrative data Background information relating to admin data inputs.
A short description of each indicator is included in the attached list along with a formula on how to calculate the indicator (if applicable), and example calculations (see Section 5).
The indicators have been developed so that a low indicator score denotes high quality, and a high indicator score denotes low quality.
This is consistent with the concept of error, where high errors signify low quality. The exceptions to this rule are the background indicators (1 to 8), where the score provides
ESSnet – Admin data: Quality Indicators 7
information rather than a quality ‘rating’; and indicators 20 and 23, where a high indicator score denotes high quality, and a low indicator score denotes low quality.
Using the list of quality indicators
Throughout the list, there are words in blue which take you to the definition of that particular word in the glossary. There are also links in the ‘How to calculate’ section, which take you to examples of how to calculate each indicator.
A framework for the basic quantitative quality indicator examples
The calculation of an indicator needs some preliminary steps. Some or all of these steps will be used for each example of the indicators to ensure consistency of the examples, and to aid understanding of the indicators themselves (see Section 5 for a list of examples).
A. Define the statistical output
B. Define the relevant units
C. Define the relevant variables
D. Adopt a schema for calculation
E. Declare the tolerance method for quantitative and qualitative variables
Links between this and other work on Quality
The work being carried out under this project should not be seen as independent of other work already in place. When analysing the list of indicators, one can conclude that some other information is useful in regard to the quality of administrative data. However, some of that very useful information cannot be (or has not been) translated into quantitative indicators. The main aim of the current project is not to discuss all the issues related to quality when using administrative data. The aim, at this stage, is to discuss basic quantitative quality indicators.
In addition, these indicators are for the benefit of the members of the European Statistical System (ESS); the producers of statistics. Consequently, the end result of the ESSnet AdminData work in this area should be integrated with the work already in place on the production of Eurostat Quality Reports.
Future work
In addition to the list of basic indicators, the ESSnet also aims to develop composite quality indicators, which draw together certain basic quality indicators into ‘themes’ in line with the ESS dimensions of output quality, to provide a more holistic view of the quality of a statistical output. In addition to this, investigative work is underway to develop quality guidance for
ESSnet – Admin data: Quality Indicators 8
situations where survey and administrative data are combined. Work is also planned to develop qualitative quality indicators to complement the list of basic quantitative quality indicators.
References Daas, P.J.H., Ossen, S.J.L. & Tennekes, M. (2010). Determination of administrative data quality: recent results and new developments. Paper and presentation for the European Conference on Quality in Official Statistics 2010. Helsinki, Finland.
Eurostat, (2003). Item 6: Quality assessment of administrative data for statistical purposes. Luxembourg, Working group on assessment of quality in statistics, Eurostat.
Frost, J.M., Green, S., Pereira, H., Rodrigues, S., Chumbau, A. & Mendes, J. (2010). Development of quality indicators for business statistics involving administrative data. Paper presented at the Q2010 European Conference on Quality in Official Statistics. Helsinki, Finland.
Ossen, S.J.L., Daas, P.J.H. & Tennekes, M. (2011). Overall Assessment of the Quality of Administrative Data Sources. Paper accompanying the poster at the 58th Session of the International Statistical Institute. Dublin, Ireland.
ESSnet – Admin data: Quality Indicators 9
3. Glossary1
Term Definition
1. administrative data Data derived from an administrative source, before any processing or validation by the NSI.
2. administrative source A data holding containing information collected and maintained for the purpose of implementing one or more administrative regulations.
3. common units Units that are included in more than one source.
4. consistent items Values for a variable in a specific unit that are the same across different sources, within a certain tolerance.
5. item A ‘value’ for a variable for a specific unit.
6. key variables Variables that are the most important and have the largest impact on the statistical output (e.g. turnover, number of employees, wages and salaries, etc.)
7. reference population The set of units about which information is wanted and estimates are required. This might be the entire Business Register (BR) or some part of the BR, e.g. manufacturing sector.
8. relevant units Businesses that are within the scope of the statistical output (e.g. units from the services sector should be excluded from manufacturing statistics).
9. relevant items ‘Values’ for units on relevant variables that should be included in calculating the statistical output.
10. required period The reporting period used within the statistical output.
1 Work Package 1 (WP1) of the ESSnet AdminData has developed an ‘Admin Data Glossary’. To access the glossary, please follow this link: http://essnet.admindata.eu/WorkPackage?objectId=4251
ESSnet – Admin data: Quality Indicators 10
11. required variables Variables necessary to calculate the statistical output.
12. statistical output A statistic produced by the NSI – whether based on a specific variable (e.g. no. of employees) or a set of related variables (e.g. total turnover; domestic market turnover; external market turnover). In the broadest sense, statistical output would also apply to the whole STS or SBS output.
13. unit Refers to statistical units – enterprise, legal unit, local unit, etc.
14. weighted A number of the quality indicators described in this document can be calculated in unweighted or weighted versions. Formulae are given for the unweighted versions of the indicators. Weighting can be beneficial as the weighted indicator will often better describe the quality of the statistical output. For example, the unweighted item non-response will inform users what proportion of valid units did not respond for a particular variable, whereas the weighted item non-response will estimate the proportion of the output variable affected by non-response. A non-response rate of 30% is of less concern if those 30% of units only cover 1% of the output variable. In practice, we do not know the values of the output variable for non-responders, so we use a related variable instead. Business register variables such as Turnover or Employment are often used as proxies.
The weighted indicators are calculated as follows:
ESSnet – Admin data: Quality Indicators 11
4. List of Basic Quality Indicators
4.1 Background Information
Use of administrative data:
Indicator Description How to calculate
1
Number of admin sources used
This indicator provides information on the number of
administrative sources used in each statistical output. The number of sources should include all those used in the statistical output whether the admin data are used as raw data, in imputation or to produce estimations.
Note. Where relevant, a list of the admin sources may also be helpful for users, along with a list of the variables included in each source. Alternatively, the number of admin sources used can be specified by variable.
Examples of indicator
2
% of items obtained exclusively from admin
data
This indicator provides information on the proportion of items only obtained from admin data, whether directly or indirectly, and where survey data are not collected. This includes where admin data are used as raw data, as proxy data, in calculations, etc. This indicator should be calculated on the basis of the statistical output – the number of items obtained exclusively from admin data (not by survey) should be considered.
%100 items of no. Total
dataadmin fromy exclusivel obtained items of No.
This indicator could also be weighted in terms of whether or not the variables are key to the statistical output. Examples of indicator
ESSnet – Admin data: Quality Indicators 12
Indicator Description How to calculate
3
% of required variables derived from admin data that are used as a proxy
This indicator provides information on the extent that admin data are used in the statistical output as a proxy or are used in calculations rather than as raw data. This indicator should be calculated on the basis of the statistical output – the number of required variables derived indirectly from admin data (because not available directly from admin or survey data) should be considered.
%100 variablesrequired of No.
proxy a as used dataadmin from derived variablesrequired of No.
Note. If a combination of survey and admin data is used, this indicator would need to be weighted (by number of units). If double collection is necessary (e.g. to check quality of admin data), some explanation should be provided. This indicator could also be weighted in terms of whether or not the variables are key to the statistical output. Examples of indicator
Timeliness and punctuality:
Indicator Description How to calculate
4
Periodicity (frequency of arrival of the admin data)
This indicator provides information about how often the admin data are received by the NSI. This indicator should be provided for each admin source.
Note. If data are provided via continuous feed from the admin source, this should be stated in answer to this indicator. Only data you receive for statistical purposes should be considered. Examples of indicator
ESSnet – Admin data: Quality Indicators 13
Coherence:
Indicator Description How to calculate
5
% of common units across two or more admin sources
This indicator relates to the combination of one or more admin sources. This indicator provides information on the proportion of common units across two or more admin sources. Only units relevant to the statistical output should be considered. This indicator should be calculated pairwise for each pair of admin sources and then averaged. If only one admin source is available, this indicator is not relevant.
%100 units uniquerelevant of No.
sourcesadmin in the unitscommon relevant of No.
Note. The “unique units” in the denominator means that units should only be counted once, even if they appear in multiple sources. This indicator should be calculated separately for each variable. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution of these units to the statistical output. Examples of indicator
6
% of common units when combining admin and
survey data
This indicator relates to the combination of admin and survey data. This indicator provides information on the proportion of common units across admin and survey data. Linking errors should be detected and resolved before this indicator is calculated. This indicator should be calculated for each admin source and then aggregated based on the number of common units (weighted by turnover) in each source.
%100survey in units of No.
datasurvey andadmin in unitscommon of No.
Note. If there are few common units due to the design of the statistical output (e.g. a combination of survey and admin data), this should be explained. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution of these units to the statistical output. Examples of indicator
ESSnet – Admin data: Quality Indicators 14
Cost and efficiency:
Indicator Description How to calculate
7
% of items obtained from admin source and also
collected by survey
This indicator relates to the combination of admin and survey data. This indicator provides information on the double collection of data, both admin source and surveys. Thus, it provides an idea of redundancy as the same data items are being obtained more than once. This indicator should be calculated for each admin source and then aggregated.
%100survey in itemsrelevant of No.
datasurvey andadmin by obtained itemscommon relevant of No.
Note. Double collection is sometimes conducted for specific reasons, e.g. to measure quality. If this is the case, this should be explained. Only admin data which meet the definitions and timeliness requirements of the output should be included. Examples of indicator
8
% reduction of survey sample size when moving from survey to admin data
This indicator relates to the combination of admin and survey data. This indicator provides information on the reduction in survey sample size because of an increased use of admin data. Only changes to the sample size due to using admin data should be included in this calculation. The indicator should be calculated for each survey and then aggregated (if applicable).
%100dataadmin of usein increase before size Sample
after size sample - dataadmin of usein increase before size Sample
Note. This indicator is likely to be calculated once, when
making the change from survey to admin data.
Examples of indicator
ESSnet – Admin data: Quality Indicators 15
4.2. Quality indicators Accuracy:
Indicator Description How to calculate
9
Item non-response (% of units with missing values for key variables)
Although there are technically no ‘responses’ when using admin data, non-response (missing values at item or unit level) is an issue in the same way as with survey data. This indicator provides information on the extent of missing values for the key variables. This indicator should be calculated for each of the key variables and for each admin source and then aggregated based on the contributions of the variables to the overall output.
%100 variableXfor relevant units of No.
variableXfor valuemissing with dataadmin in the unitsrelevant of No.
This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output Examples of indicator
10 Misclassification rate
This indicator provides information on the proportion of units in the admin data which are incorrectly coded. For simplicity and clarity, activity coding as recorded on the Business Register (BR) is considered to be correct. The level of coding used should be at a level consistent with the level used in the statistical output (e.g. if the statistical output is produced at the 3-digit level, then the accuracy of the coding should be measured at this level). This indicator should be calculated for each admin source and then aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 dataadmin in unitsrelevant of No.
BR tocode NACEdifferent with dataadmin in unitsrelevant of No.
Note. If the activity code from the admin data is not used by the NSI (e.g. if coding from BR is used), this indicator is not relevant.
If a survey is conducted to check the rate of misclassification, the rate from this survey should be provided and a note added to the indicator. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Examples of indicator
ESSnet – Admin data: Quality Indicators 16
Indicator Description How to calculate
11 Undercoverage
This indicator provides information on the undercoverage of the admin data. That is, units in the reference population that should be included in the admin data but are not (for whatever reason). This indicator should be calculated for each admin source and then aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 population referencein unitsrelevant of No.
dataadmin in NOTbut population referencein unitsrelevant of No.
Note. This could be calculated for each relevant publication of the statistical output, e.g. first and final publication. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Examples of indicator
12 Overcoverage
This indicator provides information on the overcoverage of the admin data. That is, units that are included in the admin data but should not be (e.g. are out-of-scope, outside the reference population). This indicator should be calculated for each admin source and then aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 population referencein units of No.
population referencein NOTbut dataadmin in units of No.
This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Examples of indicator
13
% of units in the admin source for which reference
period differs from the required reference period
This indicator provides information on the proportion of units that provide data for a different reporting period than the required period for the statistical output. If the periods are not those required, then some imputation is necessary, which may impact quality. This indicator should be calculated for each admin source and then aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 dataAdmin in unitsrelevant of No.
period required from period reporting
different with dataAdmin in unitsrelevant of No.
This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output.
Examples of indicator
ESSnet – Admin data: Quality Indicators 17
Indicator Description How to calculate
14
Size of revisions from the different versions of the
admin data RAR – Relative Absolute
Revisions
This indicator assesses the size of revisions from different versions of the admin data, providing information on the reliability of the data received. With this indicator it is possible to understand the impact of the different versions of admin data on the results for a certain reference period. When data is revised based on other information (e.g. survey data) this should not be included in this indicator. The indicator should be calculated for each admin source and then aggregated. If only one version of the admin data is received, this indicator is not relevant.
%100
1
1
T
t Pt
T
t PtLt
X
XX
= Latest data for X variable
= First data for X variable Note. This indicator should only be calculated for estimates based on the same units (not including any additional units added in a later draft). This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output.
Examples of indicator
15
% of units in admin data which fail checks
This indicator provides information on the extent to which data fail some elements of the checks (automatic or manual) and are flagged by the NSI as suspect. This does not mean that the data are necessarily adjusted (see Indicator 16), simply that they fail one or more check(s). This checking can either be based on a model, checking against other data sources (admin or survey), internet research or through direct contact with the businesses. This indicator should be calculated for each of the key variables and aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 checked unitsrelevant of no. Total
failed and checked dataadmin in unitsrelevant of No.
Note. If the validation is done automatically and the system does not flag or record this in some way, this should be noted. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Users should state the number of checks done, and the proportion of data covered by these checks. Examples of indicator
LtX
PtX
ESSnet – Admin data: Quality Indicators 18
Indicator Description How to calculate
16
% of units for which data have been adjusted
This indicator provides information about the proportion of units for which the data have been adjusted (a subset of the units included in Indicator 15). These are units that are considered to be erroneous and are therefore adjusted in some way (missing data should not be included in this indicator – see Indicator 9). Any changes to the admin data before arrival with the NSI should not be considered in this indicator. This indicator should be calculated for each of the key variables and aggregated based on the number of relevant units (weighted by turnover) in each source.
%100 DataAdmin in unitsrelevant of No.
data adjusted with dataAdmin in the unitsrelevant of No.
This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Examples of indicator
17
% of imputed values (items) in the admin data
This indicator provides information on the impact of the values imputed by the NSI. These values are imputed because data are missing (see Indicator 9) or data items are unreliable (see Indicator 16). This indicator should be calculated by variable for each admin source and then aggregated based on the contributions of the variables to the overall output.
%100 dataadmin in itemsrelevant of No.
dataadmin relevant in the items imputed of No.
This indicator should be weighted (e.g. by turnover or number of employees) in terms of the % contribution of the imputed values to the statistical output. Examples of indicator
ESSnet – Admin data: Quality Indicators 19
Timeliness and punctuality:
Indicator Description How to calculate
18
Delay to accessing / receiving data from Admin
Source
This indicator provides information on the proportion of the time from the end of the reference period to the publication date that is taken up waiting to receive the admin data. This is calculated as a proportion of the overall time between reference period and publication date to provide comparability across statistical outputs. This indicator should be calculated for each admin source and then aggregated.
%100daten publicatio toperiod reference of end thefrom Time
dataAdmin receiving toperiod reference of end thefrom Time
Note. Include only the final dataset used for the statistical output. If a continuous feed of data is received, the ‘last’ dataset used to calculate the statistical output should be used in this indicator. If more than one source is used, an average should be calculated, weighted by the sources’ contributions to the final estimate. If the admin data are received before the end of the reference period, this indicator would be 0. This indicator applies to the first publication only, not to revisions.
Examples of indicator
ESSnet – Admin data: Quality Indicators 20
Comparability:
Indicator Description How to calculate
19
Discontinuity in estimate when moving from a survey-based output to an admin-
based output
This indicator measures the impact on the level of the estimate when changing from a survey-based output to an admin-based output. This indicator is likely to be calculated once, when making the change from survey to admin data. This indicator should be calculated separately for each key estimate included in the output.
Note. This indicator should be calculated using survey and admin data which refer to the same period. Examples of indicator
Coherence:
Indicator Description How to calculate
20
% of consistent items for common variables in more
than one source2
This indicator provides information on consistent items for any common variables across sources (either admin or survey). Only variables directly required for the statistical output should be considered – basic information (e.g. business name and address) should be excluded. Values within a tolerance should be considered consistent – the width of this tolerance (1%, 5%, 10%, etc.) would depend on the variables and methods used in calculating the statistical output. This indicator should be calculated for each of the key variables and aggregated based on the contributions of the variables to the overall output.
%100 variableXfor required items of no. Total
variableXfor lerance)(within to items consistent of No.
Note. If only one source is available or there are no common variables, this indicator is not relevant. Please state the tolerance used. This indicator could also be weighted (e.g. by turnover or number of employees) in terms of the % contribution to the output. Examples of indicator
2Indicators 20 and 23 are the only indicators in Section 4.2 for which a high indicator score denotes high quality and a low indicator score denotes low quality.
ESSnet – Admin data: Quality Indicators 21
oSS
S
U+U
U
21
% of relevant units in admin data which have to be
adjusted to create statistical units
This indicator provides information on the proportion of units that have to be adjusted in order to create statistical units. For example, the proportion of data at enterprise group level which therefore need to be split to provide reporting unit data.
Relevant units in the reference population that are adjusted to the statistical concepts by the use of statistical methods
Relevant units in the reference population that correspond to the statistical concepts
This indicator should be weighted (e.g. by turnover or number of employees) in terms of the % contribution of these units to the statistical output. Examples of indicator
oSU
S U
ESSnet – Admin data: Quality Indicators 22
Cost and efficiency:
Indicator Description How to calculate
22
Cost of converting admin data to statistical data
This indicator provides information on the estimated cost (in person hours) of converting admin data to statistical data. The indicator should be calculated for each admin source and then aggregated based on the contribution of the admin source to the statistical output.
(Estimated) Cost of conversion in person hours
Note. This should only be calculated for parts of the admin data relevant to the statistical output. Examples of indicator
23
Efficiency gain in using admin data3
This indicator provides information on the efficiency gain in using admin data rather than simply using survey data. For example, collecting admin data is usually cheaper than collecting data through a survey but this benefit might be offset by higher processing costs. Production cost should include all costs the NSI is able to attribute to the production of the statistical output.
%100statistic basedsurvey ofcost Production
statistic basedadmin ofcost production - statistic basedsurvey ofcost Production
Note. Estimated costs are acceptable.
This indicator is likely to be calculated once, when making the change from survey to admin data. Examples of indicator
3 Indicators 20 and 23 are the only indicators in Section 4.2 for which a high indicator score denotes high quality and a low indicator score denotes low quality.
ESSnet – Admin data: Quality Indicators 23
5. Examples of calculating the quality indicators
A framework for the basic quantitative quality indicator examples
The calculation of an indicator needs some preliminary steps. Some or all of these steps will be used for each example of the indicators to ensure consistency of the examples, and to aid understanding of the indicators themselves.
A. Define the statistical output
B. Define the relevant units
C. Define the relevant variables
D. Adopt a schema for calculation
E. Declare the tolerance method for quantitative and qualitative variables
QI n. 1 – Number of administrative sources
Back to indicator 1
Example 1
A. Statistical output: The BR Enterprise units updating/identification
B. Relevant units: 10+ employees enterprises (relevant for a specific survey or as
base for the HG firms)
D. Steps for calculation: Identify the relevant admin sources.
Let S1 be the Fiscal Register source
Let S2 be the Chamber of Commerce source
Let S3 be the Social Security source
Let S4 be the Yellow Pages source
I(1) = 4 sources.
Example 2
A. Statistical output: The BR Local units updating/identification
B. Relevant units: The local units of enterprises with more than one local unit
D. Steps for calculation: Identify the relevant admin sources.
Let S1 be the Chamber of Commerce source
Let S2 be the Social Security source
I’(1) = 2 sources.
ESSnet – Admin data: Quality Indicators 24
QI n. 2 – Percentage of items obtained exclusively from admin data
Back to indicator 2
Example 1
A. Statistical output: The BR Enterprise units updating/identification
B. Relevant units: Enterprises with 10 or less employees (relevant for a specific
survey)
C. Relevant variables: Date of commencement of activities; Date of final cessation of
activities; Principal activity code at NACE 4-digit level; Number of
persons employed; Number of employees; Turnover.
D. Steps for calculation:
D1. From BR take the population of units with 10 or less employees;
D2. For each relevant variable, calculate the proportion of items for which the variable is
obtained exclusively from admin data (items with non missing variable);
D3. Divide the sum of numbers of items for which the variables are obtained exclusively
from admin data by the sum of numbers of items for which the variable is not missing
D4. Calculate the indicator as follows:
Let INIT be the date of commencement of activities;
Let END be the date of final cessation of activities;
Let NACE be the Principal Activity Code;
Let PER be the number of persons employed;
Let EMP be the number of employees;
Let TUR be the Turnover, obtained as Proxy and included in this indicator;
Variables
(1)
Number of items
for which the
variable is not
missing in the
relevant items
(2)
Number of
employees of
(1)
(3)
Number of items for
which the variable is
obtained exclusively
from admin data
(4)
Number of
employees of
(3)
(5)=[(3)/(1)]*100
Proportion of items for
which variables are
obtained exclusively
from admin data
(6)=[(4)/(2)]*100
Proportion of items for
which variables are
obtained exclusively from
admin data weighted by
employees
INIT 4,360,685 3,501,511 4,349,379 3,485,361 99.7 99.5
END 232,594 89,359 232,015 88,633 99.8 99.2
NACE 4,360,685 3,501,511 4,331,316 3,460,597 99.3 98.8
PER 4,360,685 3,501,511 4,281,650 3,294,864 98.2 94.1
EMP 1,405,754 3,501,511 1,329,071 3,299,878 94.5 94.2
TUR 4,282,711 3,367,764 4,282,711 3,367,764 100.0 100.0
Total 19,003,114 17,463,167 18,806,142 16,997,097 99.0 97.3
99.0% 100*114,003,19
142,806,18100*
#
min# I(2)
itemsofTotal
dataadfromyexclusivelobtaineditems
ESSnet – Admin data: Quality Indicators 25
QI n. 3 – % of required variables derived from admin data that are used as a proxy
Back to indicator 3
Example 1
A. Statistical output: The BR Enterprise units
B. Relevant unit: Enterprise with turnover
C. Let the list of relevant variables4 be as follows:
1) Date of commencement of activities;
2) Date of final cessation of activities;
3) Principal activity code at NACE 4-digit level;
4) Number of persons employed;
5) Number of employees;
6) Turnover;
7) Identification number of the resident/truncated enterprise group, to which the enterprise
belongs
The relevant variable obtained from the Fiscal source is the VAT turnover, proxy of the Turnover
D: Steps for calculation:
D1: Number of required variables derived from admin data
D2: Number of variables of D1 used as a proxy (i.e. the variable is derived indirectly from
admin data) (num)
D3: Number of required variables by the BR Regulation (denom)
Formula:
I(3)= (Num/Den)*100=(1/7)*100=14%
QI n. 4 – Periodicity (frequency of arrival of admin data)
Back to indicator 4
Example 1
A. Statistical output: The BR Enterprise units
4 The variables are required by the Regulation (EC) No 177/2008 of the European Parliament and of the Council, but each country will decide which of the required variables will be relevant for this indicator.
ESSnet – Admin data: Quality Indicators 26
D: Steps for calculation: Record periodicity for each source
Let S1 be the Fiscal Register source
Let S2 be the Chamber of Commerce source
Let S3 be the Social Security source
Let S4 be the Yellow Pages source
IS1(4) = 1; IS2(4) = 2; IS3(4) = 2; IS4(4) = 1
Example 2
A. Statistical output: OROS Survey (Employment, earnings and social security
contributions) based on the Social Security administrative data.
B. Relevant units: small enterprises with employees
D: Steps for calculation: Record periodicity for each source
Let S1 be the Fiscal Register source
Let S2 be the Social Security source
I’S1(4) = 4; I’S2(4) = 4
QI n. 5 – % of common units across two or more administrative sources
Back to indicator 5
Example 1
A. Statistical output: The BR Enterprise units updating/identification
B. Relevant units: The NACE sector = Construction
Yellow Pages data 1
Chamber of Commerce data 2
Social Security data 2
Fiscal Register data 1
Frequency of arrival of the admin data (respect to BR
reference year) - Per yearType of admin data
Fiscal Register data 4
Social Security data 4
Type of admin data
Frequency of arrival of the admin data -Per
year
ESSnet – Admin data: Quality Indicators 27
D. Steps for calculation:
D1. Identify the statistical unit (enterprise) for each source (i.e. group the administrative
records in one source at id code level)
D2. Match all sources each other by id code
D3. Attribute a “presence(1) / absence(0)” indicator to the unit with regard to the specific
source
D4. Calculate the number of possible pairings between sources (i.e. when there are n
sources, it is the combination of n sources taken k=2 at a time), Cn,k= n!/(n-k)!* k!
Let’s suppose 4 sources, the possible combinations will be: C4,2 =24/4=6
D5. Multiply the “presence(1) / absence(0)” indicator to obtain the presence (1) /absence
(0) indicator for each pairwise
D6. Sum up the “presence(1) / absence(0)” indicator at pair level and divide by
Cn,k*#relevant units
Let A be the Social Security source
Let B be the Chamber of Commerce source
Let C be the Yellow Pages source
Let D be the Fiscal Register source
Nace Sector = Construction
Presence(1)/Absence(0) of the unit in the source
Ind(XiA)=1 if Xi is present in the source A; Ind(XiA)=0 if Xi is absent in the source A
Num=∑ijind(Xij)=34
Denom=m*Cn,k=10*6=60
I(5)=(Num/Denom)*100=(34/60)*100=57%
UNIT A B C D AB AC AD BC BD CD Sum
X1 0 0 1 1 0 0 0 0 0 1 1
X2 0 1 0 1 0 0 0 0 1 0 1
X3 0 1 1 1 0 0 0 1 1 1 3
X4 1 1 1 1 1 1 1 1 1 1 6
X5 0 1 0 1 0 0 0 0 1 0 1
X6 1 1 1 1 1 1 1 1 1 1 6
X7 1 1 1 1 1 1 1 1 1 1 6
X8 0 1 0 1 0 0 0 0 1 0 1
X9 1 1 0 1 1 0 1 0 1 0 3
X10 1 1 1 1 1 1 1 1 1 1 6
Sum 5 4 5 5 9 6 34
ESSnet – Admin data: Quality Indicators 28
The following picture illustrates the meaning of the result:
And, weighting by Turnover
Num=∑ijind(Xij)*wi=43,722,364
Den= Cn,k*∑wi=6*7,514,804=45,088,824
I(5)=(num/den)*100=97%
QI n. 6 – % of common units when combining admin and survey data
Back to indicator 6
Example 1
A. Statistical output: A sectoral survey
B. Relevant units: The units in the survey(s)
UNIT
(2)
Turnover AB*(2) AC*(2) AD*(2) BC*(2) BD*(2) CD*(2) Sum
X1 15,020 0 0 0 0 0 15,020 15,020
X2 28,340 0 0 0 0 28,340 0 28,340
X3 57,812 0 0 0 57,812 57,812 57,812 173,436
X4 1,167,584 1,167,584 1,167,584 1,167,584 1,167,584 1,167,584 1,167,584 7,005,504
X5 21,333 0 0 0 0 21,333 0 21,333
X6 5,767,853 5,767,853 5,767,853 5,767,853 5,767,853 5,767,853 5,767,853 34,607,118
X7 153,000 153,000 153,000 153,000 153,000 153,000 153,000 918,000
X8 63,021 0 0 0 0 63,021 0 63,021
X9 184,818 184,818 0 184,818 0 184,818 0 554,454
X10 56,023 56,023 56,023 56,023 56,023 56,023 56,023 336,138
Sum 7,514,804 7,329,278 7,144,460 7,329,278 7,202,272 7,499,784 7,217,292 43,722,364
ESSnet – Admin data: Quality Indicators 29
C. Relevant variables: NACE activity code
D. Steps for calculation:
D1. Match each source with survey(s) by the common id code
D2. Attribute a “presence(1) / absence(0)” indicator to the unit if it belongs at least to a
survey (sum up for obtaining denominator)
D3. Attribute a “presence(1) / absence(0)” indicator to the unit if it belongs both to the
survey and to each source (sum up by source for obtaining numerator)
D4. Calculate the aggregate indicator as follows:
Let A be the Chamber of commerce Source
Let B be the Social security source:
100*#
#
#
#
#
#)6(
Surveyunits
SurveyBAunits
Surveyunits
SurveyBunits
Surveyunits
SurveyAunitsI
And, if we have three sources:
Let C be the Yellow pages
# # ## ( ) # ( )
# # # ( ) # # ( )(6)
# ( ) # (
# ( )
units A Survey units B Survey units A B Surveyunits C Survey units A C Survey
units Survey units Survey units Survey units Survey units SurveyI
units B C Survey units A B C Su
units Survey
*100)
# ( )
rvey
units Survey
and so on
I(6)=(Σind_survey&A+Ind_Survey&B+Ind_Survey&C)/Σind_survey-(Common
A&B&Survey+Common A&C&Survey+Common B&C&Survey)/Σind_survey+Common
A&B&C&Survey /Σind_survey =(3+2+2)/5-(1+1+1)/5+0/5)*100=4/5*100=80%
Source A Source B Source C Survey 1 Survey 2
(0)
Ind_Survey
Ind_survey
∩ A
(1)
Ind_survey
∩ B
(2)
Ind_survey
∩ C
(3)
Common
A∩B∩Survey
(4)
Common
A∩C∩Survey
(5)
Common
B∩C∩Survey
(6)
Common
A∩B∩C∩Survey
(7) Turnover
X1 1 0 1 1 1 1 1 0 1 0 1 0 0 35,147
X2 1 1 0 0 0 0 0 0 0 0 0 0 0 1,507,231
X3 1 0 0 1 0 1 1 0 0 0 0 0 0 627,432
X4 1 1 1 0 0 0 0 0 0 0 0 0 0 18,150
X5 1 1 0 0 1 1 1 1 0 1 0 0 0 57,442
X6 0 1 1 1 1 1 0 1 1 0 0 1 0 159,630
X7 1 0 0 0 0 0 0 0 0 0 0 0 0 68,000
X8 0 1 0 0 0 0 0 0 0 0 0 0 0 34,123
X9 1 0 1 0 0 0 0 0 0 0 0 0 0 22,365
X10 0 0 0 1 1 1 0 0 0 0 0 0 0 18,130
X11 1 0 0 0 0 0 0 0 0 0 0 0 0 59,458
X12 1 0 0 0 0 0 0 0 0 0 0 0 0 39,658
Sum 5 3 2 2 1 1 1 0 2,646,766
ESSnet – Admin data: Quality Indicators 30
And, weighting by turnover:
I(6)=[(35,147+627,432+57,442+57,442+159,630+35,147+159,630)/(35,147+627,432+57,442+159,
630+18,130)-(57,442+35,147+159,630)/(35,147+627,432+57,442+159,630+18,130)]*100=98%
QI n. 7 – % of items obtained from admin source and also collected by survey
Back to indicator 7
Example 1
A. Statistical output: A survey on the commerce sector
B. Relevant units: Units in the survey
C. Relevant variables: Economic activity code (NACE) (var1) and legal status (var2)
D. Steps for calculation:
D1. Match each source with survey(s) by the common id code
D2. Attribute a “presence(1) / absence(0)” indicator to items of var1 and var2 in survey
(sum up for obtaining denominator)
D3. Attribute a value=1(0) for common (not) item in survey and in the source (sum up for
obtaining numerator)
D4. Calculate the indicator as follows:
Let CC be the Chamber of Commerce source
Let SBS and GI be two Surveys
Let Var1 be the ATECO (5-Digits italian version of NACE)
Let Var2 be the Legal Status.
100*)(#
min#)7(
ssurveyinitemsrelevantof
datasurveyandadbyobtaineditemscommonrelevantofI
ESSnet – Admin data: Quality Indicators 31
QI n.8 – % reduction of survey sample size when moving from survey to admin data
Back to indicator 8
A. Statistical output: A sectoral survey
B. Relevant units: Enterprises with commercial area greater than 400 m2
D. Steps for calculation:
D1. Identify sample size before use of admin data
D2. Identify sample size after use of admin data
D3. Calculate the indicator as follows:
X1 15 46520 46510 46520 1 1 1320 1320 1 1
X2 10 10840 0 0 1220 0 0
X3 150 47112 47111 1 1 1310 1310 1 1
X4 0 68200 47112 1 1 1310 0 0
X5 28 47113 0 0 1320 0 0
X6 237 47112 47112 1 1 1320 1330 1 1
X7 58 10120 46321 47112 1 1 1320 missing 1 0
X8 76 47112 47111 1 1 1320 1320 1 1
X9 199 47111 47111 47111 1 1 missing 1310 0 0
X10 15 46411 0 0 1330 0 0
X11 0 47114 1 0 0 0
X12 11 47112 0 0 1310 0 0
Sum 799 8 7 5 4
CC-Legal
status
SBS-
ATECO
SBS-Legal
status
Presence/Absence
(1/0) of items in
source and survey
CC-
ATECO
Presence/Absence
(1/0) of items in
source and survey
Units
Number of
employeesVariable 1:ATECO Variable 2: Legal status
GI-Ateco
# items in
surveys-
ATECO
# items in
surveys-
Legal status
Sample size before increase in use of admin data 1482
Sample size after increase in use of admin data 950
I7(ATECO) = 7/8 * 100% = 87%
I7w(ATECO) = (15 + 150 + 0 + 237 + 58 + 76 + 199)/( 15 + 150 + 0 + 237 + 58 + 76 +
199 + 0)*100% = 100%
I7(Legal status) = 4/5 * 100% = 80%
I7w(Legal status) = (15 + 150 + 237 + 76)/(15 + 150 + 237 + 58 + 76) * 100% = 89%
%9.35100*1482
9501482100*
dataadmin of usein increase before size Sample
after size Sample - dataadmin of usein increase before size Sample)8(
I
ESSnet – Admin data: Quality Indicators 32
QI n. 9 – Item non-response (% of units with missing values for key variables)
Back to indicator 9
Example 1
A. Statistical output: BR for SBS
B. Relevant units: Units with 100+ employees
C. Relevant variables: number of employees
D. Steps for calculation:
D1. from BR take the population of units with 100+ employees
D2. Match source A with BR100+ by the common id code
D3. Calculate number of common units in A with missing value
D4. Calculate the indicator as follows:
Let A be the Social Security source:
100*#
missing#)9(
unitsrelevant
employeesforvaluewithAsourceinunitsI
QI n. 10 – Misclassification rate
Back to indicator 10
Example 1
A. Statistical output: BR unit
B. Relevant units: Units in Construction sector
C. Relevant variables: Economic Activity code (NACE, 4 digits or 3 digits)
E. Tolerance: “consistency” means equal NACE at 4 digits
D. Steps for calculation:
n. units n.employees (num) n. relevant units in A source with missing data 10 19,691 (den) n. relevant units "enterprises with 100+ employees" 12,277 5,394,055
I(9)= num/den%=(10/12,277)*100= 0.08%
weighted for employment
I(9)w=num/den%=(19,691/5,394,055)*100= 0.37%
ESSnet – Admin data: Quality Indicators 33
D1. Match each source (VAT (file of “Value – Added Taxes” model) and/or CCIAA (file of
declaration to ‘Chambers of Commerce’) with BR by the common id code
D2. Attribute a “presence(1) / absence(0)” indicator to items of variable in each admin data
(sum up for obtaining denominator)
D3. Attribute a value=1(0) for “inconsistency” (“consistency”) item between BR and source
(sum up for obtaining numerator)
D4. Calculate the indicator (simple or aggregated, weighted or not weighted) as follows:
sourceCCIAA or VATin items #
)( digit) 3or 4 (NACE variableof nciesinconsiste #)10(
VATorCCIAASourceBRI
I(10)VAT=(2/3)*100=67%
I(10)VATw=(2+5.42)/(2+16+5.42)*100=32%
I(10)CCIAA=(3/5)*100=60%
I(10)CCIAA w=(2+1+5.42)/(2+2.25+1+16+5.42)*100=32%
I(10)aggregated VAT – CCIAA =(67*3+60*5)/(3+5)=63%
QI n. 11 – Undercoverage
Back to indicator 11
Example 1
A. Statistical output: BR for SBS
B. Relevant units: Units with 100+ employees
C. Relevant variables: number of employees
D. Steps for calculation:
D1. From BR take the population of units with 100+ employees
unit NACE -BR
Nace-
Source
VAT
#items in VAT
Source
Inconsistency
VAT-BR (4
digits)
Nace-
Source
CCIAA
#items in
CCIAA
Source
Inconsistency
CCIAA-BR (4
digits)
Persons
employed
X1 43910 41200 1 1 412 1 1 2
X2 41200 0 0 1
X3 41200 0 412 1 0 2.25
X4 41200 0 0 2
X5 43390 0 41 1 1 1
X6 43290 43290 1 0 43290 1 0 16
X7 43220 0 0 1
X8 43120 41200 1 1 412 1 1 5.42
X9 432 0 0 1
X10 43290 0 0 1
Sum 3 2 5 3 32.67
ESSnet – Admin data: Quality Indicators 34
D2. Match source A with BR100+ by the common id code
D3. Calculate number of units in relevant population BUT not present in A
D4. Calculate the indicator as follows:
Let A be the Social Security source:
100*unitsrelevant #
A sourcein NOT unitsrelevant #)11( I
QI n. 12 – Overcoverage
Back to indicator 12
Example 1
A. Statistical output: BR enterprises with turnover less than 7,500,000 Euro
B. Relevant units: Units with turnover less than 7,500,000 Euro
C. Relevant variables: Any variable of interest
D. Steps for calculation: Source: Statistics – Based tax assessment (SBTASS), a survey
managed by the Italian Tax Authority
D1. Match each source (SBTASS) with BR by the common id code
D2. Identify relevant units
D3. Calculate number of units BR∩SBTASS out of scope, that is units with turnover greater
than 7,500,000 Euro
D4. Calculate the indicator as follows:
I(12)= 100*.
min
populationreferenceinunitsrelevantofN
populationreferenceinnotbutdataadinunitsrelevantofN
n.units n. employees (num) n. relevant units not included in A source 29 44,173 (den) n. relevant units "enterprises with 100+ employees" 12,277 5,394,055
I(11)= num/den%=(29/12,277)*100= 0.24
weighted for employment
I(11)w= num/den%=(44,173/5,394,055)*100= 0.82
ESSnet – Admin data: Quality Indicators 35
QI n.13 - % of units in the admin source for which reference period differs from the required
reference period
Back to indicator 13
Example 1
A. Statistical output: The BR Enterprise units
B. Relevant units: Corporations in Admin data with different reporting period from
required BR period
D: Steps for calculation:
D1. From Balance Sheet source take all corporations with different reporting period with
respect to the required BR period. Required BR period is 01.01.2009-31.12.2009 while a
different reporting period, for example, is 30.06.2008-30.06.2009
D2. Match all BR corporations with Balance Sheet by the common id code
D3. Calculate the indicator as follows:
Let A be the Balance Sheet source
I(13)=(Num/Den)*100=(11.341/607.899)*100=1.87%
Num: No of relevant corporations with different required BR period;
Den: No of relevant corporations in BR.
I(13)w(E)=[Employees(2)/Employees(1)]*100=(410,777/7,980,361)*100=5.15%
I(13) w (T)=[Turnover(2)/Turnover(1)]*100=(140,954,727,363/2,088,997,442,877)*100=6.75%
Units n. pers. empl.
(num) - BR∩SBTASS with turnover greater than 7,500,000
Euro 2,074 63,279.05
(den) - BR with turnover less than 7,500,000 Euro
(excluding missing value) 3,305,497 9,076,661.77
I(12) = 2,074 / 3,305,497*100=0.06%
I(12)w (for pers. empl) =( 63,279.05 / 9,076,661.77)*100=0.70%
Units Employees Turnover
607,899 7,980,361 2,088,997,442,877
11,341 410,777 140,954,727,363No of BR corporations present in Balance Sheet for which
reference period is different from Br required reference period (2)
No of BR corporations present in Balace Sheet source (1)
ESSnet – Admin data: Quality Indicators 36
QI n.14 – Size of revisions from the different versions of the admin data RAR – Relative
Absolute Revisions
Back to indicator 14
Example 1
A. Statistical output: The BR Enterprise units updating/identification.
B. Relevant units: Units with 100 or more employees in a ATECO (5-digits, Italian
version of NACE) activity code.
C. Relevant variables: Number of employees.
D. Steps for calculation:
D1. Identify the statistical unit (enterprise) in the first and in the second version of data
coming from the same source
D2. Take the units with 100 or more employees which are included in the ATECO activity
code.
D3. Take the non missing values (XPt) from the first data version;
D4. Take the non missing values (XLt) from the second data version for the same units
received in the first data version
D5. Calculate the difference (absolute value) between the latest data and the first data
version for each unit;
D6. sum up the differences and divide it by the sum of the absolute values of the first data.
D7. Calculate the indicator as follows:
100*
||
||
I(14)
1
1
T
t
T
t
XPt
XPtXLt
X1 150 150 0 150,322
X2 227 227 0 273,200
X3 125 127 2 100,233
X4 8,023 8,218 195 11,027,323
X5 1,312 1,315 3 7,182,325
X6 Absent Absent
X7 58 123 65 78,000
X8 887 887 0 532,233
X9 24 118 94 21,452
X10 533 533 0 1,125,328
Total 11,339 11,698 359 20,490,416
Number of employees in
the first data (1)
Number of employees in the
second data (2) Absolute values of (2)-(1) Turnover
ESSnet – Admin data: Quality Indicators 37
I(14)=[(І150-150І+І227-227І+І127-125І+І8,218-8,023І+І1,315-1,312І+І123-58І+І887-887І+І118-
24І+І533-533І)/(І150+227+125+8,023+1,312+58+887+24+533І)]*100=(359/11,339)*100=3%
QI n.15 – % of units in admin data which fail checks
Back to indicator 15
Example 1
A. Statistical output: The BR Enterprise updating/identification.
B. Relevant units: all the units in the register
C. Relevant variables: The key variables: NACE activity Code; state of activity; number of
employees.
D. Steps for calculation:
D1: calculate for each key variable the number of units that come from admin data;
D2: Identify for each key variable the number of units that fail checks and come from admin
data;
D3. Average the proportions of units that fail checks by weighting by the numbers of units.
Let A be the NACE activity code;
Let B be the state of activity;
Let C be the number of employees.
I(15)=(190,080/10,589,670)*100=1.8%
I(15)w=(346,657,095,921/4,182,029,063,563)*100=8.3%
QI n.16 - % of units for which data have been adjusted
Back to indicator 16
Example 1
A. Statistical output: The BR Enterprise updating/identification.
B. Relevant units: all the units in the register
Variables
A 60,277 71,131,871,029 4,486,410 1,494,887,680,425 1.3 4.8
B 48,505 15,367,168,813 4,497,993 1,518,016,883,187 1.1 1.0
C 81,298 260,158,056,079 1,605,267 1,169,124,499,951 5.1 22.3
Total 190,080 346,657,095,921 10,589,670 4,182,029,063,563 1.8 8.3
(1)
Number of units
with admin data
which fail checks
(3)
Number of units with
admin data
(2)
Turnover of (1)
(4)
Turnover of (3)
[(1)/(3)]*100 %
of units in admin
data which fail
checks
[(2)/(4)]*100
Weighted by
turnover
ESSnet – Admin data: Quality Indicators 38
C. Relevant variables: The key variables NACE activity Code; state of activity;
number of employees.
D. Steps for calculation:
D1: calculate for each key variable the number of units that come from admin data;
D2: Identify for each key variable the number of units for which data have been
adjusted;
D3: For each variable, divide the number of units of D2 by the number of units of D1;
D4: Average the simple indexes weighting by the units of D1.
Let A the NACE activity code;
Let B the state of activity;
Let C the number of employees.
I(16) =(148,539/10,589,670)*100=1.4%
I(16)w=(322,369,980,306/4,182,029,063,563)*100=7.7%
QI n.17 – % of imputed values (items) in the admin data
Back to indicator 17
Example 1
A. Statistical output: The results of a sectoral survey.
B. Relevant units: all the units in a specific NACE activity code.
C. Relevant variables: The variables NACE activity Code; number of employees;
turnover.
D. Steps for calculation:
D1: For each source identify the variables which are used for the statistical output.
D2. For each variable in the source calculate the number of items in admin data.
D3. For each variable in the source identify all the units with items present in admin data
which are afterwards imputed;
D4: For each variable in the source calculate the non missing items in the statistical output.
D5. For each variable calculate the proportion of D3 on D2;
A 18,736 46,844,755,414 4,486,410 1,494,887,680,425 0.4 3.1
B 48,505 15,367,168,813 4,497,993 1,518,016,883,187 1.1 1.0
C 81,298 260,158,056,079 1,605,267 1,169,124,499,951 5.1 22.3
Total 148,539 322,369,980,306 10,589,670 4,182,029,063,563 1.4 7.7
(1)
Number of units with
admin data for which
data have been adjusted
Variables
(2)
Turnover of (1)
(3)
Number of units
with admin data
% of units in admin
data for which data
have been adjusted
[(1)/(3)]*100
% weighted by
turnover
[(2)/(4)]*100
(4)
Turnover of (3)
ESSnet – Admin data: Quality Indicators 39
D6. Calculate the indicator for each source weighting the proportions with the items of D4.
D7. Calculate the general indicator weighting the indicators of D6 for the data.
Percentage of units in source A with Nace activity codes imputed: (3/10)*100=30% Percentage of units in Source A with number of employees imputed: (1/10)*100=10% Percentage of units in Source B with Nace activity codes imputed: (2/8)*100=25%
Percentage of units in Source B with Turnover imputed:(1/8)*100=12.5% I(17)Source A=[(3+1)/(10+10)]*100=20%
I(17)Source B=[(2+1)/(8+8)]*100=18.75%
I(17)Sources A and B=(20*10+18.75*8)/(10+8)=19.4%
QI n.18 – Delay to accessing/receiving data from admin source
Back to indicator 18
Example 1
A. Statistical output: The BR Enterprise units updating/identification
B. Relevant units: 10+ employees enterprises (relevant for a specific survey or as
base for the HG firms)
D: Steps for calculation:
D1. from BR take the population of units with 10+ employees
D2. Match each source with BR10+ by the common id code obtaining the number of
common units;
D3. Calculate for each source the number of months from the end of the reference period
to the arrival of Admin data;
X1 16231 2 Absent in the source Absent in the source
X2 16232 0 16232 16231 80,305
X3 missing 16231 3 16231 127,118
X4 16231 0 Absent in the source Absent in the source
X5 17110 16231 10 16231 335,550
X6 16231 15 missing 25,332
X7 16231 1 47112 16231 118,125
X8 missing 16231 0 5 16231 63,212
X9 16231 0 16231 missing 7550
X10 16291 0 16291 18,123
10 3 10 1 8 2 8 1
Nace
activity
code
source A
Nace activity
code source
A afterwards
imputed
Number
of units
Turnover
Source B
afterwards
imputed
Units Number of
employees source A
afterwards imputed
Nace activity code
source B
Nace activity
code source B
afterwards
imputed
Turnover Source BNumbers
of
employees
Source A
ESSnet – Admin data: Quality Indicators 40
D4: Calculate the number of months from the end of the reference period to the
dissemination date
D4. Calculate the indicator as follows
Let A be the Fiscal Register source;
Let B be the Archive of the Chamber of Commerce;
Let C be the Social Security source;
Let D be the Yellow Pages
Numerator:
For each source divide the number of months from the end of the reference period to the arrival
of Admin Data by the number of months from the end of reference period to publication date;
then sum up the source indicators.
Source A: (6/15)*100=40%;
Source B: (8/15)*100=53.3%
Source C: (6/15)*100=40%
Source D: (0/15)*100=0%
Denominator: the number of sources;
I(18)=num/den=(40%+53.3%+40%+0%)/4=33.3%
Weighted indicator: (weighted by the contribution of each source to the final result)
I(18)=(40%*186,605+53.3%*184,356+40%*186,549+0%*121,759)/(186,605+184,356+186,549+1
21,759)=36.4%
Weighting by turnover:
I(18)=(40%*2,081,725,436,855+53.3%*2,076,137,208,748+40%*2,081,236,000,674/(2,081,725,4
36,855+
+2,076,137,208,748+2,081,236,000,674+1,596,790,158,244)=35,37%
A 186,605 6 15 2,081,725,436,855
B 184,356 8 15 2,076,137,208,748
C 186,549 6 15 2,081,236,000,674
D 121,759 0 15 1,596,790,158,244
Common units in source
and relevant units
Months from the end of
reference period to receiving
admin data
Months from the end of
reference period to
publication dateSource Turnover
ESSnet – Admin data: Quality Indicators 41
QI n.19 – Discontinuity in estimate when moving from a survey-based output to an admin-
based output
Back to indicator 19
A. Statistical output: A sectoral survey
B. Relevant units: Enterprises of construction section (NACE Rev2, Section F)
C. Relevant variables: Number of employees.
D. Steps for calculations:
D1. Compute the estimate of the variable(s) for the survey-based
output
D2. Compute the estimate of the variable(s) for the admin-based
output
D3. Calculate the indicator as follows:
This indicates that the admin-based output will be 0.6% higher than the survey-based output.
QI n. 20 – % of consistent items for common variables in more than one source
Back to indicator 20
Example 1
A. Statistical output: A sectoral survey
B. Relevant units: Units in the survey(s)
C. Relevant variables: ATECO, 5-Digits, Italian version of NACE (var1) and legal status
(var2)
E. Tolerance: ATECO (var1) equal at 4 digits; and legal status (var2) equal at 4
digits
D. Steps for calculation:
D1. Match each source with survey(s) by the common id code
Estimate of total number of employees for the enterprises of Section F using admin data 1,158,542
Estimate of total number of employees for the enterprises of Section F using survey data 1,152,487
%6.0100*487,152,1
487,152,1542,158,1100*
survey from Estimate
survey from Estimate-dataadmin from Estimate)19(
I
ESSnet – Admin data: Quality Indicators 42
D2. Attribute a “presence(1) / absence(0)” indicator to items of var1 and var2 in survey
(sum up for obtaining denominator)
D3. Attribute a value=1(0) for consistent (not) item in survey and in the source (it is
considered as consistent if var1=var(survey)) (sum up for obtaining numerator)
D4. Calculate the indicator as follows
100*
#
2var1var#)20(
surveyinitems
SurveyAandinitemsconsistentI
QI n. 21 – % of relevant units in admin data which have to be adjusted to create statistical units
Back to indicator 21
Example 1
A. Statistical output: A sectoral survey
B. Relevant units: Enterprises with commercial area greater than 400 m2
C. Relevant variables: Retail area of the enterprise.
D. Steps for calculation:
Unit
X1 27200 27200 27320 1 1 1320 1320 1 1
X2 10840 0 1120 0
X3 68200 47112 1 0 1440 1440 1 1
X4 68100 0 1120 0
X5 27330 28111 1 0 1120 0
X6 47112 47112 47112 1 1 1330 1320 1 0
X7 47113 0 1320 0
X8 68200 0 1120 0
X9 47114 0 1210 0
X10 47113 68200 47113 1 1 1220 missing 0
X11 47114 1
Total 6 3 3 2
# items
in
surveys -
ATECO
Cosistent -
Legal
status
Variable 1: ATECO Variable 2: Legal status
Consistent
ATECO
Source A,
legal
status
SBS - legal
status
# items
in
surveys -
legal
Source A,
ATECO,
national cl.
For NACE
SBS -
ATECO
GI -
ATECO
I(20)=n. of consistent items in admin and survey data for ATECO/n. of items in survey for ATECO*100=3/6*100=50.0%
I(20)=n. of consistent items in admin and survey data for legal status/n. of items in survey for legal status*100=2/3*100=66.7%
ESSnet – Admin data: Quality Indicators 43
D1.Identify the units in the admin data which need to be adjusted
in order to obtain the relevant statistical units
D2.Identify the relevant units in admin data that correspond to
the statistical concepts.
D3. Divide #D1 by (#D1+#D2)
Let S1 be an admin data base on Commerce (e.g. Nielsen data) where admin units do not
correspond to the statistical concepts (the enterprise) i.e. n. admin units (n>=1) correspond
to one enterprise.
%2.10100*523,3400
400100*
units lstatistica the toingcorrespond unitsrelevant adjustedbeen have that unitsRelevant
concepts lstatistica the toadjustedbeen have that unitsRelevant )21(
I
%8.12100*450,755,1428,257
428,257)21(
wI
QI n. 22 – Cost of converting admin data to statistical data
Back to indicator 22
Example 1
A. Statistical output: A sectoral survey
B. Relevant units: Enterprises with commercial area greater than 400 m2
D. Steps for calculation:
D1.Identify the time in person hours necessary to convert the
admin data in order to obtain statistical data as a function of
admin source size and complexity in the treatment of admin
data.
Let c1=number of records in admin data
Let c2=number of records processed per hour=complexity coefficient
Units m2
Number of enterprises with more than one local unit in the admin data: 400 257,428
Number of enterprises with only one local unit in the admin data: 3523 1,755,450
ESSnet – Admin data: Quality Indicators 44
I(22)=Cost of conversion in person hours=f(#of record in admin data, #of records processed per
hour)=
= 2
1
c
cNumber of person hours= H36
83
000,3
QI n. 23 – Efficiency gain in using admin data
Back to indicator 23
Example 1
A. Statistical output: A survey on small enterprises
B. Relevant units: Corporations with 10 or less employees.
D. Steps for calculation:
D1. Quantify costs of survey-based statistic (total cost of the
survey including questionnaires, mailing, re-contacting , staff etc.)
D2. Quantify cost of survey when based on admin data (Balance
Sheets): cost of admin source acquisition; processing costs; staff
etc.
Production cost of Survey-based statistic 36,150
Production cost of Admin-based statistic 22,500
%8.37100*150,36
500,22150,36100*
statistic basedsurvey ofcost Production
statistic basedadmin ofcost Production-statistic basedsurvey ofcost Production)23(
I