q2010 course on “quality reporting and metadata” may 2010, helsinki august götzfried and eva...
TRANSCRIPT
SESSION 2 09:45 – 10:15
THE ESS FRAMEWORK FOR QUALITY REPORTING
(Code of Practice, Statistical Law, ESQR, DatQam, etc. )
Eva Elvers
2
European Statistics Code of Practice
June 2004: Council invited COM to make by June 2005 a proposal to develop minimum European standards on the independence, integrity and accountability of the European Statistical System
February 2005: adoption of the Code by the SPC based on the proposal of its Task Force
25 May 2005: COM adoption of a Communication and a Recommendation on the independence, integrity and accountability of the national and Community statistical authorities incl. the Code of Practice (CoP)
3
European Statistics Code of Practice
CoP has two aims:
Improving trust and confidence in the independence, integrity and accountability of both National Statistical Authorities and Eurostat, and in the credibility and quality of the statistics they produce and disseminate (i.e. an external focus);
Promoting the application of best international statistical principles, methods and practices by all producers of European Statistics to enhance their quality (i.e. an internal focus).
4
European Statistics Code of Practice
15 Principles addressing the institutional environment, the statistical processes and their outputs (inspired by existing international standards and the ESS quality definition)
self-regulation of NSIs and Eurostat
indicators to provide a reference for periodical reviews of the implementation of the Code
5
European Statistics Code of Practice
Institutional environment
• Principle 1: Professional Independence
• Principle 2: Mandate for data collection
• Principle 3: Adequacy of Resources
• Principle 4: Quality Commitment
• Principle 5: Statistical Confidentiality
• Principle 6: Impartiality and Objectivity
Example of indicators:
I. The mandate to collect information for the production and dissemination of official statistics is specified in law.
II. The statistical authority is allowed by national legislation to use administrative records for statistical purposes.
III. On the basis of a legal act, the statistical authority may compel response to statisticalsurveys.
6
European Statistics Code of Practice
Statistical Processes
• Principle 7: Sound Methodology
• Principle 8: Appropriate Statistical Procedures
• Principle 9: Non-Excessive Burden on Respondents
• Principle 10: Cost Effectiveness
Example of indicators:
I. Where European Statistics are based on administrative data, the definitions and concepts used for the administrative purpose must be a good approximation to those required for statistical purposes.
II. In case of statistical surveys, questionnaires are systematically tested prior to the data collection.
III. Survey designs, sample selections, and sample weights are well based and regularly reviewed, revised or updated as required.
IV. Field operations, data entry, and coding are routinely monitored and revised as required.
V. Appropriate editing and imputation computer systems are used and regularlyreviewed, revised or updated as required.
VI. Revisions follow standard, well-established and transparent procedures.
7
European Statistics Code of Practice
Statistical Output
• Principle 11: Relevance
• Principle 12: Accuracy and Reliability
• Principle 13: Timeliness and Punctuality
• Principle 14: Coherence and Comparability
• Principle 15: Accessibility and Clarity
Example of indicators:
I. Source data, intermediate results and statistical outputs are assessed and validated.
II. Sampling errors and non-sampling errors are measured and systematically documented according to the framework of the ESS quality components.
III. Studies and analysis of revisions are carried out routinely and used internally to inform statistical processes.
8
UN Frameworks
Fundamental Principles of Official Statistics (1994)
- Sets the basic rules for the statistics producers
- 10 Fundamental Principles to establish a quality management system
Principles Governing International Statistical Activities (2005)
- Further recalls the UN Fundamental Principles and the Declaration of Good Practices in Technical Cooperation in Statistics (1999)
- 10 Principles and 40 Good practices
9
Total Quality Management
LEG on Quality outputs as well as CoP pinpoint two quality aspects:
- Total quality management as a basic quality framework.
- Promoting CBM’s in processes and outputs.
Note the connections to ethical issues:
- UN Fundamental Principles of Official Statistics (1994).
- ISI Declaration on Professional Ethics (1985, upd.).
10
Product/ output quality components
OECD: relevance, accuracy, credibility, timeliness (and punctuality), accessibility, interpretability, coherence (within dataset, across datasets, over time, across countries)
Eurostat: relevance, accuracy, timeliness and punctuality, accessibility and clarity, coherence (within dataset, across dataset), comparability (over time, across countries)
ECB: accuracy/reliability, methodological soundness, timeliness, consistency
IMF: prerequisites of quality, accuracy and reliability, assurances of integrity, methodological soundness, serviceability (timeliness and periodicity), accessibility, serviceability (within dataset, across dataset, over time, across countries)
FAO: relevance (completeness), accuracy, timeliness, punctuality, accessibility, clarity (sound metadata), coherence, comparability
UNESCO: relevance, accuracy, interpretability, coherence
UNECE: relevance, accuracy (credibility), timeliness, punctuality, accessibility, clarity, comparability (across datasets, over time, across countries)
11
Product/output quality components –a possible summary
Relevance
Accuracy (and reliability)
Timeliness
Punctuality
Accessibility
Clarity/interpretability
Coherence/consistency
Comparability
12
EU Legislation on Quality Reporting
Council/EP Regulations- Revision « Statistical Law »- Sectoral Regulations
Commission Regulations
Gentlemen’s agreements
13
EU Legislation on Quality Reporting
The new Regulation on European Statistics (signed by the Parliament and Council on 11.03.09)
- References to Code of Practice (Whereas, art. 1, art. 2, art. 7, art. 11)
- Article 12 – Statistical quality
14
EU Legislation on Quality Reporting
Article 12(1) specifies the quality criteria (relevance, accuracy, timeliness, punctuality, accessibility and clarity, comparability, and coherence) that should be applied.
Article 12(2) specifies that the modalities, structure and periodicity of quality reports provided for in sectoral legislation shall be defined by the Commission in accordance with (simple) regulatory procedure.
15
EU Legislation on Quality Reporting
Article 12(2) the specific quality requirements, such as target values and minimum standards for the statistical production, may also be laid down in sectoral legislation.
Article 12(3) Member States shall provide the Commission (Eurostat) with reports on the quality of the data transmitted. The Commission (Eurostat) shall assess the quality of data transmitted and shall prepare and publish reports on the quality of European Statistics.
16
EU Legislation on Quality Reporting
Proposal for a Regulation of the Parliament and of the Council on population and housing censuses. Brussels, 11 January 2008
“1. For the purpose of this Regulation, the following quality assessment
dimensions shall apply to the data transmitted:”
.....
.....
“4. The Commission (Eurostat), in cooperation with the competent authorities of
the Member States, shall provide methodological recommendations
designed to ensure the quality of the data and metadata produced,
acknowledging, in particular, the Conference of European Statisticians
Recommendations for the Censuses of population and Housing”.
17
EU Legislation on Quality Reporting
Commission Regulations of quality evaluation
Standard articles:
– Structure and evaluation criteria (details specified in annex)
– Variables included (and breakdowns)
– Schedule (first quality report and subsequent reports)
– Review (if optional items included)
– Assessment of quality (by Eurostat of MS’s statistics)
– Entry into force
18
Self-Assessment Checklist for Survey Managers
The DESAP project (“Development of a Self-Assessment Programme for Surveys”), co-ordinated by DESTATIS (Germany), with Statistics Austria, Statistics Finland, ISTAT (Italy), Statistics Sweden and the ONS (UK) as partners, was carried out during the period October 2002-October 2003. In response to the LEG on Quality recommendation nr. 15.
DESAP
is tailored for statistics production and it aims to help survey managers to develop the survey that is under their responsibility
is fully compliant with the ESS quality criteria
applies to individual statistics collecting micro-data
has questions with numerous response categories, assessment questions, and open questions
19
Objectives of DESAP
Objectives of DESAP: – raising awareness for the quality components and survey quality concepts
– to provide a tool for a systematic, even though subjective, assessment of statistical products and processes– to provide helpful guidance in the consideration of improvement measures
Additional potential applications: – assistance for a basic appraisal of the risk of potential quality problems– to provide a means for simple comparisons of the level of quality over time – to provide support for resource allocation within statistical offices or for the training of new staff
20
Handbook on Improving Quality by Analysis of Process Variables
Then project was coordinated by ONS (UK), with INE Portugal, NSS Greece and Statistics Sweden as partners, and carried out June 2002- June 2004. In response to LEG on Quality Recommendation nr. 3.
General approach to and useful tools for the task of identifying, measuring and analysing key process variables.
Explains how ‘the process quality is improved by identifying key process variables (i.e. those variables with the greatest effect on product quality), measuring these variables, adjusting the process based on these measurements, and checking what happens to product quality.
Includes many practical examples of the application of the approach to various statistical processes.
The handbook does not aim to provide a list of recommended key process variables across all statistical processes.
21
Handbook on Data Quality Assessment Methods and Tools
Quality Profile/ Report
User Requirements
Standards GuidelinesExternalEnvironment
QualityAssessmentMethods
Processes and Products
Documentation Measurements
ProcessPerformance
Indicators
ProcessQuality
Indicators
OutputQuality
Indicators
User Survey Results
Self Assessment Audit/Peer Review
Comprehensive Quality Report
Labelling Certification
Preconditions
Figure 1. Quality Assessment Methods in Context
22
Methods and Tools for Quality Assessment
Self assessments Quality reviews
Labelling
Institutional/ legal environment
User requirements Standards
III. Conformity
II. Evaluation
I. Documentation Measurement
Improvementactions
Handbook on process-variables (ONS)
Customer/ user satisfaction surveys (SCB)
Auditing activities in NSI’s (INE-PT)
DESAP Checklist (DESTATIS, Lithuania)
Handbook Questionnaire development (ISTAT)
ESS Standard Quality Indicators
ESS Quality Reports
Editing and Imp in Business surveys (ISTAT)
Guidelines on accuracy and delays (INSEE)
Handbook on seasonal adjust-ment (HCSO)
Methods for evaluating response burden (SSB)
DatQAM (DESTATIS)
23
Content
a. To present ESS quality models in detail (Eva Elvers)
b. GSBPM follows (August Götzfried)
Process Quality and Output Quality
25
Content of Module
General definition of quality
Output quality components
Process quality components– Institutional environment– Individual statistical processes
26
General Definition of Quality
“Quality” not well defined– in sense that there are many definitions
– most general and succinc: fitness for use
Start with international standards
ISO 9000 definition:– degree to which a set of inherent characteristics fulfils requirements
ISO 8402:1986 gives more comprehensible definition:– totality of features and characteristics of a product or service that
bear on its ability to satisfy stated or implied needs
27
General Definition of Quality
These definitions provide basic notion of product quality– Need to be supplemented by more precise interpretation of quality
in ESS context
ESS Quality Definition– Presented to October 2003 meeting of ESS Working Group
Assessment of Quality in Statistics– Basis for defining output quality components in all subsequent
quality related documents, including• Code of Practice (CoP) and • forthcoming basic legal framework on European Statistics
28
Output Quality Components
Relevance
Accuracy and Reliability
Timeliness and Punctuality
Accessibility and Clarity
Coherence and Comparability
29
Output Quality Components
Relevance: – outputs meet current and potential users’ needs
Accuracy and Reliability: – outputs accurately and reliably portray reality
Timeliness and Punctuality: – outputs are disseminated in timely, punctual manner
30
Output Quality Components
Accessibility and Clarity:
– outputs are presented in clear, understandable form
– disseminated in a suitable and convenient manner
– made available and accessible on impartial basis
– accompanied by supporting metadata and guidance
31
Output Quality Components
Coherence and Comparability:
– coherence means that outputs are mutually consistent and can be used in combination
– comparability is an aspect of coherence and means that outputs referring to same data items are mutually consistent and can be used for comparisons across time, region, or any other relevant domain.
32
Process Quality Components
Output quality is achieved through process quality
Process quality has two broad aspects:
– Effectiveness: which leads to the outputs of good quality; and
– Efficiency: which leads to production of outputs at minimum cost to statistical office and to respondents that provide the original data
33
Process Quality Components
Guidance on formulation of process quality components provided by first 10 principles in ESS Code of Practice (as previously described)
Principles formulated in two groups: – institutional environment - within which programme of
statistical processes is conducted– individual statistical processes
34
Process Quality ComponentsBased on ESS Code of Practice
Institutional Environment– Professional independence– Mandate for data collection– Adequacy of resources– Quality commitment– Statistical confidentiality
Individual Statistical Process– Sound methodology – Appropriate statistical procedures– Non-excessive burden on respondents– Cost effectiveness: resources are effectively used
35
Process Quality ComponentsInstitutional Environment
Professional independence– professional independence of staff from other policy,
regulatory or administrative departments and from private sector operators
– required to support credibility of outputs
Mandate for data collection– organisation has a clear legal mandate to collect the
particular information required– For survey conducted under statistics act providers
compelled by law to provide or allow access to data
36
Process Quality ComponentsInstitutional Environment
Adequacy of resources– resources available are sufficient to meet systems and
processing requirements
Quality commitment– staff commit themselves to work and cooperate according
to principles in ESS Quality Declaration
37
Process Quality ComponentsInstitutional Environment
Statistical confidentiality – guarantees of privacy of data providers, confidentiality of
information they provide, and use only for statistical purposes
Impartiality and objectivity: – production and dissemination of statistics respect scientific
independence– conducted in an objective, professional and transparent
manner– in which all users are treated equitably.
38
Contents
Background Modelling statistical business processes Applicability Structure and key features Relevance to SDMX Next steps
40
Background
Defining and mapping business processes in statistical organisations started at least 10 years ago– “Statistical value chain”– “Survey life-cycle”– “Statistical process cycle”– “Business process model”
41
Background
Defining and mapping business processes in statistical organisations started at least 10 years ago– “Statistical value chain” X– “Survey life-cycle” X– “Statistical process cycle” X– “Business process model” X
Generic Statistical BusinessProcess Model
42
Modelling Statistical Business Processes
Reached a stage of maturity where a generic international standard could be drawn up
Many drivers for a generic model:– “End-to-end” metadata systems development– Harmonization of terminology– Software sharing– Process-based organization structures– Process quality management requirements– The Eurostat vision ...
43
Why do we need a model?
To define, describe and map statistical processes in a coherent way To standardize process terminology To compare / benchmark processes within and between organisations To identify synergies between processes To inform decisions on systems architectures and organisation of
resources
44
History of the GSBPM
Based on the business process model developed by Statistics New Zealand
Added phases for:– Archive (inspired by Statistics Canada)– Evaluate (Australia and others)
Three rounds of comments; now quite accepted; Terminology and descriptions made more generic Wider applicability?
45
Applicability of the model
All activities undertaken by producers of official statistics which result in data outputs
National and international statistical organisations Independent of data source, can be used for:
– Surveys / censuses– Administrative sources / register-based statistics– Mixed sources
46
Applicability of the model
Producing statistics from end-to-end(micro or macro-data)
Revision of existing data / re-calculation of time-series Development and maintenance of statistical and
administrative registers
47
Structure of the Model (2)
National implementations may need additional levels Over-arching processes
– Quality management– Metadata management– Statistical framework management– Statistical programme management– ........ (8 more – see paper)
49
Key features (1)
Not a linear model Sub-processes do not have to be followed in a strict order It is a matrix, through which there are many possible paths,
including iterative loops within and between phases Some iterations of a regular process may skip certain sub-
processes
50
Key Features (2)
In theory the model is circular:– Evaluation can lead to modified
needs and design In practice it is more like a multiple
helix:– There may be several iterations of a
process underway at any point in time
52
Relevance to SDMX
Process modelling already mentioned in:– SDMX User Guide– SDMX Technical Standards (version 2.0) – Euro SDMX Metadata Structure
Common terminology If inputs and outputs use SDMX formats, why not the
intermediate processes?
54
Standardized process descriptions
Harmonised processes
Rationalization of software
Use of open source and shared components
SDMX between components
Convergence of business architectures
55
Next steps
The model is more and more commonly accepted Several statistical organisations are implementing this model
or similar ones Gather implementation experiences and other comments as
input for Part C of the “Common Metadata Framework” Present to the Bureau of the Conference of European
Statisticians Role in SDMX?
56
Questions and Comments?
For more information see the METIS wiki:
www1.unece.org/stat/platform/display/metis
57
Types of Quality Report and Statistical Process
Purpose– To describe various aspects of a quality report
• types of quality report• types of statistical process for which report prepared
– To describe structure of ESQR
59
Content
Types of quality report– Scope/level of quality report– User/producer orientation– Process/output orientation
Types of statistical process
Level of detail and role
Quality reporting structure used in ESQR and EHQR
60
Types of Statistical Process – six in all
Sample Survey– based on usually probabilistic sampling procedure– involving direct collection of data from respondents (mostly)
Census– survey where all frame units are covered
Statistical Process Using Administrative Source(s)– process making use of data collected for administrative
purposes - purposes other than direct production of statistics– example: statistical tabulations produced from database
maintained by Department of Education
61
Types of Statistical Process (cont.)
Statistical Process Involving Multiple Data Sources– Survey with different questionnaire designs, sampling
procedures for different segments of population– Mixture of direct data collection & administrative data
Price or Other Economic Index Process– Involving complex sample surveys, often with non-
probabilistic designs– Targets complex and model-based
Statistical Compilation– Economic aggregates like National Accounts and Balance of
Payments
62
Types of Statistical Process – how many?
The Generic Statistical Business Process Model– ONE Statistical Process– Six types currently in the ESS – Similarities rather than differences
63
Types of Quality Report: by Scope and Level
Scope of Quality Report– Institution– Broad statistical domain– Statistical process– Sub domain within statistical process– Individual statistical indicator(s)
Level of Quality Report– National level– European level
64
European Level Quality Report
European level statistics may include – aggregations of national estimates for European entity -
EU-27, EEA, Euro area– comparisons and contrasts of national estimates
Possible objectives of an ESS quality report
– quality of European aggregate statistics– quality of comparisons of national statistics– comparisons of qualities of estimates
65
Types of QR Producer/User Orientation
Quality report may be user-oriented, producer-oriented or both– May aim communicate quality between producers
Producer of statistics may also be user of other statistics
Users may be sophisticated/not– advanced analysts/researchers, or public at large
66
Types of QR by Producer/User Orientation
ESQR is producer-oriented with focus what is needed to ensure quality of ESS
User-oriented quality reporting requires its own standard
Producer oriented report according to ESQR will include all information for user-oriented reports
67
Types of QR by Process/Output Orientation
Quality report may focus on processes or outputs or both
ESQR has output orientation even though aimed at producers
68
Types of QR by Level of Detail
Quality report can vary from brief to detailed– Quality profile covers only a few specific attributes and
indicators– DESAP checklist covers all aspects but not in detail
ESQR is for the most comprehensive form of quality report commonly prepared– dealing with all important aspects of output and process
quality including – descriptions of processes and quality measurements– quantitative quality measures and– discussions of how to deal with deficiencies
69
Types of QR by Reporting Frequency
Quality reports may be prepared for every cycle, annually, or periodically– the more frequent the report, the less detail
ESQR is aimed at comprehensive document produced periodically– say every five years, or after major changes
In between less detailed reports envisaged– for example, quality and performance indicators for every
survey occasion– checklist completed annually
70
Role of Quality Reporting
Quality report is a means to an end, not an end in itself
Should provide– factual account of quality – recommendations for quality improvements, and – justification for their implementation
71
ESQR: Reporting Structure
Process quality leads to product quality – if quality report contains an explicit assessment of quality in
terms of each process and each output quality component – there must be considerable duplication
Reporting structure in ESQR– based on output quality components – supplemented by headings covering aspects of process
quality not readily reported under any output components
72
ESQR: Reporting Structure – 11 parts
1. Introduction to statistical process and its outputs- overview required to provide context for report
2. Relevance
3. Accuracy
4. Timeliness and punctuality
5. Accessibility and clarity
6. Coherence and comparability
73
ESQR: Reporting Structure – 11 parts
7. Trade-offs between output quality components– Output quality components not mutually exclusive– Many cases where improvements with respect to one
component may lead to deterioration with respect to another
– Example: accuracy versus timeliness– Trade-offs that have to be made should be described
74
ESQR: Reporting Structure – 11 parts
8. Assessment of user needs and perceptions– Users are starting point for quality considerations– Information regarding their needs and perceptions
• obtained for all output components at the same time• not just each one individually• Need for a separate section
9. Cost, performance and respondent burden– important process quality components– not readily covered under output quality components– trade-offs versus output quality components
75
ESQR: Reporting Structure – 11 parts
10. Confidentiality, transparency and security– also important process quality components – not readily covered under output quality components
11. Conclusion– summary of principal quality problems – improvements proposed to deal with them
76
ESQR: Reporting Structure - Note
Aim of quality report is for producer to describe all aspects of statistical process and its outputs that influence the usefulness of the outputs
The key is to make use of the ESQR structure– but to be flexible in its application– to focus effort on the strengths and weaknesses likely to
be of most importance– and on known issues and problem areas
77
Quality ReportingStandards and Guidelines
In accordance with ESQR reporting structure
Introduction Relevance, Accuracy, Coherence and Comparability,
Timeliness and Punctuality, Accessibility and Clarity, Trade-offs, Performance, Cost and Respondent Burden Assessment of User Needs and Perceptions,
Confidentiality, Transparency and Security
79
Quality ReportingStandards and Guidelines
Important:
What to include How to measure, estimate, or assess Evaluation, possibly later QPI’s: Quality and Performance Indicators
EG on Quality Barometer; not details here
80
Introduction to the Statistical Process
(To provide context for report) Historical background to process, objectives and
outputs Domain – broad – to which outputs belong The quality report at hand; the boundary and
references to related quality reports Outputs produced – overview in general terms References to other related reports
81
Relevance
Relevance is the degree to which statistical outputs meet current and potential user needs. It depends on whether all the statistics that are needed are produced and the extent to which concepts used (definitions, classifications etc.,) reflect user needs
Relevance depends on the use, and relevance may depend on user
So, not a single simple description, but a broad perspective
82
Relevance Reporting
Content-oriented description of all outputs– key indicators, reference period(s)
Definitions of statistical target concepts– population, units– relation to target definitions that would be ideal from a user
perspective – discrepancies between definitions used and accepted ESS
or international definitions – trade-off between relevance and accuracy
Assessment of key outputs– Unmet user needs and reasons– Completeness relative to regulations
83
Relevance Reporting (cont.)
For administrative statistics– Definitions fixed, or influenced, by primary purpose of
administrative regulation– Possible problems
For price indexes – (Discuss important issues)
For statistical compilations– Comparison of target concepts with definitions and
concepts in international standards
84
Accuracy
The accuracy of statistical outputs in the general statistical sense is the degree of closeness of estimates to the true values
Overall and error sources Sampling errors and non-sampling errors Process type?
85
Overall accuracy
A presentation of the methodology sufficient for (i) judging whether it lives up to internationally accepted standards and best practice and (ii) enabling the reader to understand specific error assessments.
Identification of the main sources of error for the main variables.
A summary assessment of all sources of error with special focus on the key estimates.
An assessment of the potential for bias (sign and order of magnitude) for each key indicator in quantitative or qualitative terms.
86
Sampling Errors: Always; Probability and Non-Probability Sampling
Presentation, formulas, ...
Presentation device: CV, confidence interval
Sampling error cannot be estimated without reference to a model – model implying that sample is “effectively random” can sometimes be used
– for example, for price indices
For cut-off random sampling– error for sampled portion should be reported
– for non-sampled portion discuss sampling bias
Sampling biases may be significant– need to be assessed as well
87
/Sample Surveys/ Coverage Errors
Information on survey frame– reference period, updating actions– references to other documents on frame quality
Quantitative information on overcoverage and multiple listing
Assessment (preferably quantitative) on– extent of undercoverage– associated bias risks
Actions taken to reduce undercoverage and bias risks
88
/Sample Surveys/ Measurement Errors
Why ?
Data editing identifies inconsistencies due to– errors in the original data– processing errors due to coding or data entry
Inconsistencies removed by clerical correction and/or automatic imputation
Edit rule failure rates are indicative of – quality of data collection and processing– not of quality of final data
Attention paid to data editing should reflect significance of such errors
89
Measurement Errors (cont.)
Methods of error evaluation Comparisons with other data at unit level
– requires common unit identification scheme– accounting for conceptual or timing differences
Re-interview with superior method – preferably for random sample of units
90
Measurement Errors (cont.)
Methods of error evaluation (continued) Replication
– Differences between replicates indicate stability of measurement process
– Analyses often assume replication errors are independent – rarely fully justified
Effects of data editing– Comparisons of original and edited data gives a
minimum estimate of error levels
91
/Sample Surveys/ Non-Response Errors
Non-response error – difference between statistics computed from collected
data and those that would be computed if there were no missing values
Types of non-response: – unit non-response - no data are collected from unit– item non-response - some missing values in data
collected from a unit
92
/Sample Surveys/ Non-Response Errors
Impact of nonresponse Introduction of bias
– nonrespondents not similar to respondents for all variables in all strata
– whereas standard methods for handling nonresponse assume they are.
Increase in sampling error – as available number of responses is reduced
Many definitions of response rates– slightly different numerators and denominators
93
Non-Response Errors Reporting
Definitions of response rates Unit non-response rates for whole survey and
important sub-domains Item non-response rates for key variables Breakdown of non-respondents by cause Qualitative statement on risk of bias Measures to reduce non-response Treatment of non-response in estimation
94
/Sample Surveys/ Processing Errors
Identification of main issues
Include manual coding of response data that are in free format– Quality control procedures
Analysis of processing errors (where available) otherwise qualitative assessment
95
Accuracy: Censuses
Report non-sampling errors as for sample surveys Include
– Assessments of measurement, classification errors– Assessment of processing errors, especially where
manual coding of data in free text format is used. For censuses based on extensive field work:
– Assessment of undercoverage and overcoverage (undercount and over- or double count)
– Description of methods used to correct
96
Accuracy:Statistics from Administrative Sources
Report non-sampling errors as for sample surveys
Include assessment of over- and under-coverage due to lags in register updating
Include assessment of errors in classification variables
For statistics based on event reporting include assessment of rate of unreported events
97
Accuracy: Price and Other Economic Indices
Information on all sampling dimensions– for weights, products, outlets/companies etc.
Attempts at assessing sampling error– in all or some dimensions
Quality adjustment methods– including replacement and re-sampling rules– for at least major product groups
Assessment of other types of error– where they could have a significant influence
98
Accuracy: Statistical Compilations
Information and indicators relating to accuracy required by IMF Data Quality Assessment Framework (DQAF) or equivalent.
Analysis of revisions between successively published estimates
For National Accounts– Analysis of causes for statistical discrepancy– Assessment of non-observed economy
99
Other Issues Concerning Accuracy Model Assumptions and Associated Errors
Seasonal Adjustment
Imputation
Mistakes
100
Special Issues Concerning Accuracy: Revisions
Planned revisions should follow standard, well-established and transparent procedures– pre-announcements are desirable– reasons for revision and nature of the revision should
be made clear– For example new source data available, new
methods, etc
101
Coherence and Comparability:Short Definitions
Coherence: capacity of outputs to be combined and reliably used in combination
Comparability: special case of coherence for outputs involving the same data items
102
Coherence and Comparability: Explanation
The coherence of two or more statistical outputs refers to the degree to which the statistical processes by which they were generated used the same concepts - classifications, definitions, and target populations – and harmonised methods.
Coherent statistical outputs have the potential to be validly combined and used jointly.
103
Coherence and Comparability: Explanation (cont.)
Examples of joint use are where the statistical outputs refer to the same population, reference period and region but comprise different sets of data items (say, employment data and production data) or where they comprise the same data items (say, employment data) but for different reference periods, regions, or other domains.
Comparability is a special case of coherence and refers to the latter example above where statistical outputs refer to the same data items and the aim of combining them is to make comparisons over time, or across regions, or across other domains.
104
Coherence and Comparability: Notes
Distinction between coherence and accuracy– Coherence measured in terms of design metadata
– Accuracy depends upon operational metadata
– Differences between preliminary, revised and final estimates are an accuracy issue
Reasons for lack of coherence/comparability– Concepts: target population – units and coverage, reference
period, data item definitions, classifications
– Methods: frame construction, sources of data and sample design, data collection, capture, editing, imputation, estimation
105
Coherence and ComparabilityReporting
General
Descriptions of conceptual and methodological metadata elements that could affect coherence/ comparability
Assessment (preferably quantitative) of possible effect of each reported difference on outputs
Differences between statistical process and applicable European regulations/standards and/or international standards
106
Timeliness and Punctuality
Definition/Description
Timeliness: length of time between the event or phenomenon and the availability of the statistics.
Punctuality: time lag between the release date of data
and the scheduled date for release.
107
Timeliness and punctuality profile for each version (preliminary, revised, final) whenever statistics are released in multiple versions.
Reasons for possible long production times and non-punctual releases and description of the efforts made to improve situation.
Timeliness : for annual or more frequent releases the average production time for each release of data; maximum production time to provide worst recorded case.
Punctuality : the percentage of releases delivered on time (based on scheduled release dates) .
Reporting
108
Accessibility and Clarity
Definition/Description
Accessibility : measure of ease with which users can obtain the data (where to go, how to order, delivery time, pricing policy, marketing conditions, availability of micro data etc).
Clarity : measure of the ease with which users can understand the data (depends upon the quality of metadata).
Summary: both refer to the simplicity and ease with which users can access statistics with appropriate supporting information.
109
Description of the conditions of access to the data: media, support, pricing policies, possible restrictions, etc.
Summary description of the metadata accompanying the statistics (documentation, explanation, quality limitations, etc.)
Description of how well both less sophisticated and advanced users needs have been addressed.
Summary of user feedback on accessibility and clarity.
Recent and planned improvements to accessibility and clarity.
Reporting
110
Trade-offs between Output Quality Components
Definition/Description
Quality components are not mutually exclusive, there are relationships between the factors that contribute to them.
In some cases there are factors leading to improvements with respect to one component but result in deterioration with respect to another.
Decisions for trade-offs have to be made in such circumstances.
111
Types (most significant ones):
Trade-off between Relevance and Accuracy
Trade-off between Relevance and Timeliness
Trade-off between Relevance and Coherence
Trade-off between Relevance and Comparability over Time
Trade-off between Comparability over Region and
Comparability across Time
Trade-off between Accuracy and Timeliness
112
Co
nce
ptu
al f
ram
ew
ork
(2
)
IT conditions (11) – Management, planning and legislation (12) – Staff, work conditions and competence (13)
User needs(3)
Data collection
(4)
Validation
Country level (5)International level (6)
Confidentiality (7)
Dissemination (9)
Documentation (8)
Follow-up (10)
RELEVANCE ACCURACY ACCESSIBILITY/CLARITY
TIMELINESS/ PUNCTUALITY
COMPARABILITY COHERENCE
Relationship between process and output quality components
113
Performance, Cost, and Respondent Burden
Definition/Description
Cost benefit analyses are required to determine the appropriate trade-off between costs and benefits of the output quality components.
Respondent participation must be viewed as a cost (to respondents) that has to be balanced against the benefits of the data provided.
114
Imposed on individuals, household members or businesses
The overall cost of delivering the information requested by a particular questionnaire depends on 3 components:
I. Number of respondents (R).II. Average time (T) required to provide the information (includes multiple procedures).III. Average hourly cost of a respondent’s time (C).
Total respondent burden for a questionnaire: R*T*C
Respondent Burden
115
Assessment of User Needs and Perceptions
We provide our users with products and services that meet their needs. The articulated and non-articulated needs, demands and expectations of external and internal users will guide the ESS, its members, their employees and operations (ESS Quality Declaration - User Focus)
116
Confidentiality, Transparency and Security
The privacy of data providers (households, enterprises, administrations and other respondents), the confidentiality of the information they provide and its use only for statistical purposes must be absolutely guaranteed.
Statistical authorities must produce and disseminate European statistics respecting scientific independence and in an objective, professional and transparent manner in which all users are treated equitably.
(European Code of Practice, Principles 5 and 6).
117
ESS QPI’s – prerequisites for Quality and Performance Indicators
Quality indicators: summary measures for certain key elements (process variables or output characteristics)
Representative of the main quality criteria
Applicable to most statistical processes
Well defined and standardised methodology for the calculation
Easy to interpret and understand
No additional burden for the Eurostat production units
118
Users and uses of the ESS QPI’s – e.g. a Quality Barometer (QB)
For production managers to evaluate their specific production process.
For domain managers to compare the quality indicators with average values for benchmarking across processes.
For top-management to have highly synthesised quantitative information for strategic decision making.
For users to analyse characteristics of the statistics and to compare the quality of different sets of statistics.
119
QPI – characteristics
Indicator for quality component
Definition
Levels of aggregation
Formulae
Interpretation
120
QPI – some examples
Relevance– Rate of available statistics
Accuracy– Coefficient of variation– Rate of over-coverage
– Average size of revisions
Comparability and coherence– Length of comparable time series– ...
121
What is SDMX and why is it so important?
Current situation :
Lacking harmonisation causes extra costs and inconvenience !!
123
What is SDMX and why is it so important?
Solution: A common and harmonised phone charger
Benefits :
For producers– One charger cheaper prod. costs– One package cheaper transport costs
For clients– One charger serves all convenience– Lower prod. Costs lower phone prices– Possible charging from computers
For everyone– Less chargers needed environmental benefits
124
What is SDMX and why is it so important?
Current situation in the ESS :
Different types of data and metadata files exchanged Conversion of formats needed No standardisation of the statistical contents Renaming, mappings needed Problem of correspondence between variables, codes, etc.
The way forward: creation of technical, IT and statistical standards to be used for official data and metadata
125
Seven international organizations (BIS, ECB, Eurostat, IMF, OECD, UN, World Bank) have joined forces, with the results of
creating the SDMX technical and statistical standards and creating the SDMX technical and statistical standards and guidelines together with an IT service architecture and IT tools to be guidelines together with an IT service architecture and IT tools to be used for the efficient exchange and sharing of statistical data and used for the efficient exchange and sharing of statistical data and metadata. metadata.
These technical and statistical standards and guidelines are sufficiently mature now. They are used and implemented more and more by statistical organizations around the world.
What is SDMX and why is it so important?
126
In 03/2009, the Eurostat top management confirmed SDMX and its implementation within the ESS in stating:
“As a first step before making SDMX compulsory for all domains in Eurostat, the use of SDMX would be made compulsory for all new or considerably changed datasets and reference metadata sets.”
The SDMX technical and statistical standards and guidelines can be regarded as one of the main enablers for implementing the new Eurostat vision based on the Commission Communication (COM 404/2009).
SDMX within the European Statistical System
127
SDMX – the main components
1. The SDMX technical standards (Version 2.0) SDMX information model Data and metadata messages and query formats Registry services
2. The SDMX guidelines to harmonise the statistical contents the SDMX Content-oriented Guidelines
3. The SDMX IT service architecture the “push”, “pull” and “hub approaches”
4. The SDMX IT tools IT tools produced by sponsoring organisations and openly available The SDMX IT tools can be used across the whole data life cycle and across statistical
domains
SDMX is not just a data transmission format
See also: www.sdmx.org;
128
SDMX components
1. SDMX technical standards (v. 2.0)
SDMX information model
Data and metadata message and query formats
Registry service definitions
Receiving Organisation
Sending OrganisationMessage
Registry
129
SDMX components
2. The SDMX guidelines to harmonise contents: the SDMX Content-oriented Guidelines Annex 1: The SDMX cross-domain concepts:
List of statistical concepts relevant to statistical domains to be used within the SDMX technical standards
Annex 2: The SDMX cross-domain code lists:
Statistical code lists relevant to statistical domains to be used within the SDMX technical standards
Annex 3: The statistical subject matter domainsList of subject matter domains (e.g. demography statistics, national accounts…)
Annex 4: The Metadata Common Vocabulary Metadata cross-domain statistical terminology used above
The SDMX COG were released in 01/2009.
130
SDMX components
3. SDMX IT service architecture “push” mode “pull” mode (see example) “hub” mode
Database
Pull Requestor
Receiveddata in
SDMX-MLLoader Dissemination
WebService
SDMX-MLfile
RSS
PULL
Input environment
Processing environment
Warehousestorage
XSLT forSDMX-ML
Sending organisation Receiving organisation
131
SDMX components
4. The SDMX IT tools
IT tools normally openly available via www.SDMX.org;
The SDMX IT tools can be used across the whole data life cycle (e.g. for the creation of data structure definitions, database loading, visualisation, metadata production, statistical registries etc.)
132
SDMX is the crucial instrument for rendering the production method of ESS statistics more efficient → new Eurostat vision
After a phase of low investment costs, the use of SDMX should reduce the burden on national and international statistical organizations.
Data and metadata messages produced by national and international organizations get more comparable and consistent.
National and international statistical processes get more harmonized and offer new ways of data and metadata exchange (such as data hubs).
Web-based dissemination formats are provided that are computer “readable” and easier to update.
Benefits of SDMX
133
Costs of SDMX
Development/maintenance of the SDMX standards and guidelines done by the international sponsoring institutions (supported by NSIs)
Standards are public and open source
IT tools are created by sponsoring or other organizations and made freely available
Capacity building by sponsoring or other institutions
Input to the SDMX standards from the user community through open process
No need to radically change the IT and statistical systems: gradual SDMX implementation possible with low investment costs
134
More emphasis on the implementation of SDMX within the ESS (asked by the Eurostat senior management)
Accelerated implementation in statistical domains (new Data Structure Definitions created)
Harmonisation of structural and reference metadata (e.g. the ESMS and the harmonised code lists)
SDMX is also implemented into the Eurostat IT applications used within the Eurostat CVD (in the single entry point, reference database, metadata handler etc.)
Many training and other capacity building actions organised (for IT staff, statisticians…)
Use of the hub/pull architecture (e.g. census hub)
SDMX – latest progress
135
SDMX is global
Good progress reached in creating the SDMX technical and statistical standards and guidelines
SDMX at the core of the harmonisation of the statistical business process, as outlined in the new Eurostat strategy
The implementation of SDMX in the different statistical domains requires the close involvement of Member states
For more information please see under
http://www.sdmx.org
Summarising
136
1. The ESS Standards for Reference Metadata
1.1 The Euro SDMX Metadata Structure (ESMS)
1.2 The Quality Reporting within Eurostat and the ESS (ESQRS)
1.3 Relation between ESMS, ESQR and ESQRS
2. The ESS Standards for Structural Metadata
2.1 Harmonisation within Eurostat and the ESS
2.2 Harmonisation within SDMX
138
Reference and Structural Metadata
Reference Metadata:– describe the contents and the quality of statistical data
• conceptual metadata describing the concepts used• methodological metadata describing methods• quality metadata describing the data quality
– are often linked to the data, but this is not mandatory
Structural Metadata:– identify and describe the data
• Name of the variables• Dimensions used in statistical cubes
– must be associated to the data otherwise data are meaningless
139
1.1 The Euro SDMX Metadata Structure (ESMS)
is the standard format for reference metadata in the ESS
replaces the former SDDS format since December 2008All SDDS files disseminated on Eurostat’s website are being converted into ESMS format (ongoing process, soon finalised)
is the format to be used for the reporting of national reference metadata files to Eurostat (Commission Recommendation 2009/498/EC of June 2009)
covers 21 concepts selected from the 62 SDMX cross-domain concepts (also the main quality related concepts)
is fully SDMX compliant.
1. The ESS standards for Reference Metadata
140
1. The ESS standards for Reference Metadata
1.1 The Euro SDMX Metadata Structure (ESMS)
1. Contact 8. Release policy 15. Timeliness and punctuality
2. Metadata update 9. Frequency of dissemination 16. Comparability
3. Statistical presentation 10. Dissemination format 17. Coherence
4. Unit of measure 11. Accessibility of documentation 18. Cost and burden
5. Reference period 12. Quality management 19. Data revision
6. Institutional mandate 13. Relevance 20. Statistical processing
7. Confidentiality 14. Accuracy and reliability 21 Comment
141
1.2 The Quality reporting within Eurostat and the ESS
Within the European Statistical System (ESS) reporting on statistical data quality exists in many statistical domains….
1. The ESS standards for Reference Metadata
143
1.2 The Quality reporting within Eurostat and the ESS
1. The ESS standards for Reference Metadata
… BUT :
– Quality reports do not exist for all statistical processes within the ESS;
– No homogeneity between the different report structures used for data quality reporting;
– Not all the quality related information is made publicly available;
– No common and standard IT infrastructure is used within the ESS;
The new Eurostat vision: “Improving the production method of EU statistics” requires an improvement action.
144
1.2 The Quality reporting within Eurostat and the ESS
1. The ESS standards for Reference Metadata
Progress made since 2008 :
01/2009: release of the new version of the ESS quality reporting documents:
• ESS Standard for Quality Reports (ESQR)
• ESS Handbook for Quality Reports (EHQR)
Detailed requirements following the European Statistics Code of Practice
ESS Quality and Performance Indicators (QPI’s) defined
03/09: EP/Council Regulation 223/2009 Article 12 defining the quality criteria to be reported
145
1.2 The Quality reporting within Eurostat and the ESS
The ESQR (European Standard for Quality Reports)– aims at providing recommendations for the preparation of
comprehensive quality reports for a full range of statistical processes and their outputs.
– is organised along the lines of the quality principles in the ESS Code of Practice
1. The ESS standards for Reference Metadata
I. Introduction to theStatistical process
V. ACCESSIBILITY and CLARITY
IV. TIMELINESS and PUNCTUALITY
III ACCURACY
II. RELEVANCE
VI. COMPARABILITY and COHERENCE
VII. Trade -Output Quality Components
VII. Trade offs between output quality components
VIII. Assessment of User needs and perceptions
X. Confidentiality,
IXI. Performance, Cost and Respondent
Burden
XI. Conclusions
X. Confidentiality, Transparency and Security
IXI. Performance, Cost and Respondentburden
146
1.2 The Quality reporting within Eurostat and the ESS
The ESQRS (ESS Standard for Quality Reports Structure)
– Based on the ESQR, a new report structure - the ESQRS - was created for harmonising the reporting on statistical data quality within the ESS.
– The ESQRS is using the main statistical data quality criteria as listed in EP/Council Regulation 223/2009 and as being part of the ESMS and details them further :
• Relevance• Accuracy• Timeliness and Punctuality• Accessibility and Clarity• Comparability• Coherence
– A subset of the Quality Performance Indicators (QPI’s) is also covered in the new ESQRS.
1. The ESS standards for Reference Metadata
147
ESS Guidelines
The guidelines for quality reporting from ESS Handbook for Quality Reports (EHQR) are already used in the “ESS Guidelines” for ESMS.
These guidelines will be further used in the ESQRS in order to provide detailed guidelines for 6 different statistical processes:
• Sample survey• Census• Statistical Process using Administrative Sources• Statistical Process involving Multiple Data Sources• Price or other Economic Index Process• Statistical Compilation
148
1.3 Relation between ESMS, ESQR and ESQRS
ESMS and ESQR
– ESMS is more oriented to the USERS of statistics to understand the statistical data released there is no need for too detailed information on data quality 21 SDMX cross domain concepts used
– ESQR is more oriented to the PRODUCERS of statistics to monitor the quality of the statistics produced in detail concentrating on the main quality concepts (being also part of
the ESS Statistics Regulation No 223/2009)
However, there is information on quality criteria which is common to both ESMS and ESQR.
1. The ESS standards for Reference Metadata
149
ESMS and ESQR
14. Accuracy and reliability
7. Confidentiality
20. Statistical processing
13. Relevance6. Institutional mandate
19. Data revision12. Quality management
5. Reference period
18. Cost and burden11. Accessibility of documentation
4. Unit of measure
17. Coherence10. Dissemination format
3. Statistical presentation
16. Comparability9. Frequency of dissemination
15. Timeliness and punctuality
8. Release policy
14. Accuracy and reliability
7. Confidentiality
20. Statistical processing
13. Relevance6. Institutional mandate
19. Data revision12. Quality management
5. Reference period
18. Cost and burden11. Accessibility of documentation
4. Unit of measure
17. Coherence10. Dissemination format
3. Statistical presentation
16. Comparability9. Frequency of dissemination
15. Timeliness and punctuality
8. Release policy
RELEVANCE
ACCURACY
TIMELINESS
ACCURACY
CLARITY
COMPARABILITY
ACCESSIBILITY
PUNCTUALITY
COHERENCE
XI. Conclusions
X. Confidentiality,
IXI. Performance, Cost and Respondent
Burden
VIII. Assessment of User needs and
Perceptions
VII. Trade-offs between Output Quality Components
I. Introduction to the Statistical Process and
Its Outputs
XI. Conclusions
X. Confidentiality,
IXI. Performance, Cost and Respondent
Burden
VIII. Assessment of User needs and
Perceptions
VII. Trade-offs between Output Quality Components
I. Introduction to the Statistical Process and
Its Outputs
ESQR ESMS
III ACCURACY
IV. TIMELINESSand PUNCTUALITY
V. ACCESSIBILITYand CLARITY
VI. COMPARABILITYand COHERENCE
II. RELEVANCE
1. Contact
21. Comment
2. Metadata update
Transparency, Security
= The quality criteria defined in EC/Council Regulation 223/2009
150
ESQR and ESQRS
XI. Conclusions
X. Confidentiality,
IXI. Performance, Cost and Respondent
Burden
VIII. Assessment of User needs and
Perceptions
VII. Trade-offs between Output Quality Components
XI. Conclusions
X. Confidentiality,
IXI. Performance, Cost and Respondent
Burden
VIII. Assessment of User needs and
Perceptions
VII. Trade-offs between Output Quality Components
ESQR ESQRS
Transparency, Security
III ACCURACY
IV. TIMELINESSand PUNCTUALITY
V. ACCESSIBILITYand CLARITY
VI. COMPARABILITYand COHERENCE
II. RELEVANCE
I. Introduction to the Statistical Process
and Its Outputs
IntroductionII IntroductionII
Relevance (user needs and perceptions)
III Relevance (user needs and perceptions)
III
AccuracyIV AccuracyIV
Timeliness and punctualityV Timeliness and punctualityV
Accessibility and clarityVI Accessibility and clarityVI
ComparabilityVII ComparabilityVII
CoherenceVIII CoherenceVIII
ContactI ContactI
CommentIX CommentIX
151
ESMS and ESQRS
– The metadata produced in the ESMS and ESQRS need to be kept consistent. The ESQRS is based on the ESQR, but not taking up all the chapters contained in the latter one.
– The information in the ESQRS is more detailed compared to the information on statistical data quality contained in the ESMS.
ESQRS reports deeper in terms of data quality compared to the ESMS
152
ESMS and ESQRS
Accuracy and reliability Non- sampling error
Description:Accuracy:closeness of computations or estimates to the exact or true values that the statistics were intended to measure.Reliability: closeness of the initial estimated value to the subsequent estimated value.
ESMS
Description:Error in survey estimates which cannot be attributed to sampling fluctuations.
Non- response errorNon- response error Unit response rateUnit response rate Formulae unit resp. rateFormulae unit resp. rate
Description:The difference between the statistics computed from the collected data and those that would be computed if there were no missing values.
Description:The ratio of the number of units for which data for at least some variables have been collected to the total number of units designated for data collection.
Description:
Ex. calculation formluae for un-weighted unit response rate.
Accuracy Non- sampling error
Description:Accuracy:closeness of computations or estimates to the exact or true values that the statistics were intended to measure.
Description:Error in survey estimates which cannot be attributed to sampling fluctuations.
ESQRS
153
Why is the harmonisation necessary?
• To facilitate the exchange of data and metadata in Eurostat, within the ESS and beyond (e.g. within the SDMX sponsoring organisations);
• To support the further implementation of SDMX in statistical domains (data structure definitions or metadata structure definitions);
• To enable and facilitate the metadata driven statistical business process as a response to the new vision for the ESS.
2. The ESS standards for Structural Metadata
154
155
Two processes for harmonising structural metadata:
Harmonisation within the ESS
Eurostat will produce and release harmonised structural metadata covering all the statistical domains. The main code lists will be proposed for inclusion into the SDMX at a the appropriate moment.
Harmonisation within SDMX
The SDMX Content-oriented Guidelines (version 2009) also deals with the harmonisation of structural metadata. Annex 2 recommends some cross-domain code lists to institutions using SDMX.
The two processes run in parallel with a “mutual” impact.The two processes run in parallel with a “mutual” impact.The results of both processes need to be fully consistent.The results of both processes need to be fully consistent.
155
2.1 The harmonisation in Eurostat and in the ESS
The need of faster progress on the harmonization of structural metadata (=harmonization of code lists) has become more and more evident.
Unit B6 (Reference databases and metadata) is working on the harmonisation of structural metadata to cover progressively all the statistical concepts used in Eurostat.
Standard Code Lists are defined on the basis of official classifications or widely used standards, in cooperation with the users and taking into account the needs of all the statistical domains.
2. The ESS standards for Structural Metadata
156
2.1 The harmonisation in Eurostat and in the ESS
More than 30 Eurostat Standard Code Lists released for the moment :
2. The ESS standards for Structural Metadata
157
2.1 The harmonisation in Eurostat and in the ESS
2. The ESS standards for Structural Metadata
Harmonisation of the code lists related to the 2011 Census:
AGE Age AMENITY Amenities AREA Area AREA_OCC Floor area per occupant BUILDING Type of building C_BIRTH Country/region of birth C_WORK Country/region of work CITIZEN Citizenship GEO Geopolitical entity (declaring) HHCOMP Composition of households HHSTATUS Individuals by household status HOUSING HousingI SCED97 International Standard Classification of Education (1997 version)
159
2.1 The harmonisation in Eurostat and in the ESS
2. The ESS standards for Structural Metadata
ISCO08 International Standard Classification of Occupations 2008 (ISCO-08)
MARSTA Marital status N_PERSON Number of persons N_ROOM Number of rooms NACE_R2 Statistical Classification of Economic Activities in the European
Community (NACE Rev. 2) RESID Residence ROOM_OCC Number of rooms per occupant SEX Sex SIZE_HAB Size classes in square meters (m²) TENURE Housing tenure status WSTATUS Activity and employment status Y_ARRIV Year of arrival Y_CONSTYear of construction
160
2.1 The harmonisation in Eurostat and in the ESS
2. The ESS standards for Structural Metadata
The production of additional harmonised structural metadata will also lead to an overall reduction of the code lists in use in the Eurostat Reference database and then within the whole CVD (down from 500 lists at the beginning of the process).
When harmonised structural metadata are published, changes can only be done by unit B6 (on request of the domain managers or following new upcoming needs e.g. new EU aggregates).
These standard code lists will be gradually included into the domain specific SDMX Data Structure Definitions and they will then also impact countries for the data transmission.
161
2.2 The harmonisation in SDMX
2. The ESS standards for Structural Metadata
The SDMX harmonized structural metadata have to be seen as a first package on which the seven participants from international organizations sponsoring SDMX agreed on in 2009.
Further work is necessary to be done by the SDMX sponsors and the SDMX Secretariat in order to enlarge the list the harmonized structural metadata already published.
The SDMX lists already published are to be consistent with the lists used by Eurostat
162
2.2 The harmonisation in SDMX
2. The ESS standards for Structural Metadata
Annex 2 of the SDMX Content-oriented Guidelines recommends the following 9 harmonized code lists to institutions applying SDMX:
163
2.2 The harmonisation in SDMX
2. The ESS standards for Structural Metadata
Example: the SDMX code list on frequency
164
Conclusions
Harmonisation of Metadata is crucial to facilitate the exchange Harmonisation of Metadata is crucial to facilitate the exchange of Data and Metadata between Institutions and National of Data and Metadata between Institutions and National Statistical Authorities within the ESS and beyondStatistical Authorities within the ESS and beyond
The standards for Reference Metadata and Quality Reporting The standards for Reference Metadata and Quality Reporting are defined (ESMS, ESQRS…) and more and more are defined (ESMS, ESQRS…) and more and more implemented within the ESSimplemented within the ESS
The harmonisation of Structural Metadata (standard code lists) The harmonisation of Structural Metadata (standard code lists) is progressing and needs further implementationis progressing and needs further implementation
165
The Eurostat Metadata Handler
is the backbone of a metadata-driven harmonised statistical business process at Eurostat and in the European Statistical System;
provides one central interface for accessing different types of metadata by Eurostat staff or by members of the European Statistical System;
provides services to other IT applications within the ESS;
is the primary source for different types of harmonised metadata to be used within Eurostat and the ESS.
167
The components of the Eurostat Metadata Handler (Eurostat-MH)
National Metadata Editor
EMIS
RAMON CODED
The Eurostat Metadata Handler
Common user interface
Output for the Eurostat web
Output for Eurostat or external users
Metadata from the Eurostat domain manager
Eurostat as main administrator
Eu
ro S
DM
X
Reg
istr
y
Metadata from National Statistical
Administrations
168
The Euro SDMX Registry
The Euro SDMX Registry currently contains : The Euro SDMX Registry currently contains :
• the Eurostat and ECB data structure definitions in use for GESMES-based data transmission;
• the code lists currently used in the different GESMES-based data structure definitions;
• the list of data and metadata flows (from countries to Eurostat); • the list of statistical concepts used in the abovementioned data
structure definitions; • some additional information such as provision agreements, etc.
The Euro SDMX Registry will contain in addition: The Euro SDMX Registry will contain in addition:
• the agreed SDMX data and metadata structure definitions used in different statistical domains for data and metadata exchange;
• the harmonized code lists used in SDMX data structure definitions; • national data structure definitions and code lists used in ESS countries
(if uploaded by countries).
169
National Statistical Authorities using the Euro SDMX registry
The National Statistical Authorities will have access to the Euro
SDMX Registry :
• for storing the data and metadata structure definitions they are using at national level;
• for storing the code lists they are using at national level;
• for retrieving the agreed SDMX data and metadata structure definitions used within the ESS and beyond;
• for retrieving the harmonised structural metadata (code lists) used within the ESS and beyond;
• for retrieving information related to data and metadata flows etc. (e.g. the concepts used, the provision agreements, etc… )
170
The National Reference Metadata Editor The National Reference Metadata Editor
• is the component dealing with national reference metadata;
• will accommodate the current ESS standards for reference metadata such as the ESMS or the upcoming ESQRS (for quality reporting);
• is addressed to the national producers of reference metadata, who will be able to compile their domain-specific metadata on-line and transmit them to Eurostat;
• is the IT tool stimulating the harmonization of reference metadata within the ESS;
• will be made available to the ESS later in 2010.
The National Reference Metadata Editor
171
The Explanatory Metadata Information System (EMIS) is the component dealing with the production and dissemination of reference metadata files at Eurostat;
The Eurostat domain managers create their metadata files in EMIS by using the ESMS structure
Technical functionalities in EMIS enable flexible extractions of information stored in the metadata files
EMIS will also accommodate national metadata files using the ESMS structure;
In a later stage also the ESQRS will be incorporated.
EMIS
173
Accessibility of the Eurostat-MH
Eurostat Domain Managers
National Statistical Authorities
Any public user
SDMX Registry Access rights for reading and downloading;Central maintenance by ESTAT DB Admin.
Access rights for reading and downloading; uploading of national DSDs, MSDs also possible
Access rights for reading and downloading
National Reference Metadata Editor
Access rights for production and downloading.
Specific production rights for the files related to the NSA and the specific statistical domains (central national administrator)
No access to the application.
EMIS Access rights for production and downloading; final dissemination of the files centralised.
No access to the application. No access to the application
CODED/RAMON Access rights for production and downloading; final dissemination centralised.
No access to the application No access to the application
174
The Eurostat-MH and the harmonisation of the ESS statistical business processes
The statistical standards and guidelines contained in the Euro SDMX Registry and in the National Reference Metadata Editor thoroughly contribute to the harmonisation and rationalisation of the statistical business processes used for data and metadata at national and international level.
Examples• The use of the Nace Rev. 2 code lists in SDMX based data structure
definitions from end-to-end of the statistical business process.
• The use of the ESMS for national reference metadata production and dissemination often leads to an integration of the national
business processes used for producing this metadata.
175
Conclusions
The Eurostat Metadata Handler is more and more in the centre of The Eurostat Metadata Handler is more and more in the centre of harmonising statistical business processes and metadata within harmonising statistical business processes and metadata within
the European Statistical System. the European Statistical System.
National Statistical Authorities should increasingly use the National Statistical Authorities should increasingly use the contents and functionalities of the Eurostat Metadata Handler. contents and functionalities of the Eurostat Metadata Handler.
The Eurostat Metadata Handler is one of the main responses to The Eurostat Metadata Handler is one of the main responses to the new Eurostat vision dealing with improvements of the the new Eurostat vision dealing with improvements of the
production methods of EU statistics. production methods of EU statistics.
176