metadata use in the statistical value chain
DESCRIPTION
Metadata use in the Statistical Value Chain. UNECE-Eurostat-OECD Meeting on Management of Statistical Information Systems MSIS 2008 Luxembourg, 7-9 April 2008 Georges Pongas Adam Wroński. Content. Introduction Operational Characteristics of Metadata - PowerPoint PPT PresentationTRANSCRIPT
07-Apr-08
Metadata use in the Statistical Value Chain
UNECE-Eurostat-OECD Meeting on
Management of Statistical Information SystemsMSIS 2008
Luxembourg, 7-9 April 2008
Georges Pongas Adam Wroński
7-Apr-08 Metadata use in the Statistical Value Chain 2
Content
1. Introduction
2. Operational Characteristics of Metadata
3. Technical Characteristics of the Metadata
4. Metadata types needed in the various steps of the SVC (statistical value chain)
5. Conclusion
7-Apr-08 Metadata use in the Statistical Value Chain 3
Seven SVC steps1. Expression of the need
2. Data collection design
3. Specification and development of the tools needed for the data collection
4. Data collection
5. Data editing and imputation
6. Data processing
7. Data dissemination
7-Apr-08 Metadata use in the Statistical Value Chain 4
Basics
Leave out the statistical notions from the technical (implementation oriented) characteristics of the metadata.
Design metadata technical characteristics so the same metadata structures can cover both statistical and non-statistical requirements
7-Apr-08 Metadata use in the Statistical Value Chain 5
Operational Characteristics of Metadata
Static nature Long production process Located in various places (resources) Critical link with statistical data
– depends on statistical data changes Strong coupling of structural metadata with
the statistical data Large number of metadata entities needed in
SVC
7-Apr-08 Metadata use in the Statistical Value Chain 6
Technical Characteristics of Metadata
Terminology often complex Technical characteristics and
statistical notions frequently mixed
7-Apr-08 Metadata use in the Statistical Value Chain 7
Statistical Notions and Metadata Examples
– Classification, keyword list and set of information related to the SDDS standard
– Correspondence table between two classifications & table containing the links (access rights) between the user names and the statistical datasets of a database
The only difference is the context, i.e., the user interface
Thus develop separately: – a common set of functionalities and – the interface layer for an application
7-Apr-08 Metadata use in the Statistical Value Chain 8
Metadata Technical Structure Categories
Three categories proposed:1. Simple Metadata Entities (SME)
2. Binary Relationships (BR)
3. Clustered Metadata Entities (CME)
7-Apr-08 Metadata use in the Statistical Value Chain 9
Simple Metadata Entities (SME)
simple key variable number of attributes appropriate for
vertical type storage
Example 1Example 2
Entity NACE user nameEntity element 2122 gpongasAttribute name English label phone noAttribute value “Mining” 430139
7-Apr-08 Metadata use in the Statistical Value Chain 10
Examples of SMEs
SDDS documentsDublin CoreClassificationsKeywordsAdministrative entitiesProgramsPublications
7-Apr-08 Metadata use in the Statistical Value Chain 11
Binary Relationships (BR)
Two types: Between two different entities
– correspondence tables, access rights definitions Inside the same entity
– thesauri, classification hierarchies, links between regulations, statistical documents
ExampleRelationship id UN thesaurusFirst entity id EUROPEFirst entity role ParentSecond entity id FRSecond entity role ChildReason of link Broader term
7-Apr-08 Metadata use in the Statistical Value Chain 12
Clustered Metadata Entities (CME)
Complex entities characterised by variable keys’ cardinality and references to other entities of type CME, SME and BR
Description techniques – XML schema is appropriate
7-Apr-08 Metadata use in the Statistical Value Chain 13
Examples
SDMX, Gesmes definitions
Dataset definitions
Annotations to dataset cells
Confidentiality definitions linked to datasets
7-Apr-08 Metadata use in the Statistical Value Chain 14
Metadata in the various steps of the SVC
7-Apr-08 Metadata use in the Statistical Value Chain 15
Collection Metadata
Mostly of type BR and SMEAmong others they contain:
– source agencies– data files descriptions– codelists– validation rules linked to initial data
checks
7-Apr-08 Metadata use in the Statistical Value Chain 16
Editing, Imputation and Processing Metadata
More complex than the collection metadata (more CME entities needed)
Among others they contain: – Dataset definitions– Formulas, programs, scripts– Conditional and ordinary annotations– Dissemination feeding information
7-Apr-08 Metadata use in the Statistical Value Chain 17
Dissemination Metadata The most complex metadata types
are located here. They contain almost all the previously
described metadata plus their own Reasons for this complexity
• Dissemination contains all the statistical domains
• It must cover all user types• It has tight delivery deadlines • It must offer navigation presentation and
extraction facilities of great friendliness
7-Apr-08 Metadata use in the Statistical Value Chain 18
Among others dissemination metadata contain
Sitemap descriptionRelease calendarsDataset links to publication tablesQuestionnaires definitions linked to
datasetsUnits of measurementReady made queries
7-Apr-08 Metadata use in the Statistical Value Chain 19
Conclusion
Separation of
statistical notions (context) and structure (functionality) of metadata
gives
minimisation of structural metadata types
consequently it makes easier to
build and implement a complex statistical (metadata and data) system