sdmx basics core elements information model data structure definition (dsd) sdmx-ml messages
DESCRIPTION
SDMX Basics Core Elements Information Model Data Structure Definition (DSD) SDMX-ML Messages Major changes in SDMX v 2.1. THE SDMX COMPONENTS. Technical Specifications The SDMX Information Model. Guidelines to Hamonise Content The Content Oriented Guidelines (COG). Tools - PowerPoint PPT PresentationTRANSCRIPT
1Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Basics
Core ElementsInformation ModelData Structure Definition (DSD)SDMX-ML MessagesMajor changes in SDMX v 2.1
2Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE SDMX COMPONENTS
Technical Specifications
The SDMX
Information Model
Guidelines to
Hamonise Content
The Content Oriented Guidelines (COG)
Tools
IT Architectures for data exchange
SDMX compliant tools
3Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
The SDMX Information Model is a meta-model describing the objects involved in:
The collection The dissemination The publication of aggregated statistics and related metadata
The abstract model is like a structured set of containers
Everything in SDMX is model-driven: All messages and interfaces are implementations of the
information model
THE SDMX INFORMATION MODEL
4Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – SCOPE
DATA & METADATA
FLOWS
Structure Definition
Category Scheme
Category
ConstraintProvision Agreement
Data Provider
Data & Metadata set
5Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Information Model
6Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA & METADATA
Time series data representation
Cross-sectional data representation
Statistical Data (Figures)
Statistical Metadata (Identifiers, Descriptors)
Structural metadata
Reference metadata
Statistical Metadata (Methodology, Quality)
7Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Statistical data - Cube
Time
20052006
Country FR ITESAT
Tourism activity
A100
B010
B020
2007
Time series
Cross-section for 2006
time/activity B0102005 81742006 81382007 8052
Number of tourist campsites - France - annual data
geo/activity B010AT 542ES 1216FR 8138IT 2510
Number of tourist campsites - national - 2006
817481388052
542121681382510
STATISTICAL DATA & METADATATwo different ways to represent data
8Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA - TIME SERIES REPRESENTATION
9Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA - CROSS-SECTIONAL REPRESENTATION
10Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
From a number to statistical data
11353511 11353511
STRUCTURAL METADATA Introduction
11Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
CONCEPTS
STRUCTURAL METADATA
Identify and describe data
Dimension, Attribute or
Measure in a DSD to define a Data set’s structure
Attributes in a MSD to define the
structure of a Metadata set
12Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
Number of touristic establishmentsin I taly, annual data
A100Hotels and similar
B010Tourist Campsites
B020Holiday dwellings
STRUCTURAL METADATAFrom a statistical table to its descriptor concepts
13Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STRUCTURAL METADATA – CONCEPTS AND ROLES
14Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD
STRUCTURAL METADATA: DATA STRUCTURE DEFINTION To easily exchange and process data, we first define a standard container based on the structure of the real statistical table: The Data Structure Definition (DSD)
Code lists
Code lists
Code lists
Dimensions
Attributes
Measures
Concepts
UNITTIME_PERIOD
COUNTRY
OBSERVATIONS
The DSD can be seen as a "logical container" for a specific set of data that we want to exchange. It includes the concepts that represent the data, gives them roles (Dimension, Measure, Attributes) and links them to code lists.
15Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
ELEMENTS OF A DATA STRUCTURE DEFINITION
16Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – 10-11 and 14-15 March 2011
DatasetDSD
SDMX does not introduce any new concept for statisticians. It just provides a framework for what statisticians already know.
Code lists
Observations
Table structure The SMDX dataset is a standard container in which statistical data are represented together with the structural metadata, according to the DSD.
SDMX INFORMATION MODEL - DATA SET
Now you have an easy way to exchange and process data and metadata automatically.
17Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA SET
KEYKEYKEYGROUP KEYGROUP KEYGROUP KEY
KEY VALUESKEY VALUESKEY VALUES
TIME PERIODOBSERVATIO
N VALUE
ATTRIBUTEVALUE
Attribute attachmentAttribute attachment
Cross-section
Time series
SDMX INFORMATION MODEL - DATA SET
18Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL - DATA SET
19Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL - DATA SET
20Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
REFERENCE METADATA
21Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Reference Metadata Set
SDMX INFORMATION MODEL - METADATA SETConcepts
22Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – DATA & METADATA FLOW
DATA & METADATA
FLOWS
Structure Definition
Category Scheme
Category
ConstraintProvision Agreement
Data Provider
Data & Metadata set
23Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – CATEGORIES
24Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM – DATA PROVIDERS & PROVISION AGREEMENT
Production and dissemination of Statistical data
Production and dissemination of
Reference Metadata
25Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA & METADATA
FLOWS
ConstraintProvision Agreement
SDMX IM - CONSTRAINTS
26Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM - CONSTRAINTS
Example: A data provider can restrict his reporting of monthly data to only some months.
Example: A data provider can restrict his reporting of data to subsets of statistical cubes.
27Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM - SUMMARY
28Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE SDMX COMPONENTS
Technical Specifications
The SDMX
Information Model
Guidelines to
Hamonise Content
The Content Oriented Guidelines (COG)
Tools
IT Architectures for data exchange
SDMX compliant tools
29Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
IT ARCHITECTURES FOR DATA EXCHANGE
30Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX REGISTRY
REGISTRY
31Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX REGISTRY DEMONSTRATION
32Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Data Structure Definition (DSD)
33
COMPLIANCE & IMPLEMENTATION
Generally the following four steps need to be done:
1.Preparation: The statisticians from the organisations involved in the data exchange describe the data and the different dataflows, dataset and provision agreements.
2.Compliance: you create all the necessary objects according to the SDMX Technical Specifications.
3.Implementation: Now we put into practice. Standard software is installed and configured to use the DSDs. The exchange process is set up and tested.
4.Production: use the objects in the production process. SDMX implementation is achieved when the data and metadata exchanges within the domain are carried out according to SDMX-compliant specifications.
34Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Define the DSD– List of concepts (Concept scheme)– Roles of concepts (Dimension, Attribute, Measure)– Code lists
Provide the related Dataflows (e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ)
CREATE ALL THE NECESSARY OBJECTS
35Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE STEPS TO BUILD A DATA STRUCTURE DEFINITIONIdentification of the descriptor concepts for the data Choose the type of data representation (Time Series
and Cross-sectional )
Choice of Cross Domain code lists or definition of specific code
lists for coded conceptsDefinition of the text format
for non coded concepts
Definition of the concept role (Dimension, Attribute or Measure)
Define Dimensions for Time Series and Cross-sectional
data representation
Define Attributes with the attachment levels Time
Series and Cross-sectional data representation
Define Time Series primary measure and/or Cross-
sectional measures with their measure concepts
Create the defined artefacts in a SDMX Data Structure Definition tool (e.g. DSW)
1
2
3
4
5
36Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
1- IDENTIFICATION OF THE DESCRIPTOR CONCEPTS
37Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
2 – DEFINE THE CODE LISTS
38Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Cross-sectional slice
Time s
eries
slice
Statistical data - Cube
Country ES ITFRAT
Tourism activity
A100
B010
B020
Time
20052006
2007
Time series
Cross-section for 2006
geo/activity B010AT 542ES 1216FR 8138IT 2510
Number of tourist campsites - national - 2006
125012161220
542121681382510
3- CHOOSE THE TYPE OF DATA REPRESENTATION TIME SERIES (TS) / CROSS-SECTIONAL (CS)
39Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA REPRESENTATION – TIME SERIES
40Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA REPRESENTATION – CROSS-SECTIONAL
41Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
4- DEFINE ROLES OF CONCEPTS AND LIST OF CONCEPTS
42Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
5 – DEFINE GROUPS AND ATTRIBUTE ATTACHEMENTS
43Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011Eurostat Unit B5 – Statistical Information TechnologiesSDMX Training for Statisticians – March 2010
6 – DEFINE THE VIEW OF THE DATA STRUCTURE
44Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Year MonthTurnover
index Status Confidentiality2002 January 84.5 actual free2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free2003 February 88.3 provisional free2003 March 96.1 provisional free
Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)
Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.
(monthly data - Base year: 2000)
EXAMPLE: STS SAMPLE DATASET
Dimensions
Attributes
Primary Measure
Dimensions
45Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE: STS SAMPLE DATASET
STS_INDICATORTITLE STS_ACTIVITY
REFERENCE_AREAFREQ STS_ BASE_YEAR
ADJT
46Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
OBS_STATUSOBS_VALUE
REFERENCE_PERIOD
OBS_CONF
STS_INSTITUTION
EXAMPLE: STS SAMPLE DATASET
47Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;FM;GR;N;TOTV;NS5201;1;2000;200205;60.8;A;FM;GR;N;TOTV;NS5201;1;2000;200206;78.2;A;FM;GR;N;TOTV;NS5201;1;2000;200207;89.9;A;F
AttributesPrimary MeasureDimensions
M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A,F
Reference PeriodGroup
EXAMPLE: STS SAMPLE DATASETIDENTIYING CONCEPTS AND GROUPING SERIES IN CSV FILES
48Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD OF DATAFLOW STSRTD_IND_M
Concept Concept ID
frequency FREQ reference area REF_AREA
adjustment ADJUSTMENT
type of index STS_INDICATOR
activity STS_ACTIVITY
type of institution STS_INSTITUTION
base year STS_BASE_YEAR reference period TIME_PERIOD
turnover idex OBS_VALUE status OBS_STATUS
confidentiality OBS_CONF time duration set TIME_FORMAT
Title TITLEdecimals DECIMALS
Example of value Remark
M Monthly GR Greece N No
TOVV Turnover deflated (volume of sales)
NS5201 Retail trade
11=NSI or 2=National
Bbank 2000
200201 CCYYMM 108.6 observation
A actual data F Free of publication
P1M ISO8601 1 One
Code List
CL_FREQ CL_AREA_EE
CL_ADJUSTMENT
CL_STS_INDICATOR CL_STS_ACTIVITY
CL_STS_INSTITUTION CL_STS_BASE_YEAR
CL_OBS_STATUS CL_OBS_CONF
CL_TIME_FORMAT
CL_DECIMALS
Dimensions
Measure Attributes
Attachment level
Obs Obs
Series Group
Group
List of variables ValuesCodesRolesFootnotes
49Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STRUCTURE OF THE DATASET FOR TIME SERIES
Group of series
Series M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;F
REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS5201" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail trade"
Attributes and attachment level: group
M;GR;N;TOTV;N15220;1;2000;200201;60.8;A;FM;GR;N;TOTV;N15220;1;2000;200202;78.2;A;FM;GR;N;TOTV;N15220;1;2000;200203;89.9;A;F
Group of series REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N15220" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail sale of food"
Attributes can be attached to groups
Series
Series
Series
Series
Series
Series
50Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Definition of Series 1
M;GR;N;TOTV;NS0006;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS0006;1;2000;200203;88.8;A;F
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS0006" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes and attachment level: series
M;GR;N;TOTV;N14500;1;2000;200201;60.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;78.2;A;FM;GR;N;TOTV;NS0006;1;2000;200203;89.9;A;F
Definition of Series 2
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N14500" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes can be attached to seriesAttributes can be attached to series
Series 1
Series 1
Series 1
Series 2
Series 2
Series 2
STRUCTURE OF THE DATASET FOR TIME SERIES
51Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Definition of Series 1
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS0006" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes and attachment level: series
Attributes can be attached to observations
Definition of Observation 1
TIME_PERIOD="200201" OBS_VALUE="88.8" OBS_STATUS="A" OBS_CONF="F"
Definition of Observation 2
TIME_PERIOD="200202" OBS_VALUE="84.7" OBS_STATUS="A" OBS_CONF="F"
Definition of Observation 2
TIME_PERIOD="200203" OBS_VALUE="88.8" OBS_STATUS="A" OBS_CONF="F"
M;GR;N;TOTV;NS0006;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS0006;1;2000;200203;88.8;A;F
Observation 1
Observation 2
Observation 3
CSV
STRUCTURE OF THE DATASET FOR TIME SERIES
52Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
Measures
AttributesDimensionsDimensions
53Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
TITLE
TIME_PERIODTIME_PERIOD
TAB_NUM
REV_NUM OBS_STATUSFREQFREQ
COUNTRYCOUNTRY
Dimensions attached to the dataset level
Dimensions attached to the group level
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
54Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
OBS-VALUE
DEMODEMO
SEXSEXUNIT
MALE
Dimensions attached to the observation level
Measure Dimension
FEMALE TOTAL
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
55Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD FOR DATAFLOW: DEMOGRAPHY_RQ Attachment
level Concept Concept ID Code List Values
reference period TIME_PERIOD 2005
reporting country COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX demographic
characteristic DEMO CL_DEMO # of births, etc. frequency FREQ CL_FREQ A (for annual) Male MALE number of persons Female FEMALE number of persons Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset reference
table TAB_NUM RQFI05V1 Section (Series) unit of value UNIT CL_UNIT PERS (for persons)
observation status OBS_STATUS CL_OBS_STATUS provisional data
observation series time duration set TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005 COUNTRY CL_COUNTRY Fi (for Finland) sex SEX CL_SEX
M (male), F (Female),
DEMO CL_DEMO # of births, etc. frequency FREQ CL_FREQ A (for annual) Male MALE number of persons Female FEMALE number of persons Total TOTAL number of persons
dataset title TITLE Title of the
exchanged dataset dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005 COUNTRY CL_COUNTRY Fi (for Finland) sex SEX CL_SEX DEMO CL_DEMO # of births, etc. frequency FREQ CL_FREQ A (for annual) Male MALE number of persons Female FEMALE number of persons Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005 COUNTRY CL_COUNTRY Fi (for Finland) sex SEX CL_SEX DEMO CL_DEMO # of births, etc. frequency FREQ CL_FREQ A (for annual) Male MALE number of persons Female FEMALE number of persons Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Dimensions
Cross-sectional Measures
Attributes
Attachment level Concept Concept ID Code List Values
reference period TIME_PERIOD 2005
reporting country COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX demographic
characteristic DEMO CL_DEMO # of births, etc. frequency FREQ CL_FREQ A (for annual) Male MALE number of persons Female FEMALE number of persons Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset reference
table TAB_NUM RQFI05V1 Section (Series) unit of value UNIT CL_UNIT PERS (for persons)
observation status OBS_STATUS CL_OBS_STATUS provisional data
observation series time duration set TIME_FORMAT CL_TIME_FORMAT P1M
Dimensions
Cross-sectional Measures
Attributes
56Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Dataset
Attributes and attachment level
Attribute attached to group
COUNTRY="FI"
Group REF_PERIOD="2005" FREQ="A" TIME_FORMAT="P1Y"
Section DECI="0" UNIT="PERS" UNIT_MULT="0"
Dimension attached to dataset
Attributes attached to sections
Dimension attached to group
Observation FEMALE OBS_VALUE="35" DEMO="ADJT" OBS_STATUS="P"
Cross–sectional measureDimensions attached to observation
Attribute attached to observation
MALE OBS_VALUE="29400" DEMO="LBIRTHST" OBS_STATUS="P"
TOTAL OBS_VALUE="8986" DEMO="NETMT" OBS_STATUS="P"
Observation
Observation
STRUCTURE OF THE DATASET FOR CROSS SECTIONAL
57Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Organisation Schemes
DSDs
Concept Schemes
Category Schemes
DataFlows
Code lists
CREATION OF THE DSDTHE SDMX OBJECTS RELATED TO THE DATA STRUCTURE
58Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSW – “standalone” desktop application
(replaced KeyFamily AccessDB tool)
Offline version of Eurostat’s SDMX Registry
Maintenance of SDMX v2.0 data and meta data
structures (create, modify, delete, query)
Import/Export SDMX-ML structures (validate
structure messages)
Import/Export GESMES/TS structure files
Reporting of structures
Advanced search features
Export metadata for use with the GENEDI tool
Data Authoring (building SDMX-ML sample datasets)
Interaction with any SDMX v2.0 compliant Registry
Query SDMX v2.0 Registry
Submit data structures to SDMX v2.0 Registry
SDMX Registry
Import/Export SDMX-ML messages
CREATION OF THE DSD: DATA STRUCTURE WIZARD
59Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Example - DSD import / creationusing the DSW
60Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
LIFE DEMONSTRATION - DSD IMPORT / CREATION USING THE DATA STRUCTURE WIZARD
61Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA STRUCTURE DEFINITIONID FISH_CATCH_A
Name Catches for all fishing areas
Version 1.0
AgencyID ESTAT
Valid From
Valid To
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
62Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DIMENSIONS
Position in Key
CONCEPT REPRESENTATION
Dimension TypeID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VER AGENCY
1 FREQ Frequency CS_FISHERIES 1.0 ESTAT CL_FREQ 1.1 ESTAT Frequency
2 REPORTING_AREACountry ISO3 codes (extended)
CS_FISHERIES 1.0 ESTATCL_REPORTING_AREA
1.0 ESTAT
3PRODUCTION_AREA
Production Area (from major area to sub-unit)
CS_FISHSTAT 1.0 FAOCL_PRODUCTION_AREA
1.0 FAO
4 SPECIESASFIS Species Alpha 3 Code
CS_FISHSTAT 1.0 FAOCL_SPECIES
1.0 FAO
TIME TIME_PERIOD Reference year CS_FISHERIES 1.0 ESTAT
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
63Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
MEASURES
TYPE
CONCEPT REPRESENTATIONMEASUR
E DIMENSI
ON
CODEID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VER AGENCY
Primary OBS_VALUE Value of the measureCS_FISHERIES
1.0 ESTAT N/A N/A
ATTRIBUTES
ATTACHMENT LEVEL
CONCEPT REPRESENTATION
ATTRIBUTE TYPE
ASSIGNMENT STATUSID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VER AGENCY
Observation UNIT unit CS_FISHERIES 1.0 ESTAT CL_UNIT 1.1 ESTAT C
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
64Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Converter Data Structure Wizard
SDMX Technical Standard v2.0 (http://www.sdmx.org/index.php?page_id=16)
Help-desk: [email protected]
USEFUL LINKS
65Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX-ML Messages
66Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Based on a common Information Model– SDMX-EDI (GESMES/TS)
• EDIFACT syntax• Time-series oriented – One format for Data
Sets– SDMX-ML
• XML syntax• Four different formats for Data Sets• Easier validation (XML based)
SYNTAXES FOR SDMX MESSAGES
67Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Element Example id TEST0000 test true truncated false name FISH_AQ_TEST prepared 2010-30-01T09:30:47+01:00 senderid ESTAT sendername Eurostat sendercontactname G. Smith sendercontactdepartment Statistics sendercontactrole Response sendercontacttelephone 0210 2222222 sendercontactfax 0210 00010999 sendercontactx400 sendercontacturi www.sdmx.org sendercontactemail [email protected] receiverid NSI_GB receivername CSO receivercontactname P. Mustermann receivercontactdepartment Statistics receivercontactrole Statistician receivercontacttelephone 02101234567 receivercontactfax 02103810999 receivercontactx400 receivercontacturi www.sdmx.org receivercontactemail [email protected] datasetagency ESTAT datasetid FISH_AQX datasetaction Append extracted 2010-30-01T09:30:47+01:00 reportingbegin 2008-01-01T00:00:00 reportingend 2008-12-31T00:00:00 source DH lang en
SDMX DATA COMMON HEADERS
68Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Equivalent representations for reporting Datasets
SDMX DATA MESSAGES
Version 2.0 Version 2.14 data messages, each with a distinct format.
GenericData
CrossSectional DataCompact Data
UtilityData
Therefore, there are now 4 data messages which are based on two general formats: • GenericData GenericTimeSeriesData• StructureSpecificData StructureSpecificTimeSeriesData
Phased out
69Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF GENERIC SDMX-ML MESSAGE
70Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF COMPACT SDMX-ML MESSAGE
71Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF CROSS-SECTIONAL SDMX-ML MESSAGE
72Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Equivalent formats
Generic SDMX-ML
Cross-sectional SDMX-ML
Compact SDMX-ML
Can be expanded to other formats (e.g. CSV, GESMES)
Based on the
same IM
Exceptions:
If a Cross-Sectional DSD does NOT contain a
time dimension
CONVERSIONS SDMX V2.0
73Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Read the input message
Parsing Populate the data model of the tool
(based on the SDMX v2.0 information
model)
Write the converted message
Uses the data model to write the output message in the required
target format.
Information retrieved from the Registry
Data flow ID is used to retrieve the data flow definition from the
Registry.
The DSD ID, version and agencyID are retrieved from the data flow definition
and are used to acquire the DSD
SDMX CONVERTER
74Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Possible conversionsCSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML *
SDMX-EDI (GESMES/TS)
CSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML
SDMX-EDI (GESMES/TS)
Main use: Conversion CSV Compact SDMX-ML
SDMX CONVERTER MAIN FUNCTIONALITY
SDMX training session on basic principles, Major Changes in version 2.1
Fabien JACQUET
SDMX Basics
MMMM 2011
Select the Input file Select the output file
Select the input and output formats
Select the DSD on the local driveIdentify a DSD to
download from the SDMX Registry
Identify a dataflow linked to the DSD to download from the SDMX Registry Select / manage
headers for CSV input formats
Select mapping / transoding tables
CSV parameters
GESMES representation for GESMES output
formats
Load / save the current settings
XML parameters for SDMX output formats
76Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Conversion Example
77
Major changes in SDMX v 2.1
78Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Overview of the changes
Structural Metadata– Data Structure Definition (DSD)– Metadata Structure Definition
(MSD)– Constraint– Code List– Organisation Scheme– Categorising Structures– Process– Provision Agreement– Transformations and
Expressions
Data Set– Message Changes– Structured Data
Mechanism Revised Metadata Set
– Message Changes– Alignment of Formats– Structured Metadata
Mechanism Revised
79Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Data structure Definition (DSD) Support for non-time-series data structuresMeasure Dimension
DSD
Code lists
Code lists
Code lists
DimensionsAnd
Measure dimension
Attributes
Measures
Concepts
DSD
Version 2.0 Version 2.1
Measure Dimension
Dimensions
Attributes
Primary Measure
Concepts
Concept Scheme
Code lists
Code lists
80Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Maintainable artefact
Constraint
Version 2.0 Version 2.1
Dataflow
Provision agreement
Constraint
Constraint
Registry Constraint
Dataflow Code list
Provision agreement
DSD
81Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Code List
Common
Code listConstraint 1 Par
tial
DSD DSD
Constraint 2
Version 2.1
82Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Categorising Structures
Version 2.0 Version 2.1
Category Scheme
Data/Metadata flow
Reference
Categorisation
Data/Metadataflow Code list
Category
ReferenceProvision
agreementDSD
Category
Only
Maintainable artefact
83Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Version 2.0 Version 2.1
Message Changes
Data Set
4 data messages, each with a distinct format.
GenericData
CrossSectionalDataCompactData
UtilityData
Therefore, there are now 4 data messages which are based on two general formats: • GenericData o GenericTimeSeriesData• StructureSpecificData o StructureSpecificTimeSeriesData
Phased out