sdmx - oecd.org · introduction 1. data exchange process a. role of the sdmx-ri and sdmx converter...
TRANSCRIPT
Eurostat
7th Meeting of the Expert Group on SDMX
SDMX-RI and SDMX Converter Eurostat Unit B3: “IT for statistical production” Seoul, 27 October 2014
Introduction
1. Data exchange process
A. Role of the SDMX-RI and SDMX Converter in the data exchange process
B. SDMX compliance and data preparation
2. SDMX Converter
3. SDMX Reference Infrastructure (SDMX-RI)
A. Architecture overview B. Modules and functionalities C. Statistical domains using SDMX-RI
4. SDMX-RI - prerequisites, identified advantages and shortcomings
7th Meeting of the Expert Group on SDMX
1.B. SDMX compliance*
7th Meeting of the Expert Group on SDMX
Agreed standard/format for data exchange
Defined definitions, concepts, codelists, DSDs, transmission model and agreements
Requires: NSIs to create and transmit data
according to: agreed format mode of data exchange defined time period
* Prerequisite for SDMX tools and SDMX exchange
Macrodata database
Aggregated dataset 1
1.B. Data preparation
Statistical variables (topics) Microdata
database
Individual records
Aggregated dataset 2
Aggregated dataset 3 Aggregated dataset 4
Aggregated dataset 5
Aggregated dataset n
Non-SDMX local dissemination data
SISAI-Metadata Working Group meeting 01-07-2014
1.A. Role of the SDMX-RI in the data exchange process
Non-SDMX local data
NSI Process workflow
SDMX codes
Extract files
Transform file
SDMX file Dissemination/Transmission
NSI software
NSI software
SDMX Converter
Processing for sending
EDAMIS
SDMX Converter
SDMX-RI
Mapping Assistant
Test client Web client
Processing for sending
NSI Web service
EDAMIS
HUB
NSI development
NSI software EDAMIS
NSI developed software Eurostat tools 7th Meeting of the Expert Group on SDMX
Requires: aggregated data and SDMX DSDs
2. SDMX Converter Originally developed to convert from/to SDMX
Continuously extended to offer new functionality, conversion capabilities and supported formats
Currently supported formats:
SDMX-ML 2.1 and 2.0
GESMES/TS, GESMES/2.1, GESMES/DSIS
CSV, FLR, DSPL, Excel
7th Meeting of the Expert Group on SDMX
2. SDMX Converter
Reading input messages Parsing & populating internal SDMX data model
Writing output messages Writing them in target data format
Importing Data Structure Definition (DSD)
Provided locally or retrieved from a Registry
Four modes of operation:
Graphical User Interface
Command line
Application Programming Interface
Web Service
7th Meeting of the Expert Group on SDMX
3. SDMX – Reference infrastructure Set of IT modules, allowing a statistical office to
transform the data into SDMX format and to expose data in SDMX format to the external world
Modular architecture, developed in both Java and .NET
Supports different database vendors
Supports SDMX 2.0 and in the future SDMX 2.1
Allows data collector organisation to access and retrieve data on demand (pull approach)
Open Source Software – free of charge
In use: National Statistical Offices, Eurostat dissemination chain, UN, etc.
7th Meeting of the Expert Group on SDMX
3.B. SDMX-RI modules, functionalities and workflow (2/6)
Phase I. Preparation (mapping) Mapping Assistant (translates the internal data model into
SDMX compliant one)
7th Meeting of the Expert Group on SDMX
3. B. SDMX – RI modules and functionalities - Mapping Assistant (3/6)
Stores the SDMX structures agreed for the data exchange process
Allows users to define subsets of data to be disseminated
Creates and stores mappings between the internal data structure and SDMX concepts
(e.g. My_column_A = AGE)
Creates and stores mappings between the internal classifications and SDMX codelists
(e.g. My_code_AB = Y_LT15)
Result:
Control the exposed data
Preview the data in SDMX format
Identify errors
7th Meeting of the Expert Group on SDMX
Phase I. Preparation (mapping) Mapping Assistant (translates the internal data model into
SDMX compliant one)
Phase II. Testing Test Client (direct connection to the dissemination database) Web Client (using web service address)
7th Meeting of the Expert Group on SDMX
3.B. SDMX-RI modules, functionalities and workflow (2/6)
3. B. SDMX – RI modules and functionalities - Test Client (4/6)
Allows users to view and extract data in SDMX format, using the mappings defined in the Mapping Assistant tool
extract data directly from the dissemination database
extract data using a web address (web service)
Result:
Allows to test the data dissemination process
Test for SDMX compliance
Create custom extraction
Identify errors
Extract data in different formats
7th Meeting of the Expert Group on SDMX
3. B. SDMX – RI modules and functionalities - Web Client (5/6)
Allows users to view and extract data in SDMX format, using the mappings defined in the Mapping Assistant tool
Provides user friendly interface for even not experienced users
Can extract data using a web address (web service)
7th Meeting of the Expert Group on SDMX
Result:
Allows to test the data dissemination process
Test for SDMX compliance
Create custom extraction
Identify errors
Extract data in different formats
7th Meeting of the Expert Group on SDMX
3.B. SDMX-RI modules, functionalities and workflow (2/6)
Phase I. Preparation (mapping) Mapping Assistant (translates the internal data model into SDMX
compliant one)
Phase II. Testing Test Client (direct connection to the dissemination database) Web (NSI) client (using web service address)
Phase III. Production
NSI web service (understands and responds to SDMX queries)
Web (NSI) client (using web service address)
Test Client (using web service address)
3. B. SDMX – RI modules and functionalities - NSI) Web service (6/6)
No graphical user interface
Invisible for the user modules controlling the incoming data requests
Retrieving SDMX structure and mappings
Retrieving data from the dissemination database
Generating data response messages
Sending data in SDMX format
Result:
Data are made available to different data consumers via internet
7th Meeting of the Expert Group on SDMX
Users Software
National Statistical Institutes
EUROSTAT
SDMX Converter
Most SDMX compliant domains Most SDMX compliant domains
SDMX - RI Census 2011 dissemination (production) Census HUB* (production)
ICT dissemination (pilot) ICT HUB* (pilot)
National Accounts, Balance of Payments
Dissemination web service (production)
ISTAT
SDMX dissemination (production)
Job vacancies (pilot)
Labour cost index (pilot)
STS VELA
Hub architecture for statistics exchange
3.C. Statistical domains using SDMX-RI and SDMX Converter
7th Meeting of the Expert Group on SDMX * Customers of SDMX-RI
3.C. Census Hub - Standardisation of the approach
Data Provider = NSI Data Collector
Non-SDMX
local database
SDMX (STANDARD, EXCHANGE,
METADATA, REPOSITORY)
DSD
Mapping
Assistant
Metadata
repository
Test
Client
Mapping
store
Census Hub
Web
Client
WE
B S
ER
VIC
ES
SDMX query
SDMX response
DSD
DSD
7th Meeting of the Expert Group on SDMX
3.C. Reusability – ESS.VIP.BUS ICT Project
In 2013 a project based on reusing the Census Hub architecture and SDMX-RI was approved by ESSC - ESS.VIP.BUS ICT
Results (April 2014)
The national installation can be used for other Data Hubs with limited effort (for data mapping mainly)
7th Meeting of the Expert Group on SDMX
4. Prerequisites for SDMX-RI
Technical Statistical
Supported databases – Oracle DB, SQL Server, MySQL*.
SDMX compliance
Required development framework: Java and .NET
Perform mapping activities
Deployment requirements – Tomcat, WebLogic, IIS
Update of SDMX structures and mappings (if necessary)
(Minimal) hardware and software requirements
Aggregated data are produced
NSIs should maintain the system
7th Meeting of the Expert Group on SDMX
4. SDMX-RI - identified advantages
Facilitate the SDMX compliance phase
Supports all modes of data transmission
Limited/no change of the existing environment, production or dissemination systems
Limited/no need for adaptation
Open source
Reusable modules/installations through well-defined APIs
High performance
Minimal hardware and software requirements
SDMX-RI in the Hub concept (pull mode)
No data transmission
No replication of the data
In case of update, the new data are available immediately
7th Meeting of the Expert Group on SDMX
4. SDMX-RI – shortcomings/challenges (based on the Census Hub project)
No automatic way to understand the data are updated (possible future solution: implement RSS feed)
No validation (possible future solution: include validation module)
No confidentiality treatment (possible future solution: to be analysed)
No computation or aggregation of data (possible future solution: include an aggregation module)
Data are cross tabulated, aggregated and loaded in the database from the NSIs (possible future solution: include a loading mechanism)
7th Meeting of the Expert Group on SDMX