eddi: introduction to sdmx arofan gregory open data foundation

143
EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Upload: beverley-kelley

Post on 11-Jan-2016

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

EDDI: Introduction to SDMX

Arofan Gregory

Open Data Foundation

Page 2: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

What is SDMX?

• The problem space:– Statistical collection, processing, and

exchange is time-consuming and resource-intensive

– Various international and national organisations have individual approaches for their constituencies

– Uncertainties about how to proceed with new technologies (XML, web services …)

Page 3: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

International OrganisationsRegional Organisations

accountsstatistics

Banks, CorporatesIndividual Households

trans-actions

accounts

National Statistical Organisations

accountsstatistics

180

+ C

ount

ries

180

+ C

ount

ries

Inte

rnet

, S

earc

h, N

avig

atio

nIn

tern

et,

Sea

rch,

Nav

igat

ionwww.z.org

www.hub.org

www.x.org

www.y.org

Page 4: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

What is SDMX?

The Statistical Data and Metadata Exchange (SDMX) initiative is taking steps to address these challenges and opportunities that have just been mentioned:– By focusing on business practices in the field

of statistical information– By identifying more efficient processes for

exchange and sharing of data and metadata using modern technology

Page 5: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Historical Note

• SDMX uses an approach based on the 10-year-long success of an earlier standard – GESMES/TS

• GESMES/TS was an initiative that is used today in many countries for collecting, exchanging, and updating statistical databases– GESMES/TS is now SDMX-EDI

• Focus is on time-series, and is mostly used by central banks

Page 6: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Who is SDMX?• SDMX is an initiative made up of seven

international organizations:– Bank for International Settlements– European Central Bank– Eurostat – International Monetary Fund– Organisation for Economic Cooperation and

Development– United Nations– World Bank

• The initiative was launched in 2002

Page 7: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Products• Technical standards for the formatting and

exchange of aggregate statistics:– SDMX Technical Specifications version 1.0 (now

ISO/TS 17369 SDMX)– SDMX Technical Specifications version 2.0

(submitted to ISO)– SDMX Technical Specifications version 2.1 under

review (will be forwarded to ISO)

• Content-Oriented Guidelines– Common Metadata Vocabulary– Cross-Domain Statistical Concepts– Statistical Subject-Matter Domains

Page 8: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Detailed SDMX Goals• Reduce national reporting burden to international institutions• Fostering consistency, accuracy, and timeliness between

data and metadata disseminated by national and international institutions, relying on what is decentrally released via national websites

• Enhancing national statistical processing efficiency, especially through internationally-recognised standard formats for exchanges between statistical silos within institutions and with other national statistical agencies

• Providing standards for web-based dissemination formats that are computer readable and facilitate updating of databases

• Enhancing comparison of data and metadata analysis through standard formats and content-oriented guidelines

Page 9: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Official Recommendations

• SDMX has been officially recommended:– February 2007: SDMX endorsed by the

European Union’s Statistical Programme Committee

– March 2008: UN Statistical Commission declares SDMX to be the preferred standard for data and metadata

Page 10: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Exchange Patterns

• Bilateral: Institutions exchange data according to bilateral agreements regarding format, timing, protocols, etc.

• Gateway: Institutions share the data they collect with their peers, in agreed formats among counterparty communities

• Data-sharing: standard exchange of data using standard formats and protocols

Page 11: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Bilateral Exchange

Page 12: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Gateway Exchange

Page 13: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data-Sharing Exchange

Page 14: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Notes About Data-Sharing

• Data-sharing only works if there are standard formats

• Data-sharing works only if the data themselves are decentralized– One big database doesn’t work!

• Like the Web itself, a data-sharing model relies on pull exchanges, not push exchanges– Data consumers discover the data they need, and its

location, and then go and get it– Data producers don’t have to send data

Page 15: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX View

• SDMX products support all types of exchange

• One major requirement is to work well with existing systems, to protect technology investments

• SDMX promotes an incremental movement toward the data-sharing model

Page 16: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Exchange with Peer Organizations

• SDMX-EDI and SDMX-ML are both able to exchange databases between peer organizations

• Structural metadata is also exchanged and can be read by counterparty systems

• Incremental updating is possible• Increases degree of automation for exchange –

lowers degree of bilateral, verbal agreement• Can use “pull” instead of “push” if registry is

deployed

Page 17: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Integration within an Organization

• SDMX standard formats are also useful within an organization– Many organizations have several disparate databases– Differences in database structure and content can

make it difficult to use other system’s data– SDMX-ML provides a way to loosely couple such

databases, while facilitating exchange– An SDMX registry can allow visibility into other

databases, while not affecting control or ownership of data

Page 18: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Collection and Warehousing

• When data is collected from many different sources, it can be in a wide variety of formats– Typically metadata-poor

• SDMX allows for a single, metadata-rich reporting format for each type of data

• Existing counterparty systems can be “wrappered” to support SDMX for exchange only

Page 19: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Adoption of SDMX

• SDMX has been aggressively adopted, as compared to other international technology standards– Many important data sets are available in

SDMX-ML today– There are many prototypes and planned

projects at the national and international level– Increasing numbers of tools are available

which support SDMX

Page 20: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Adopters/Interest• The following are known adopters (or planning to adopt):

– US Federal Reserve Board and Bank of New York– European Central Bank– Joint External Debt Hub (WB, IMF, OECD, BIS)– UN/TRADECOM at UN Statistical Division– NAAWE (National Accounts from OECD/Eurostat)– European Statistical System (Eurostat and National Statistical Institutes)– Mexican Federal System– Vietnamese Ministry of Planning and Investment– Qatar Information Exchange– IMF (BOP, SNA, SDDS/GDDS)– Food and Agriculture Organization– Millennium Development Goals (UN System, others)– International Labor Organization– Bank for International Settlements– OECD– World Bank World Development Indicators (WDI)– Marchioness Islands (Spanish/Portuguese Statistical Region)– UNESCO (Education)– Australian Bureau of Statistics– WHO (SDMX-HD)– Statistics Canada– There are many others!

Page 21: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Domains

• SDMX is organized as a central standard, created and supported by the SDMX Initiative– Each statistical domain creates it’s own domain

standard– Example: WHO has created SDMX-HD (“Health

Domain”) for monitoring disease outbreaks/epidemiology

– Example: UNESCO and Eurostat have developed standard SDMX applications for Education Statistics

• You should look at the work in the different domains when applying SDMX to different national-level statistics collection

Page 22: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

US Federal Reserve Board

• Several important data sets are available – and searchable at a granular level – using SDMX

• SDMX-ML is both a web-delivery format and an internal exchange format for production of data

http://www.federalreserve.gov/datadownload/

default.htm

Page 23: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Federal Reserve Bank of New York

• Historical data – once stored in huge CSV files – is now available as SDMX-ML

• Increased the use of the site

• The “typical user” is now a machine

http://www.newyorkfed.org/xml/index.html

Page 24: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

European Central Bank

• ECB uses SDMX-EDI to exchange data with European Central Banks

• SDMX-ML is used for web dissemination– Simultaneous release on many CB sites– Each site can use its own language and look & feel– Data warehouse now available in SDMX-ML

• Built and maintained using SDMX standardshttp://www.ecb.int/stats/exchange/eurofxref/html/index.en.htmlhttp://stats.ecb.europa.eu/stats/sdmx/visualisation/icp/dashboard/rc1/

• ECB’s Statistical Data Warehouse/web service

Page 25: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

OECD

• Data structures are specified using SDMX standards

• Data sets are held in SDMX-ML format and navigated “on the fly”– OECD.Stat

• http://stats.oecd.org/WBOS/index.aspx

• Experimenting with graphical presentation of data

• Serves all OECD data as SDMX through OECD.stat web service

Page 26: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Eurostat

• Builds on long experience of using GESMES for data transmission (GESMES is main format for transmission of data in several important domains e.g. national accounts, balance of payments, short-term statistics)

• More than 50 Data Structure Definitions for GESMES developed and maintained (in partnership with ECB)

• Software components developed and made available as open-source software (see Tools page of SDMX website)

• Now creating a portal for all European Census data, collected as SDMX

Page 27: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Specifications and Products

Page 28: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

can provide data/metadata for many data/metadata flows using agreed data/metadata structure

conforms to business rules of the data/metadata flow Data or

Metadata FlowData or

Metadata Flow

Data ProviderData ProviderProvision

AgreementProvision

Agreement

can get data/metadata from multiple data/metadata providers

Data or Metadata Set

Data or Metadata Set

publishes/reports data/metadata sets

uses specific data/metadata structure

Data or Metadata Structure DefinitionData or Metadata

Structure Definition

Registered Data or

Metadata Set

Registered Data or

Metadata Set

can have child categories

comprises subject or reporting categoriescan be linked to

categories in multiple category schemes

SDMX Information Model: High level Schematic

CategoryCategory

Category Scheme

Category Scheme

is registered for

registers existence of data and metadata

Page 29: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Technical Specs v 1.0

• Information Model (data structure definitions and data formats)

• SDMX-ML: XML formats for data structure definitions and data

• SDMX-EDI: EDI formats for data structure definitions and data

• Web-Services Guidelines

• User Guide

Page 30: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Technical Notes on Version 1.0

• Only numeric observations were supported

• Only coded key values were supported

• Intended to provide an XML version of the existing GESMES/TS data model – GESMES/TS became SDMX-EDI– XML extended the data model to provide for

more types of groups and cross-sectional data

• Hierarchical codelists not supported

Page 31: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Technical Spec v. 2.0

• Expanded data model includes– Registry interfaces– Metadata structures and formats– Data and metadata provisioning– Other advanced features (process flow,

reporting taxonomy, structure mapping, etc.)

• Data formats now include uncoded dimensions, hierarchical codelists, and non-numeric observations

Page 32: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Technical Notes on Version 2.0

• A very large expansion of scope– Model covers the process of statistical

exchange, not just the data formats– Many cases which version 1.0 could not

support were included in version 2.0 as a result of implementations

• Full support for the “data sharing” pattern of exchange– Resulting from the inclusion of the registry

Page 33: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Changes for Version 2.1

• Expanded Web Services Guidelines– Standard WSDL Functions– Standard RESTful syntax (URL-based API)– Standard Error Codes– Will allow for interoperable web services for SDMX – so generic

clients can use multiple sources

• Simplified Data Formats– All data formats will be more consistent– Cross-sectional and time-series formats are more similar

• SDMX Query has been improved• Note: SDMX 2.1 is available for public review now!

Page 34: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Content-Oriented Guidelines

• Four documents:– Overview– Metadata Common Vocabulary– Cross-Domain Concepts– Statistical Subject-Matter Domains

• These will not become ISO specifications, but will evolve as publications of the SDMX Initiative

Page 35: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata Common Vocabulary

• A set of terms and definitions for the different parts of the SDMX technical standards, and many common concepts used in data and metadata structures

• Does not replace other major vocabularies in this space (such as the OECD glossary) but references these other works

Page 36: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Cross-Domain Concepts• Includes concepts which are common

across many statistical domains– Names & Definitions– Representations

• These are concepts which support both data and metadata structures

Page 37: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Statistical Subject-Matter Domains

• Based on the UN/ECE classification of statistical activities

• Provides a classification system for use in exchanging statistics across domain boundaries

• Provides a breakdown of the various domains within official statistics

Page 38: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Data Formats

Page 39: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set

Page 40: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

We have a dataset, what do we need to know?

• Version 1.0– What it is and how it is structured

• Version 2.0– Who reports/disseminates it– How a specific data set fits into the overall

collection framework and which organisation is responsible for reporting which parts

– The reporting/publication schedule– That it has been reported/published

Page 41: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set: Structure

Page 42: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

First: Identify the Concepts

• A concept is a unit of knowledge created by a unique combination of characteristics (SDMX Information Model)

Page 43: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Computers need structure of data

•Concepts

•Code lists

•Data values

•How these fit together

Unit Multiplier

Unit

Topic

Time/Frequency

CountryStock/Flow

Data Set Structure:Concepts

Page 44: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set Structure: Code Lists

Code Lists

TOPIC

A Brady Bonds

B Bank Loans

C Debt Securities

AR Argentina

MX Mexico

ZA South Africa

COUNTRY STOCK/FLOW

1 Stock

2 Flow

CONCEPTS

Topic

Country

Flow

Concepts

Page 45: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

16457

Q,ZA,B,1,1999-06-30=16547

Data Makes Sense

Page 46: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set Structure: Defining Multi-dimensional Structures

• Comprises– Concepts that identify the observation value– Concepts that add additional metadata about the

observation value– Concept that is the observation value– Any of these may be

• coded• text• date/time• number• etc.

Dimensions

Attributes

Measure

Representation

Page 47: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set Structure: Concept Usage

Unit Multiplier

Unit

Topic

Time/Frequency

CountryStock/Flow

Observation

(Dimension)(Dimension)

(Dimension)

(Attribute)

(Dimension)

(Dimension)

(Attribute)

(Measure)

Page 48: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

has code list

Code List

Code List

AttributesAttributes

concepts that add metadata

has format

concepts that identify groups of keys

concepts that identify the observation

Data Structure Definition

Data Structure Definition

Key Key Group Key Group Key

Dimensions Dimensions

Concept Concept

MeasuresMeasures

CONCEPTS

Topic

Country

Flow

takes semantic

from

has formattakes

semantic

from

takes semantic

from

has format

concepts that are observed phenomenon

TOPIC

A Brady Bonds

B Bank Loans

C Debt Securities

Representation

Coded Coded Non-

coded Non-

coded

Page 49: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

16457

Q,ZA,B,1,1999-06-30=16547

Data Makes SenseFrequency,Country,Topic,Stock/Flow,Time=Observation

Quarterly, South Africa, Bank Loans, Stocks, 2nd quarter 1999

Page 50: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Identifying Concepts

• Identifying Concepts - Sources– Existing data set tables

• From website• From applications

– Data Collection Instruments• Questionnaires• Excel spreadsheets

– Regulations, Handbooks, User Guides• Labour Statistics Convention, 1985 (No. 160), Recommendation,

1985 (No. 170)• Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of

14/02/1976; Compilation of statistics on foreign workers

– Database Tables– Existing Data Structure Definitions

• From other organisations

Page 51: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Identify Concepts – from website

Source: FAO proof of concept project

Measurement = 1,000 Kg

Page 52: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Concepts

Reference Region

Commodity

Frequency and Time

Observation Value

Measure Type

Unit and Unit Multiplier

Measurement = 1,000 Kg

Page 53: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Concept Role: Reminder

• Dimensions– Are the concepts that identify the observation value

• Attributes– Are the concepts that add additional metadata about

the observation value

• Measure– Is the concept that is the observation value

Page 54: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Exercise:Concept Role

Reference Region

Commodity

Frequency and Time

Observation Value

Measure Type

Unit and Unit Multiplier

Measurement = 1,000 Kg

(Dimension)(Dimensions)

(Measure)

(Dimension)

(Dimension)

(Attributes)

Page 55: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Set and StructureDimension Concept

FREQ

REF_AREA_REG

COMMODITY

MEASURE_TYPE

TIME

Measure Concept

OBS_VALUE

Attribute Concept

OBS_STATUS

OBS_CONF

UNIT

UNIT_MULTIPLIER

Page 56: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Identify/Define Code Lists

• Purpose of a Code List– Constrains the value domain of concepts when used

in a structure like a data structure definition– Defines a shortened language independent

representation of the values– Gives semantic meaning to the values, possibly in

multiple languages

• Agreeing on harmonised code lists is the most difficult aspect of defining a data structure definition

Page 57: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Structure Definition

Data Structure Definition

Key Key Group Key Group Key

Dimensions Dimensions

Concept Concept

AttributesAttributes MeasuresMeasures

takes semantic

from

has format

takes semantic

from

takes semantic

from

has format

has format

concepts that add metadata

concepts that identify the observation

concepts that are observed phenomenon

concepts that identify groups of keys

Data Structure Definition - Reminder

Representation

Coded Coded Non-

coded Non-

coded

Code List

Code List

has code list

Page 58: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Data Formats

Session: SDMX Syntax Implementations for Data

Page 59: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Data Syntax Implementations

• SDMX provides for two main syntaxes:– UN/EDIFACT (for SDMX-EDI)– XML (for SDMX-ML)

• Each syntax provides a format for describing data structure definitions

• Each syntax provides at least one format for data– There are 4 different XML syntaxes for data

Page 60: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX-EDI

• EDI – “electronic data interchange” – is an older, flat-file syntax used primarily to conduct e-commerce– There have been a few statistical messages– GESMES is the “generic statistical message”

• EDI messages are difficult to read unless you know EDI very well…

Page 61: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation
Page 62: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Benefits of SDMX-EDI

• As a data format, it is very compact– Good for very large data sets

• Permits incremental updating of data sets• Permits attributes and observations to be

sent separately• Has a very large installed base within the

European community and the central banks (used by 180 countries)

• It is not very Web-oriented, however

Page 63: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX-ML Document Types (Data)• Structure Message: Holds the agencies, concepts,

codelists, and data structure definitions (DSDs)• Generic Format: A single XML schema for all different types

of data, regardless of data structure definition• Utility Format: Specific to DSD, provides strongest

validation• Compact Format: Like the EDI message, compact, but not

as much validation as Utility• Cross-Sectional Format: Similar to Compact, but holds

cross-sectional data• Data Query Message: Allows for querying of online

databases and similar applications which are SDMX-aware. Supports web services.

Page 64: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

The SDMX-ML Data Formats• In designing the XML formats for SDMX, several different

needs were identified– Needed an XML format for describing data structure definitions– Needed an XML version of the EDIFACT messages for transmitting

large databases– Needed an XML which would help validate statistical data sets– Needed an XML which could be used generically for any statistical

data set– Needed an XML for transmitting cross-sectional data– Needed a message to query for data

• Because SDMX-ML is based on the SDMX Information Model, it was decided to create several equivalent XML data formats, to satisfy each of these cases– Requirements were mutually exclusive for these cases

Page 65: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Generic Data Message• No validation• Carries data for any data structure definition• Verbose – files are very large• Can perform incremental updates and carry

partial data sets• Useful for applications which need to carry

potentially incorrect data for processing and cleaning

• Useful for generic applications which handle data for more than one DSD

• Serves as a “pivot format” between other SDMX-ML format types

Page 66: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Utility Data Message

• Provides strongest validation – all business rules in DSD are enforced by a generic XML parser (schemas are specific to particular DSDs)

• Less verbose than Generic; more verbose than Compact & Cross-Sectional

• Incremental updates not supported• For XML tools, this is the most “normal”

type of XML schema – performs best

Page 67: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Compact Data Message

• Equivalent of SDMX-EDI data format, but schemas are specific to a particular DSD

• Good for exchanging partial data sets and incremental updates

• Very compact (for XML) in terms of file sizes• Very simple, but performs limited validation

– Will validate codelists, but not some other things

Page 68: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Cross-Sectional Data Message

• Similar to Compact format, but allows for lots of observations for a single point in time (not time-series oriented like other formats)

• Very compact

• Supports incremental updates

• Provides limited validation – schemas are specific to a particular DSD

Page 69: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Selecting the Right SDMX-ML Format

• Free tools allow transformation between data formats without any loss – each application can use one or more formats for specific tasks

• Depending on the application, one format may be preferable to another– How large are the data files?– How much validation needs to be performed?– How many DSDs are supported by the application?– Will all data be correct when received (according to

the DSD)?

Page 70: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX-ML “Model-Driven” XML Approach

DSD

Page 71: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Additional SDMX Features

• Hierarchical Code List

• Structure Set (mappings)

• Reporting Taxonomy

Page 72: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Hierarchical Code Lists – Example Scenario

• France is a country• France is part of the continent of Europe• France is a member of NATO• France is a member of the EU• France is a member of the G10• When I analyse statistics I might want to see totals by

– continent– trading block– military alliance– financial grouping

• France will be grouped with different sets of countries depending on the “view” required

• How do we express these groupings?

Page 73: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

6B NATO

B0 EU

B1 NAFTA

BE Belgium

BG Bulgaria

CA Canada

CH Switzerland

CZ Czech Republic

DE Germany

DK Denmark

E1 Europe

E8 North America

EE Estonia

ES Spain

FI Finland

FR France

GB United Kingdom

GR Greece

HU Hungary

JP Japan

I2 Euro 12

IT Italy

NE Netherlands

US United States

Reference Area

Code Parent

BE E1

BG E1

CH E1

CZ E1

DE E1

DK E1

EE E1

ES E1

FI E1

FR E1

GB E1

etc

Code Parent

BE E0

CZ E0

DE E0

DK E0

EE E0

ES E0

FI E0

FR E0

GB E0

etc

Europe EU countries

Code Parent

BE 6B

BG 6B

CA 6B

CZ 6B

DE 6B

DK 6B

EE 6B

ES 6B

FR 6B

GB 6B

etc

NATO countries

Code Parent

CA B1

US B1

MX B1

NAFTA countries

Code Parent

CA B1

US B1

North America

Code Composition

Code Composition

Code Parent

BE G0

CA G0

CH G0

DE G0

FR G0

GB G0

JP G0

IT G0

NL G0

SE G0

US G0

G10 countries

Code Association

Code Association

Code ListCode ListCodeCode Hierarchy-1Hierarchy-1

Code Composition

Code Composition

Hierarchy-2Hierarchy-2 Hierarchy-3Hierarchy-3

Code Composition

Code Composition

Hierarchy-4Hierarchy-4

Code Composition

Code Composition

Page 74: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

value based hierarchy has code groups

belongs to

Hierarchical Code

Scheme

Hierarchical Code

Scheme

CodeCodeCode

AssociationCode

Association

Code Composition

Code Composition

LevelLevel

HierarchyHierarchy

parent code

code

relates a code to a parent code

groups codes with the same parent

comprises code groupscomprises hierarchies

comprises code groups

level based hierarchy has formal levels

PropertyProperty

Code ListCode List

Properties of the association

The codes may be in variety of code lists.

Schematic of the Hierarchical Code Scheme

Page 75: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Item Scheme Maps

• Many types of “item scheme” use the same fundamental structure– Code list– Category scheme– Concept scheme

• Two Item Schemes can be mapped

Page 76: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Item SchemeItem Scheme

ItemItemItem

AssociationItem

Association

has item associations

source item

Item SchemeItem Scheme

ItemItemtarget item

Item Scheme Association

Item Scheme Association

source item schemetarget item scheme

Code List Category Scheme

Concept Scheme

Code Category Concept

Code List Map

Category Scheme

Map

Concept Scheme

Map

Association Role

Association Role

Code List Category Scheme

Concept Scheme

Code Category Concept

Schematic of the “Code” Mapping

Page 77: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Structure Maps

• Structures can also be mapped– Data structures– Metadata structures

Page 78: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data or Metadata Flow

Data or Metadata Flow

Data ProviderData ProviderProvision

AgreementProvision

Agreement

Data or Metadata Set

Data or Metadata Set

Data or Metadata Structure DefinitionData or Metadata

Structure Definition

CategoryCategory

Category Scheme

Category Scheme

Registered Data Set or

Metadata Set

Registered Data Set or

Metadata Set

Data/Metadata Reporting, Query, Analysis, Mapping

Structure & Item Scheme

Maps

Structure & Item Scheme

Maps

Content ConstraintContent

Constraint

Attachment Constraint

Attachment Constraint

Page 79: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Reporting Taxonomy

• An SDMX Reporting Taxonomy is a group of data flows and/or metadata flows which form the basis of a single real-world document or report

• They can be organized into groups and sub-groups as needed

• They can be named and identified• Useful for managing various types of

reports over time

Page 80: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Processes

• SDMX 2.0 provides the ability to document the steps and logic of a process flow

• This is not executable, but serves as documentation to describe the processes which produce data and metadata

• It is useful as a target for the attachment of reference metadata describing processing

Page 81: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Metadata Formats

Page 82: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Reference Metadata• We have seen how data values are limited to where they

belong– Series key (usually qualified by time)

• Data attribute values are limited in where they belong– Observation value– Series key– Group key– Data set

• Metadata is everywhere, but– it must be metadata about “something”

• what is the “something”• how is it identified

– it comprises concepts and how are they structured• The Metadata Structure Definition answers these

questions• Advance release calendar is only one possible example

Page 83: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata Example: Advance Release Calendar (ARC)

• What is the release calendar for?– Informs when data will be

published/made available

• Who publishes the data set?• What type of data is it (data flow)?• What metadata is in the release

calendar (i.e. its structure)• Who publishes the release calendar?• When is it published?

Labour Force Statistics

RELEASE CALENDAR

Page 84: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Structure

•Concepts

•Hierarchies

•Representation (e.g. code list)

Metadata Structure Definition (MSD)

RELEASE CALENDAR

Page 85: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

definition of format and permitted values

Metadata Structure Definition (MSD)Report Structure

Metadata AttributesMetadata AttributesMetadata AttributesMetadata Attributes

Format and Permitted Value List

Format and Permitted Value List

Metadata Report

Metadata Report

Concept Scheme

Concept Scheme

concept defined inConceptConcept

takes semantic and context from

Metadata Structure Definition

Metadata Structure Definition

can comprise the specification of one or more report

can have hierarchy

can have hierarchy

can comprise the specification of one or more report

Page 86: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Example ARC MetadataDay Ref Area Indicator Ref Period Time Tolerance Status

30-04-2007

INE, Spain LF-H Q: 31-03-2007

09:00 +24 Hr. Final

30-04-2007

INE, Spain LF-E Q: 31-03-2007

09:00 +24 Hr. Final

30-04-2007

ONS, UK LF-H Q: 31-03-2007

09:00 +48 Hr. Final

30-04-2007

ONS, UK LF-E Q: 31-03-2007

09:00 +48 Hr. Draft

Identifiers

Page 87: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

MSD Metadata Concepts: Advance Release Calendar

Concepts

Concept Id Description

REFERENCE_PERIOD The time period to which a variable refers

RELEASE_DATE_TIME The specific point in time that data or metadata are made available

DATE_TOLERANCE The possible or permissible variance of a time period relative to a known point in time.

RELEASE_STATUS The state of preparedness of a statement on the availability of data or metadata

ANNOTATION Additional metadata

1

Page 88: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

MSD: Report Structure for ARC

ARC

REFERENCE_PERIOD

RELEASE_DATE_TIME

RELEASE_STATUS

DATE_TOLERANCE

MY_AGENCY:METADATA_CONCEPTS

ANNOTATION

Metadata AttributesMetadata AttributesMetadata AttributesMetadata Attributes

Format and Permitted Value List

Format and Permitted Value List

Metadata Report

Metadata Report

Concept Scheme

Concept Scheme

ConceptConcept

Metadata Structure Definition

Metadata Structure Definition

ARC_METADATA

REFERENCE_PERIOD

RELEASE_DATE_TIME

RELEASE_STATUS

DATE_TOLERANCE

ANNOTATION

Page 89: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Annotation

Metadata Attribute

Concept = Representation =Text

Release_Date_Time

Release_Status

ARCMetadata Report =

CL_Status

F Final

P Provisional

Target Id =

MSD: Metadata Report Structure

Reference_Period

Metadata Attribute

Concept = Representation = Date/Time

Date_Tolerance

Date/Time

Time Value

Metadata Attribute

Metadata Attribute

Metadata Attribute

Concept = Representation =

Concept = Representation =

Concept = Representation =

Page 90: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

AnnotationConcept = Value = simultaneous release by ECB

Date_Tolerance

Release_Date_Time

Reference_Period

Release_Status

ARCMetadata Report =

Metadata Attributes

Concept =

Concept =

Identifiers

Metadata Set

Value = 2007-31-03

Concept = Value = 2007-04-30T09:00

Value = F

Concept = Value = +24Hr

Metadata Structure = ARC_METADATA

Metadata Set: ARC Report Example

Page 91: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata Example: Advance Release Calendar (ARC)

• What is the release calendar for?– Informs when data will be

published/made available

• Who publishes the data set?• What type of data is it (data flow)• What metadata is in the release

calendar (i.e. its structure) • Who publishes the release calendar?• When is it published?

RELEASE CALENDAR

Page 92: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

definition of format and permitted values

Metadata Structure Definition (MSD)To which object is the metadata attached?

Metadata AttributesMetadata AttributesMetadata AttributesMetadata Attributes

Format and Permitted Value List

Format and Permitted Value List

Metadata Report

Metadata Report

Concept Scheme

Concept Scheme

concept defined inConceptConcept

takes semantic and context from

Metadata Structure Definition

Metadata Structure Definition

can comprise the specification of one or more report

can have hierarchy

can have hierarchy

Target IdentifierTarget

Identifier

can comprise the specification of one or more report

Links to

Page 93: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

can provide data for many data flows using agreed data structure

conforms to business rules of the dataflow Data FlowData Flow

Data ProviderData

Provider

Provision AgreementProvision

Agreement

can get data from multiple data providers

Data SetData Set

publishes/reports data sets

uses specific data structure

Data Flows: Controlling Reporting and Publishing

RELEASE CALENDAR

Structure Definition

Structure Definition

Page 94: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

can provide data for many data flows using agreed data structure

Controlling Data Reporting

conforms to business rules of the dataflow Data FlowData Flow

Data ProviderData

Provider

Provision AgreementProvision

Agreement

can get data from multiple data providers

Data SetData Set

publishes/reports data sets

uses specific data structure

Structure Definition

Structure Definition

RELEASE CALENDAR

1A – INE Spain

LF-H = labor force hours

Page 95: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Provision AgreementProvision

Agreement

Identify Structure

•Concepts

•Hierarchies

•Representation (e.g. code list)

Metadata Structure Definition (MSD)

RELEASE CALENDAR

Page 96: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Full Target Identifier

Full Target Identifier

Partial Target

Identifier

Partial Target

Identifier

Metadata Structure Definition

Metadata Structure Definition

Identifier Components

Identifier Components

Identifier Components

Identifier Components Item SchemeItem Scheme

defines “keys” of object types to which metadata can be “attached”

specifies the identifier components (“key”) of the target object

identifies the code list or other type of list (e.g. Category Scheme which defines the valid values tat can be used when metadata are reported in a metadata set

Target Object Type

Target Object Type

identifies target object type of the component

MSD: Identifying the “Target”

Page 97: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

MSD: Object Identification for ARC

Data FlowData Flow

Data ProviderData Provider

Data_Flow_Provider

ARC_METADATA

ARC

Full Target Identifier

Full Target Identifier

Partial Target

Identifier

Partial Target

Identifier

Metadata Structure Definition

Metadata Structure Definition

Identifier Components

Identifier Components

Identifier Components

Identifier Components Item SchemeItem Scheme

Metadata Report

Metadata Report

Target Object Type

Target Object Type

LF-H Labour Force, Hours Worked

LF-E Labour Force, Employment

CL_DATA_FLOW

OS_DATA_PROVIDER

1A INE, Spain

2A ONS, UK

Page 98: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data ProviderData Provider

Data FlowData Flow

Metadata Structure Definition =

Target = Data_Flow_Provider

MSD: Identifiers for ARC

ARC_METADATA

Identifier Component

Target Object Type =

Item Scheme =

Target Object Type =

Item Scheme =

Identifier Component

OS_DATA_PROVIDER

1A INE, Spain

2A ONS, UK

LF-H Labour Force, Hours Worked

LF-E Labour Force, Employment

CL_DATA_FLOW

Page 99: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Annotation

Metadata Attribute

Concept = Representation =Text

Release_Date_Time

Release_Status

ARCMetadata Report =

CL_Status

F Final

P Provisional

Target Id =

MSD: Metadata Report Structure

Data_Flow_Provider

Reference_Period

Metadata Attribute

Concept = Representation = Date/Time

Date_Tolerance

Date/Time

Time Value

Metadata Attribute

Metadata Attribute

Metadata Attribute

Concept = Representation =

Concept = Representation =

Concept = Representation =

Page 100: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

AnnotationConcept = Value = simultaneous release by ECB

Date_Tolerance

Release_Date_Time

Reference_Period

Release_Status

ARCMetadata Report =

Metadata Attributes

Concept =

Concept =

Identifiers

Metadata Set

Data Provider =

Data Flow =

1A

LF-H

Value = 2007-31-03

Concept = Value = 2007-04-30T09:00

Value = F

Concept = Value = +24Hr

Metadata Structure = ARC_METADATA

Metadata Set: ARC Report Example

Data FlowData Flow

Data ProviderData

Provider

Provision AgreementProvision

Agreement

Page 101: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata: Advance Release Calendar (ARC)

• What is the release calendar for?– Informs when data will be

published/made available

• Who publishes the data?• What type of data is it (data flow)?• What metadata is in the release

calendar (i.e. its structure)?• Who publishes the release calendar?• When is it published?

RELEASE CALENDAR

Page 102: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

ARC_METADATA

ARC

1A

conforms to business rules of the metadata flow

can provide metadata for many metadata flows using agreed metadata structure

Controlling Metadata Reporting

Metadata collectors can set up control metadata for the collection process

Metadata Flow

Metadata Flow

(Meta)Data Provider

(Meta)Data Provider

Provision AgreementProvision

Agreement

can get metadata from multiple metadata providers

Metadata Set

Metadata Set

publishes/reports metadata sets

uses specific data structure

Metadata Structure Definition

Metadata Structure Definition

Page 103: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata: Advance Release Calendar (ARC)

• What is the release calendar for?– Informs when data will be

published/made available

• Who publishes the data?• What type of data is it (data flow)?• What metadata is in the release

calendar (i.e. its structure)• Who publishes the release calendar?• When is it published?

RELEASE CALENDAR

Page 104: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Reference Metadata

• Metadata is everywhere, but– it must be metadata about “something”

• what is the “something”• how is it identified

– it comprises concepts and how are they structured• The Metadata Structure Definition answers these

questions• Advance release calendar is only one possible example

– attached to the Provision Agreement

To which (other) things can metadata be attached?

Page 105: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

MSD: Some Object Types

Category Scheme

CategoryData or Metadata

Flow

Data Provider

Provision Agreement

Structure Definition

Data Set or Metadata

Set

Content

Constraint

Structure and Item Scheme

Maps

Registered Data Set or Metadata

Set

Attachment

Constraint

Page 106: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

MSD: List of Object Types to Which Metadata can be Attached

AgencyConceptSchemeConceptCodelistCodeKeyFamilyComponentKeyDescriptorMeasureDescriptorAttributeDescriptorGroupKeyDescriptorDimensionMeasureAttributeCategorySchemeReportingTaxonomyCategoryOrganisationScheme

DataProviderMetadataStructureFullTargetIdentifierPartialTargetIdentifierMetadataAttributeDataFlowProvisionAgreementMetadataFlowContentConstraintAttachmentConstraintDataSetXSDataSetMetadataSetHierarchicalCodelistHierarchyStructureSetStructureMapComponentMap

CodelistMapCodeMapCategorySchemeMapCategoryMapOrganisationSchemeMapOrganisationRoleMapConceptSchemeMapConceptMapProcessProcessStep

Page 107: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

definition of format and permitted values

Metadata Structure Definition (MSD)Report Structure

Metadata AttributesMetadata AttributesMetadata AttributesMetadata Attributes

Format and Permitted Value List

Format and Permitted Value List

Metadata Report

Metadata Report

Concept Scheme

Concept Scheme

concept defined inConceptConcept

takes semantic and context from

Metadata Structure Definition

Metadata Structure Definition

can comprise the specification of one or more report

can have hierarchy

can have hierarchy

Target IdentifierTarget

Identifier

can comprise the specification of one or more report

Links to

Page 108: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Metadata Formats

Session: SDMX-ML Formats for Metadata Sets

Page 109: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata Formats Syntax Implementation

• There are three relevant constructs in SDMX-ML for handling metadata sets– Metadata Structure Definitions– Metadata Reports (specific to an MSD)– Generic Metadata Sets (for any MSD)

• This is similar to data formats in SDMX-ML, except that there are fewer different use cases

• There is no corresponding format implementation in SDMX-EDI for Reference Metadata

Page 110: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Comparing Formats for Metadata Sets

• Generic Metadata performs no validation, but can hold any type of metadata report

• MSD-specific Metadata Reports can perform more validation, and are less verbose– Because there tend to be few codelists or

numeric types in metadata reports, the validation may not be very useful

Page 111: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata: Quality Frameworks

• The SDMX cross domain concepts for reference metadata are concerned with data quality framework (DQAF) metadata

• These DQAFs are used to improve the quality, comparability, transparency etc. of published data

Page 112: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata – Reported according to a Quality Framework

Page 113: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Metadata AttributesMetadata AttributesMetadata AttributesMetadata Attributes

Format and Permitted Value List

Format and Permitted Value List

Metadata Report

Metadata Report

Concept Scheme

Concept SchemeConceptConcept

Metadata Structure Definition

Metadata Structure Definition

CATEGORY_CONTENT_REPORT

QUALITY_METADATA

COVERAGE

REF_AREA

ACCOUNTING_CONV

MY_CONCEPTS

COVERAGE

REF_PERIOD

BASE_PER

BASE_PER

Example Metadata: Content

REF_PERIOD

ACCOUNTING_CONV

COVERAGE_SECTOR

COVERAGE_SECTOR

REF_AREA

BASE_PER

Page 114: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Registry Overview

Page 115: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

REPOSITORY Provisioning

Metadata

REGISTRY Data Set/

Metadata Set

REPOSITORY Structural Metadata

Register

Query

Submit

Query

Submit

Query

SDMX Registry/Repository

Describes data and metadata structures

Describes data and metadata sources and reporting processes

Indexes data and metadata

SDMX Registry Interfaces

Page 116: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

REPOSITORY Provisioning

Metadata

REGISTRY Data Set/

Metadata Set

REPOSITORY Structural Metadata

Subscription/Notification

Applications can subscribe to notification of new or changed objects

Register

Query

Submit

Query

Submit

Query

SDMX Registry/Repository

Describes data and metadata structures

Indexes data and metadata

SDMX Registry Interfaces

Page 117: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

URL, registration date etc.

can provide data/metadata for many data/metadata flows using agreed data/metadata structure

conforms to business rules of the data/metadata flow Data or

Metadata FlowData or

Metadata Flow

Data ProviderData ProviderProvision

AgreementProvision

Agreement

can get data/metadata from multiple data/metadata providers

Data or Metadata Set

Data or Metadata Set

publishes/reports data/metadata sets

uses specific data/metadata structure

Data or Metadata Structure DefinitionData or Metadata

Structure Definition

can have child categories

comprises subject or reporting categoriescan be linked to

categories in multiple category schemes CategoryCategory

Category Scheme

Category Scheme

Data or Metadata Set

Data or Metadata Set

Information Model: High level Schematic

registers existence of data and metadata

Structure Maps

Structure Maps

structure and code list maps

Page 118: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

REPOSITORY Provisioning

Metadata

REGISTRY Data Set/

Metadata Set

REPOSITORY Structural Metadata

Subscription/Notification

Applications can subscribe to notification of new or changed objects

Register

Query

Submit

Query

Submit

Query

SDMX Registry/Repository

Describes data and metadata structures

Indexes data and metadata

SDMX Registry Interfaces

Page 119: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

URL, registration date etc.

can provide data/metadata for many data/metadata flows using agreed data/metadata structure

Data FlowData Flow

Data ProviderData ProviderProvision

AgreementProvision

Agreement

can get data/metadata from multiple data/metadata providers

uses specific data/metadata structure

Structure DefinitionStructure Definition

can have child categories

comprises subject or reporting categoriescan be linked to

categories in multiple category schemes CategoryCategory

Category Scheme

Category Scheme

Data Set Data Set

SDMX Artefacts: Registry Contents

registers existence of data and metadata sets

Structural Metadata

Provisioning Metadata

Registered Data and Metadata

Structure Maps

Structure Maps

structure and code list maps

Page 120: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

The Old JEDH (Joint External Debt Hub) Site

BIS

IMF

OECD

WorldBank

WEBSITE

(VariousFormats) (3-month production cycle)

Page 121: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

JEDH with SDMX

BIS

IMF

OECD

WorldBank

SDMX-ML

SDMX-ML

SDMX-ML

SDMX-ML

SDMX-ML(Debtor database)

[Info about data is registered]

SDMX“Agent”

SDMXRegistry

Discover data and URLs

Retrieves data from sites

JEDH Site

Data providedin real timeto site

SDMX-MLLoaded into

JEDH DB

Page 122: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

CountrySTATRegionSTAT

National Publication Server(s)

Regional Publication Server

FAO SDMX Registry

Flow of FAO CountrySTAT-RegionSTAT Implementation

1

23a

4

3b

SDMX in Action: Prototype System

FOOD AND AGRICULTURE ORGANIZATIONOF THE UNITED NATIONS

Slide courtesy of the FAO

Page 123: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

FOOD AND AGRICULTURE ORGANIZATIONOF THE UNITED NATIONS

1 CountryStat National Publication Server

•The web site is published from the files in CountryStat

SDMX Publication

•The new CountryStat files are converted to SDMX-ML data sets and made web accessible on the CountryStat web site

•These files are registered in the FAO SDMX Registry

RegionStat Regional Publication Server

•Queries the registry for new registrations which responds with registration details including the URL of the new data sets

•Retrieves the new data sets from the CountryStat web site

•Converts the SDMX-ML files to an internal format and integrates the new data sets with existing RegionStat data sets

•Re-publishes the RegionStat web site

2

3a

4

Prototype System: Explanation

Slide courtesy of the FAO

3b

Page 124: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Implementation

Page 125: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Developing SDMX Applications

• General Design Approaches

• Publications and Dissemination

• Data Warehousing/Integration of Data Sources

• Other Topics

Page 126: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Publication and Dissemination

• SDMX can be used to drive Web dissemination and print publication– It is a useful format for distribution from

websites– It can be used by websites to improve delivery

of content– It can be used to provide content to print

applications, for tabular data

• These techniques can result from a single system

Page 127: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Storage(SDMX)

Print PublicationEngine

SDMX-ML

XSL-FO

SDMX Query Engine

PDF, etc.

Templates, boilerplate text,

analysis

Website

CannedQueries

ASP/JSP

HTML

SDMX-ML

CSV

On-the-FlyQueries

Note: Can be a virtualdata store fed by theSDMX registry

XSLT

SDMX-ML

SDMXRegistry

Page 128: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Notes on Publication/Dissemination

• Current practice is often to focus on the delivery of tables– This is often not what users ideally want– Tables can be viewed as “canned queries”

• Better web-sites can be created which support granular user queries supported by rich metadata– See the ECB data warehouse, Federal Reserve

Board site as examples– See “Data on the Web” presentation for more details

Page 129: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Data Warehousing/Integration of Data Sources

• SDMX is also designed to support the collection and processing of data– In most organizations, this is seen as a data

warehousing activity

• SDMX provides tools for integrating data from a variety of sources– Can be among a set of organizations or within

an organization

Page 130: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Dat

a S

ou

rces

(st

atic

fil

es,

dat

abas

es,

etc.

)

SDMX Registry

DataRegistration

Data LoadingData

Harmonization/Processing

Data Dissemination

Notification

Data Pulled

Met

adat

aInternal

Applications

Print Publication

Web

site

Registratio

n

Data Warehouse

Note: All types of dissemination applications may use the registryfor various purposes. The registry may even be made publically available to users who want SDMX-ML data and metadata.

Page 131: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Notes on Data Warehousing• Each stage is loosely coupled with associated

applications, using XML interfaces:– Data sources– Data processing– Data dissemination applications

• The SDMX Registry functions throughout as a metadata repository, to provide structural and provisioning information as well as location of data as needed

• Internal database structures are based on SDMX information model– They are predictable and regular– They can be auto-generated

Page 132: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Tools and Resources

Page 133: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX Tools (Partial List)• Metadata Technology has a set of free tools for

working with data and metadata, and a free registry implementation– Mostly Java and XSLT

• Eurostat has a set of free tools for working with data and metadata, and has a registry implementation

• OECD and IMF have a web-services based package for dissemination: .STAT (available through MOU)

• ECB visualization tools written in Flex on Google Code

• Some other tools, including commercial vendors (STR Supercross 2, etc.)

Page 134: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Other Resources

• www.sdmx.org has a blog and makes many different presentations and paper available, as well as distributing copies of the standards– An SDMX User’s Guide is currently being developed

(beyond the material contained in the SDMX v 2.0 specification)

• The Open Data Foundation promotes SDMX (among other standards)– Check www.opendatafoundation.org– They host the SDMX Users Forum

www.sdmxusers.org

Page 135: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and Other Standards

Page 136: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Other Important Standards

• Data Documentation Initiative (DDI) – describes the micro-data inputs to aggregate (SDMX) data

• ISO/IEC 11179 Metadata Registries – describes terminological/semantic and conceptual models, and the metadata lifecycle

• eXtensible Business Reporting Language (XBRL) – describes financial microdata for economic statistics

Page 137: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

SDMX and XBRL

• These standards can be mapped to each other successfully

• However, the mapping depends on the specific SDMX Data Structure Definition, and the specific XBRL “Taxonomy”– There is no single, standard mapping

Page 138: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

DDI and SDMX Combined Data Model

• DDI 3 focuses on:– collection and production of microdata– reuse and sharing of common data structures– conversion to statistical tables (matrices)– preservation and multiple storage options

• SDMX focuses on:– statistical tables– reuse and sharing of common data structures– consistent data transfer structure

• Together they form a coherent data management model for data capture, storage and interchange with a wide area of overlap

S20 138

Page 139: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Generic Process Example

Survey/Register

Raw Data SetRaw Data Set

Anonymization, cleaning, Anonymization, cleaning, recoding, etc.recoding, etc.

Micro-Data Set/Micro-Data Set/Public Use FilesPublic Use Files

Tabulation, processing,

Tabulation, processing,

case selection, etc.

case selection, etc.

Aggregation,

Aggregation,

harmonizatio

n

harmonizatio

n

Aggregation, Aggregation, harmonizationharmonization

Aggregate Data SetAggregate Data Set(Lower level)(Lower level)

Aggregate Data SetAggregate Data Set(Higher Level)(Higher Level)

DDIDDI

SDMXSDMXAggregate Data SetAggregate Data Set(Highest-Level)(Highest-Level)

Page 140: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

The Generic Staistical Business Process Model (GSBPM)

• The METIS group is a part of UN/ECE which addresses metadata issues for national statistical agencies (and other producers of official statistics)– This community uses both SDMX and DDI

• They have produced a reference model of the statistical production process– The DDI 3 Lifecycle Model was a major input– GSBPM has a much greater level of detail

Page 141: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation
Page 142: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

The Generic Statistical Information Model (GSIM)

• Early work on an information model to accompany the GSBPM is starting– Still informal, very early– Involves some of the statistical agencies which lead

the work on GSBPM• GSIM will take as a major input both the DDI and

SDMX information models– Will also cover other metadata– Will also draw on other standards (Neuchatel Model

for Classifications, etc.)• Goal is to publish GSIM through METIS

alongside the GSBPM

Page 143: EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

Questions?