metadata driven application for data processing – from local toward global solution rudi seljak...

15
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Upload: godfrey-bell

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Metadata driven application for data processing – from local

toward global solution

Rudi Seljak Statistical Office of the Republic of Slovenia

Page 2: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Summary of presentation

• Introduction • Current generic application – main

characteristics• Development of global solution • Changes in the statistical process• Conclusions

Page 3: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Introduction

• Statistical data processing:– Demanding, time consuming and very expensive task– Constant pressure for budget cuts

• Rationalisation of the statistical process:– Take advantage of the rapid IT development– Movement from domain oriented to process oriented production– Stove-pipe IT solutions replaced by general applications

• Statistical Office of the Republic of Slovenia (SURS)– SURS began systematic development of generic solutions 6 years ago– Prototype solutions for several parts of the process were developed – These solutions were already used for several large surveys (e.g. 2010

Agriculture Census and the 2011 Population Census)– The prototype generic solutions are now upgraded to a more global

solutions

Page 4: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Generalised solutions – main characteristics

• Small, generic solutions for small parts of the statistical process, called the building blocks: – Enable easy and flexible linking of inputs and outputs of the individual

components to the whole statistical process

– Can be plugged to different databases in different environments (e.g. ORACLE, SAS) if the input database follows few basic conditions

– They are designed as fully metadata driven (MDD) systems: one program code → the parameters for the execution of the processing for the concrete survey are provided through the special metadata tables

– The process metadata can be provided in different environments (SAS, MS Access, ORACLE) → the metadata organisation must follow the strict rules of its structure (tables and variables)

Page 5: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Building blocks - functioning

Different microdata databases

General SAS program

 

  Ad-hoc program

Ad-hoc program

Building block

 

Different databases of process metadata

Page 6: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Linking bulding blocks into the process

Building block 1

MicrodataBuilding block 2

Ad-hoc program

Building block n

Transformed data

Ad-hoc program

Transformed data

Ad-hoc program

Transformed data

Page 7: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Process metadata

• The system is to a very large extent based on the process metadata:– Processing rules which enable adjustment of the general

program for different surveys.

• The process metadata are at the moment inserted directly into MS Access database– High probability of syntax errors – Users must be thoroughly instructed in order to correctly fill the

metadata

Table Variable Condition Corr_rule Step

TABLE1 X X/Y >1000 Round(X/100) 1

TABLE1 Z Z NE X X 2

Page 8: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Building blocks

• The basic tool of the whole system are the building blocks, which cover the particular processing phase.

• SAS macros which is able to operate on the basis of the process metadata.

• So far the building blocks for following phases are created:– Data validation (logical controls) – Deterministic corrections – Data imputations– Standard error estimation – Aggregation – Tabulation– Calculation of quality indicators– Disclosure control (testing phase)

Page 9: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Building a global solution

• The developed system is very open and flexible tool. • However certain re-integration would be needed to

increase its functionality: – To move the process metadata in ORACLE environment

– To create single, unique database of process metadata where process metadata for all the surveys are stored and maintained

– To develop the graphical interfaces for user friendly management of process metadata

– To link the system with the metadata repository

Page 10: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

The new system

Different microdata databases

General SAS program

 

  Ad-hoc program

Database of processing metadata

Metadata repository

Ad-hoc program

Application for metadata management

 

Data on tables and variables

Page 11: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Application for metadata managementDeterministic corrections

Page 12: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Application for metadata managementExecution of the particular process step

Page 13: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

New application and statistical process

• Generic MDD application introduces changes in the implementation of data processing on general level: – Essentially different distribution of work between IT specialists, general

methodologists and IT experts

– Change in the role of subject-matter statisticians → changed expectations of their skills and capabilities

– The work organisation of the IT Department and the General Methodology Department will have to be changed from domain oriented to process oriented.

– Different approach of IT and methodology experts will be needed. • Experts capable of thinking and operating at a much more general level • Survey is just one of the realisations of the general statistical process.

Page 14: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Conclusions

• SURS developments in recent years: flexible, metadata driven generic solutions for different phases of data processing.

• Very open system will be replaced with more integrated and centralised system

• Main goal: Transition from the stove-pipe oriented production to the more integrated processing systems

• Two main challenges:– To build the generic IT solutions, which would „cover“ the wide

diversity of statistical surveys – To change the very „domain oriented state of mind “ among the

employees

Page 15: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

Thank you for your attention