francesco rizzo (istat - italy) stefano de francisci (istat – italy) an integration approach for...
TRANSCRIPT
Francesco Rizzo (ISTAT - Italy)
Stefano De Francisci (ISTAT – Italy)
An integration approach for the Statistical Information System of Istat using SDMX standards
GENEVE 08 -10 May 2007 Meeting on the Management of Statistical Information Systems
Summary
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Istat Information System (current situation)
The Integrated Output Management System
Planning constraints; strategic plan
Standardizing new sub-systems through the toolkit
Integrating existing sub-systems through SDMX
Conclusion
Istat Information System
Current situation:
the statistical production activities of Istat are supported by a distributed architecture. Several production Directorates operate through local subsystems that, independently, cover the full life cycle of statistical data, from collection to dissemination
The mission is:to improve and standardize processes in part of the life cycle of statistical data from validated data to dissemination, through the integration and management of data and metadata supplied by production Directorates.
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Data Collection
Data Editing
Data Aggregation
Production Directorate
thematicDB
Data Collection
Data Editing
Data Aggregation
Production Directorate
thematicDB
metadata
validatedmicrodata
web navigator web navigatorweb navigator
The simplified current ISTAT scenario
Istat Information System
Some numbers:• 7 production Directorates• 3 horizontal-competence Directorates • 18 dissemination databases accessible by Internet• 2 centralized metadata systems
Used software and tools:• Unix – Windows• Tomcat, Apache, IIS• VB, Java, Sas, Excel• .NET, JSP, PHP, ASP• Oracle, Access, Postgress, mySQL
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System
The Integrated Output Management System is a project oriented towards the standardization and integration of a part of the life cycle of statistical data, particularly, all the steps need to produce purposeful statistical outputs for end users.
The high level of existing applications and technological heterogeneity of the involved systems have precluded a full integration.
Consequently the Integrated Output Management System
has been configured as a multi-level and a multi-service integration environment
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System
The project’s guidelines:
• identify the right position inside the Istat Information System
• find the right mediation among the points of view of the different Directorates
• choose the right compromise among standardization, reengineering and integration
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
validatedmicrodata
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Data Collection
Data Editing
Data Aggregation
Production Directorate
thematicDB
Data Collection
Data Editing
Data Aggregation
Production Directorate
thematicDB
metadata
web navigatorweb navigator
web navigator
Integrated OutputManagement System
web navigator
Position of Integrated Output Management System in the Istat scenario
The Integrated Output Management System
Planning constraints:• use expertise acquired in some projects developed in the last few years: Metadata Information System, generalized environments performing OLAP functions, thematic databases;• use new technologies like XML and Web Services alternatively to the “proprietary solutions”;• develop a new “integration culture” which refer to new sub-systems’ planning stages• minimize the impact on the existing sub-systems;• minimize the costs and risks through a “gradual strategy” of development;
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System
Minimize costs and risks: the two different architectural approaches
• to develop a complete framework that allows the standardization of all the processes from validated data to dissemination through the use of a toolkit that allows the building of new sub-systems or the reengineering of existing sub-systems• to build a SDMX architecture, using a Registry and Web Services, that allows the integration of existing thematic databases
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System
Minimize costs and risks: the two main phases of the project• short-term
• feasibility analysis• planning• prototyping• training on integration technologies and standards• build frameworks
• medium-term• stimulate the use of the frameworks as means to standardize and integrate sub-systems• guide the planning of new dissemination systems• build the SDMX infrastructure integration system
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System (1/2)
Top management strategic plan
• build up new Directorate for “Information needs, Integration and Territory” (DCET) with the main objective to guide integration processes inside the Institute.• two Unit of DCET Directorate are involved directly in the “Integrated Output Management System Project”:
• “Unit B” has the task of developing a toolkit that enables production Directorates to integrate new dissemination sub-systems that are going to be planned.
This Unit will supply expertise with particular reference towards cross-sectional data and OLAP applications
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System (2/2)
• “Unit A” has the task of testing fairly new technologies
in Istat, like XML and Web Services, and studying SDMX standards.
This Unit will supply the expertise to integrate existing thematic databases with particular reference towards short term statistics and time series
• building up an internal inter-Directorates working group with the aim of supporting the Eurostat SODI task force• building up an internal inter-Directorates working group whose main objective is to analyze and verify the use of SDMX standards in the Istat Information System architecture.
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
multidimensional
web portal
TS loader
web portals
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
SDMX web portal
ETL
DW microdata
DW macrodata
Registry
thematicdatabas
e
SDMX web service
time seriesweb portal
aggregator
Standardization and reengineering through the toolkit
SDMX integration of existing thematic databases
macrodata
CS loader
WS
WS web service; TS time-series; CS cross-sectional; DW data warehouse
validated microdata
metadata
Integrated Output Management System zooming
Standardizing new sub-systems through the toolkit
The toolkit allows the production Directorates to be self-sufficient in building statistical Data Marts as part of the Institute’s most complete Corporate Data Warehouse.
The functions available are:• integrate through a specialized layer with the centralized metadata systems• carry out Statistical Data Mart validated microdata oriented to a specific subject matter domain• build a primary Data Warehouse of validated microdata• build a Web Warehouse of aggregated data
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Integrating existing sub-systems through SDMX
In order to facilitate the necessary support to the SODI task force and to develop best practices on SDMX, we are developing several software modules organized in a framework.
The framework could be used entirely from reporting to dissemination, or alternatively using modules separately, integrating them into each sub-system.
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
SDMX Istat Framework 1/2
SDMX Istat Framework version 1.0 is composed by the following modules:• Check and Loader:
• collect and load aggregated data in the database;• publish a RSS file that inform when new data is loaded or updated;• publish one or more SDMX Query file(s);• publish one or more SDMX Compact file(s);
• SDMX data Web Service• allows the use of the Pull exchange method to request data• accepts a SDMX Query• responds with a SDMX Compact
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
SDMX Istat Framework 2/2
• SDMX Web Navigator:• is a web application that acts as a client towards the web service;• allows to query the database using DSDs as analysis dimensions;• allows building of SDMX Queries using a graphic interface;• allows testing of SDMX Queries;
• Manager and Web navigator Reference Metadata:• allows production Directorates to produce Reference Metadata in SDDS format without modifying current working ways
Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
An integration approach for the Statistical Information System of Istat using SDMX standards
Conclusions
• standardization and integration are now possible to carry out easily through new technologies like XML and Web Services
• the full success of the project will depend on:• top management strategic plan • the right position inside Istat Information System• the right compromise between standardization, reengineering and integration• management of the introduction of the result systems without a traumatic modification of the current working ways
GENEVE 08 -10 May 2007 Meeting on the Management of Statistical Information Systems