publishmydata for publishing governmental statistical data: webinar 3
TRANSCRIPT
PublishMyData for publishing governmental statistical data
OpenCube: Webinar 3
Overview of OpenCube project and summary of overall outputs (5 min)
Objectives (2 min)
Background and context: the DCLG data collection and DCLG requirements (10 min)
Demonstration of tools developed (15 min)
Ongoing work and planned improvements (5 min)
Questions and answers (10 min)
2
Agenda
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
Open Statistical Data are very important for the EU
Users frequently want to blend & combine statistical data from multiple sources
But, these data usually resides in files and databases (data silos) that are hard to combine
3
Problem definition
Linked Data (LD) technology has the potential to enable combining and performinganalytics on top of disparate and previously isolated statistical data
However, relevant tools are few, scattered and un-tested under real-life conditions
Potential of using LD in statistical data analysis unexploited
12 March 2015 NTTS 2015, Brussels, 10-12 March 2015
12 March 2015 NTTS 2015, Brussels, 10-12 March 2015 4
The OpenCube project
OpenCube is a 2-year project funded by the EU within FP7
The project aims to develop and test processes and tools for managing statistical
linked open data.
The results will:
Facilitate data publishers to create linked data cubes from legacy formats
Empower data users to browse, visualise, link, expand and analyse data cubes.
Enable analysis not possible before (merging data cubes at a Web scale)
We propose a lifecycle for statistical LD
The lifecycle is divided into two phases: publish and reuse (orconsume)
The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.
OpenCube also develops tools to support the whole lifecycle of linked statistical data.
12 March 2015 NTTS 2015, Brussels, 10-12 March 2015
Linked Statistical Data Lifecycle
5
* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.
Publishing components TARQL extension
D2RQ /R2RML-QB extension
JSON-stat
Grafter
Consuming components OpenCube Browser
OpenCube MapView
R Analysis Chart
Linking components
6
OpenCube Toolkit
12 March 2015 NTTS 2015, Brussels, 10-12 March 2015
Developed using Information Workbench open source as underlying linked data management platform
License scheme OpenCube components are
provided under open source licenses
Check http://opencube-toolkit.eu
But, commercial solutions are also offered by consortium members
Linked open data publishing platform created by Swirrl
In use by:
DCLG
Scottish Government
Hampshire Hub
Surrey County Council
Manchester local authorities
Glasgow City Council
Tools for managing data
Tools for exploring, selecting and downloading data
7
PublishMyData
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
819 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
UK Government department, responsible for:
Funding, monitoring and supporting local government
Increasing local decision making
Housing and planning policy
DCLG objectives for open data:
Up-to-date and trusted
Relevant and useful
Useable and used
9 December 2014 OpenCube First Review 9
UK Department for Communities and Local Government
Large and varied collection of datasets
Would like to improve flexibility of Data Cube creation process and reduce effort of processing new data sources
Discovery, publishing, sharing, re-use of codelists and vocabularies
Aiming to enhance use of the data, so looking for improved tools for Data Cube users, reducing any requirement to know about Linked Data
Provide data primarily for researchers and analysts, both inside and outside the organisation
Enable creation of rich interactive visualisations
10
DCLG requirements
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
We propose a lifecycle for statistical LD
The lifecycle is divided into two phases: publish and reuse (orconsume)
The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.
OpenCube also develops tools to support the whole lifecycle of linked statistical data.
12 March 2015 NTTS 2015, Brussels, 10-12 March 2015
Linked Statistical Data Lifecycle
11
* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.
Grafter: data transformation
9 December 2014 OpenCube First Review
13
http://grafter.org
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
PublishMyData: grid view
9 December 2014 OpenCube First Review
▪ Displays a 2D slice of a data cube as a table
▪ Pick which dimension to use for rows and columns
▪ Fix values of any other dimensions
▪ Order the table according to the values in any column
▪ Download the result as CSV
9 December 2014 OpenCube First Review
Grid View Overview
PublishMyData: map view
9 December 2014 OpenCube First Review
18-19 November 2013 OpenCube kick-off meeting 17
PublishMyData: spreadsheet
builder
9 December 2014 OpenCube First Review
▪ Choose some geographic areas on a map
▪ Choose 'slices' of data from multiple datasets: each will become a column in the table
▪ Build up a table of data
▪ Download as CSV
9 December 2014 OpenCube First Review
Spreadsheet Builder Overview
'Second generation' of data selection tools
Slice-based data navigation and selection
Data shopping cart
Broader discussions on re-use of code lists and vocabularies for linked statistical data
20
What's next
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014
http://www.opencube-project.eu
21
More information
19 October 2014, Riva del Garda, Italy ISWC 2014 – SemStats 2014