google public data explorer

11
Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Google Public Data Explorer Aftab Iqbal [email protected] http://www.StefanDecker.org/

Upload: aftab-iqbal

Post on 30-Oct-2014

424 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Google Public Data Explorer

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Google Public Data Explorer

Aftab Iqbal

[email protected]://www.StefanDecker.org/

Page 2: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

Introduction

DSPL consists of : XML CSV files

Page 3: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

DSPL Dataset

General information About the dataset

Concepts Definitions of "things" that appear in the dataset (e.g.,

counties, unemployment rate, gender, etc.) Slices

Combinations of concepts for which there are data Tables

Data for concepts and slices. Concept tables hold enumerations and slice tables hold statistical data

Topics Organize the concepts of the dataset in a meaningful

hierarchy through labeling

Page 4: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

School Enrollment 2009_2010 *

School_Roll_No

Short_Name Level Male Female

00697S ST BRIDGIDS NS Primary 377 447

01170G NAUL NS Primary 40 61

09492W BALSCADDEN NS Primary 98 133… … … … …

* Snapshot took from http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385

Page 5: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

DSPL – Contd.

General Information General information about the provider of the dataset

<info> <name> <value>School</value> </name> <description> <value>Statistics about Fingal County Schools</value> </description> <url> <value></value> </url> </info>

<provider> <name> <value>County Fingal School Enrollment Statistics</value> </name> <url> <value>http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385</value> </url> </provider>

Page 6: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

DSPL – Contd.

Concepts Type of data that appears in a dataset

<concept id="Schools“ extends="geo:location" > <info> <name> <value>Schools</value> </name> <description> <value>List of schools for Co. Fingal</value> </description> </info> <type ref="string"/> <table ref="schools_table"/></concept>

<table id="schools_table"> <column id="School" type="string"/> <column id=“School_Roll_No" type="string"/> <column id="latitude" type="float"/> <column id="longitude" type="float"/> <data> <file format="csv" encoding="utf-8">schools.csv</file> </data> </table>

school name latitude longitude

00697S Saint Bridgids National School 53.37514 -6.3622101170G S N Na H Aille Naul National School 53.57887 -6.2856409492W Balscadden National School 53.61528 -6.2321809642P Burrow National School 53.39129 -6.10028

… … … …

Page 7: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

DSPL – Contd.

Slices It’s a combination of concepts for which data exists contains two kinds of concept references: Dimensions

and metrics.

<slice id="enrolment_slice"> <dimension concept="school"/> <dimension concept="time:year"/> <metric concept="M"/> <metric concept="F"/> <table ref="enrolment_slice_table"/></slice>

<table id="enrolment_slice_table"> <column id="school" type="string"/> <column id="M" type="integer"/> <column id="F" type="integer"/> <column id="year" type="date" format="yyyy"/> <data> <file format="csv" encoding="utf-8">school_enrolment_slice.csv</file> </data> </table>

Page 8: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

School Enrollment Slice

School Male Female Year

Saint Bridgids National School 377 447 2009

Saint Bridgids National School 475 392 2010

Balscadden National School 98 133 2009

Balscadden National School 126 102 2010… … … …

Dimensions metrics

Page 9: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

DSPL – Contd.

Topics Classify concepts hierarchically, and are used by

applications to help users navigate to your data.

<topic id="Male_indicators"> <info> <name><value>Male Students Enrollment</value></name> </info> </topic> <topic id="Female_indicators"> <info> <name><value>Female Students Enrollment</value></name> </info> </topic>

Page 10: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

Data Cleansing

School_Roll_No Short_Name Level Male Female00697S ST BRIDGIDS NS Primary 377 44701170G NAUL NS Primary 40 61

… … … … …

School_Roll_No Short_Name Level Male Female00697S ST BRIDGIDS NS Primary 475 39201170G NAUL NS Primary 58 40

… … … … …

School Name Latitude Longitude00697S Saint Bridgids National School 53.37514 -6.3622101170G S N Na H Aille Naul National School 53.57887 -6.28564

… … … …

School Male Female Year00697S 377 447 200900697S 475 392 201001170G 40 61 200901170G 58 40 2010

… … … …

School Enrollment 2009 School Enrollment 2010

School_Enrollment_Slice.csv

Schools.csv

Page 11: Google Public Data Explorer

Digital Enterprise Research Institute www.deri.ie

<slice id="enrolment_slice"> <dimension concept="school"/> <dimension concept="time:year"/> <metric concept="Male"/> <metric concept="Female"/> <table ref="enrolment_slice_table"/></slice>

<table id="enrolment_slice_table"> <column id="school" type="string"/> <column id="Male" type="integer"/> <column id="Female" type="integer"/> <column id="year" type="date" format="yyyy"/> <data> <file format="csv" encoding="utf-8">School_Enrollment_Slice.csv</file> </data> </table>

Deployment

CSV files metadata

Compressed