towards a frictionless data future
TRANSCRIPT
Toward a Frictionless Data FuturePRESENTED BYJo [email protected] (@jobarratt/@okfnlabs)AT3rd Research Data Network - St Andrews University - 30 November 2016
Licensed under cc-by v3.0 (any jurisdiction)
International non-profit founded in 2004
Who we are● Vision
o A world where open knowledge is ubiquitous, enabling citizens and organizations to create insights that drive change on global and local challenges, combat injustice and inequality and hold governments and corporations to account.
● Missiono Open up all essential, public interest information
and see it used to create insight that drives positive change
o Build communities, tools and skills to empower individuals and organizations to use open information to create insights that drive change.
Widely adopted - over 20 national governments
and 60+ local governments & cities
ckan.org/instances
£4m in 15mins
Frictionless Data is…● Lightweight specifications for “packaging” datasets● Integrations for loading datasets into tools and platforms
relevant to researchers
The Goals...● Introduce a significant, measurable improvement in how research
data is shared, consumed, and analyzed.● Make it easier to maintain and improve data quality.
The Problem
Treemap of issues …
Legal barriers
(open data, sharing agreements etc)
Data Quality
Hard to find
Interoperability
No tool integration
Cargo loading ~1955
Manual, Slow, Costly
(and Dangerous)
Data is Shipping Pre-
Containerization
Containerization
Standards- Standards (a few, simple ones)- Tools (primarily for integration)
- Documentation- Datasets
Data Containerizatio
n
http://www.flickr.com/photos/photohome_uk/1494590209/
Key Principles1. Simplicity
2. Web Oriented
3. Existing Tools
4. Open
5 Distributed
Tabular Data Package
Data Package JSON Table Schema CSV
http://frictionlessdata.io/guides/tabular-data-package/
http://frictionlessdata.io/guides/data-package/http://frictionlessdata.io/guides/json-table-schema/
Tabular Data Package
Tooling …
Data Package
Tabular Data Package
JSON Table Schema CSV
Toole.g. import to R, SQL etc
Toole.g. data checking ala GoodTables
Validationhttp://goodtables.okfnlabs.org
Validation
Continuous Validation ● If you’re working in a group, you need continuous validation…
for data!● In < 1 hour, we integrated elements (datapackage.json + Python
libraries + GoodTables API) to support continuous data validation
http://uk-25k.datadashboards.io/
Platform Integrations
Partners
View more: http://frictionlessdata.io/partners/
Dataship
YOUR RESEARCH
ORG
● Project website: http://frictionlessdata.io/● Specifications: http://specs.frictionlessdata.io/● GitHub: https://github.com/frictionlessdata/● User Stories:
http://frictionlessdata.io/user-stories/
● Newsletter: http://frictionlessdata.io/get-involved/#newsletter
● Follow @okfnlabs on Twitter (#frictionlessdata)