basics of open data: what you need to know by wouter degadt & pieter colpaert

36
Data → open and linked Wouter Degadt & Pieter Colpaert [email protected] & [email protected]

Upload: opening-upeu

Post on 13-Jul-2015

68 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Data → open and linkedWouter Degadt & Pieter [email protected] & [email protected]

1. The basics

Data → Open Data → Linked Data

1. Linked Open Data

How to publish data?

Programme

DataWikipedia says:

English (disambiguation): data is uninterpreted information

English (computing): is any sequence of symbols given meaning by specific acts of

interpretation.

Dutch: data is the plural of datum, which is an observation of a fact

What’s data quality?

What’s interoperability?

Querying

syntactic

object

semantic

technical

legal

process

Would the data governance be able

to be merged?

Are you legally allowed to merge 2 datasets?

Can you connect the communication channels?

e.g., merge a dataset published as a CD with a

dataset published using floppy disk

How easy is it to ask certain questions over the

borders of the dataset?

What’s the interoperability of the serialisation

formats? E.g., JSON vs. PDF?

What can you request to the server?

Do the words in the one dataset mean the same

as the words in the other?

Open DataBecause non-personal data increases in value when

others reuse it

reuse is allowed

Data on the webreuse in a gray zone unauthorised reuse

OpenDefinition.org

How can we find open data?

It’s made available through open data portals

http://data.gov.uk,

http://datahub.io,

http://open-data.europa.eu,

http://data.gent.be,

Via links in existing datasets

e.g., http://dbpedia.org/resource/Ghent

Linked DataBecause it is impossible to store all the world’s

knowledge on one machine

name type same as location

iMinds company IBBT Gaston

Crommenlaan 8

{

“iMinds” : {

“type” : “company”,

“same as” : “IBBT,

“location” : “Gaston

Crommenlaan 8”

}

}

<iMinds>

<type>company</type>

<sameas>IBBT</sameas>

<location>

Gaston Crommenlaan 8

</location>

</iMinds>

Table / CSV / Spreadsheet

JSON XML

name type same as location

iMinds company IBBT Gaston

Crommenlaan 8

<iMinds> <type> <company> .

<iMinds> <sameas> <IBBT> .

<iMinds> <vestiging> “Gaston Crommenlaan 8” .

Table / CSV / Spreadsheet

triples

{

“iMinds” : {

“type” : “company”,

“same as” : “IBBT,

“location” : “Gaston

Crommenlaan 8”

}

}

<iMinds>

<type>company</type>

<sameas>IBBT</sameas>

<location>

Gaston Crommenlaan 8

</location>

</iMinds>

JSON XML

World Wide Web

iMinds

same as

IBBT

iMinds

is a

company

IBBT

located at

Gaston Crommenlaan 8

Machine 1 Machine 2 Machine 3

Probleem

The word company is ambiguous. How can we make

sure that machines understand each other?

semantic interoperability

What about “is a”?

and what about “iMinds”?

Solution

iMinds → http://data.kbodata.be/organisation/0866_386_380#id

is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Company → http://www.w3.org/ns/regorg#RegisteredOrganization

Uniform Resource Identifiers (URI’s)

een triple = is an atomary piece of data (a datum

or a fact) that cannot be misunderstood on

machine-level in a Web context

iMindscompa

ny

is a

iMinds → http://data.kbodata.be/organisation/0866_386_380#id

is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Company → http://www.w3.org/ns/regorg#RegisteredOrganization

Company register

iMindscompa

ny

is a

Open

Knowledge

Belgium

TVH

Maes

company

register

address

database

… Government

Service X

Linked Open Data cloud: de verzameling

van biljoenen triples gepubliceerd via het

Web

SummaryNew terms: data quality, data interoperability, triples, open

data, linked open data cloud

Linked Open Data means: making your data more

interoperable with other datasets on the web by using URIs

as identifiers and triples as atomary building blocks

Data publishing

iMinds → http://data.kbodata.be/organisation/0866_386_380#id

is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization

Linked Data principles

1. Use a URI for every term

2. Dereference these URIs over HTTP

3. Return useful information

4. Add links towards useful sources

E.g., I’m launching a new company

{mynewcompany} → http://{mynewcompany}.be/#org

is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization

Een identifier voor jouw bedrijf en

jij bent baas over de betekenis.

Mind the ambiguity

E.g., I’m launching a new company

{mynewcompany} → http://{mynewcompany}.be/#org

is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization

{mynewcompany} → http://{mynewcompany}.be/#org

heeft een home page → http://xmlns.com/foaf/0.1/homepage

http://{mynewcompany}.be/

What URIs should I use?

Publishing methods

1. Data dumps

2. Triples within HTML pages

3. JSON → JSON-LD web services

4. Triple pattern fragments

http://wiki.dbpedia.org/Downloads2014

→ all facts in 1 file

Data dumps

Triples within HTML

Triples within HTML

JSON API

http://{address to API document on Empire State}

JSON-LD API

Triple Pattern Fragments server

?subject → ?predicate → ?object

iMinds → is a → company

Triple Pattern Fragments clients

Questions?