basics of open data: what you need to know by wouter degadt & pieter colpaert
TRANSCRIPT
Data → open and linkedWouter Degadt & Pieter [email protected] & [email protected]
DataWikipedia says:
English (disambiguation): data is uninterpreted information
English (computing): is any sequence of symbols given meaning by specific acts of
interpretation.
Dutch: data is the plural of datum, which is an observation of a fact
↓
Querying
syntactic
object
semantic
technical
legal
process
Would the data governance be able
to be merged?
Are you legally allowed to merge 2 datasets?
Can you connect the communication channels?
e.g., merge a dataset published as a CD with a
dataset published using floppy disk
How easy is it to ask certain questions over the
borders of the dataset?
What’s the interoperability of the serialisation
formats? E.g., JSON vs. PDF?
What can you request to the server?
Do the words in the one dataset mean the same
as the words in the other?
How can we find open data?
It’s made available through open data portals
http://data.gov.uk,
http://datahub.io,
http://open-data.europa.eu,
http://data.gent.be,
…
Via links in existing datasets
e.g., http://dbpedia.org/resource/Ghent
name type same as location
iMinds company IBBT Gaston
Crommenlaan 8
{
“iMinds” : {
“type” : “company”,
“same as” : “IBBT,
“location” : “Gaston
Crommenlaan 8”
}
}
<iMinds>
<type>company</type>
<sameas>IBBT</sameas>
<location>
Gaston Crommenlaan 8
</location>
</iMinds>
Table / CSV / Spreadsheet
JSON XML
name type same as location
iMinds company IBBT Gaston
Crommenlaan 8
<iMinds> <type> <company> .
<iMinds> <sameas> <IBBT> .
<iMinds> <vestiging> “Gaston Crommenlaan 8” .
Table / CSV / Spreadsheet
triples
{
“iMinds” : {
“type” : “company”,
“same as” : “IBBT,
“location” : “Gaston
Crommenlaan 8”
}
}
<iMinds>
<type>company</type>
<sameas>IBBT</sameas>
<location>
Gaston Crommenlaan 8
</location>
</iMinds>
JSON XML
World Wide Web
iMinds
same as
IBBT
iMinds
is a
company
IBBT
located at
Gaston Crommenlaan 8
Machine 1 Machine 2 Machine 3
Probleem
The word company is ambiguous. How can we make
sure that machines understand each other?
semantic interoperability
What about “is a”?
and what about “iMinds”?
Solution
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Company → http://www.w3.org/ns/regorg#RegisteredOrganization
Uniform Resource Identifiers (URI’s)
een triple = is an atomary piece of data (a datum
or a fact) that cannot be misunderstood on
machine-level in a Web context
iMindscompa
ny
is a
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Company → http://www.w3.org/ns/regorg#RegisteredOrganization
SummaryNew terms: data quality, data interoperability, triples, open
data, linked open data cloud
Linked Open Data means: making your data more
interoperable with other datasets on the web by using URIs
as identifiers and triples as atomary building blocks
Data publishing
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
Linked Data principles
1. Use a URI for every term
2. Dereference these URIs over HTTP
3. Return useful information
4. Add links towards useful sources
E.g., I’m launching a new company
{mynewcompany} → http://{mynewcompany}.be/#org
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
Een identifier voor jouw bedrijf en
jij bent baas over de betekenis.
E.g., I’m launching a new company
{mynewcompany} → http://{mynewcompany}.be/#org
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
{mynewcompany} → http://{mynewcompany}.be/#org
heeft een home page → http://xmlns.com/foaf/0.1/homepage
http://{mynewcompany}.be/
Publishing methods
1. Data dumps
2. Triples within HTML pages
3. JSON → JSON-LD web services
4. Triple pattern fragments
http://wiki.dbpedia.org/Downloads2014
→ all facts in 1 file
Data dumps
Triple Pattern Fragments server
?subject → ?predicate → ?object
iMinds → is a → company