1 bluffers guide to the semantic web frank van harmelen cs department vrije universiteit amsterdam...

29
1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

1

Bluffers Guide toThe Semantic Web

Frank van HarmelenCS Department

Vrije Universiteit Amsterdam

Data wants to be free

Data wants to be free

Page 2: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

2

Semantics as your saviour?

Page 3: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

3

Page 4: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

4

OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts

Page 5: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

5

The Scientist’s Problem

Too much unintegrated data: from a variety of incompatible

sources no standard naming convention each with a custom browsing and

querying mechanism (no common interface)

and poor interaction with other data sources

Everybody’s

Everybody’s

Page 6: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

6

What are the Data Sources?

Flat FilesURLsProprietary DatabasesPublic DatabasesSpreadsheetsEmails…

Data wants to be free

Data wants to be freeMapsMaps

Page 7: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

7

In which disciplines?ArcheologyChemistryGenomics, proteomics, ... (bio/life-

sciences)Communication scienceSocial historyLinguisticsBio-diversityEnvironmental sciences (climate studies).... libraries (KB), archives (sound&vision)

One dataset per sitea new database each month

historical datalaymen data

international data (for their first time)

Geo?

Geo?

Page 8: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

8

OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts

Page 9: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

The Current Web of text and pictures

linked web-pages, written by people, written for people, used only by people...

Many of these pagesalready come from data,that is usable by computers!But we can’t link the data....

?

? ?

?

The Future Web of Data

?

linked data,usable by computers!useful for people!

Data wants to be free

Data wants to be free

Page 10: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

10

Which Semantic Web?Version 1:

“Enrichment of the current Web”

recipe:Annotate and classify web-content

enable better search & browse,..

Page 11: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

11

Which Semantic Web?Version 2:

"Semantic Web as Web of Data" (TBL)

recipe:expose databases on the web, use RDF, integrate

meta-data from: expressing DB schema semantics

in machine interpretable waysenable integration and unexpected re-

use

Page 12: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

12

OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts

Page 13: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

13

machine accessible meaning (What it’s like to be a machine)

<name>

<symptoms>

<drug>

<drugadministration>

<disease>

<treatment>

IS-A

alleviatesMETA-DATA

Page 14: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

14

What is meta-data?

it's just datait's data describing other dataits' meant for machine consumption

disease

name

symptoms

drug

administration

Page 15: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

15

Required are:1. a standard syntax

so meta-data can be recognised as such

2. one or more shared vocabularies so data producers and data consumers

all speak the same language

3. lots of resources with meta-data attached

mechanisms for attribution and trust

Page 16: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

1. A standard syntax

things & relations between things things & relations between things

Semantic Web data model: RDF

Page 17: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

17

RDF Triples in Life Sciences

Page 18: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

18

RDF Triples in Geo

<rdf:RDF> <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> </geo:Point> </rdf:RDF>

<rdf:RDF> <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> </geo:Point> </rdf:RDF>

geo:point:_

55.701

12.552

geo:lat

geo:long

Remem

ber:

RDF = simple m

odel for data

Remem

ber:

RDF = simple m

odel for data

Page 19: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

19

RDF Schema: vocabulary for data typesClasses + subclass hierarchy

rivers are waterwaysProperties + subproperty hierarchy

father-of implies parent-of

Domain of properties X capital-of Y X has-type city

Range of properties X capital-of Y Y has-type country

Simple standardised inferences

Simple standardised inferences

Page 20: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

20

OWL: richer vocabulary for data types

Things RDF Schema cannot express: Description Logic SHOIN(D) equality, disjunction, negation, min/max number restrictions inverse, symmetric, transitive

properties and much more…

Example: Every country has precisely one capital:InferenceTheHague ≠ A’dam & A’dam = capital TheHague ≠ capitalIntegrity checks after data-merging

Example: Every country has precisely one capital:InferenceTheHague ≠ A’dam & A’dam = capital TheHague ≠ capitalIntegrity checks after data-merging

Complex standardised inferences

Complex standardised inferences

OWL

Page 21: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

Web of Data: anybody can say anything about anythingAll identifiers are URL's (= on the

Web) Allows total decoupling of

• data• vocabulary • meta-data

x T

[<x> IsOfType <T>]

differentowners & locations

<prince>

Data wants to be free

Data wants to be free

Page 22: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

22

2. Shared vocabulariesMesh

Medical Subject Headings, National Library of Medicine

22.000 descriptions EMTREE

Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms

UMLS Integrates 100 different vocabularies

SNOMED 200.000 concepts, College of American Pathologists

Gene Ontology 15.000 terms in molecular biology

NCBI Cancer Ontology: 17,000 classes (about 1M definitions)

BioMed

BioMed

Geo?

Geo?

Page 23: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

23

OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts

Page 24: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

24

How far away is this ?Stable data formats & standardised

inferencesLots of shared vocabularies

(+ ways to convert them)Lots of data sources

(+ ways to convert them)Lots of tools

convert, construct, edit (data, vocabularies) store, search, query, reason interlink visualise ...

Page 25: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

already many billions of facts & rules

How far away is this ? Not very far away!

rapidly growing Linked Open Data cloud.

Encyclopedia

Encyclopedia

Geographic names (millio

ns)

Geographic names (millio

ns)

names of artis

ts & art works

(10.000’s)

names of artis

ts & art works

(10.000’s)

scientific bibliographies

scientific bibliographies

hierarchical dictio

naries

(UK, F

R, NL)

hierarchical dictio

naries

(UK, F

R, NL)

life-science databases

life-science databases

any CD ever recorded (a

lmost)

any CD ever recorded (a

lmost)

every book sold by Amazon

every book sold by Amazon

basic facts on every country

on the planet

basic facts on every country

on the planet

common sense rules & fa

cts (100.000’s)

common sense rules & fa

cts (100.000’s)

It gets bigger every month

Page 26: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

26

Example use-case: bbc.co.uk/music/artists

Content is BBC + LODUse an ontology as basis for the siteServe data back out as RDF

“The Web is becoming our content management platform”

Page 27: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

27

OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts

Page 28: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

28

Next steps1. hunt for shared vocabularies

try to avoid building them

2. wrap legacy data sources your own from others

3. link wrapped sources4. publish linked data on the web

make noise

5. reconstruct some old results6. produce new results7. get famous

Can you get famous

by sharing data?

Can you get famous

by sharing data?

papers in oncology, in communication science,

dedicated conferences in chemistry, earth-sciences, life-

sciences, humanities

funding opportunities in humanities, social sciences, life sciences

learn / get access tosome basic technology

in-use systems in communication science, KB, Beeld & Geluid, Europeana

A little semantics goes a long way

A little semantics goes a long way

Page 29: 1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free

29

Questions & discussion

[email protected]://www.cs.vu.nl/~frankh/

popularising.html