1 bluffers guide to the semantic web frank van harmelen cs department vrije universiteit amsterdam...
Post on 20-Dec-2015
215 views
TRANSCRIPT
1
Bluffers Guide toThe Semantic Web
Frank van HarmelenCS Department
Vrije Universiteit Amsterdam
Data wants to be free
Data wants to be free
2
Semantics as your saviour?
3
4
OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts
5
The Scientist’s Problem
Too much unintegrated data: from a variety of incompatible
sources no standard naming convention each with a custom browsing and
querying mechanism (no common interface)
and poor interaction with other data sources
Everybody’s
Everybody’s
6
What are the Data Sources?
Flat FilesURLsProprietary DatabasesPublic DatabasesSpreadsheetsEmails…
Data wants to be free
Data wants to be freeMapsMaps
7
In which disciplines?ArcheologyChemistryGenomics, proteomics, ... (bio/life-
sciences)Communication scienceSocial historyLinguisticsBio-diversityEnvironmental sciences (climate studies).... libraries (KB), archives (sound&vision)
One dataset per sitea new database each month
historical datalaymen data
international data (for their first time)
Geo?
Geo?
8
OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts
The Current Web of text and pictures
linked web-pages, written by people, written for people, used only by people...
Many of these pagesalready come from data,that is usable by computers!But we can’t link the data....
?
? ?
?
The Future Web of Data
?
linked data,usable by computers!useful for people!
Data wants to be free
Data wants to be free
10
Which Semantic Web?Version 1:
“Enrichment of the current Web”
recipe:Annotate and classify web-content
enable better search & browse,..
11
Which Semantic Web?Version 2:
"Semantic Web as Web of Data" (TBL)
recipe:expose databases on the web, use RDF, integrate
meta-data from: expressing DB schema semantics
in machine interpretable waysenable integration and unexpected re-
use
12
OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts
13
machine accessible meaning (What it’s like to be a machine)
<name>
<symptoms>
<drug>
<drugadministration>
<disease>
<treatment>
IS-A
alleviatesMETA-DATA
14
What is meta-data?
it's just datait's data describing other dataits' meant for machine consumption
disease
name
symptoms
drug
administration
15
Required are:1. a standard syntax
so meta-data can be recognised as such
2. one or more shared vocabularies so data producers and data consumers
all speak the same language
3. lots of resources with meta-data attached
mechanisms for attribution and trust
1. A standard syntax
things & relations between things things & relations between things
Semantic Web data model: RDF
17
RDF Triples in Life Sciences
18
RDF Triples in Geo
<rdf:RDF> <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> </geo:Point> </rdf:RDF>
<rdf:RDF> <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> </geo:Point> </rdf:RDF>
geo:point:_
55.701
12.552
geo:lat
geo:long
Remem
ber:
RDF = simple m
odel for data
Remem
ber:
RDF = simple m
odel for data
19
RDF Schema: vocabulary for data typesClasses + subclass hierarchy
rivers are waterwaysProperties + subproperty hierarchy
father-of implies parent-of
Domain of properties X capital-of Y X has-type city
Range of properties X capital-of Y Y has-type country
Simple standardised inferences
Simple standardised inferences
20
OWL: richer vocabulary for data types
Things RDF Schema cannot express: Description Logic SHOIN(D) equality, disjunction, negation, min/max number restrictions inverse, symmetric, transitive
properties and much more…
Example: Every country has precisely one capital:InferenceTheHague ≠ A’dam & A’dam = capital TheHague ≠ capitalIntegrity checks after data-merging
Example: Every country has precisely one capital:InferenceTheHague ≠ A’dam & A’dam = capital TheHague ≠ capitalIntegrity checks after data-merging
Complex standardised inferences
Complex standardised inferences
OWL
Web of Data: anybody can say anything about anythingAll identifiers are URL's (= on the
Web) Allows total decoupling of
• data• vocabulary • meta-data
x T
[<x> IsOfType <T>]
differentowners & locations
<prince>
Data wants to be free
Data wants to be free
22
2. Shared vocabulariesMesh
Medical Subject Headings, National Library of Medicine
22.000 descriptions EMTREE
Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms
UMLS Integrates 100 different vocabularies
SNOMED 200.000 concepts, College of American Pathologists
Gene Ontology 15.000 terms in molecular biology
NCBI Cancer Ontology: 17,000 classes (about 1M definitions)
BioMed
BioMed
Geo?
Geo?
23
OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts
24
How far away is this ?Stable data formats & standardised
inferencesLots of shared vocabularies
(+ ways to convert them)Lots of data sources
(+ ways to convert them)Lots of tools
convert, construct, edit (data, vocabularies) store, search, query, reason interlink visualise ...
already many billions of facts & rules
How far away is this ? Not very far away!
rapidly growing Linked Open Data cloud.
Encyclopedia
Encyclopedia
Geographic names (millio
ns)
Geographic names (millio
ns)
names of artis
ts & art works
(10.000’s)
names of artis
ts & art works
(10.000’s)
scientific bibliographies
scientific bibliographies
hierarchical dictio
naries
(UK, F
R, NL)
hierarchical dictio
naries
(UK, F
R, NL)
life-science databases
life-science databases
any CD ever recorded (a
lmost)
any CD ever recorded (a
lmost)
every book sold by Amazon
every book sold by Amazon
basic facts on every country
on the planet
basic facts on every country
on the planet
common sense rules & fa
cts (100.000’s)
common sense rules & fa
cts (100.000’s)
It gets bigger every month
26
Example use-case: bbc.co.uk/music/artists
Content is BBC + LODUse an ontology as basis for the siteServe data back out as RDF
“The Web is becoming our content management platform”
27
OutlineThe general idea: a Web of DataWhat must be done to realise thisHow far away is thisNex steps, do’s, don’ts
28
Next steps1. hunt for shared vocabularies
try to avoid building them
2. wrap legacy data sources your own from others
3. link wrapped sources4. publish linked data on the web
make noise
5. reconstruct some old results6. produce new results7. get famous
Can you get famous
by sharing data?
Can you get famous
by sharing data?
papers in oncology, in communication science,
dedicated conferences in chemistry, earth-sciences, life-
sciences, humanities
funding opportunities in humanities, social sciences, life sciences
learn / get access tosome basic technology
in-use systems in communication science, KB, Beeld & Geluid, Europeana
A little semantics goes a long way
A little semantics goes a long way