introduction to semantic web what? why? how? so far? next? frank van harmelen ai department vrije...

51
Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam ative Commons License: owed to share & remix, must attribute & non-commercial

Upload: darius-rumble

Post on 31-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Introduction to Semantic Web

What? Why? How? So far? Next?

Frank van HarmelenAI Department

Vrije Universiteit Amsterdam

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

Page 2: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Who am IFrank van HarmelenProf in AI at Vrije Universiteit AmsterdamKnowledge RepresentationEarly Semantic Web Projects (> 1999)Co-designed OWLTech advisor of Aduna (Sesame)Scientific Director of LarKC

(Large Knowledge Collider)I know nothing about image analysis…

Page 3: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Who are you?who knows roughly what Semantic Web is?who has heard of RDF & OWL?who has studied RDF & OWL?who has used RDF & OWL?who expects ever to use RDF & OWL?

who is a logicianwho is a KR researcherwho is a Web researcherwho is an image researcher

Page 4: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

General idea ofthe Semantic Web

Page 5: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

General idea of Semantic WebMake current web more machine

accessible(currently all the intelligence is in the user)

Motivating use-cases• search• personalisation• semantic linking• data integration• web services• ...

Page 6: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

General idea of Semantic Web

Do this by:1. Making data and meta-data

available on the Webin machine-understandable form (formalised)

2. Structure the data and meta-data in ontologies

These are non-trivial design decisions.Alternative would be:

Make current web more machine accessible(currently all the intelligence is in the user)

Page 7: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

What’s wrong with the Web?

linked web-pages, written by people, written for people, used only by people...

Many of these pagesalready come from data,usable by computers!But we can’t link the data....

?

? ?

??

linked data,usable by computers!useful for people!

Page 8: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

"Web of Data" (TBL)

1. expose data on the web (“facts”) in interoperable form (RDF)

2. expose knowledge on the webwith interoperable semantics (ontologies, RDF Schema, OWL)

3. Apply lightweight inference for Interoperability Query answering Search Unexpected reuse …

Semantic Web

Page 9: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Not just data,also knowledge

All of this:• Low expressivity logic (RDF)• That allows some inference:

Property inheritance, domain/range inference

Some of this:• Medium expressive logic (OWL)• That allows more inference:

(in)equality, number restrictions, datatypes

Page 10: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Desideratum:On the Web of Data, anyone can say anything about anything

• Need for total decoupling of • data• vocabulary • meta-data

x T

[<x> IsOfType <T>]

differentowners & locations

<village>

Page 11: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Two versions of Semantic Web story:

V1: Semantic Web = annotated Web ;1 & 2 are embedded in text & images on the Web

V2: Semantic Web = Web of Data ;1 & 2 live in dedicated repositories (triple stores)

x T

[<x> IsOfType <T>]

differentowners & locations

<village>

Page 12: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Why is this hard?

Page 13: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

machine accessible meaning (What it’s like to be a machine)

<name>

<symptoms>

<drug>

<drugadministration>

<disease>

<treatment>

IS-A

alleviatesMETA-DATA

Page 14: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

What is meta-data?

it's just datait's data describing other dataits' meant for machine consumption

disease

name

symptoms

drug

administration

Page 15: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

What is required?

Page 16: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Required are:1. one or more standard vocabularies

so search engines, producers and consumersall speak the same language

2. a standard syntax, so meta-data can be recognised as such

3. lots of resources with meta-data attached

Page 17: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Bluffer’s Guide to RDF & RDF Schema

Page 18: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Bluffer’s Guide to RDF• Express relations between things:

• Results in labelled network (“graph”)• All labels are actually web-addresses (URIs)• You can “ping” any label and find out more• Bits of the graph can live at physically different

locations & have different owners

Frank y

x

AuthorOf

AuthorOf MITpublishedBy

Subject ObjectPredicate

Page 19: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Bluffer’s Guide to RDF Schema

• types for subjects & objects & predicates• Types organised in a hierarchy• Inheritance of properties

Frank y

x

AuthorOf

AuthorOf MITpublishedBy

author book publisher

person artifact

man

Page 20: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

So what’s special about RDF(S)? statements about an identifier can be

distributed

<owl:Individual ID="CENTRAL-COAST" />

<owl:Individual rdf:about="CENTRAL-COAST"> <type rdf:resource="#CALIFORNIA-REGION"/></owl:Individual>

no unique name assumption no closed world assumption

Rememberweb-style

decoupling

Page 21: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Remember:

• Need for total decoupling of • data• vocabulary • meta-data

x T

[<x> IsOfType <T>]

differentowners & locations

<village>

Page 22: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

RDF(S) have a (very small) formal semanticsDefines what other statements are

implied by a given set of RDF(S) statements

Ensures mutual agreement on minimal contentbetween parties without further contact

In the form of “entailment rules”Very simple to compute

(and not explosive in practice)

Page 23: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

RDF(S) semantics: examplesAspirin isOfType Painkiller

Painkiller subClassOf Drug Aspirin isOfType Drug

aspirin alleviates headachealleviates range symptom headache isOfType symptom

Page 24: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

RDF(S) semantics: examples isOfType subClassOf isOfType

range isOfType

Page 25: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

RDF(S) semanticsX R Y + R domain T X IsOfType TX R Y + R range T Y IsOfType TT1 SubClassOf T2 +

T2 SubClassOf T3 T1 SubClassOf T3X IsOfType T1 +

T1 SubClassOf T2 X IsOfType T1

Semantics = predictable inference

Page 26: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Bluffer’s Guide to OWL

Page 27: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

OWL: things RDF Schema can’t doequalityenumerationnumber restrictions

• Single-valued/multi-valued• Optional/required values

inverse, symmetric, transitiveboolean algebra

• Union, complement…

Page 28: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Layered language OWL Lite: Classification hierarchy Simple constraints

OWL DL: Maximal expressiveness While maintaining tractability Standard formalisation

OWL Full: Very high expressiveness Loosing tractability Non-standard formalisation All syntactic freedom of RDF

(self-modifying)

Syntactic layeringSemantic layering

Syntactic layeringSemantic layering

Full

DL

Lite

Page 29: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Language Layers

Full

DL

Lite

OWL Full Allow meta-classes etc

OWL DLNegationDisjunctionFull CardinalityEnumerated types

OWL Light(sub)classes, individuals(sub)properties, domain, rangeconjunction(in)equalitycardinality 0/1datatypesinverse, transitive, symmetrichasValuesomeValuesFromallValuesFrom

RDF Schema

Page 30: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Backward compatibility with RDF

<owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#ruler"/> <owl:allValuesFrom rdf:resource="#Mayor"/> </owl:Restriction> </rdfs:subClassOf></owl:Class>

OWL agents understand everything…

Page 31: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

<owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <daml:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#ruler"/> <daml:toClass rdf:resource="#Mayor"/> </daml:Restriction> </daml:subClassOf></owl:Class>

OWL agents understand everything…… others still the most important aspects

Backward compatibility with RDF

Page 32: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

OWL also has a formal semanticsDefines what other statements are implied by a

given set of statements

Ensures mutual agreement on content(both minimal and maximal)between parties without further contact

Can be used for integrity/consistency checking

Hard to compute (and rarely/sometime/always explosive in practice)

Page 33: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

OWL semantics: minimalvanGogh isOfType Impressionist

Impressionist subClassOf Painter vanGogh isOfType Painter

vanGogh painter-of sunflowerspainter-of domain painter vanGogh isOfType painter

Page 34: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

OWL semantics: maximalvanGogh isOfType Impressionist

Impressionist disjointFrom Cubist NOT: vanGogh isOfType Cubist

painted-by has-cardinality 1sun-flowers painted-by vanGoghPicasso different-individual-from vanGogh NOT: sun-flowers painted-by Picasso

Page 35: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Remember:Require are

1. standard vocabularies2. a standard syntax,3. lots of resources with meta-data

attached

Page 36: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Ontologies: real life examples handcrafted

• music: CDnow (2410/5), MusicMoz (1073/7)• biomedical: SNOMED (200k), GO (15k),

Emtree(45k+190kSystems biology

ranging from lightweight • Yahoo, UNSPC, Open directory (400k)

to heavyweight (Cyc (300k))

ranging from small (METAR) to large (UNSPC)

Page 37: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Biomedical ontologies (a few..)Mesh

• Medical Subject Headings, National Library of Medicine • 22.000 descriptions

EMTREE• Commercial Elsevier, Drugs and diseases• 45.000 terms, 190.000 synonyms

UMLS• Integrates 100 different vocabularies

SNOMED• 200.000 concepts, College of American Pathologists

Gene Ontology• 15.000 terms in molecular biology

NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),

Page 38: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Remember:Require are

1. standard vocabularies2. a standard syntax,3. lots of resources with meta-data

attached

Page 39: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Who makes the meta-data?

Don’t throw away what we already have:• Databases (Amazon.com)• Navigation structures• meta-data in documents

• Office, Acrobat, MP3, jpg

As spin-off on what we already do• MIT Media Lab photo annotator

Automated analysis• Text, Images, Video

Page 40: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Summary so far

Page 41: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Linked Data/Semantic Web

Identification• Uniform Resource Identifier (URI) • Global identifier (NB: persistent!)• Looks like a URL,

is often and internationalized Resource Identifier (IRI) Description

• Resource Description Framework (RDF)• RDF Schema (RDFS)• Simple Knowledge Organization System (SKOS)• Web Ontology Language (OWL)

Querying• RDF Triple stores• SPARQL Query Language

Page 42: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Hoe ziet RDF eruit?Datamodel is een (directed) graphElk data-item is een ‘resource’ met

een URI als identifierElke eigenschap is een binaire relatie:

• ‘triple’• Tussen resources:

<subjectURI, predicateURI, objectURI>

• Tussen een resource en een ‘literal’<subjectURI, predicateURI, “literal value”>

Page 43: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Why is this a Web of data?

Global unique identifiersReuse of identifiers in other datasets

• For data:(two sources say something about over ‘Amsterdam’ )

• For schema:(two sources each use the same concept ‘City’)

This reuse builds “links” between datasets

Page 44: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Does this work in practice?

Page 45: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

already many billions of facts & rules

Linked Open Data cloud

Encyclopedia

Encyclopedia

Geographic names (millio

ns)

Geographic names (millio

ns)

names of artis

ts & art works

(10.000’s)

names of artis

ts & art works

(10.000’s)

scientific bibliographies

scientific bibliographies

hierarchical dictio

naries

(UK, F

R, NL)

hierarchical dictio

naries

(UK, F

R, NL)

life-science databases

life-science databases

any CD ever recorded (a

lmost)

any CD ever recorded (a

lmost)

May ‘09 estimate > 4.2 billion triples + 140 million interlinks

May ‘09 estimate > 4.2 billion triples + 140 million interlinks

basic facts on every country

on the planet

basic facts on every country

on the planet

common sense rules & fa

cts (100.000’s)

common sense rules & fa

cts (100.000’s)

It gets bigger every month

Page 46: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

It gets bigger every month

Page 47: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

And remember:not just data

All of this: Low expressivity logic (RDF/RDFS) That allows some inference:

Property inheritance, domain/range inference

Some of this: Medium expressive logic (OWL) That allows more inference:

(in)equality, number restrictions, datatypes

Page 48: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Nice in the lab, but are you getting

anywhere in practice?

Page 49: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Semantic Web

News Quiz• Google• Reuters• New York Times• Microsoft• Zemanta• Obama Government• BBC (music, worldcup, wildlife)• BestBuy.com• Facebook

Page 50: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

Challenges

Page 51: Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed

What to do when success is becoming a problem?

Heterogeneity ontology mapping, instance identification

Scale (10^10 statements)Dynamics, versioning

(Flickr: 3000 pictures/minute, Wikipedia: 100 edits/minute)

Trust, attribution, provenanceMultimedia

In both directions