informazione, metadati & conoscenza (pt1: rdf,...

28
Informazione, Metadati & Conoscenza (Pt1: RDF, RDFS) R. Basili a.a. 2012-13 (contributi da John Davies (BT),

Upload: volien

Post on 16-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Informazione, Metadati & Conoscenza (Pt1: RDF, RDFS)

R. Basili a.a. 2012-13

(contributi da John Davies (BT),

Overview

•  Dai dati strutturati ai dati non strutturati •  Metadati

–  Esempi –  Metadati nel Web: il web semantico

•  Linguaggi Standard per i metadati –  Da HTML a XML –  RDF ed RDFS

•  Terminologie Standard: Dublin Core •  OWL & SKOS

Dati strutturati e dati non strutturati

10000

100000

1000000

10000000

100000000D

ez 9

4

Jun

95

Dez

95

Jun

96

Dez

96

Jun

97

Dez

97

Jun

98

Dez

98

Jun

99

Dez

99

Jun

00

Dez

00

Jun

01

Dez

01

Jun

02

Dez

02

Jun

03

Dez

03

Time

Web

Ser

ver N

umbe

r

[ Source: http://www.zakon.org/robert/internet/timeline/ ]

„Web data transfer larger than FTP data transfer“ „Kifer, Lausen, Woo, Logical foundations of object-oriented and frame-based languages“ „A. Borgida, On the relative expressiveness of description Logics and predicate logic“

... Semantic Web HISTORY

„W3C Semantic Web Standardization: Work on Web Ontology Language (OWL)“

„W3C standardization of Semantic Web starts Work on Resource Description Framework (RDF) Work on RDF Schema (RDFS)“

10.2.2004: Resource Description Framework (RDF) Web Ontology Language (OWL) become W3C recommendations

Semantic Web = Web + Data base technology + Knowledge Representation

„W3C Standardization of XML starts“

„Research projects on Web Ontologies start EU : On-To-Knowledge (01/00) and US (DARPA): DAML (07/00)“

Semantic Web

„The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in co-operation.“

[Berners-Lee et al., 2001]

Semantic Web Vision

Where we are Today: the Syntactic Web

[Hendler & Miller 02]

The Syntactic Web is… •  A hypermedia, a digital library

–  A library of documents called (web pages) interconnected by a hypermedia of links

•  A database, an application platform –  A common portal to applications accessible

through web pages, and presenting their results as web pages

•  A platform for multimedia –  BBC Radio 4 anywhere in the world! Terminator 3

trailers! •  A naming scheme

–  Unique identity for those documents [Goble 03]

Hard Work using the Syntactic Web…

•  Complex queries involving background knowledge –  Find information about “animals that use sonar but are

not either bats, dolphins or whales”

•  Locating information in data repositories –  Travel enquiries –  Prices of goods and services –  Results of human genome experiments

•  Delegating complex tasks to web “agents” –  Book me a holiday next weekend somewhere warm,

not too far away, and where they speak French or English

, e.g., Barn Owl

What is the Problem? •  Consider a typical web page:

•  Markup consists of: –  rendering

information (e.g., font size and colour)

–  Hyper-links to related content

•  Semantic content is accessible to humans but not (easily) to computers…

What information can we see… WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel, Honolulu, hawaii, USA 7-11 may 2002, 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong,,

norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire

Register now On the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event.. Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …

What information can a machine see…

!!!"##"$%&' '(')'*+& ,*+'-*.+,/*.( 0/-(1

0,1' 0'2 3/*4'-'*3'$5&'-.+/* 0.,6,6, &/+'($7/*/(8(89 &.0.,,9 :5;$<=>> ?.@ "##"$> (/3.+,/* A 1.@B ('.-* ,*+'-.3+$C'D,B+'-'1 E.-+,3,E.*+B 3/?,*D

4-/?$.8B+-.(,.9 3.*.1.9 3&,('

1'*?.-69 4-.*3'9 D'-?.*@9 D&.*.9 &/*D 6/*D9 ,*1,.9 ,-'(.*19 ,+.(@9 F.E.*9 ?.(+.9 *'0 G'.(.*19 +&' *'+&'-(.*1B9 */-0.@9 B,*D.E/-'9 B0,+G'-(.*19 +&' 8*,+'1 6,*D1/?9 +&' 8*,+'1 B+.+'B9 ),'+*.?9 G.,-'$

XML User definable and domain specific markup

<H1>Knowledge Management</H1> <UL> <LI>Manager: John Davies <LI>Project: SEKT </UL>

HTML:

<research-topic> <title>Knowledge Management</title> <manager>John Davies</manager> <project>SEKT</project>

</research-topic>

XML:

XML: Document = labelled tree

course

teacher title students

name http

<course date=“...”> <title>...</title> <teacher>...</teacher> <name>...</name> <http>...</http> <students>...</students>

</course>

=

•  DTD: simple grammars to describe legal trees

•  node = label + contents

XML example <play> <title>The Life and Death of King John</title> <Dramatis Personae> <persona>The Earl of PEMBROKE</persona> <persona>The Earl of ESSEX</persona> …… </Dramatis Personae> <Stagedir>SCENE England, the Court.</Stagedir> <act>Act 1 <scene>Scene I. <speech> <speaker>JOHN</speaker> <line>Now, Chatillon, what would France with us?</line> </speech>

Solution: XML markup with “meaningful” tags? <name>!!!"##"$%&' '(')'*+& ,*+'-*.+,/*.( 0/-(1 0,1'

0'23/*</name> <location>5&'-.+/* 0.,6,6, &/+'($

7/*/(8(89 &.0.,,9 :5;</location> <date><=>> ?.@ "##"</date> <slogan>> (/3.+,/* A 1.@B ('.-*

,*+'-.3+</slogan> <participants>C'D,B+'-'1 E.-+,3,E.*+B

3/?,*D 4-/?$.8B+-.(,.9 3.*.1.9 3&,(' 1'*?.-69

4-.*3'9 D'-?.*@9 D&.*.9 &/*D 6/*D9 ,*1,.9 ,-'(.*19 ,+.(@9 F.E.*9 ?.(+.9 *'0 G'.(.*19 +&' *'+&'-(.*1B9 */-0.@9 B,*D.E/-'9 B0,+G'-(.*19 +&' 8*,+'1 6,*D1/?9 +&' 8*,+'1 B+.+'B9 ),'+*.?9 G.,-'</participants>

But What About… <conf>!!!"##"$

%&' '(')'*+& ,*+'-*.+,/*.( 0/-(1 0,1' 0'23/*</conf>

<place>5&'-.+/* 0.,6,6, &/+'($

7/*/(8(89 &.0.,,9 :5;</place> <date><=>> ?.@ "##"</date> <strapline>> (/3.+,/* A 1.@B ('.-*

,*+'-.3+</strapline> <participants>C'D,B+'-'1 E.-+,3,E.*+B

3/?,*D 4-/?$.8B+-.(,.9 3.*.1.9 3&,(' 1'*?.-69

4-.*3'9 D'-?.*@9 D&.*.9 &/*D 6/*D9 ,*1,.9 ,-'(.*19 ,+.(@9 F.E.*9 ?.(+.9 *'0 G'.(.*19 +&' *'+&'-(.*1B9 */-0.@9 B,*D.E/-'9 B0,+G'-(.*19 +&' 8*,+'1 6,*D1/?9 +&' 8*,+'1 B+.+'B9 ),'+*.?9 G.,-'</participants>

XML: limitations for semantic markup XML per se makes no commitment on: •  Domain specific ontological vocabulary

•  Which words shall we use to describe a given set of concepts?

•  Ontological modelling primitives •  How can we combine these concepts, e.g. “car is a-kind-of

(subclass-of) vehicle”

ð requires pre-arranged agreement on vocab and primitives

Only feasible for closed collaboration –  agents in a small & stable community –  pages on a small & stable intranet

.. not for sharable Web-resources

Limitations of the Web today

Machine-to-human, not machine-to-machine

XML is a first step

• Semantic markup – HTML ð layout – XML ð meaning

• Metadata – within documents, not across documents – prescriptive, not descriptive – No commitment on vocabulary and modelling

primitives • RDF is the next step

Resource Description Framework (RDF) •  A standard of W3C •  Relationships between documents •  Consisting of triples or sentences:

–  <subject, property, verb> –  <Tolkien, wrote, The Lord of the Rings>

•  RDFS extends RDF with standard “ontology vocabulary”: –  Class, Property –  Type, subClassOf –  domain, range

An example “Tolkein wrote ISBN00001047582” hasWritten (‘http://www.famouswriters.org/

tolkein/’, http://www.books.org/ISBN00001047582’)

RDF and RDFS •  RDFS defines the ontology

–  classes and their properties and relationships –  what concepts do we want to reason about and how are

they related –  there are authors, and authors write books

•  RDF defines the instances of these classes and their properties –  Mark Twain is an author –  Mark Twain wrote “Adventures of Tom Sawyer” –  “Adventures of Tom Sawyer” is a book

•  Notation: RDF(S) = RDF + RDFS

hasName (‘http://www.famouswriters.org/twain/mark’, “Mark Twain”)

hasWritten (‘http://www.famouswriters.org/twain/mark’, ‘http://www.books.org/ISBN00001047582’)

title (‘http://www.books.org/ISBN00001047582’, “The Adventures of Tom Sawyer”)

XML version: <rdf:Description rdf:about=http://www.famouswriters.org/twain/mark>

<s:hasName>Mark Twain</s:hasName> <s:hasWritten rdf:resource=http://www.books.org/ISBN0001047/>

</rdf:Description>

RDF

twain/mark /ISBN000010475 hasWritten

“Mark Twain”

“The Adventures of Tom Sawyer”

hasName title

An example RDF data graph

RDF(S) definitions

subclassof(FamousWriter, Writer) type(‘http://www.books.org/ISBN00001047582’,

‘http://www.description.org/schema#Book’)

An example RDF Schema

Writer hasWritten Book

FamousWriter

/twain/mark ../ISBN00010475

Schema(RDFS) Data(RDF)

hasWritten type

subClassOf

domain range

type

Annotation of WWW resources and semantic links

Conclusions about RDF(S)

• Next step up from plain XML: –  (small) ontological commitment to

modeling primitives – possible to define vocabulary

• However: – no precisely described meaning – no inference model