standards for the representation of knowledge on the semantic web antoine isaac stitch project...

Post on 04-Jan-2016

220 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Standards for the Representation of Knowledge on the Semantic Web

Antoine ISAACSTITCH Project

Offene Archivierbare FormateOct. 25th, 2007

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web• SKOS

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web• SKOS

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic WebThe Interoperability Problem in Cultural

Heritage

• STITCH• SemanTic Interoperability To access Cultural Heritage

• Here, CH at large (incl. Digital Libraries)

• Trend: simultaneous access to different collections• The European Library, Memory of the Netherlands

• Problem: how to access seamlessly different collections?

• Traditional solution: using object metadata • But…

Standards for the Representation of Knowledge on the Semantic WebKB Illustrated Manuscripts

Standards for the Representation of Knowledge on the Semantic WebKB Illustrated Manuscripts

Standards for the Representation of Knowledge on the Semantic Web

Mandragore

Standards for the Representation of Knowledge on the Semantic Web

Mandragore

Standards for the Representation of Knowledge on the Semantic Web

The Interoperability Problems

From syntactic to semantic

• Different formats• “We have a solution”

• XML as a standard for data exchange

• Different metadata schemes• “Something is coming”

• Dublin Core for MD exchange

Standards for the Representation of Knowledge on the Semantic Web

The Interoperability Problems

From syntactic to semantic (continued)

• Different conceptual vocabularies for description• “Do you really want to discuss about it now?”

• No standard vocabulary• DDC, UDC, SWD, LCSH, AAT, Iconclass and myriads of

others…

• Not even a common model for these Knowledge Organization Schemes (KOSs)

• thesauri, classification schemes, subject heading lists…

• Even worse: there are reasons for this!

Standards for the Representation of Knowledge on the Semantic Web

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

The result

Standards for the Representation of Knowledge on the Semantic Web

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

An Ideal Situation

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

Why thinking of the Semantic Web?

• Cf Semantic Web activity page at W3C• http://www.w3.org/2001/sw/

• “The Semantic Web provides a common framework that allows data to be shared and reused”

• “The Semantic Web is a web of data”

• “It is about common formats for integration and combination of data drawn from diverse sources”

Standards for the Representation of Knowledge on the Semantic Web

SW Problem: The Web for Humans

• A city

• A flag

• The city’s location

Meaning

Standards for the Representation of Knowledge on the Semantic Web

SW Problem: The Web for Humans

Standards for the Representation of Knowledge on the Semantic Web

SW Problem: The Web for Computers?

• Characters

• Images

Black boxes

• Markup

Layout/Display

Standards for the Representation of Knowledge on the Semantic Web

SW Problem: The Web for Computers?

Standards for the Representation of Knowledge on the Semantic Web

The Interoperability Problems in CH (reminder)

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

Standards for the Representation of Knowledge on the Semantic WebThe Semantic Web Approach: A Web of

(Meta)data

subject

Amsterdam

par3

file1

Article

type

partOf

DocumentsubClassOfThe_Netherlands

hasCapital

City

type

Standards for the Representation of Knowledge on the Semantic Web

A footnote

• Why “(meta)data”?

• Because what is metadata for certain applications can indeed be the data for the Semantic Web

• Boundary is blurred

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

The Semantic Web (1/4)

• Pointing at resources• What? Knowledge objects, everything that we may

want to refer to (including documents)

• How? Uniform Resource Identifiers (incl. URLs)

Standards for the Representation of Knowledge on the Semantic Web

A Web of Resources

myVoc2:Amsterdam

http://ex.org/files/file1#par3

http://ex.org/files/file1

myVoc1:Article

http://www.ned.nl/rep321

Standards for the Representation of Knowledge on the Semantic Web

The Semantic Web (2/4)

• Pointing at resources: URIs

• Creating structured assertions involving resources• What? Structured assertions with typed links

• How? RDF (Resource Description Framework)

Factual knowledge encoded as “triples”subject – predicate (property) – object

myVoc1:subject

myVoc2:Amsterdam

http://ex.org/files/file1#par3

Standards for the Representation of Knowledge on the Semantic Web

Data in an RDF “graph”

myVoc1:subject

myVoc2:Amsterdam

http://ex.org/files/file1#par3

http://ex.org/files/file1

myVoc1:partOf

myVoc1:Article

rdf:type

http://www.ned.nl/rep321

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

The Semantic Web (3/4)

• Pointing at resources: URIs

• Enabling structured assertions: RDF

• Giving machine-understandable semantics to “building blocks” • What? Ontologies

• “Formal definitions of shared conceptual vocabularies”

• Giving semantics for properties and classes

• How? RDFS /OWL (Ontology Web Language)

Standards for the Representation of Knowledge on the Semantic Web

RDF Schema (RDFS)

• Meta-language to create vocabularies• “Article” is an (RDFS) Class

• Denotes a type, a collection of resources (individuals)

• “subject” is an (RDFS) Property

• Giving semantics to vocabulary elements• My “Article” has the literal article as a label for

display• myVoc1:Article rdfs:label “article”

• “Article” is a subclass of the class “Document”• myVoc1:Article rdfs:subClassOf myVoc1:Document

• “subject” is applied to resources of type “Document”• myVoc1:Article rdfs:domain myVoc1:Document

Standards for the Representation of Knowledge on the Semantic Web

RDF Schema (RDFS)

• Different kind of constructs• Assigning domain and ranges of properties

• Creating hierarchies of classes and properties

• Labels and informal specifications

• (Some) Equipped with formal semantics• R rdf:type C1, C1 rdfs:subClass C2 -> X rdf:type C2

• P rdfs:domain C, R1 P R2 -> R1 rdf:type C

Standards for the Representation of Knowledge on the Semantic Web

Web Ontology Language (OWL)

• Same function as RDFS, but more possibilities, e.g.• Characteristics of properties

• Inverse(hasAuthor, authorOf)

• Restriction on property usage• SubClassOf(Books, restriction(hasISBN minCardinality(1)))

• Combination and exclusion of classes and properties• DisjointClasses(Persons, Books)

• Inherits from AI research and Description Logics

• Comes in different levels of complexity:• Lite, DL, Full

Standards for the Representation of Knowledge on the Semantic Web

Tools to build RDFS/OWL ontologies

Standards for the Representation of Knowledge on the Semantic Web

Ontological information

myVoc1:subject

myVoc2:Amsterdam

http://ex.org/files/file1#par3

http://ex.org/files/file1

myVoc1:Article

rdf:type

myVoc1:partOf

myVoc1:Documentrdfs:subClassOf

http://www.ned.nl/rep321

Standards for the Representation of Knowledge on the Semantic Web

The Semantic Web (4/4)

• Pointing at resources: documents, knowledge objects

• Enabling structured assertions

• Using “building blocks” with precise semantics

• Controlling existing facts, inferring new onesPart of the tasks are delegated from the user to

inference engines that use the formal semantics of ontologies

Standards for the Representation of Knowledge on the Semantic Web

Ontological information

myVoc1:subject

myVoc2:Amsterdam

http://ex.org/files/file1#par3

http://ex.org/files/file1

myVoc1:Article

rdf:type

myVoc1:partOf

myVoc1:Documentrdfs:subClassOf

http://www.ned.nl/rep321

rdf:type

Standards for the Representation of Knowledge on the Semantic Web

RDFS/OWL and Semantic Interoperability

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

Why is it interesting?

• RDF model is simple• Just triples

• There is meaning exploitable by computers

• Resources are universal, hence shareable• One resource for one object, used in different places

• Vocabularies for (meta)data are made of resources• They can be re-used in different applications

• RDF does not enforce the use of a specific ontology

• Their meaning (incl. formal semantics) is shareable

Standards for the Representation of Knowledge on the Semantic Web

Building on top of the Web

• Web-based resources allow distribution/sharing of • document

• vocabulary

• (meta)data

(par3, subject, Amsterdam)

differentowners & locations

http://www.kb.nl/eDepot

http://www.geo.org/voc/

http://www.ned.nl/rep321

Standards for the Representation of Knowledge on the Semantic Web

Why is it interesting?

• Using open standards• W3C’s URI, XML, RDF, RDFS, OWL

Standards for the Representation of Knowledge on the Semantic Web

Footnote: Building on top of XML

<rdf:Description rdf:about=”http://www.ned.nl/doc321”> <myVoc1:subject rdf:resource=” http://www.geo.org/Amsterdam”/></rdf:Description><rdf:Description rdf:about=”http://www.geo.org/The_Netherlands”> <myVoc2:hasCapital rdf:resource=”http://www.geo.org/Amsterdam”/></rdf:Description>

• RDF can be encoded as XML data• RDF/XML is the reference syntax, but others are

possible

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing (meta)data to the Semantic Web• SKOS

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

Problem: Data Population

• How will Semantic Web data will be created?• Creation of “born-semantic” data?

• Automatic or manual (tagging)

• Converting existing databases to SW format• Fits the vision of the SW as a place to exchange data

• In the CH situation: porting legacy metadata is fundamental

Standards for the Representation of Knowledge on the Semantic Web

Porting CH Metadata to the Semantic Web

• Requirement: an ontology to create SW-enabled representations for metadata• “Ontologized” metadata schema

• A first candidate: Dublin Core for metadata schema• Well-established set of metadata elements

• Already coming in RDFS!

Standards for the Representation of Knowledge on the Semantic Web

Porting KOSs to the Semantic Web

• How about metadata values from Knowledge Organization Schemes?• E.g. dc:subject values (terms, keywords, classes…)

• DC does not address the problem of KOS representation

• Why is it important?• Their heterogeneity is a primary source of

interoperability problems

• They are provided with (informal) semantics• Taxonomies, associative networks can be exploited in

many applications

Standards for the Representation of Knowledge on the Semantic Web

Porting KOSs to the Semantic Web

• A first solution: converting KOSs to formal ontologies• Ontologization of terms/concepts into classes

• Problem: KOSs are generally no full-fledged ontologies• Iconclass: “Group of Birds” rdfs:subClassOf “Birds”?

• There is much work needed to have semantics fit!

• The concept of a car (reference=a subject in a KOS)

vs. the class of cars (reference=a set of objects in the world)

• Things in ontologies and KOSs don’t have the same epistemological status

• We need a model for elements of the realm of subjects

Standards for the Representation of Knowledge on the Semantic Web

Representing KOSs – Requirements

Many different models and formats to represent vocabularies

• Need for standard formats to develop standardized tools and methods• Semantic correspondences

• Browsing/information retrieval tools using vocabularies

• Need to represent features commonly used by these tools• Especially lexical information and semantic links

Standards for the Representation of Knowledge on the Semantic Web

SKOS (Simple Knowledge Organisation System)

• Model to represent KOSs (thesauri, classification schemes) on the Semantic Web in a simple way• Comparable to Dublin Core, for conceptual vocabularies

• SKOS offers building blocks to create XML/RDF data• Concepts and ConceptSchemes

• Lexical properties (prefLabel, altLabel)

• Semantic relations (broader, related)

• Notes (scopeNote, definition)

Standards for the Representation of Knowledge on the Semantic Web

SKOS: Iconclass Example

Standards for the Representation of Knowledge on the Semantic Web

SKOS: Limitations

• SKOS is a standard• Simple• Meant for information exchange and re-use

• Not everything can be represented!E.g. for Iconclass, difficulty to represent all types of

auxiliaries• Keys, structural digits…

• It is still work in progress• W3C Semantic Web Deployment Working Group

Standards for the Representation of Knowledge on the Semantic Web

Agenda

• Interoperability problems in Cultural Heritage

• An introduction to the Semantic Web• The problem

• RDF

• RDFS/OWL

• Why is it interesting?

• Porting existing metadata to the Semantic Web• SKOS

• Conclusion: SW and semantic alignment

Standards for the Representation of Knowledge on the Semantic Web

What have we seen?

• TODO

Standards for the Representation of Knowledge on the Semantic Web

Back to the Problem: Semantic Alignment

• Different ontologies/individuals should be aligned at the semantic level• Using the same resources to join SW graphs together

• Using the same vocabularies and semantics

• But: difficulty to recognize equivalent resources at data creation time• There is (and will be) no such thing as a single one

ontology!

• A posteriori semantic alignment is needed

Standards for the Representation of Knowledge on the Semantic Web

Back to the Problem: semantic alignment

• Fortunately, SW languages give appropriate means• Equivalence/specialization links for properties and classes

• myVoc:auteur rdfs:subPropertyOf dc:creator

• myVoc:Article owl:equivalentClass yourVoc:Artikel

• Identity link for individuals• vu:aisaac owl:sameAs kb:AntoineIsaac

• (yet unstable) SKOS mapping links for subjects• iconclass:birds exactMatch swd:vogel

• But they don’t do the job for us!• The links have to be created somehow

• This is another story…

Standards for the Representation of Knowledge on the Semantic Web

Thank you!

Standards for the Representation of Knowledge on the Semantic Web

Vocabulary alignment

• Find correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”

• “maagd Maria” = “Heilige Moeder”

• STITCH aim: doing it (semi-)automatically• Vocabularies are big

• They evolve over time

• Using techniques from Semantic Web research domain• Problem comparable to ontology alignment

• Techniques already investigated there• Linguistics, statistics

Standards for the Representation of Knowledge on the Semantic Web

Automatic alignment techniques

• Lexical • Structural• Statistical• Background knowledge

Standards for the Representation of Knowledge on the Semantic Web

Lexical alignment

• Labels of entities, textual definitions

tumorbrainLong tumor LongMore specific than

Standards for the Representation of Knowledge on the Semantic Web

Automatic Alignment Techniques

• Lexical • Structural• Statistical• Background knowledge

Standards for the Representation of Knowledge on the Semantic Web

Statistical alignment

• Object information (e.g. book indexing)

Thesaurus 1

Thesaurus 2

Collectionof books

“DutchLiterature”

“Dutch”

Standards for the Representation of Knowledge on the Semantic Web

Automatic Alignment Techniques

• Lexical • Structural• Statistical• Background knowledge

Standards for the Representation of Knowledge on the Semantic Web

Backgroundknowledge

Alignment using shared background knowledge

• Using a shared conceptual reference to find links

Thesaurus 1 Thesaurus 2

“Calendar”

“Publication”

Standards for the Representation of Knowledge on the Semantic Web

Alignment: no universal solution

• No single technique gives an ideal solution

• Different techniques have to be selected/combined, depending on the application case• Poor vs. rich semantic structure

• Extensive vs. limited lexical coverage

• Existence of collections described by several vocabularies

• Alignment is a difficult research problem

Standards for the Representation of Knowledge on the Semantic Web

Conclusions : Alignement

• Les techniques simples permettent d'obtenir des résultats rapides• 12300 concepts de Mandragore “accessibles” depuis

Iconclass

• Leur fiabilité ne permet pas de les considérer comme sources uniques• Combinaison avec travail manuel (vérification, complétion)

• L’alignement sémantique est toujours un problème de recherche difficile• Aucune technique n’est parfaite• Il faut sélectionner/combiner, en fonction des cas applicatifs

Standards for the Representation of Knowledge on the Semantic Web

Demo

• http://prauw.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandraNewNONE , amphibiens

• Blé

Standards for the Representation of Knowledge on the Semantic Web

Conclusions : Représentation

• Il est possible de produire des représentations WS standardisées (SKOS) des vocabulaires conceptuels• Et des méta-données qui les utilisent

• Les techniques existantes pour accéder aux méta-données et vocabulaires (OAI-PMH, XML) facilitent le travail

• C’est utile • Réutilisation/interopérabilité des composants applicatifs

utilisant les vocabulaires

• Facilité de la représentation de liens avec des éléments extérieurs au vocabulaire représenté

Standards for the Representation of Knowledge on the Semantic Web

Links

• STITCH http://stitch.cs.vu.nl• Demo collections

• BNF Mangragore http://mandragore.bnf.fr• KB illuminated manuscripts http://www.kb.nl/manuscripts/

• Library-originated integration projects:• MSAC search interface http://sigma.nkp.cz• MACS project http://macs.cenl.org

• Semantic web links• Semantic Web at W3C http://www.w3.org/2001/sw/• SKOS http://www.w3.org/2004/02/skos/

• Semantic Web projects dealing with Cultural Heritage• MuseumFinland http://www.museosuomi.fi/ • eCulture

http://e-culture.multimedian.nl/

Standards for the Representation of Knowledge on the Semantic WebDemo (1)

Subject vocabulary, collection 1

Subjects

Standards for the Representation of Knowledge on the Semantic Web

Demo (2)

Hierarchical path from root to selected

subject

Possible specialization for selected subject

Standards for the Representation of Knowledge on the Semantic Web

Document from Collection 2

Semantic alignment of subjects activated

Demo (3)

Standards for the Representation of Knowledge on the Semantic Web

Demo (4)

Subject from voc2 aligned to voc1:amphibians”

Back

top related