eswc2012 presentation: supporting linked data production for cultural heritage institutes: the...

33
Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber EuropeanaConne ct

Upload: victor-de-boer

Post on 06-May-2015

738 views

Category:

Education


0 download

DESCRIPTION

Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata. We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.

TRANSCRIPT

Page 1: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

Supporting Linked Data Production for Cultural Heritage institutes:

The Amsterdam Museum Case Study

Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber

EuropeanaConnect

Page 2: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

2

Aggregator

Page 3: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

3

“Europeana enables people to explore the digital resources of Europe's museums, libraries, archives and audio-visual collections.’’

www.europeana.eu

From portal… …to data aggregator.

Europeana

Page 4: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

4

data.europeana.eu

2.4 Million objects exposed as Linked Data.

8 aggregators, 200 institutions, 15 countries

Europeana Semantic Elements converted to RDF Europeana Data Model (EDM)

Page 5: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

5

Aggregate and convert

Linked data-ify

Page 6: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

6

Aggregate and convert

Linked data-ify

Convert to Linked dataMapped to EDM

Page 7: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

7

Methodology and tool stack

• Focus on transparency and interactivity– Reproducability – Both in conversion and alignment

• Maintain detail and complexity of original data

• Interoperability through schema mapping

Page 8: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

8

Methods

ClioPatria

XMLRDF

1. XML ingestion (OAI)

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

cliopatria.swi-prolog.org powered by

Page 9: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

9

Case study:Amsterdam Museum

• Formerly Amsterdam Historic Museum– “The rich collection of works of art,

objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.”

• In March 2010 published their whole collection online– 73.000 objects– CC license

Page 10: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

10

Methods

ClioPatria

XMLRDF

1. XML ingestion

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 11: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

11

Ingested AM metadata

• Adlib database XML API

• Object metadata • 73.000 objects, 256MB • Nested XML

• Concept Thesaurus• 27.000, 9MB• Different types (geo,motif, event)

• Person Authority File• 67.000 persons, 10MB• Consolidated from object metadata fields• Creators, annotators, reproduction creators,

institutions,

<record priref="10541“ > <acquisition.date>1997</acquisition.date> <dimension> <dimension.type>hoogte</dimension.type> <dimension.unit>cm</dimension.unit> <dimension.value>6</dimension.value> </dimension> …</record>

<record priref="28024“ > <term>Kalverstraat 124</term> <broader_term>Kalverstraat</broader_term> <term.type>GEOKEYW </term.type> </record>

<record priref="6" > <biography>boekverkoper en uitgever van cartografie</biography> <birth.date.start>1659</birth.date.start> <death.date.start>1733</death.date.start> <name>Aa, Pieter van der</name> <nationality>Nederlands</nationality> <use>Aa, Pieter van der (I)</use> </record>

Page 12: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

12

XMLRDF (1)Syntactic RDF conversion

<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> …</record>

am:Record_:bn1

“19319 ”

“1651”

priref

date

maker

object.type

“Rembrandt (1606-1669)”

“etsplaat”

XML-Element is attributes + content Map to RDF blank-node + attributes

Attributes → Literals (+xml:lang) Content

If plain → Literal (+xml:lang) Otherwise → RDF blank node (recursive)

Page 13: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

13

ClioPatria:Intermediate Statistics

Page 14: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

14

Methods

ClioPatria

XMLRDF

1. XML ingestion

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 15: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

15

XMLRDF Graph rewrite rule language

Declarative committed-choice language based on CHR (Constraint Handling Rules) Triples <=> Guard, NewTriples Keep \ Triples <=> Guard, NewTriples

Page 16: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

Example

Page 17: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

17

AM rewriting rules examples

Page 18: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

18

RDF rewriting conversion<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> …</record>

am:Record_:bn1

“19319 ”

“1651”

priref

date

am:Personam:p-1234

skos:Conceptam:etsplaat

“1234”

“1606”

am:prirefam:birthdate

“etsplaat”

maker

object.type

“Rembrandt (1606-1669)”

“etsplaat”

am:Recordam:proxy-19319

“19319 ”

“1651”am:priref

am:date

am:maker

am:object.type

“Rembrandt”rda:name

skos:prefLabel

Page 19: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

19

Some statisticsAmsterdam Museum

Rules Resources Predicaes used

Triples

Object metadata 58 73,447 (Proxies)

100 5,700,371

Thesaurus 23 28,000 (Concepts)

13 601,819

Person Auth List 2 66,966 (Persons)

21 301,143

558,161 Proxy-Concept relations 80,432 Proxy-Person relations 243,532 Proxy-Proxy relations

Page 20: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

20

Methods

ClioPatria

XMLRDF

1. XML ingestion

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 21: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

21

Mapping to EDM

am:proxy_22093 “Job Cohen”am:contentPersonName

rdfs:subPropertyOf

dcterms:subject

Page 22: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

22

Europeana Data Model (EDM)

• Dublin Core for metadata representation– creator, date, title etc.

• SKOS for vocabularies– preferredLabel, hasBroader, etc.

• RDA Group 2 elements for persons– dateOfBirth, name etc.

• OAI-ORE to allow for aggregations etc.

• Some EDM-specific properties– edm:wasPresentAt, …

110 rdfs:subPropertyOf

triples relating AM predicates

to EDM predicates

Page 23: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

23

Methods

ClioPatria

XMLRDF

1. XML ingestion

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 24: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

24

Amalgame Alignment Platform

semanticweb.cs.vu.nl/amalgame

Page 25: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

25

AM Alignments

• 3500+ links put in RDF– 143 places linked to

GeoNames– 1076 persons linked to

ULAN (VIAF)– 34 persons linked to

DBPedia– 2498 concepts

AATNed.

Page 26: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

26

Methods

ClioPatria

XMLRDF

1. XML ingestion

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 27: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

27

Architecture

RDF(s) storage

HTTP server

SPARQL

Prolog

Web interface

SPARQL-app Browser

Logic

Purl.org redirect

http://semanticweb.cs.vu.nl/europeana/

cliopatria

Page 28: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

Content negotiation@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix ore: <http://www.openarchives.org/ore/terms/> .@prefix ens: <http://www.europeana.eu/schemas/edm/> .@prefix ahm: <http://purl.org/collections/nl/am/>

ahm:proxy-66970a ore:Proxy ;ahm:title "Zegelstempel Felix Meritis"@nl ;ahm:material ahm:t-12463 ,

ahm:t-5447 ;ahm:objectCategory ahm:t-5504 ;ahm:objectName ahm:t-13817 ,

ahm:t-8489 ;ahm:objectNumber "KA 7653.1" ;ahm:priref "66970" .

ahm:proxy-66972a ore:Proxy ;ahm:acquisitionDate "0000" ;ahm:title "Zegelstempel mogelijk van

familiewapen"@nl .

28

Page 29: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Page 30: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study
Page 31: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

31

Wrapping up

Methodology• Stay as close as possible to original (XML) metadata• Separate syntactic transformation, semantic interpretations• Interactive workflow, simple steps• Use rdf schema to map to interoperability layer• Keep provenance, reproducability

Tools• XMLRDF Realised clean workflow for RDF production.• Amalgame: Interactive and transparent vocablary alignment• ClioPatria Semantic server: statistics at any moment + Full

expressivity of Some Prolog

Page 32: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

32

Issues

• Validate with real collection managers– Making good rules is sometimes hard– Graphical tools can help

• Integrate in normal collection workflow (tools)– LD as another view on the data– Live updates

• RDFS reasoning needed to have interoperability

Page 33: Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

[email protected]://semanticweb.cs.vu.nl/lod/am/

ClioPatria: the SWI-Prolog RDF toolkit (includes XMLRDF and Amalgame packages)

http://cliopatria.swi-prolog.org

amsterdammuseum.nl

?