eswc2012 presentation: supporting linked data production for cultural heritage institutes: the...
DESCRIPTION
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata. We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.TRANSCRIPT
Supporting Linked Data Production for Cultural Heritage institutes:
The Amsterdam Museum Case Study
Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber
EuropeanaConnect
2
Aggregator
3
“Europeana enables people to explore the digital resources of Europe's museums, libraries, archives and audio-visual collections.’’
www.europeana.eu
From portal… …to data aggregator.
Europeana
4
data.europeana.eu
2.4 Million objects exposed as Linked Data.
8 aggregators, 200 institutions, 15 countries
Europeana Semantic Elements converted to RDF Europeana Data Model (EDM)
5
Aggregate and convert
Linked data-ify
6
Aggregate and convert
Linked data-ify
Convert to Linked dataMapped to EDM
7
Methodology and tool stack
• Focus on transparency and interactivity– Reproducability – Both in conversion and alignment
• Maintain detail and complexity of original data
• Interoperability through schema mapping
8
Methods
ClioPatria
XMLRDF
1. XML ingestion (OAI)
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
cliopatria.swi-prolog.org powered by
9
Case study:Amsterdam Museum
• Formerly Amsterdam Historic Museum– “The rich collection of works of art,
objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.”
• In March 2010 published their whole collection online– 73.000 objects– CC license
10
Methods
ClioPatria
XMLRDF
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
11
Ingested AM metadata
• Adlib database XML API
• Object metadata • 73.000 objects, 256MB • Nested XML
• Concept Thesaurus• 27.000, 9MB• Different types (geo,motif, event)
• Person Authority File• 67.000 persons, 10MB• Consolidated from object metadata fields• Creators, annotators, reproduction creators,
institutions,
<record priref="10541“ > <acquisition.date>1997</acquisition.date> <dimension> <dimension.type>hoogte</dimension.type> <dimension.unit>cm</dimension.unit> <dimension.value>6</dimension.value> </dimension> …</record>
<record priref="28024“ > <term>Kalverstraat 124</term> <broader_term>Kalverstraat</broader_term> <term.type>GEOKEYW </term.type> </record>
<record priref="6" > <biography>boekverkoper en uitgever van cartografie</biography> <birth.date.start>1659</birth.date.start> <death.date.start>1733</death.date.start> <name>Aa, Pieter van der</name> <nationality>Nederlands</nationality> <use>Aa, Pieter van der (I)</use> </record>
12
XMLRDF (1)Syntactic RDF conversion
<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> …</record>
am:Record_:bn1
“19319 ”
“1651”
priref
date
maker
object.type
“Rembrandt (1606-1669)”
“etsplaat”
XML-Element is attributes + content Map to RDF blank-node + attributes
Attributes → Literals (+xml:lang) Content
If plain → Literal (+xml:lang) Otherwise → RDF blank node (recursive)
13
ClioPatria:Intermediate Statistics
14
Methods
ClioPatria
XMLRDF
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
15
XMLRDF Graph rewrite rule language
Declarative committed-choice language based on CHR (Constraint Handling Rules) Triples <=> Guard, NewTriples Keep \ Triples <=> Guard, NewTriples
Example
17
AM rewriting rules examples
18
RDF rewriting conversion<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> …</record>
am:Record_:bn1
“19319 ”
“1651”
priref
date
am:Personam:p-1234
skos:Conceptam:etsplaat
“1234”
“1606”
am:prirefam:birthdate
“etsplaat”
maker
object.type
“Rembrandt (1606-1669)”
“etsplaat”
am:Recordam:proxy-19319
“19319 ”
“1651”am:priref
am:date
am:maker
am:object.type
“Rembrandt”rda:name
skos:prefLabel
19
Some statisticsAmsterdam Museum
Rules Resources Predicaes used
Triples
Object metadata 58 73,447 (Proxies)
100 5,700,371
Thesaurus 23 28,000 (Concepts)
13 601,819
Person Auth List 2 66,966 (Persons)
21 301,143
558,161 Proxy-Concept relations 80,432 Proxy-Person relations 243,532 Proxy-Proxy relations
20
Methods
ClioPatria
XMLRDF
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
21
Mapping to EDM
am:proxy_22093 “Job Cohen”am:contentPersonName
rdfs:subPropertyOf
dcterms:subject
22
Europeana Data Model (EDM)
• Dublin Core for metadata representation– creator, date, title etc.
• SKOS for vocabularies– preferredLabel, hasBroader, etc.
• RDA Group 2 elements for persons– dateOfBirth, name etc.
• OAI-ORE to allow for aggregations etc.
• Some EDM-specific properties– edm:wasPresentAt, …
110 rdfs:subPropertyOf
triples relating AM predicates
to EDM predicates
23
Methods
ClioPatria
XMLRDF
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
24
Amalgame Alignment Platform
semanticweb.cs.vu.nl/amalgame
25
AM Alignments
• 3500+ links put in RDF– 143 places linked to
GeoNames– 1076 persons linked to
ULAN (VIAF)– 34 persons linked to
DBPedia– 2498 concepts
AATNed.
26
Methods
ClioPatria
XMLRDF
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
27
Architecture
RDF(s) storage
HTTP server
SPARQL
Prolog
Web interface
SPARQL-app Browser
Logic
Purl.org redirect
http://semanticweb.cs.vu.nl/europeana/
cliopatria
Content negotiation@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix ore: <http://www.openarchives.org/ore/terms/> .@prefix ens: <http://www.europeana.eu/schemas/edm/> .@prefix ahm: <http://purl.org/collections/nl/am/>
ahm:proxy-66970a ore:Proxy ;ahm:title "Zegelstempel Felix Meritis"@nl ;ahm:material ahm:t-12463 ,
ahm:t-5447 ;ahm:objectCategory ahm:t-5504 ;ahm:objectName ahm:t-13817 ,
ahm:t-8489 ;ahm:objectNumber "KA 7653.1" ;ahm:priref "66970" .
ahm:proxy-66972a ore:Proxy ;ahm:acquisitionDate "0000" ;ahm:title "Zegelstempel mogelijk van
familiewapen"@nl .
28
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
31
Wrapping up
Methodology• Stay as close as possible to original (XML) metadata• Separate syntactic transformation, semantic interpretations• Interactive workflow, simple steps• Use rdf schema to map to interoperability layer• Keep provenance, reproducability
Tools• XMLRDF Realised clean workflow for RDF production.• Amalgame: Interactive and transparent vocablary alignment• ClioPatria Semantic server: statistics at any moment + Full
expressivity of Some Prolog
32
Issues
• Validate with real collection managers– Making good rules is sometimes hard– Graphical tools can help
• Integrate in normal collection workflow (tools)– LD as another view on the data– Live updates
• RDFS reasoning needed to have interoperability
[email protected]://semanticweb.cs.vu.nl/lod/am/
ClioPatria: the SWI-Prolog RDF toolkit (includes XMLRDF and Amalgame packages)
http://cliopatria.swi-prolog.org
amsterdammuseum.nl
?