valentine charles: linking cultural heritage with kos: the europeana example
TRANSCRIPT
Linking cultural heritage with KOS the Europeana example
Valentine Charles
Evolution and variation of classification systems – KnoweScape, Amsterdam, 05.03.2015
Context à Aggregates metadata from the cultural heritage sector in
Europe • Libraries, museums, archives and audio-visual archives • Metadata in 33 languages
à Provides a portal for users to access data and objects • http://www.europeana.eu/ in 31 languages
• Metadata under Creative Commons Zero - public domain
• Previews and links to source
à Data distributed via • API http://labs.europeana.eu/api/ • Linked Data (currently being updated) http://data.europeana.eu/
Europeana.eu, Europe’s cultural heritage portal 40M objects from 2,200 galleries, museums, archives and libraries
Create a new data framework for richer metadata à Europeana Data Model (EDM)
• Re-uses several existing Semantic Web-based models: Dublin Core, OAI-ORE, SKOS, CIDOC-CRM…
• More granular metadata • links e.g. between objects and context entities (persons, places)
• multilingual & semantic linked data for contextual resources (e.g. Concepts)
à EDM gives support for contextual resources (semantic layer)
Rely on KOS to solve a problem of data integration
à Create a “semantic layer” on top of connected cultural heritage objects • Include multilingual “value vocabularies” • From Europeana’s providers or from third-party data sources
Contextual entities Representing (real-world) entities related to a provided object as fully fledged resources, not just strings
edm:Agent foaf:name skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth
skos:Concept skos:prefLabel skos:altLabel
skos:broader
skos:related
skos:definition….
edm:TimeSpan skos:prefLabel dcterms:isPartOf edm:begin edm:end ….
edm:Place wgs84_pos:lat wgs84_pos:long skos:prefLabel skos:note dcterms:isPartOf….
Encourage data providers to contribute their own vocabularies
à Benefit from data links made at data providers’ level à Ingestion of vocabularies is made possible if the vocabularies used the data structures EDM expects
• For instance SKOS for concept
à For other vocabularies, Europeana does custom mappings
An example the integration of AAT URIs in EDM
hourglasses@en uurglazen@nl
reloj de las horas@es
http://vocab.getty.edu/aat/300206197 edm:ProvidedCHO
Hourglass urn:imss:instrument:401058
skos:Concept http://vocab.getty.edu/
aat/300198626
skos:prefLabel
skos:prefLabel skos:prefLabel
skos:broader
dc:type
Demo with AAT and PartagePlus vocabularies
à http://www.europeana.eu/portal/search.html?query=sabliers&rows=24&qf=PROVIDER%3A%22Museo+Galileo+-+Istituto+e+Museo+di+Storia+della+Scienza%22&qt=false
à http://www.europeana.eu/portal/search.html?query=Brooch&rows=24&qf=PROVIDER%3A%22Partage+Plus%22&qt=false
Challenge #1
à Europeana needs to regularly check that vocabularies have not changed at source:
• Changes in concepts’ identifiers
• Changes in the description of concepts (which would require a new mapping)
Challenge #2
à Some of the vocabularies supported by Europeana have been developed by projects • Issue of sustainability who maintains the vocabulary when the
project ends? What happens to the data?
Europeana also manages its own vocabulary– WWI example
à Europeana developed a series of domain specific “sub-sites”
à Europeana 1914-1918 (http://www.europeana1914-1918.eu/ ) developed its own vocabulary based on a subset of LCSH • Terms translated in 10 languages and linked to id.loc.gov
• Published in SKOS via the OpenSkos vocabulary service
Challenge #3
à Creation of caches of existing LOD vocabularies • Europeana needs to keep track of the updates at the vocabulary
provider side.
à The enrichment done on the Europeana side lives separately from the source vocabulary.
Multilingual Access to Subjects (MACS)
à MACS project has produced manual and semi automatic alignments between: • Library of Congress Subject Heading (LCSH) • RAMEAU
• Schlagwortnormdatei (SWD)
è 120,000 links created
à MACS is integrated in The European Library as links included in all bibliographic data.
An example of a MACS record before and after additions by The European Library : - ARK identifiers - LOD URIs
Automatic enrichment based on KOS
Goal: Contextualization which goes beyond the scope of a particular platform
Object External Dataset and Vocabulary
Automatic enrichment process in Europeana
• Metadata fields in resource descriptions
• Selection of potential rules to match
• Matching the values of the metadata fields to values of the contextual resources
• Adding contextual links
• Selecting the values from the contextual resource
• Augmentation of the index with the labels picked from the vocabulary
Analysis
Linking
Augmentation
Vocabularies selection requirements In the context of Europeana a target vocabulary should be: à Technically available (through Linked Data or in dedicated
repositories), properly documented, and in open access;
à well-connected together, e.g. equivalent elements in other vocabularies are indicated; • Key to avoid duplication and redundancy
à Multilingual
Enrichment Types and Vocabularies
Enrichment Type Target vocabulary Source metadata fields
Places GeoNames dcterms:spatial, dc:coverage
Concepts GEMET, DBpedia, dc:subject, dc:type
Agents DBpedia dc:creator, dc:contributor
Time Semium Time
dc:date, dc:coverage, dcterms:temporal, edm:year
Challenge #4
à A significant change change in the target vocabulary implies • an update of the retrieved RDF files and a new deployment of
the enrichment framework (and/or)
• An update of the enrichment rules
Challenge #5
à Europeana data providers might also perform enrichment on their side
à Europeana has currently no mecanism to separate the
(curated) links to contextual resources by data providers from (automatic) enrichments by providers.
Challenge #6
à Automatic enrichment has flaws and problems • For instance linking any print to the physical “pressure” concept
because of its German “Druck” alternative label.
à Incorrect enrichments lead to • Devaluation of curated metadata • Loss of trust from providers • Irrelevant search results • Bad user experiences
To conclude
à Europeana continues to focus on pivot vocabularies such as Wikidata, Agrovoc to improve its search and retrieval services.
à We now investigates how to use more domains specific vocabularies for dedicated services.
à We also work on the definitions of best practices and evaluation methods for enrichment • http://pro.europeana.eu/get-involved/europeana-tech/
europeanatech-task-forces/evaluation-and-enrichments