europeana: update on metadata mapping and normalisation, content ingestion and aggregation...

37
Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL Workshop – Harvesting Metadata: Practices and Challenges September 30 2009

Upload: morris-roberts

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Europeana: Update on Metadata

Mapping and Normalisation, Content

Ingestion and Aggregation Activities

Robina Clayphan

Interoperability Manager, EDLF

ECDL Workshop – Harvesting Metadata: Practices and Challenges

September 30 2009

Introduction

• A look at the metadata schema we use and the elements that must be in a standard form

• The whole ingestion process

• Summary of the aspects of and approach to aggregation

Europeana

Europeana brings together and makes available digital content from:

•Four cultural heritage sectors• Museums, Archives, Libraries, Audio-visual archives

•Twenty-nine countries• EU plus Norway and Switzerland

•Twenty-six languages

•Four types of material• Image, sound, video, text

….need for a metadata lingua franca…

ESE V3.2

Europeana Semantic Elements (ESE) V3.2 developed for the prototype

•A Dublin core-based application profile• Cross-domain schema for heterogeneous data• Not to capture the full semantics of provider’s data

•37 Dublin Core terms – used principally to describe the objects

•12 Europeana coined terms - used to support portal functionality

• Needed to have consistent data for the portal to work

The Dublin Core elements

Title Alternative

Creator Subject Description TableOfContents

Publisher Contributor Date Created; Issued

Type Format Extent; Medium

Identifier Source Language Relation isVersionOf; hasVersion; isReplacedBy; replaces; isRequiredBy; requires;

isPartOf; hasPart; isReferencedBy; references; isFormatOf; hasFormat; conformsTo

Coverage Spatial; Temporal

Rights Provenance

Europeana elements

Element Who is responsible Function

europeana:isShownAt or europeana:isShownBy

Provider must provide at least one of these elements - both if applicable.URL

Links to object

europeana:object Provider - if appropriate to the data URL

Source of thumbnail

europeana:provider Provider must provide this element. Controlled list.

Facet

europeana:type Provider must provide this element.Controlled list

Facet

europeana:unstored Provider – only if appropriate to your data. Text string

Container element

europeana:country

Europeana is responsible for providing all these elements.

Facet

europeana:hasObject System use

europeana:language Facet

europeana:uri System Identifier

europeana:usertag User provided tags (future)

europeana:year Facet, timeline

Normalised elements

• Language• ISO 369-1 standard two character code.

• Country • ISO 3166 standard

• Year• Four digit year from Gregorian calendar (YYYY)• Generated where possible from date supplied in <dc:date>

• Provider• Controlled list of names, in the language of provider

• Type• Controlled list (in English) of four types: Text, Image, Sound, Video• mapped from the diverse types used in source data (by provider)

Mapping and Normalisation

Three key reference documents for providers:

•ESE Specification V3.2

•Normalisation Guidelines V1.2

•ESE V3.2 XML schema + explanatory text

All available from the “Provide Content” section of the Europeana Group pages:

http://group.europeana.eu/web/guest/provide_content

Content Ingestion

……starting right from the beginning

Global Europeanaingestion workflow

Activity diagram: Steps I5 to I8

Content Ingestion

• Europeana has provided a Content Checker tool which has two parts:

• The Content Ingestor• Allows uploading of a data set• Validation against the ESE V3.2 XML schema• Importing the data into the database• Indexing of data• Caching of thumbnails

• The Test Portal• Separate from the operational portal• Allows provider to search for uploaded data

Content Ingestor

Select “new data set” - the ingestor automatically creates a new ID – “null05” in this example

Content Ingestor - upload

Content Ingestor - validate

Import

Index

Cache

Test Portal - search

Aggregation and the Content Strategy

Move on to a look at various aspects of aggregation in Europeana – the need for it, the approach to it.

Aggregation - terminology

• A Content Provider • an organization that provides metadata that enables access to its

digital objects

• An Aggregator • collects metadata from a group of content providers• transmits them to Europeana,• helps content providers with guidance on conformance with

Europeana norms • transforms metadata if necessary• supports the content providers with administration, operations and

training

Roles and benefits

• Content providers • Know their content and data best – fewer mapping errors• Look at the results before ingested in operational system

• Aggregators • Know the needs of the providers (domain, level)• Play a bridging role between providers and Europeana – single

point of contact, conduit for information in both directions

• Europeana• Supporting role for consultation, co-ordination, standardisation• Management of the 10 million objects• Offer the cross-domain and multi-lingual service

Organisational Model

Europeana

AggregatorAggregator

InstituteInstituteInstitute

Aggregator

Institute Institute Institute Institute Institute Institute Institute

Institute Institute Institute Institute Institute Institute Institute InstituteInstituteInstitute Institute

Types of aggregator

Matrix of aggregators:

• cross-domain, single domain, thematic

• level of operation – regional, national, European, global

Domain/Geographic coverage Regional National European Worldwide

Cross-domain

(horizontal)

Thuis in Brabant CulturaItalia Europeana

Single- domain

(vertical)

MovE (museums in East Flanders )

Direcção-Geral de Arquivos (Portuguese archives)

Dismarc (music)

TEL (books)

EFG (movies)

World Digital library WorldCat

Them-

atic

Cross domain Judaica ArXiv.org

Single domain Great War Archive

Why aggregation?

• November 2008 – 5 million items in Europeana

• July 2009 - content from over 1000 providers

• July 2010 – target of 10 million items

• Many individual organisations asking to contribute

• Currently there are six projects that aggregate content for Europeana (amongst other objectives)

• another three projects starting later this year

• Europeana Group site at: http://group.europeana.eu/web/guest/home

Why aggregation?

• Labour-intensive administration and ingestion processes • Not due to the amount of data – but the number of organisations

• Aggregation provides economies of scale allowing Europeana Office to remain relatively small

Promoting aggregation and providing services and expertise to aggregators will be key to Europeana’s Content Strategy

• Europeana is a small organisation!

Aggregation activities

• Aggregators survey• Establish shared issues and need for support

• Formation of Aggregators group• Council of Content Providers and Aggregators is now part of

Europeana Governance structure

• Training for aggregators• Generic and bespoke training days as the need arises

• Identifying potential aggregators

• “EuropeanaLabs” for Aggregators

• Test environment for content delivery and/or software development

Aggregation activities

• Handbook for aggregators. Content to be decided as part of survey but likely to cover:

• Europeana source code, APIs, content checker etc• Technical documentation for participating in Europeana• Templates and documentation for budget planning, fundraising,

revenue generation, sustainability• Templates and documentation for administrative and

organisational aspects of running an aggregator• Templates and documentation on IPR and European Licensing

framework• Documentation for establishing political and networks support• Templates and documentation for dissemination activities• Wiki for aggregator issues

Thank you!

[email protected]

Thank you!

[email protected]

isShownBy1

isShownAt2