metadata ingestion plan presentation
TRANSCRIPT
e u r o p e a n a s o u n d s . e u
Metadata Ingestion Training 23-24 October 2014 NTUA, Athens
Metadata Ingestion Plan Targets Reporting progress Andra Patterson Metadata Manager, Europeana Sounds
e u r o p e a n a s o u n d s . e u
Metadata Ingestion Plan
Takes into account:
• 4 main stages of aggregation
• Needs of data providers for scheduling
• Info from Rights and metadata ingestion survey
• Info from emails, phone calls, etc.
• Targets from DoW
Flexible - may need to take into account:
• Changing needs of data providers during project
• Needs of Europeana Ingestion Team
e u r o p e a n a s o u n d s . e u
Aggregation – 4 main stages
Content selection
Metadata preparation
Metadata ingestion
Metadata curation
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 1
Content selection
Select the objects for which you will provide metadata to Europeana Sounds
• According to selection guidelines in D1.1 Content Selection Policy
• According to figures in Table 0, DoW (part B, p.22-27)
Establish the correct rights statements for the objects
• Use Europeana Available Rights Statements
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 2
Metadata preparation
Prepare your metadata and export in .xml or .csv
• Check that mandatory elements are included or can be added
• Check that source metadata is well-formed
• Ensure that digital objects are accessible via links in metadata
• Ensure that objects that can be made available for re-use fit criteria in Europeana Content Re-use Framework • File quality; Rights
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 3
Metadata ingestion
Ingest your metadata records using MINT tool
• MINT
• Web-based tool
• Developed by NTUA
• Used to map, ingest and deliver metadata to Europeana
• Map metadata to schema defined in D1.4 EDM Profile for Sound
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 4
Metadata curation
Enrich your metadata records using MINT tool
• Normalise metadata
• Enrich metadata
• Add controlled vocabulary terms
e u r o p e a n a s o u n d s . e u
Targets Table 0 Underlying Content (Part B, p.22-27) = what we are contracted to achieve
e u r o p e a n a s o u n d s . e u
Targets
Progress measured against Performance Monitoring Table (Part B, p.91)
“Available for re-use” Europeana definition:
PDM, CC0, CC-BY, CC-BY-SA
e u r o p e a n a s o u n d s . e u
Targets
Targets for each “metadata set”
Set 1: October 2014-January 2015 (Milestone 5)
Set 2: February 2015-January 2016 (no formal Milestone)
Set 3: February 2016-July 2016 (Milestone 6)
Milestones say: “Content and metadata ready for ingestion”
e u r o p e a n a s o u n d s . e u
Targets
0
100000
200000
300000
400000
500000
600000
700000
800000
Re-use subset
Audio-related
Audio
Chart showing required (minimum) metadata ingestion progress
e u r o p e a n a s o u n d s . e u
Reporting progress – what to count
• DoW requires us to count digital objects
– Digital objects must be counted the same way as in the DoW
• Audio objects
• Audio-related objects
• Objects “Freely available for re-use”
– These are a subset of the total, not additional items
• Also count metadata records
– Useful to compare what you have prepared for publication with what is actually published on Europeana
e u r o p e a n a s o u n d s . e u
Each line is a metadata record
Counting BL digitised sound
One metadata record usually represents one digital object
e u r o p e a n a s o u n d s . e u
No duplicates, please!
Keep track internally of what you have supplied to Europeana already for this project and for other Europeana projects – no duplicates!
e u r o p e a n a s o u n d s . e u
Each line is a metadata record
Number of digital objects counted for DoW Table 0
Counting BL digitised printed scores
One metadata record often represents many digital objects
e u r o p e a n a s o u n d s . e u
Reporting progress – how to record
• Record statistics in your Google or Excel spreadsheet
– See Europeana Sounds Manual for Data Providers section 3.3.3 for links to Google spreadsheets (will be active next week!)
• Update your spreadsheet by 3rd Friday of each month
• Targets – are based on Table 0, Metadata Ingestion Survey, emails
– are distributed across the 3 metadata sets
– are the minimum required - feel free to do more!
e u r o p e a n a s o u n d s . e u
Sample Google spreadsheet showing targets for BL – edit the orange cells!
e u r o p e a n a s o u n d s . e u
Metadata Ingestion Training 23-24 October 2014 NTUA, Athens
Metadata Quality Meaningful metadata Rights Controlled vocabularies Andra Patterson Metadata Manager, Europeana Sounds
e u r o p e a n a s o u n d s . e u
Metadata Quality
• The richer the metadata, the better for discovery by users
• Europeana Sounds provides an opportunity for us to enhance our metadata and check quality
• EDM mandatory elements ensure a minimum metadata standard
• Metadata Quality Task Force (end 2013-mid 2014)
– Quality of metadata varies between institutions
– Need meaningful information in fields
e u r o p e a n a s o u n d s . e u
Metadata Quality – Main Issues
• To aid discovery, metadata needs to provide context to the CHO
– Include a meaningful title and/or description
• Metadata needs to be understandable to
– Humans (e.g. rich descriptions, rights information)
– Machines (e.g. UTF-8 coding, xml-lang)
• Metadata needs to be standardised
– EDM-compliant
– Controlled vocabularies (edm:type, ebucore:hasGenre)
e u r o p e a n a s o u n d s . e u
Rights
• Establish the rights of your web resources
– May need to discuss with colleagues
– Use information & resources from WP3
• Important to use the most appropriate rights statement for your web resources
– Tells users what they can or can’t do with an object
– Web resources of Public Domain CHOs should be labelled as Public Domain – discuss any issues about this with Andra Patterson or Lisette Kalshoven
Right! Getting
e u r o p e a n a s o u n d s . e u
Rights – Public Domain Works • Europeana Public Domain Charter
– “Digitisation of Public Domain content does not create new rights over it”
• Europeana Sounds Consortium Agreement
– “… where possible … content which is in the Public Domain … will be made available without any access restriction and will be labelled as being in the Public Domain …”
• Some data providers may encounter issues with this, e.g.
– Commercial re-use considered inappropriate
• Academic, artistic, private OK; some commercial re-use considered inappropriate; sponsorship funds provided according to this (ONB)
– Desire to refinance digitisation activities
• Government funding is basic – charging fees for high quality images contributes to refinancing digitisation (ONB)
• However, non-profit institutions run risk of losing non-profit status by earning too much from commercial users! (ONB)
– Legal
• Case law in UK is inconclusive so far (BL)
e u r o p e a n a s o u n d s . e u
Rights - EDM edm:ProvidedCHO dc:rights
– Name of rights holder of CHO, or more general rights information
edm:WebResource dc:rights
– Name of rights holder of a particular web resource, or more general rights information
edm:WebResource edm:rights (Strongly recommended)
– Formal rights statement for a particular web resource
– Overrides statement in ore:Aggregation edm:rights (see below)
– Choose from http://pro.europeana.eu/available-rights-statements
ore:Aggregation edm:rights (Mandatory)
– Formal rights statement for a particular web resource without edm:rights (see above)
– Formal rights statement for a group of web resources without their own edm:rights, when these are attached to one CHO
– Choose with care from http://pro.europeana.eu/available-rights-statements
e u r o p e a n a s o u n d s . e u
What is this?
Danish pastry
Wieneråtta
Wienerbrød
Kopenhagener Plunder
Dänischer Plunder
Danish
e u r o p e a n a s o u n d s . e u
Vocabularies
• Enable users to search and navigate across different metadata sets
• Important in Europeana Portal, where different data providers use different vocabularies
• Bring together using linked data where possible
– LC Linked Data Service
– VIAF (Virtual International Authority File)
Controlled
e u r o p e a n a s o u n d s . e u
Controlled Vocabularies – Linked Data
VIAF Virtual International Authority File
e u r o p e a n a s o u n d s . e u
Controlled Vocabularies
• EDM vocabularies
– edm:rights • http://pro.europeana.eu/available-rights-statements
– edm:type • TEXT, VIDEO, SOUND, IMAGE, 3D
• Europeana Sounds new vocabularies
– dcterms:medium • Europeana Carrier Types Vocabulary
– ebucore:hasGenre • Europeana Music Genre/Form Vocabulary • Europeana Non-Music Genre/Form Vocabulary
Shared,
e u r o p e a n a s o u n d s . e u
Europeana Vocabularies – Carrier Types
Europeana Carrier Types Vocabulary
DISMARC dmFormats
RDA Carrier Types
dcterms:medium
e u r o p e a n a s o u n d s . e u
New Europeana Vocabularies – Genre/Form
Europeana Music Genre/Form Vocabulary
Europeana Non-Music (Generic) Genre/Form
Vocabulary
ebucore:hasGenre
DISMARC dmGenre
DBpedia
D1.1 Content Selection
Policy broad categories
Freebase
e u r o p e a n a s o u n d s . e u
Broad Genre/Form Concepts (Mandatory)
Europeana Music Genre/Form Vocabulary
Europeana Non-Music (Generic) Genre/Form
Vocabulary
Broad Genre (Mandatory)
• Music • Spoken word • Radio • Environment
ebucore:hasGenre
e u r o p e a n a s o u n d s . e u
• Europeana Sounds Manual for Data Providers section 4.5 has links to recommended vocabularies
• Genre/Form
• Subjects
• Places
• Carrier types
• Digital formats
• Medium of performance
• Names
• Roles
• Works
More About Controlled Vocabularies