sshelco 2016 metadata workshop
TRANSCRIPT
Plays Well with Others:
Getting Your Digital Collections Metadata Ready
for the World2016 SSHELCO Annual
March 17, 2016
Linda Ballinger, Penn StateDoreva Belfiore, Temple UniversityBill Fee, State Library of PennsylvaniaLeanne Finnigan, Temple UniversityKristen Yarmey, University of ScrantonElise Warshavsky, Temple University
The Pennsylvania Digital Collections Project (PDCP) Metadata team!
On the agendaPDCP/DPLA OverviewMeet the aggregatorWhy metadata mattersField by Field metadata madness!
Derived FieldsRequired FieldsHighly Recommended FieldsRecommended FieldsOptional Fields
Before we startQ&A throughoutFun breaks at panelists’ discretion!
Slides and guidelines will be available.
Most of all:Don’t panic. We’re all in this together.
PDCP Project Overview
Toward a PA DPLA Hub
August 2014: meeting at the Free Library of Philadelphia
Initiated by Joe Lucia and Stacey Aldrich, former PA State Librarian
Including representatives from a number of institutions across the state
Founders
Why get involved?DPLA as major discoverability conduit:
Worldwide exposure for PA content
DPLA as a means of working efficiently:Collaboration at the cross-institutional levelTaking advantage of economy of scaleDPLA portal / api vs. customized siloes
DPLA Hub and Spoke ModelContent Hubs:
Single institutions, 200K+ objects, i.e. NARA, Hathi Trust, NYPL
Service Hubs:Content aggregation for many
institutionsState/regional level; ideally 1:1 ratioDigital Commonwealth (MA), Mountain
West Digital Library, Empire State Digital Network (NY)
Digitization and Repository Support Activities
Digitization:For organizations that have not started
digitizing materials, or have not done much
Potential for remote, local and mobile digitization options (a.k.a. “scannebagos”)
Provided by the State Library of Pennsylvania
Content Hosting:For organizations that already have digital
files but no current digital repository capabilities
Provided by POWER Library (HSLC)Free for Pennsylvania institutions
SUCCESS!PDCP Announced as DPLA
Pennsylvania Service hub, August 28, 2015!
Estimated Timeline:September, 2015 - OrientationOctober-November, 2015 - Signing Legal
Agreements, Metadata RevisionOctober-December, 2015 - Metadata
normalization, harvesting tests and QAEarly 2016 - Planned live ingest of records
into the DPLA!
PA-DPLA Aggregator
Proof-of-concept prototypeDevelopment: Pennsylvania State University / Temple University
partnershipDec. 2014 - Mar. 2015Hydra (Fedora) - Open Source Platform
Harvesting & exposing metadata via OAI-PMHhttps://github.com/tulibraries/dplah
Released production versionSummer 2015
PA-DPLA Aggregator
OAI-PMH Metadata - Human readable with faceting browsing and searching
http://libcollab.temple.edu/aggregator
PA-DPLA Aggregatorhttp://libcollab.temple.edu/aggregator
PA-DPLA Aggregator
Kitchen Memories, Scranton Public Library, http://content.lackawannadigitalarchives.org/cdm/ref/collection/cookbooks/id/1657
http://libcollab.temple.edu/aggregator
Testing Data http://libcollab.temple.edu/aggregator-testing/
Prototype Harvested Content“Lowest hanging fruit”:
○ OAI-PMH harvestable data■ 29 institutions, 147K+ harvested records
○Primarily targeting collections from PDPC Steering and Planning Committee institutions
○Keep numbers manageable for testing purposes
○Scalable to full production mode for the future
Why do we need good metadata?
http://iknowwhereyourcatlives.com/
http://libcollab.temple.edu/aggregator/catalog/dplapa:TEMPLE_p15037coll3_31399
http://dp.la/map?utf8=%E2%9C%93&q=textiles
http://dp.la/map?place[]=Pennsylvania&q=textiles&utf8=%E2%9C%93
http://dp.la/timeline?utf8=%E2%9C%93&q=textiles#/1723
http://dp.la/apps?page=2
Collection Policies
CC0 Metadata
Contributing institutions are required to share their metadata and thumbnails under a CC0 license (full access - no rights reserved).
The digital objects themselves retain any original specified rights.
Collecting ScopeThe following types of collections are NOT currently accepted by the DPLA:Scholarly materials: ETDs, Journal ArticlesFinding Aids: EADs, Collection GuidesAggregate Description: Objects described at the
folder, series, or collection level instead of the item level
Items that don’t resolve to a publicly-accessible URLIndividual page-level objects instead of compound
ones
Restricted ItemsIf your institution needs to restrict any digital objects at the item level, for copyright or other reason:Enter the string pdcp_noharvest in fields that map to either of these Dublin Core values:dcterms:accessRightsdc:rights
Restricted Area for Humans Only, Ronyasoft, http://www.ronyasoft.com/products/poster-forge/templates/funny-signs/restricted-area-sign-template/images/restricted-area-sign-template.jpg
Derived Fields
Derived FieldsDerived fields are those metadata fields that are created by the PDCP aggregator automatically from the OAI-PMH feed.
Happy Face, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p15037coll3/id/6541
Derived = “Dont worry, be happy!”
ThumbnailThumbnails are the small preview versions of your digital object that are shown both in your repository and in the DPLA.
They are important because they give viewers a confirmation that they have found (or not found) what they are looking for.
ThumbnailThumbnails can be derived by our aggregator from these common repository systems:
CONTENTdmBepressOmekaVUDL… and more to be added
Thumbnail
Portrait of Zapata, Kutztown University, http://digital.klnpa.org/cdm/ref/collection/asaro/id/21
ThumbnailFor other systems, we need a consistent path where the thumbnail is housed, i.e.:
http://www.server.org/repo/thumbs/$identifier/
CollectionThe collection name is set up by the team before harvesting. It generally matches the digital collection name found online.
Contributing InstitutionThe contributing institution name refers to YOUR ORGANIZATION and is set up by the team before harvesting.
Are YOU in This?, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p16002coll9/id/2952
Contributing InstitutionThe Contributing Institution name can also be pulled automatically from the following DC fields:
ContributorCreatorPublisherSource
Intermediate ProviderIf your data is hosted by an aggregator or common repository then we list that entity as an Intermediate Provider, i.e.:
Keystone Library Network (KLN)Lackawanna Valley Digital Archives (LVDA)
POWER Library (via HSLC)
Resource LocationThe Resource Location is a trackback to the original collection URL for a digital object.
Example: http://content.lackawannadigitalarchives.org/cdm/ref/collection/SPL/id/36
Resource Location
Resource Location
Early Library Staff, Scranton Public Library, http://content.lackawannadigitalarchives.org/cdm/ref/collection/SPL/id/36
Resource LocationRequired by DPLA to present your original data record.
Can be derived from the OAI-PMH data feed for typical systems:
CONTENTdmBepressOmekaVUDL Can be custom mapped if needed for other systems, e.g.:http://www.server.org/repo/$identifier/
Required fields
TitleOther than the thumbnail, the title is often the first piece of information a user sees on a results listShould be the name by which an object is known, not a file name
LanguageRequired if appropriate
3 letter ISO 639-2 language codes are preferredAggregator normalizes these codes to full
language names for displayExamples:
eng ---> English lat---> Latin
ita ---> Italian san---> Sanskrit
spa ---> Spanish vie---> Vietnamese
Language
RightsContains information about rights associated with the resource
“In the public domain and may be used without copyright restriction.”
“Content is under copyright of the University of Scranton.”http://creativecommons.org/licenses/by-sa/3.0/
REMINDER: DPLA will only accept objects that are available and viewable to the general public
pdcp_noharvest
Rights‘Getting it Right on Rights’
Working group (DPLA, Europeana, etc.)
Released white paper May 2015 and opened it up for comments
Standardized rights statementsComing soon!
RightsThe aggregator can add rights statements at the collection level
TypeThe nature or genre of the resource
DCMI Type Vocabulary recommendedAssign ‘Text’ type to images of texts
Think of the user
TypeTypes used by DPLA:
text, image, sound, moving image, physical object
The aggregator can map your local types to these at the collection/seed level
Type
Highly Recommended Fields
Date CreatedDate of creation of the original resource.
(Not date digitized.)(Not date range or time period.)
Date CreatedMap todcterms:created (preferred)ordc:date
Date CreatedUseISO 8601 (W3CDTF) format
which looks likeYYYY-MM-DDYYYY-MMYYYY
http://xkcd.com/1179/
Date CreatedWatch out for:Extra spaces or symbols
_1943Missing digits
1943-1-5Placeholders and qualifying terms
Unknownn.d.ca. 19501950s
PlaceSpatial characteristics of the resource. Geographic location relevant to the original item.
PlaceMap todcterms:spatial (preferred)ordc:coverage
PlaceWhere is this thing?
PlaceMultiple choice:PhiladelphiaPhiladelphia; PennsylvaniaPhiladelphia (Pa.)Philadelphia, Pennsylvania, United StatesSeventh and Sansom Streets (Philadelphia, Pa.)
Franklin Institute (Philadelphia, Pa.)
Facade of the original Franklin Institute building
prior to moving to the Parkway in 1934.
PlaceUseLCNAF (preferred)orTGN, FAST, ...
Addresses, lat/long, or other location markers can also be mapped to dcterms:spatial.
PlaceExamples: Pittsburgh (Pa.)Allegheny County (Pa.)Harrison (Allegheny County, Pa. : Township)
PlaceWatch out for:Place ≠ Time Period
SubjectThe topic of the resource.
SubjectMap todc:subject
SubjectMany variations on a theme:NewspapersStudent newspaperNew Holland (Pa.) NewspapersScranton (Pa.) -- NewspapersWest Chester University Student NewspapersCollege student newspapers and periodicals -- Pennsylvania --
ScrantonUniversity of Scranton -- Students -- NewspapersLock Haven University of Pennsylvania Student Newspaper Archive
SubjectUseLC Authorities (preferred)orDDC, MeSH, UDC, AAT, TGN…
For LCSH, use space and double hyphen:term -- term
SubjectExamples: Harlem Renaissance -- MapsCoal miners -- Pennsylvania -- Social conditions
Cassatt, Mary, 1844-1926
SubjectWatch out for:Quotation marks
"D.O.R.A at Westminster"Separate terms with semi-colons
, Holt, Colbin32 Carat Club,anniversary ,charitable organizations,social
servicesOdd symbols or characters
2nd &
SubjectWatch out for:Standardization
SubjectWatch out for:Local terms with limited global meaningKSOMWML20
Recommended Fields
Creator - Old way(DON’T DO THIS)
Creator – Proper way
Description• The 500 field of the Dublin Core world
Format
From New York Heritagehttp://cdm16694.contentdm.oclc.org/cdm/ref/collection/p15109coll6/id/2083
From African Americans Seen Through the Eyes of the Newsreel Cameramanhttp://collections.contentdm.oclc.org/cdm/singleitem/collection/p9002coll1/id/277/rec/1
Identifiers• OCLC # 12177842• Call # PT 1.1• FRBR linking code• OCLC number (of object)• CONTENTdm number• CONTENTdm file name• Identifier• Generated identifier• HPHWPZ201404000165 (unique, assigned)
Publisher
Publisher State Library of Pennsylvania NO!
Publisher Mount Pleasant, Pa.: Mount Pleasant Press, 1906
Optional Fields
Alternate Title
Contributor
NO!
Wrap-Up
What’s next?See PDCP Metadata Guidelines (still in draft)
Let us know if you have feedback!We plan to finalize v.1 in AprilLiving document
Would your institution like to contribute to the DPLA?Email: [email protected] Institutions will be forwarded to different organizations based
upon needs and readiness for data harvest (harvesting and metadata support, repository support, digitization support)
Checklist to Contribute DataPermission letter agreeing to share
metadata and thumbnails to DPLA under a CC0 license
Data available on a publicly accessible website
Ability to share metadata via OAI-PMH or CSV file
Staff available to work with PDCP about metadata issues
Stay in touch:Come to PA Backwards session tomorrow morning (9am)!Email the PDCP team: [email protected] Twitter: Follow us at @pdcp_pa PADIGITAL Listserv - general information about statewide digital
[email protected] Send a message to listserv@albright org with the text “subscribe
padigital” in the body
Resources and SupportDocumentation : https://
drive.google.com/folderview?id=0B-icpMLW3cRXQmVQMnJoMkZJUDg&usp=sharing
Onboarding : Leanne Finnigan ([email protected]) or
Elise Warshavsky ([email protected])
Online office hours
Webinar workshops
METADATA
Original: The Doctor Is In, Peanuts Worldwide, LLC.
0 ¢
Thank you!
http://xkcd.com/1543/
PDCP Metadata TeamLinda BallingerDoreva Belfiore
Bill FeeLeanne FinniganKristen Yarmey