metadata for digitization and preservation. introduction what is metadata and why it matters the key...
TRANSCRIPT
Metadata for Digitization and Preservation
Introduction
What is metadata and why it mattersThe key elementsHow metadata is createdWhere metadata is storedMetadata standardsHow much will it cost?
What is metadata?
Tony Gill – ARTstorMetadata refers to structured descriptions, stored as computer data, that attempt to describe the essential properties of other discrete computer data objects.
Big picture definition: the sum total of what can be said about any information object at any level of aggregation
What is metadata for?
World Wide Web consortium say metadata is:
to provide a means to discover that the data set exists and how it might be obtained or accessedto document the content, quality, and features of a data set, indicating its fitness for use.
Therefore we need to think:content, context and structure
Why Does Metadata Matter?
“Doing research on the Web is like using a library assembled piecemeal by packrats and vandalized nightly.” – R. Ebert, Internet Life
Finding the needle in the haystackManaging 1000’s of identical looking needlesFinding visual materials without viewing themExpanding usePreserving content and context
Key Elements
Administrative Metadata – used in managing and administering information resourcesDescriptive Metadata – used to describe or identify information resourcesPreservation Metadata – related to the preservation management of information resourcesTechnical Metadata – related to how a system functions or metadata behaveUse Metadata – related to the level and type of use of information resources
Structure of metadata
CollectionCollection CollectionCollection
WorkWork WorkWork WorkWork
ItemItem
ItemItem
ItemItemItemItem
ItemItem ItemItem
ItemItem
How metadata is created
By software toolsFrom resource content e.g. catalogues or databasesFrom creation tool e.g. digital camera or file header
By human interventionDescription by resource creator/ownerDescription by third party provider e.g. technical metadata
Creating and maintaining good metadata is time consuming and high cost
Where metadata is stored
Embedded in the resourceXIF information with TIFF images – viewable in PhotoshopFile headers or invisible copyright watermarking
Linked to resourceCreated as record in database format
Metadata Standards
Dublin Corehttp://vads.ahds.ac.uk/guides/creating_guide/sect43.html
DIG35 – for technical metadatawww.i3a.org/I_dig35.html
Categories for the Description of Works of Art (CDWA)www.getty.edu/research/institute/standards/cdwa/
Visual Resources Association Core Categorieswww.vraweb.org/
SEPIA working groupwww.knaw.nl/ecpa/sepia/workinggroups/wp5/cataloguing.html
Resource Description Framework (RDF)Encoded Archival Description (EAD)
How much will it cost?
How long is a piece of string?Depends upon the stop pointsThere is no one-size-fits-all or one-cost frameworkDepends upon the description already in place and how well the collection is currently indexedInhouse measurement
Balance skill, time, and automationPhotographs – descriptive metadata will not take <5 minutes per photograph and usually not >30 minutes
Traditional Functions
Traditionally we applied these functions to:
Paper based and microform based information resources
Monographs, serials, photographs, etc.
Access provided through local library services
Including inter-library loan
New Functions
Apply these functions to:
Web documents, online serials, digital images, digital collections, web sites, digital audio and video, born digital material, etc.
Access provided via the web and email
Why are these digital objects different?
Information explosionMultiple versionsInstant accessLess physical control over collectionSome are surrogatesIncreased user expectationsPreservation is more complex
Why do we need metadata to do these things?
Provides the necessary tools to manage, preserve and provide access to information in the digital environment
Our jobs have not fundamentally changed; but our collections have and our users have
What is metadata?
Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects.
The creation of metadata is governed by a body of standards, best practices and schemas that, when appropriately applied, work together to facilitate the management, description, and preservation of digital objects.
Types of metadata
Descriptive TechnicalStructuralAdministrativePreservation
About Metadata
SetsEncoding standards/schema
Metadata set = rulesEncoding schema = representation
Metadata Sets
AACR2Dublin CoreVisual Resources AssociationMetadata Object Descriptive SchemaText Encoding InitiativeEncoded Archival Description
Encoding Standards/Schema
HTMLMARCMetadata Encoding Transmission Standards (METS)Resource Description Framework (RDF)XMLZ39.50
Choosing Sets and Schema: Interoperability
Why is interoperability important?How is it achieved?
Crosswalks/mappingStandardization
SchemaControlled vocabulary
Open Archives Initiative (OAI)Common elements harvested and made searchable from one interfaceVery basic level of description, working to develop it to make it better
Choosing an Encoding Schema
The more digitized objects you have; the more complex they are; the more data sharing you do; the more important it will be to utilize an encoding schema
XML is the most prevalent encoding schema
All metadata schema have XML based encoding schema already available
Factors in Metadata Decisions for Digitization Projects
AudienceWorkflow and TimelinesPreservationInteroperabilityNumber of and complexity of digitized objects
What Do You Want To Do?
Digitize for access only?Descriptive Some administrative
Digitize for preservation?DescriptiveAdministrativeTechnicalEventually preservation
What Materials Are You Digitizing?
The more complex the material, the more complex your metadata
Structural metadata becomes vital
For example….
Complex Digital Objects
Original = 150 page book with 7 chaptersDigitization results in 4 versions of the same content
150 master TIFF images150 JPEG access images150 JPEG thumbnail images7 ASCII text transcripts (one per chapter)
Files to manage = 457
Complex Digital Objects and Structure
Which images belong in which chapter?
Which digital version is which?
Where is chapter 3 in each version?
There is technical metadata for each digital version AND each digital file. How do we relate the correct metadata to the correct version/file?
Digitization and Metadata
Descriptive metadata for access and administration
Technical metadata for preservation
Structural metadata for control over complex digitized objects
Preservation metadata for management within a digital archive
Descriptive Metadata
Information users will have to gain access to the digitized material
Should facilitate access to the original source material whenever possible
Access via a web interface search engine
User friendly
Standardized
Well written
Common Descriptive Metadata Sets for Digitization Projects
Visual Resources Association
Metadata Object Descriptive Schema
Encoded Archival Description
Text Encoding Initiative
Dublin Core
MARC
Choosing a Set
Should we use MARC?Integrated into existing workRules for creation already existLess technical infrastructure necessaryComplex – more trainingTime consuming
Should we use something else?Collaborating? Interoperability concerns?Staff expertiseSize of projectExhibit and web access
Choosing a Schema
Can we use both?MARC for collection levelMetadata for item level
MARC for allCrosswalked to web accessible database
Database for allCrosswalked to MARC
Implementation
What informational elements do you need?List them, making sure to think through web design, audience and access issues
What descriptive schema schema will you use?
MARCDublin CoreVRAMODS
Implementation
Build database or implement content management system for metadata storage
Map the fields to the schema you have chosen
Document the mapping
Create Style Guide for your project
Staff creates the metadata manually according to Style Manual and established work processes
Metadata is reviewed for quality
Implementation
Metadata is stored and made web accessible
XML (if supported)
Back-ups, “master” metadata record, and/or web access
Dublin Core
Title Creator Subject /Keywords Description Publisher Contributor Date Audience
Resource Type FormatResource Identifier Source Language Relation Coverage Rights Management
Characteristics of the Dublin Core
All elements optional
All elements repeatable
All elements displayable in any order
Extensible (a starting place for richer description)
International
Extensibility
Refining mechanism for elements improve sharpness of description with qualifiers
Means for extending element setcomplementary packages of other types of metadata (administrative, rights management, discipline-specific, etc)
Technical Metadata
Information file that facilitates management and preservation of the file
Technical information about:
Master file (TIFF) Scanning specifications (resolution, bit depth, etc)
Derivative
Storage – compression
NISO Metadata
Purpose: To define a standard set of metadata elements for digital images
Facilitate interoperability
Support long term management of and continuing access to digital images
Tagged Image File Format – Background and
Metadata
TIFF is a specification for a file format
Spec includes a “directory” or “header” section which consists of several metadata fields
A TIFF can consist of several images
Directory/Header information is unique for each image
Tagged Image File Format – Background and
Metadata
The TIFF spec is implemented differently by different applications
Scanning softwareUsually “bundled” with your scanner
Controls the scanner or camera and passes information to computer storage or image editing software
Outputs image files in specific image file formats
Determines what “flavor” TIFF is produced
Determines what metadata fields are utilized and how they are utilized
Tagged Image File Format – Background and
Metadata
Other software may add to the TIFF metadata, such as Photoshop
Tags can be added, using particular software
TiffKit (no longer supported)
Black Ice Software Development Kit
Captiva’s Input Accel
Others
Technical Metadata -- Options
Options?
Use as much as you can; create manually using database and/or XML based on the NISO draft and the LC encoding schema
or
Use DC: Format element
Using DC Format for Technical Metadata Elements
File sizeQuality (bit depth, resolution)Extent (pixel dimensions, play time, pagination)CompressionChecksum value (error detection)Object producer (name of scanning technician, vendor who scanned)Creation Hardware (digital camera, flatbed scanner,etc)Creation Software (name and version)
Encoding: METS
Metadata Encoding and Transmission Standard
Product of Making of America project
Digital Library Federation Initiative
Provides an XML schema for encoding metadata necessary for:
management of digital library objects
exchange of those objects (OAIS)
Brings all the metadata together
Encoding: METS
Five Sections of a METS document
Descriptive
Administrative
File Group
Structural Map
Behavior
Encoding: METS Five Sections
Descriptive Metadata
may point to descriptive metadata external to the METS document
MARC
may imbed the descriptive metadata within the METS document
Encoding: METS
<dmdSec ID="dmd002"> <mdWrap MIMETYPE="text/xml" MDTYPE="DC" LABEL="Dublin Core Metadata"> <dc:title>Alice's Adventures in Wonderland</dc:title> <dc:creator>Lewis Carroll</dc:creator> <dc:date>between 1872 and 1890</dc:date> <dc:publisher>McCloughlin Brothers</dc:publisher> <dc:type>text</dc:type> </mdWrap> </dmdSec>
<dmdSec ID="dmd003"> <mdWrap MIMETYPE="application/marc" MDTYPE="MARC" LABEL="OPAC Record"> <binData>MDI0ODdjam0gIDIyMDA1ODkgYSA0NU0wMDAxMDA...(etc.) </binData> </mdWrap> </dmdSec>
Encoding: METS Five Sections
Administrative Metadata
information regarding file creation and stored
intellectual property
metadata regarding the original
information regarding provenance of the digital object (technical metadata)
may be external or internally encoded
Encoding: METS
<amdSec ID="AMD001"> <mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO Img. Data"> <niso:MIMEtype>image/tiff</niso:MIMEtype> <niso:Compression>LZW</niso:Compression> <niso:PhotometricInterpretation>8</niso:PhotometricInterpretation> <niso:Orientation>1</niso:Orientation> <niso:ScanningAgency>NYU Press</niso:ScanningAgency> </mdWrap> </amdSec>
<file ID="FILE001" ADMID="AMD001"> <FLocat LOCTYPE="URL">http://dlib.nyu.edu/press/testimg.tif</FLocat> </file>
Encoding: METS Five Sections
File Groups
used to group together related files
One file group lists all of the files which comprise a single electronic version of the digital library object
Master document (TIFF)
Access copy or copies
Perhaps a transcript
Encoding: METS
<fileGrp> <fileGrp ID="VERS1"> <file ID="FILE001" MIMETYPE="application/xml" SIZE="257537" CREATED="2001-06-10"> <FLocat LOCTYPE="URL"> http://dlib.nyu.edu/tamwag/beame.xml
</FLocat> </file> </fileGrp>
Encoding: METS
<fileGrp ID="VERS2"> <file ID="FILE002" MIMETYPE="audio/wav" SIZE="64232836" CREATED="2001-05-17" GROUPID="AUDIO1"> <FLocat LOCTYPE="URL"> http://dlib.nyu.edu/tamwag/beame.wav </FLocat> </file> </fileGrp>
Encoding: METS
<fileGrp ID="VERS3" VERSDATE="2001-05-18"> <file ID="FILE003" MIMETYPE="audio/mpeg" SIZE="8238866" CREATED="2001-05-18" GROUPID="AUDIO1">
<FLocat LOCTYPE="URL"> http://dlib.nyu.edu/tamwag/beame.mp3 </FLocat> </file> </fileGrp> </fileGrp>
Encoding: METS Five Sections
Structural Mapoutlines the intellectual structure of the content of the digital resource
Encoding: METS
<structMap TYPE="logical"> <div ID="div1" LABEL="Oral History: Mayor Abraham Beame" TYPE="oral history"> <div ID="div1.1" LABEL="Interviewer Introduction" ORDER="1"> <fptr FILEID="FILE001"> <area FILEID="FILE001" BEGIN="INTVWBG" END="INTVWND" BETYPE="IDREF" /> </fptr>
Encoding: METS
<fptr FILEID="FILE002"> <area FILEID="FILE002" BEGIN="00:00:00" END="00:01:47" BETYPE="TIME" /> </fptr>
<fptr FILEID="FILE003"> <area FILEID="FILE003" BEGIN="00:00:00" END="00:01:47" BETYPE="TIME" /> </fptr> </div>
Encoding: METS Five Sections
Behaviorused to associate executable behaviors with contentdefines the behaviorscan contain executable code to run the behaviors
METS: Encoding
<METS:behaviorSec ID="DISS1.0" STRUCTID="S1" BTYPE="uva-bdef-image-w:101" CREATED="2002-05-25T08:32:00" LABEL="Watermark Behaviors" GROUPID="DISS1" ADMID="AUDREC1" STATUS="A"> <METS:interfaceDef LABEL="Photo Watermark Behavior Definition" LOCTYPE="URN" xlink:href="uva-bdef-image-w:101"/> <METS:mechanism LABEL="Watermarking Behavior Mechanism for Images" LOCTYPE="URN" xlink:href="uva-bmech-image-w:112"/> </METS:behaviorSec>
Preservation Metadata
If you are digitizing with preservation in mind, ALL metadata is preservation oriented
Metadata must be of the highest quality that is possible
Incorporate the creation and management of metadata into your project at the planning stage
Preservation Metadata
Designed to facilitate the process of preservation and management in a digital repository
Generally implemented at the time a digital resource is moved to a digital archive
Several schemas under development for particular operating environments and/or programs
Preservation Metadata Sets
CEDARS – Consortium of University Research Libraries,
Exemplars in Digital Archives project
www.leeds.ac.uk/cedars/guideto/metadata/
NLA -- National Library of Australia
www.nla.gov.au/preserve/pmeta.html
NEDLIB – Networked European Deposit Library
www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm
OCLC Digital Archive
www.oclc.org/digitalarchive/about/works/metadata/
Preservation Metadata
Inference that there is a core of metadata necessary for preservation regardless of the preservation strategy
More work needs to be done to identify the particular elements necessary for particular preservation strategies
Metadata Wrap up
New tools for new resources
Metadata schema = rules
Encoding schema = mark up and storage
Descriptive Metadata
Use an established metadata schema
Create a project style guide to facilitate standardized, high quality creation
Store in content management software or database to provide web access
Document the database design and map fields to DC (or other schema) within the documentation
Encode and back up using XML, if technically feasible
Technical and Structural
Use TIFFDocument scanning software used as TIFF has many different “flavors”
Use as much of the NISO draft standard as possible – watch for implementation developments, or…Use descriptive schema to collect technical informationStructural metadata ( METS) to manage numerous, complex digital objects, or…Documented file naming and structures
Planning
Plan for the costs associated with good metadataCreation and researchTechnical resources (staff, hardware, software, backups)
Get a team of appropriate people togetherIdentify goals, elements, and research appropriate schema and encodingStyle Guide for descriptive metadataCreate the highest quality, most thorough metadata possible in your situationDocument mappings
Some Conclusions
Metadata is a work in progress at both the community level and the project level
Use standards
Technical metadata will be easier to implement in time
Structural metadata is vital for large projects with complex digital object
Preservation metadata isn’t standardized yet