metadata: principles, practices, challenges sandra payette digital library research group cornell...
Post on 22-Dec-2015
219 views
TRANSCRIPT
Metadata:Principles, Practices, Challenges
Sandra PayetteDigital Library Research Group
Cornell [email protected]
Metadata CREATOR: Plato
TITLE: The Republic
Image 1 cdrom 1Image 2 cdrom 1Image 3 cdrom 2
Metadata is structured data about data that facilitates discovery, use, and
management of the data to which it refers.
Access Control List
Metadata enables …
Resource Discovery Resource Presentation and
Navigation Rights Management Preservation
We must support all these functions, but also recognize that these are artificial categories for metadata.
General Principles:Metadata
Designing one kind (e.g, descriptive) without consideration of others (e.g., usage, rights, preservation) can compromise utility and interoperability over time
The metadata problem should be approached analytically and methodically, not ad hoc
Metadata is expensive; must cost-benefit Common metadata sets often represent a
flattened or simplified view of reality.
Challenge of Interoperability
Semantic
Structural
Syntactic
Media: CD-ROM
(refers to physical storage medium for digital image)
Media: 35mm film
(refers to original source)
Challenge of Interoperability
Semantic
Structural
Syntactic
Date: 10-6-99 Date: 6-10-99
Type: image/tiff Type: TIFF 4.0
Author: Sam Brown Author: Brown, S
Challenge of Interoperability
Semantic
Structural
Syntactic
<META name="DC.creator" content=”Junger, S"><META name="DC.title" content=”The Perfect Storm">
Creator: Junger,STitle: The Perfect Storm
Syntaxes for Expressing Metadata
HTML META tags Embed metadata in HTML documents Search engines can extract it
XML and SGML Communities can define own vocabularies
(DTDs/Schema) Separation of structural description from rendering info Increasing support for XML in browsers and other
software Resource Description Framework (RDF) using XML
Express complex relationships between resources Proprietary
TIFF headers Vendor-specific data structures
Functional Views of Metadata
Resource Discovery Resource Presentation and
Navigation Rights Management Preservation
Resource Discovery on Web
Scale: much content is not visible or not found, so it’s not indexed
Format: much content non-textual (e.g., images!)
Context: lacking! (causing precision error in search)
Rights: valuable content hidden behind firewalls
lycos
excite
Collection “A”Web Server
ImageDB
Text-basedsearch engines
Challenges
Resource Discovery on Web
lycos
excite
Collection “A”Web Server
ImageDB
“A”searchengine
MetadataStore
WebBrowser
Welcome to Collection ‘A”
Search:
Context established - customized metadata created at source
Resource Discovery:Dublin Core
15 descriptive elements Facilitates simple resource discovery on
Web Cross-disciplinary, international, genre-
independent Very active and accepted “standard”
100+ major projects 20+ countries
http://purl.oclc.org/dc/
Resource Discovery:Dublin Core Caveats
Designed for simple discovery, don’t force it to do more than it can (rights, preservation)
Qualification – can compromise meaning and interoperability
“Stratford”Hamlet
“Shakespeare”dc:creator.playwright
dc:creator.birthplace
Roll-up to root element and … “dc:creator = Stratford” ????
Open Archives Initiative (OAi)
Specification of simple metadata harvesting protocol to facilitate interoperability
Adoption of unqualified Dublin Core Element Set as required metadata
Common XML container format for metadata packaging
Institutional backing of CNI (Coalition for Networked Information) DLF (Digital Library Federation)
http://www.openarchives.org/
Exposing and Exchanging Metadata using OAi
harvester
ImageCollections
Electronicjournals
OPAC
E-texts
metadata
OAi
OAi Registered Repositories
arXivOCLC Thesis and DissertationsPerseus Digital LibraryPhysNetOxford Text ArchiveLibrary of Congress -- American MemoryCogPrintsHumboldt UniversityMIT ThesisLinguistic Data ConsortiumResource Discovery Network… and more
Resource Discovery:Principles
Decide what you want to be visible to which search services (Site home page? Specific items?)
Adopt standard metadata (e.g., DC) for cross-domain visibility of resources
Develop context-specific metadata to meet collection requirements
Design/adopt a metadata model that allows for graceful co-existence of multiple metadata sets
Express or expose metadata in syntax that promotes interoperability (e.g., XML, RDF)
Functional Views of Metadata
Resource Discovery Resource Presentation and
Navigation Rights Management Preservation
Structural Metadata
Facilitates Direct access to key points in objects Browsing objects Navigation (e.g., turning pages) Identification of relationships (e.g., parent/child) Access to different formats (e.g., TIFF, GIF, PDF)
Where is it? ASCII text files in directories Relational databases Embedded in documents or surrogates (e.g. XML,
SGML)
Structural metadata can be a byproduct of data management
atlantic
V0001 v0002
i0001 i0002
v0003
0001.tif 0002.tif 0003.tif
Level 1(journals)
Level 2(volumes)
Level 3(issues)
Level 4(articles)
harpurs
RelationalDatabase
atlantic (dir) V0002 (dir) I0001 (dir) 0001.tif 0002.tif
File System
Structure via Document Encoding
Current trend is to use mark-up languages to encode the structure of document objects
SGML Text Encoding Initiative (TEI/TEI-Lite DTDs) Memory of World DTD for rare library materials Encoded Archival Description (EAD)
XML Goettingen Digitization Center (XML and RDF) Making of America II (archival object DTD, plus EAD) METS (under development, support by DLF)
MOA2 DTD: Structural “Binding” of Images
<StructMap><div N='1' TYPE='Book' LABEL='Diary of Patrick Breen one of the Donner Party 1846-57…><fptr FILEID='HRJ1' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ1' MIMETYPE='image/jpeg' /><fptr FILEID='LRG1' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='titlepage' />
<div N='1' TYPE='Entry' LABEL='Friday Nov. 20th 1846 [Page 1]'><fptr FILEID='HRJ2' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ2' MIMETYPE='image/jpeg' /><fptr FILEID='LRG2' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry1'/></div><div N='2‘ TYPE='Entry‘ LABEL='Entry sat. 21st [Page 2]'><fptr FILEID='HRJ3' MIMETYPE='image/jpeg' /><fptr FILEID='LRJ3' MIMETYPE='image/jpeg' /><fptr FILEID='LRG3' MIMETYPE='image/gif' /><fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry2'/></div>
Source: sunsite.berkeley.edu/moa2
Step 1
Text file (ASCII)Image file (TIFF)
Enabling Fine-grained Access to Images
….
Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about
December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive
wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning
….
transcribe
Step 2
<div id=‘entry11’> Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about </div>
<div id=‘entry12’> December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive</div>
<div id=‘entry13’> wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning</div>
Text file Encoded text file
mark-up
Enabling Fine-grained Access to Images
….
Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about
December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive
wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning
….
Step 3
Title Page Entry 1 Entry 2 Entry 3 Entry 4 Entry 5 … …
Encoded text file File viewed in browser
parseand
render
Enabling Fine-grained Access to Images
<div id=‘entry11’> Monday 30th Snowing fast wind W about 4 or 5 feet deep, no drifts looks as likely to continue as when it commenced no liveing thing without wings can get about </div>
<div id=‘entry12’> December 1st Tuesday Still snowing wind W snow about 5 1/2 feet or 6 deep difficult to get wood no going from the house completely housed up looks as likely for snow as when it commenced, our cattle all killed but three or four [of] them, the horses & Stantons mules gone & cattle suppose lost in the Snow no hopes of finding them alive</div>
<div id=‘entry13’> wedns. 2nd. Continues to snow wind W sun shineing hazily thro the clouds dont snow quite as fast as it has done snow must be over six feet deep bad file this morning</div>
EAD: Encoded Archival Description
DTD for SGML mark-up of descriptive finding aids (e.g., inventories, registers, indexes, and guides)
Provides more detail about a collection than in typical catalog record
Facilitates access - “drill down” into collection Potential international standard Maintained jointly by Library of Congress and
Society of American Archivists (SAA)
Presentation and Navigation:Principles
Decide how fine-grained you want the access experience to be
Determine the cost-benefit of creating this amount of structural metadata
Design/adopt a model (esp. DTD/Schema) that can be shared
Be prepared to express in XML, since it is poised to become standard on Web
Functional Views of Metadata
Resource Discovery Resource Presentation and
Navigation Rights Management Preservation
Rights and Security Metadata
Facilitates Access control Protection of intellectual property rights Transactions (e-commerce) Security (protect materials from attack) Monitoring
Digital Library Federation (DLF) Requirements
Must account for perspectives of publishers, intermediaries, users
Must not compromise privacy of users Must accommodate ambiguity, as found in
copyright (e.g., fair use) Metadata relationships
Descriptive (about objects) User profiles Rights declarations (Policies)
Expressing Policies for Automated Enforcement
Rights Metadata efforts (XML/RDF oriented) <indecs> Digital Object Identifier (DOI)
Policy Language initiatives Extensible Rights Markup Language (XrML)
(www.xrml.org) KeyNote (www.crypto.com/trustmgt/kn.html) Cornell’s PSLang (Language-based security)
Modeling the “rights” problem:<indecs>
Supported by copyright societies, publishers, recording industry
Fundamental Entities are modeled Creation Person Agreement
Inter-relationship of descriptive and rights metadata
Event-oriented (time and transactions) Model will be expressed in RDF Schema
<indecs>Sample Encoding for Rights Metadata
[EventIdentifier=License No 12345][EventType=Agreement]
[Person=John Smith] [Role=GranterOfRight][Person=Bill Brown] [Role=Grantee][Event=Event No 11111 [Role=Permitted Act]…
[EventIdentifier=11111][EventType = Usage Event]
[Person=Bill Brown] [Role=Downloader][Manifestation=TextFile1 “Make Money…”…
Source: http://www.indecs.org/pdf/model3.pdf
Functional Views of Metadata
Resource Discovery Resource Presentation and
Navigation Rights Management Preservation
Preservation Metadata
Image File Attributes:• formats • versions • compression
Image Attributes:• resolution• bit depth• orientation
Process Data:• creation date/time• equipment used
Rights Data:•Expiration dates•Copyright info•source statements
Descriptive Data:• author• title• publish date
Structure Data:• pagination• sub-groups
Electronic Records CommunityPerspective
Metadata requirements for preserving evidence
Six-layer metadata model Unique identifier Resource discovery metadata Data structure Terms and conditions Provenance information
Source: Pittsburgh Project, www.lis.pitt.edu/~nhprc
Unique Identifiers
Globally unique names (e.g., URN specification)
Name is permanent, location changes Resolution services to locate the object Implementations: PURL, Handles, DOI Can create your own local resolution system
cnri.dlib/april97-payetteNamingAuthority
ItemName
UniqueIdentifier:
URL: http://www.somewebserver.org/somedirectory/somefile
Conceptual Model
DescriptiveView
StructuralView
TechnicalView
RightsViewPreservation
Metadata View
A model for preservation should accommodate different metadata views
Model Projectsfor Preservation Metadata
Cedars (UK) Developed extensive preservation metadata set Evaluated all major initiatives, and influenced by
RLG and Pittsburgh work Using OAIS model for distributed archives http://www.leeds.ac.uk/cedars/metadata.html
National Library Australia Metadata to manage collections, objects, files Desired output of a metadata system http://www.nla.gov.au/preserve/pmeta.html http://www.nla.gov.au/padi
Wrap Up:Questions for setting metadata requirements
How will users locate digital image objects?
How will users interact with digital image objects or collections?
What policies are necessary to protect rights and provide access controls
How will the program assure permanence of digital materials?
Wrap Up… Best Practices for Metadata
Well-conceived data models Understand functional requirements metadata will support Modularity in design (provides flexibility and extensibility) Prevent data anomalies (remember DC example?)
Well-structured metadata (machine-interpretable) Express or expose metadata using standard syntax
(e.g., XML) Define “standard” community semantics and rules
(e.g., XML Schema, DTDs) Anticipate need to interoperate
Exposure of metadata in standard syntax and semantics