the open archives initiative protocol for metadata harvesting and the imls digital collections &...
TRANSCRIPT
The Open Archives Initiative Protocol for Metadata Harvesting and the IMLS Digital Collections & Content
Project at the University of Illinois
Timothy W. Cole ([email protected])Mathematics Librarian & Professor of Library AdministrationUniversity of Illinois at Urbana-Champaign
Friday 12 November 2004MCN 2004, Minneapolis, MN
http://imlsdcc.grainger.uiuc.edu/Cole_MCN2004_OAI.ppt
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 2
The Digital Information Landscape
The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaus…. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses…. The landscape is, however, multidimensional. Where one scholar may see a peak another may see a trough. The task is to devise mapping conventions which enable scholars to read the map of the landscape fruitfully, at the appropriate level of generality or specificity.
Michael Heaney (2000), “An Analytical Model of Collections and their Catalogues.”
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 3
Users & Uses of Digital Libraries
From Bibusages study (French National Library): Digital Libraries are used in conjunction with Web search
engines, generalist portals, commercial sites Mix of intensive & casual users DL users skew somewhat older, higher degree level than
average French Internet user population DL users seeking answer for specific information need;
most time spent discovering, viewing, & downloading documents
“Digital Libraries … are now attracting a new type of public, bringing about new, unique and original ways for reading and understanding texts.”Houssem Assadi, et al. “Users & Uses of Online Digital Libraries in France,” ECDL 2003
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 4
Managing Digital Collections & Content
How do mandates translate & change in digital world? Content & collections as virtual ‘information landscapes’ New users, uses, & metrics Increased emphasis on interoperability & sharing
New models for sharing & resource discovery Harvesting – e.g., OAI-PMH Federated searching – e.g., Z39.50 / ZNG, DiGIR, ...
New Emphasis on ‘Shareable’ metadata Reconciling different descriptive metadata practices New metrics for metadata quality (for interoperability)
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 5
IMLS Digital Library Forum (2001)
Framework of Guidance for Building Good Digital Collectionshttp://www.niso.org/framework/forumframework.html
Stresses reusability, persistence, interoperability, verification, and documentation of digital collections & content
Accompanying report included recommendations encouraging: Creation of an IMLS Collection Registry Implementation of the Open Archives Initiative Protocol for
Metadata Harvesting by IMLS projects creating digital content Development of infrastructure to facilitate interoperability
between IMLS projects and initiatives like NSDL
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 6
IMLS DCC Project Overview
Collection description & prototype registry for IMLS National Leadership Grant projects with associated digital content
Enhance discoverability of collections & content Provide alternative view of one output of IMLS NLG
program
Prototype item level metadata repository via OAI-PMH Demonstrate potential of metadata for interoperability Serve as testbed for IMLS projects interested in OAI-PMH Facilitate reuse of information resources paid for by IMLS
Research question:How can resource developers best represent collections and itemsto meet the needs of service providers and end users?
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 7
IMLS Grantees – A Diverse Community
Mix of library, museum, and archive traditions Wide variation in technical skills, technology
infrastructure & information management policy Diverse perspectives on intellectual property; use and
presentation of metadata & primary resources Diverse embedded knowledge structures
Results in wide variability in: Metadata formats Content resource types Controlled vocabularies Descriptive metadata practices
Broad Categories of InstitutionsRepresented in Collection Registry
Institutions in IM LS Collection Registry by Category(349 institutions from 134 collections / 92 NLG projects)
Libraries41%
M useums36%
Archives3%
Specimen Holding
3%
Other17%
Detailed Institution TypesRepresented in Collection Registry
Types of Institutions in IMLS Collection Registry(349 institutions from 134 collections from 92 NLG projects)
69
58
48
39
24
15 13 12 12 11 10 8 6 6 4 3 3 2 1 1 1 1 10
10
20
30
40
50
60
70
80
Aca
d.
Lib
.
His
tori
cal
So
c.
Pu
bli
c L
ib.
Oth
er
His
tory
Mu
s.
Gen
eral
Mu
s.
Oth
er H
igh
erE
d
Sp
ec.
Mu
s.
Sta
te L
ib.
Res
earc
hL
ib./
Arc
hiv
es
Art
Mu
seu
m
K-1
2 S
cho
ol
Lib
. C
on
s.
Nat
.His
. M
us.
Sci
ence
Mu
s.
Bo
t. G
ard
en /
Her
bar
ium
Sp
ec.
lib
rary
His
tori
c S
ite
Arb
ore
tum
Mu
seu
m L
ib.
Pri
vate
Lib
.
Sch
oo
l L
ib.
Sta
te M
us.
Institution Types
Nu
mb
ers
Broad Categories of InstitutionsRepresented in Metadata Repository
Institutions Represented in M etadata Repository(136 institutions--27 harvested collections/193,677 metadata records)
Libraries42%
Museums37%
Archives4%
Specimen Holding4%
Other13%
Detailed Institution TypesRepresented in Metadata Repository
Types of Institutions Represented in Item Level Metadata Repository(136 institutions -- 27 harvested collections/193,677 metadata records)
34
16 16 14
8 6 6 6 5 4 4 3 3 2 2 2 1 1 1 1 1 0 0 005
10152025303540
Institution Types
Nu
mb
er o
f In
stit
uti
on
s
Metadata Formats
Metadata Formats in Use
0
0
10 (13%)38 (49%)
48 (62%)
1 (1%)11 (14%)
12 (16%)
4 (5%)24 (31%)
28 (36%)
16 (21%)16 (21%)
21 (27%)8 (10%)
14 (18%)10 (13%)
4 (5%)
2 (3%)2 (3%)
0 10 20 30 40 50 60
Dublin Core onlyDublin Core in combination with
Dublin Core Total
EAD onlyEAD in combination with other
EAD total
MARC onlyMARC in combination with other
MARC total
TEI onlyTEI in combination with other
TEI total
VRA Core OnlyVRA Core in combination with
VRA Core total
Other Metadata Standard OnlyOther Metadata Standard in
Other Metadata Standard Total
Locally Developed Metadata OnlyLocally Developed Metadata in
Locally Developed Metadata Total
Number of Respondents
Types of Resources
Type of Material in Digital Collection
0
8 (9%)71 (80%)
79 (89%)
3 (3%)69 (78%)
72 (81%)
2 (2%)24 (27%)
26 (29%)
1 (1%)15 (17%)16 (18%)
1 (1%)13 (15%)14 (16%)
12 (13%)12 (13%)
0 10 20 30 40 50 60 70 80 90
Images OnlyImages in combination
Total Images
Text OnlyText in combination
Total Text
Sound OnlySound in combination
Total Sound
Interactive Resource OnlyInteractive Resource in
Total Interactive Resource
Moving Image OnlyMoving Image in
Total Moving Image
Other OnlyOther in combination
Total Other
Ty
pe
of
Ma
teri
al
Number of Respondents
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 14
Controlled Vocabularies
Element Top three used Controlled Vocabulary (% of respondents who identified a controlled vocabulary)
Subject LCSH (50%); LC TGM I (19%); AAT (13%)
Format LC TGM I I (7%); AAT (7%); MIME types (4%)
Type DCMI Type (8%); LC TGM I I (7%); AACR2 (7%)
Personal names
LC Name Authority File (47%)
Geographic names
LC Name Authority File (18%); LCSH (15%); Getty Thesaurus of Geographic Names (10%)
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 15
Descriptive Practice
Different traditions regarding Inclusion of interpretive information Granularity of description Presentation of information resources
Shared problems / issues How to provide context & collection description What exactly to describe Which metadata scheme(s) to use
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 16
Illustration – Coverlets (1 of 2)
Description: Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.
Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.
Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.
Coverage: —
Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?
Type: Image
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 17
Illustration – Coverlets (2 of 2)
Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern. Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973
Source: —
Format: 228 x 169 x 1.2 cm (1,629 g)
Coverage: Euro-American; America, North; United States; Indiana? Illinois?
Date: Early 19th c. CE
Type: cultural; physical object; original
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 18
OAI Protocol for Metadata Harvesting
‘Harvesting’ approachto interoperabilityat metadata level
Divides world intoMetadata Providers& Service Providers
Builds on HTTP,XML, & Community Metadata Standards
Metadata Harvesting Model
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 20
How OAI-PMH Works
OAI “VERBS”
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
HARVESTER
REPOSITORY
OAI OAI
Service Provider Metadata Provider
HTTP Request
HTTP Response
(OAI Verb)
(Valid XML)
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 21
Why OAI-PMH for IMLS DCC Project
Offers low technical barrier options; primary cost is metadata e.g., OAI-PMH itself, OAI Static Repository, mod_oai
Is a cross-domain, non-proprietary approach to interoperability
Already used by NSDL, OAIster, etc. Seen as a way to bring content to attention of wider audience
37% of visits to State Library of New South Wales image collection via PictureAustralia (a OAI-PMH based portal)
Facilitates metadata & metadata services research What makes for good ‘shareable’ metadata? Contrast & compare metadata designs & workflows Explore normalization, enhancement, aggregated searching
issues
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 22
OAI-PMH Issues
Harvesting vs. federated Harvested metadata aggregation always out of date, but
Federated real-time performance dependent on weakest link Sorting, ranking, & de-dupping easier with harvesting model
Potential scale issues Largest OAI-PMH provider serves 4 million records Largest OAI-PMH service provider < 10 million records
Integration into existing metadata workflow requires some investment – cost-to-benefit ratio still unclear
Practical metadata sharing issues: Persistent identifiers, date stamps, proper application of
protocol Metadata quality, consistency, context, cross-walking, ...
Federated Searching Model
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 24
Alternative Approaches for Interoperability
Federated search models Library: NISO Z39.50 Specimen / Natural History: DiGIR More homogeneous metadata schemes, query rules
Collaborative, sometimes proprietary project portals RLG Cultural Materials ArtStor GBIF, MaNIS, ...
Generally higher technical threshold; rely on higher level of metadata homogeneity & compliance
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 25
OAI-PMH as Complement to Other Approaches
OAI-PMH provides a lowest-common-denominator approach to sharing & interoperability
Insufficient for some high-level, domain-specific applications,
But useful for sharing across more heterogeneous communities & allowing participation with less technology
Portals can exploit combination of approaches OAI-PMH metadata harvesters can normalize &
augment metadata before sharing on with domain-specific federated search portals
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 26
IMLS DCC Collection Registry (alpha)
Features:
Searchable
Browseable
An entry point foritem-level searching
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 27
IMLS DCC Metadata Repository (alpha)
Currently Harvesting: 27 Collections 193,677 Records
Ongoing analysis of metadata
Documenting practices
Potential for normalization
Implications for interface & search engine design
OAI-PMH & The IMLS DCC ProjectMCN 2004, 12 November 2004
[email protected] of Illinois at UC 28
More Information
This presentation: http://imlsdcc.grainger.uiuc.edu/Cole_MCN2004_OAI.ppt Project Website: http://imlsdcc.grainger.uiuc.edu/ Project PI: Tim Cole, [email protected] Project Coordinator: Sarah Shreeves, [email protected]
OAI-PMH resources: http://www.openarchives.org/ Online OAI-PMH tutorial: http://www.oaforum.org/tutorial/ DLF OAI-PMH & shareable metadata best practices
(under development): http://oai-best.comm.nsdl.org/