findings from the mellon metadata harvesting initiative martin halbert, joanne kaczmarek, and kat...

23
Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Upload: theodore-gayman

Post on 28-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Findings from the Mellon Metadata Harvesting

Initiative

Martin Halbert,

Joanne Kaczmarek, and Kat Hagedorn

Monday 18-Aug-2003

ECDL 2003

Page 2: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 2

Overview

• Highlights of the Mellon projects• Findings regarding metadata harvesting• Questions about the context of metadata and

metadata harvesting• Next steps, subsequent research projects

Page 3: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Highlights of the Projects

Page 4: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 4

Andrew W. Mellon Foundation

• Mellon is a major U.S. private philanthropic foundation that has been involved with the OAI-PMH from the beginning

• Sought to foster projects exploring how the OAI-PMH could be used by libraries and other organizations supporting research to make metadata concerning scholarly collections more visible to users

• Funded seven projects in 2001 with total of US $1.5M

Page 5: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 5

Seven Projects

1. University of Illinois at Urbana-Champaign

2. The University of Michigan (OAIster)

3. Emory University (MetaArchive)

4. SOLINET / ASERL (AmericanSouth)

5. The Research Libraries Group (RLG)

6. University of Virginia

7. (Woodrow Wilson International Center for Scholars at the Smithsonian)

Page 6: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 6

Highlights of Projects

• OAIster and UIUC Repository harvested millions of records and developed sophisticated search tools

• Emory and SOLINET MetaScholar projects harvested focused collections, enhanced existing OSS harvesting tools, formed teams of scholars and librarians to study the process and context of metadata harvesting for research portals

• Other projects examined internal uses of OAI-PMH for cultural scholarship

Page 7: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Findings ConcerningMetadata Harvesting

Page 8: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 8

Metadata Harvesting Findings:Slow Adoption of the OAI-PMH

• Most institutions with cultural materials collections have not yet implemented the protocol in the 2002-2003 period

• This is due to many reasons: lack of institutional priority, insufficient technical staff, little organizational understanding of the benefits of the protocol

• However, both Emory and Illinois found that centralized regional centers providing relatively modest OAI technical expertise to other libraries was very effective in fostering adoption of the protocol

Page 9: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 9

Metadata Harvesting Findings:Problems with Institutional Metadata

• Wide variations in implementation of Unqualified Dublin Core (UDC) descriptive metadata elements

• Duplication of records between collaborating institutions, difficult to de-dupe due to lack of unique inter-institutional identifiers

• Format incompatibilities/collisions, especially between Encoded Archival Descriptions (EAD) and UDC record perspectives

• Inconsistent access restrictions to content leads to confusion by users

Page 10: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 10

Metadata Harvesting Findings:Problems with Inst. Metadata (cont.)

• No controlled vocabulary in effect for any UDC field, nor would this make sense for most fields

• Although universal systems such as US Library of Congress Subject Headings (LCSH) exist, they are not granular enough for most repositories

• No uniform mechanism in place to express dates or locations (coverage), which can mean many things in UDC, and no authority control for creator field

• 96% of institutional repositories using Eprints software do not use standard controlled vocabularies

Page 11: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 11

Metadata Harvesting Findings:Need for Metadata Gardening

• The best way to make metadata effective cross-institutionally is to coordinate the entire life cycle of metadata production

• Uncoordinated harvesting is relatively easy to do, but the resulting metadata aggregation then suffers from all the problems previously described and needs remediation (which may be effectively impossible)

Page 12: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 12

Metadata Harvesting Findings:Need for Metadata Gardening (cont.)

• Coordinated gardening of metadata is the long-standing solution to this problem

• Examples include virtually any community of information users that have come up with consistent standards for the metadata they share

• The problem is that new information communities are still forming, having been enabled by the OAI-PMH

• Mature information communities are mature precisely because they have well-understood standards and practice in using and sharing information

Page 13: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Findings ConcerningMetadata Context

Page 14: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 14

Metadata Context

• Metadata without a context is useless, much like encrypted information without the key

• Metadata is considered useful precisely because it is created in particular contexts by particular communities

• OAI-PMH only prescribes UDC format • UDC is some context, and is (probably?) better than

nothing, but many groups inaccurately thought that it was enough context to build robust discovery systems around

Page 15: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 15

Metadata Context Findings:Recovering Context

• Different opinions among the projects over how to recover context for aggregated heterogeneous metadata

• OAIster made some efforts to normalize some UDC metadata fields after harvesting (UDC type field)

• Illinois developed mechanism for displaying original EAD context of records disaggregated from finding aid series information

• Emory/SOLINET AmericanSouth has a team of nationally renowned scholars studying how online scholarship can contextualize metadata and vice versa

Page 16: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 16

Metadata Context Findings:Harvesters vs. other Discovery

Systems• How do we understand harvesters vs. online

catalogs, Google, and commercial databases?• How do we articulate the difference to users?• What information should we aggregate and make

searchable? Metadata and crawled web content? Very different information realms need to be bridged through new federated search mechanisms

Page 17: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Next Steps andSubsequent Research

Page 18: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 18

Next Steps for Emory, Michigan, and Illinois

• All of these projects learned a great deal during the Mellon Metadata Harvesting Initiative that has informed their subsequent planning for new services

• All of these projects are in the process of being mainstreamed using various strategies

• All of these projects continue to grapple with metadata quality and context issues

Page 19: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 19

Next Steps: Illinois

• Additional research is being undertaken on the integration of EAD and OAI

• Beginning a three year collaboration with the research libraries of other Committee on Institutional Cooperation (CIC) institutions to study the potential of OAI-PMH to facilitate resource sharing

• NSF grant to develop digital libraries for scientific communities in connection with National Science Digital Library (NSDL)

• Institute for Museum and Library Services (IMLS) grant to develop an OAI-based registry of IMLS projects

Page 20: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 20

Next Steps: Michigan

• Working on further techniques for metadata remediation– De-duplication

– Normalization of more UDC fields

– Further tailoring of metadata for research purposes

• Exploring use of OAIster in connection with campus courseware initatives

Page 21: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 21

Next Steps: Emory

• Undertaking further modeling of scholarly portals based on metadata harvesting, with application to an international Irish Literature portal

• New grant from the Mellon Foundation to build on previous projects– Experiments in semantic clustering of metadata using

support vector machines– Exploration of combining metadata harvesting and web

crawling– Developing frameworks for federating loosely-coupled

digital library components

Page 22: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 22

Appreciation

• Enormous thanks go to the Andrew W. Mellon Foundation for advancing the understanding of metadata harvesting applications through these projects

• Mellon continues to be a driving force in the United States and internationally for research into digital library experiments benefiting scholarly communication

Page 23: Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

ECDL 2003 – Trondheim, Norway Mellon Metadata Initiative – Slide 23

Contacts

• Martin Halbert ([email protected]) 404-727-2204

• Kat Hagedorn ([email protected])

• Joanne Kaczmarek ([email protected])