the future of marc: dead or reviving? rebecca guenther nytsl fall program nov. 4, 2011
TRANSCRIPT
The Future of MARC: dead or
reviving?
Rebecca GuentherNYTSL Fall Program
Nov. 4, 2011
Overview of presentation
History of MARC The current bibliographic framework
Efforts to evolve MARC XML formats Linked data explorations RDA changes
LC Bibliographic Framework Transition Initiative
What is MARC 21?
A syntax defined by an international standard for communications with 2 expressions: Classic MARC (MARC 2709) MARCXML
A data element set defined by content designation and semantics
Many data elements are defined by external content rules; a common misperception is that it is tied to AACR2
It does not specify internal storage and institutions do not store “MARC 21”
A set of 5 formats for different purposes: Bibliographic, Authority, Holdings, Classification, Community Information
The current bibliographic environment
Billions of rich descriptive records in MARC systems
Many national formats have been harmonized with MARC 21
Integrated library systems support MARC bibliographic, authority and holdings formats for different functions
Wide sharing of records for 30+ years OCLC is a major source of records MARC records are being reused (sometimes converted) and repackaged
Need to interact with descriptions in other formats/syntaxes
MARC successes
Can carry data formulated by different cataloging rules and conventions Multiple descriptive rules, different principles and models
Different subject thesauri Multiple languages and scripts
Cooperation in record exchange has resulted in widespread use and cost savings
Richness of MARC records supports multifaceted retrieval Coded data Parsed data
Problems with MARC
MARC 2709 syntax problems Limitation of available fields, subfields, indicator values, etc.
Redundant data (fixed vs. variable fields)
The longevity of the format complicates reuse of data tags; redundancies have built up over time
Ability to link is limited Lack of explicit hierarchical levels
Efforts to streamline MARC 21
Take advantage of XML Increasingly use MARC 21 in an XML structure
Take advantage of freely available XML tools
Develop simpler (but compatible) alternatives MODS and MADS
Allow for interoperability with different XML metadata schemas Assemble coordinated set of tools
MARCXML
MARCXML uses the MARC data element set in an XML syntax
Lossless roundtrip conversions Simple flexible XML schema, no need to change when MARC 21 changes
Continuity with current data and flexible transition options
Problems with limitations in tagging persist
http://lccn.loc.gov/2004012412/marcxml
MARC derivatives: MODS and MADS
Attempts to deal with MARC limitations Eliminates some of the problems with MARC (e.g. lack of tags/subfield codes)
More user-friendly (uses language tags) Repackages redundant data elements into one Can carry hierarchical data Less tied to cataloging rules Highly compatible with MARC but simpler, although retaining some richness
Widely implemented especially for digital projects
Governed by Editorial Committee
Example: http://lccn.loc.gov/2004012412/mods
Related XML schemas: METS
METS A container/information package Wrapper for MARCXML and MODS descriptions Allows for additional technical and preservation metadata
Enables tracking of actions on the metadata itself
Many use METS as a framework for digital libraries and their metadata
Particularly useful for complex digital objects
Allows for reuse of rich descriptions
Experimentation with “Linked data”
Library of Congress Authorities & Vocabularies service: http://id.loc.gov
Allows both human-oriented and programmatic access to LC authorities and vocabularies
Actionable URIs associated with concepts First offering was Library of Congress Subject
Headings, then Names, MARC code lists, Thesaurus of Graphic Materials, ISO 639-2, PREMIS vocabularies
Advantages Facilitate development and maintenance process for
vocabularies Expose vocabularies to wider communities Experiment with Linked Data Offer bulk downloads
Example:http://id.loc.gov/authorities/sh85049843
Experimentation with Linked Data
MADS in RDF MODS in RDF Linking vocabularies in id.loc.gov with other external vocabularies
PREMIS OWL ontology Integration between ontologies and controlled vocabularies becomes possible
MARC Changes for RDA
MARC community made many changes to accommodate RDA
In some cases RDA was more granular than MARC and data elements had to be examined as to whether such detail was needed
Limitations in number of fields/subfields prevented complete crosswalking
Need for additional experimentation to determine what needs to be accommodated
http://www.loc.gov/marc/RDAinMARC29-9-12-11.html
Challenges in adapting MARC for RDA
RDA was changing as MARC was revised Not all MARC users will be using RDA Continuity with current data is important
Not all RDA users will use the increased granularity– tension between simpler vs more complex
Impact of FRBR Financial constraints of too much change and scarce resources
Specific RDA changes
RDA Content, Media, Carrier Fields 336, 337, 338 Controlled vocabularies—codes or text
Carrier characteristics Additional values in 008 New subfields in 340 New fields for sound, video, digital
Authority changes
Attributes of Names and Resources Changes to Authority format for additional metadata about persons, families, organizations
New fields for date, place, address, field of activity, occupation, gender, family information
Changes to Authority format for uniform titles (works or expressions) New fields for date, content type, language, form of work, medium of performance, key
All elements for works/expressions also added to bibliographic
Other RDA changes
Relationships between resources Name to resource (RDA App I) Resource to resource (RDA App J) Name to name (RDA App K) Uses MARC relators or subfield $i
Production, publication, distribution Field 264 with designation of function in indicator
URIs in MARC records Links to resources
Field 856 for link to resource or related resource
URIs available in numerous fields as a link to additional information, e.g. 505, 506, 583
Links to values Controlled vocabulary values may be identified by a URI
id.loc.gov RDA vocabularies bring established with URIs in Open Metadata Registry http://rdvocab.info/
URIs in MARC records Do URIs need their own data element or are they self-identifying?
Data elements where needed Code lists (relators, countries, GACs, orgs)
RDA controlled vocabs (e.g. 336,337, 338) Fields with controlled lists (with $2) Headings
Approach (experimental) Use same subfield where data is now Both URI and textual data?
Results of the RDA test
Feeling that MARC structure doesn’t allow for taking full advantage of RDA
Not all RDA data elements have a distinct place in MARC
RDA is element based; MARC groups elements that can’t live independently
Concerns whether MARC can interoperate with other metadata in a semantic web world
Limitations for showing relationships between entities and applying FRBR model
Evolving the bibligraphic framework: issues to consider
Actionable vs. descriptive data Parsed vs. text Controlled/access vs. transcribed Codes vs. words Library vs. non-library traditions My model vs. your model Stability vs. change Basic retrieval vs. scholar retrieval
Cost of change
Bibliographic Framework Transition Initiative
Rethinking bibliographic control because of technological and environmental changes
Content and packaging of RDA suggest that a different carrier is needed to fully exploit it
Reevaluate use of scarce resources and provide efficiencies in creating and sharing bibliographic metadata
Analyze present and future environment Identify components of the bibliographic
framework to support users Plan for an evolution to a future framework
Issues to be addressed
Determine aspects of MARC that should be retained
Experiment with Semantic Web and Linked Data technologies
Foster reuse of existing rich metadata Allow for navigating relationships among entities
Explore risks of action and inaction and pace of change
Plan for migrating existing metadata into a new infrastructure
Components of a new bibliographic framework
Based on Working Group on Bibliographic Control and RDA test
Continue to support MARC during the transition and as long as is needed
Broaden participation in a network of resources and be able to link patrons to all kinds of resources
Follow an open and transparent process
Requirements
Broad accommodation of content rules and data models
Provide for types of data that accompany or support bibliographic data, e.g. holdings, preservation
Accommodate textual and linked data with URIs
Reconsider the relationship between internal storage, displays and input screens
Requirements
Consider all sizes and types of libraries
Continue maintenance of MARC until no longer necessary; minimize changes to only those needed for RDA
Compatibility with existing records Provide transformations from MARC 21 to the new environment to enable experimentation
General approach
Focus on the Web environment, Linked Data and RDF
Integrate library data and other cultural heritage data on the Web
Use of triplestores to provide more options for storing and retrieving data
Allow the library environment to become more readily understandable by data creators and software developers
Explorations
Develop interaction scenarios in the broader information community
Develop use cases to scope its boundaries and interdependence with other initiatives, e.g. PREMIS, METS
Develop ontologies for the description of resources
Experiment collaboratively with new models
Use existing partners for prototyping
Collaborations
Close contact with MARC format partner institutions (national libraries)
Review and comment from MARC advisory bodies (e.g. MARBI)
Prototyping by networks and vendors
Input on modeling with general resource description community
Timetable and next steps
Provide funding through a 2-year grant
Organize consultative groups and prototyping activities
Develop models and scenarios Assemble and review ontologies Few real details on time frame
Community input
Individuals and institutions can recommend members to serve on the advisory or technical committee
Join and post thoughts to the bibliographic transition listserv ([email protected])
Comments will be publicly available
Likely characteristics of post-MARC
Web and linked data based High level simple core ontologies Modularized format that allows for extensions
Application builders can pick from ontologies and extensions
There should be a way to keep all elements of MARC
MARC to post-MARC could be lossy Agnostic to cataloging rules Ability to output in various syntaxes
Conclusions MARC 21 has served the community well for wide sharing of bibliographic metadata
Much effort will go into the new initiative There are widely differing views More questions than answers remain
How much of MARC will be retained? Will the new format look like MODS, a derivative, or will it be completely new?
How will supporting data be accommodated? How will systems change? How long will it take?
Thank you!
Rebecca [email protected]://www.meetyourdata.com