the future of marc: dead or reviving? rebecca guenther nytsl fall program nov. 4, 2011

The Future of MARC: dead or

reviving?

Rebecca GuentherNYTSL Fall Program

Nov. 4, 2011

Overview of presentation

History of MARC The current bibliographic framework

Efforts to evolve MARC XML formats Linked data explorations RDA changes

LC Bibliographic Framework Transition Initiative

What is MARC 21?

A syntax defined by an international standard for communications with 2 expressions: Classic MARC (MARC 2709) MARCXML

A data element set defined by content designation and semantics

Many data elements are defined by external content rules; a common misperception is that it is tied to AACR2

It does not specify internal storage and institutions do not store “MARC 21”

A set of 5 formats for different purposes: Bibliographic, Authority, Holdings, Classification, Community Information

The current bibliographic environment

Billions of rich descriptive records in MARC systems

Many national formats have been harmonized with MARC 21

Integrated library systems support MARC bibliographic, authority and holdings formats for different functions

Wide sharing of records for 30+ years OCLC is a major source of records MARC records are being reused (sometimes converted) and repackaged

Need to interact with descriptions in other formats/syntaxes

MARC successes

Can carry data formulated by different cataloging rules and conventions Multiple descriptive rules, different principles and models

Different subject thesauri Multiple languages and scripts

Cooperation in record exchange has resulted in widespread use and cost savings

Richness of MARC records supports multifaceted retrieval Coded data Parsed data

Problems with MARC

MARC 2709 syntax problems Limitation of available fields, subfields, indicator values, etc.

Redundant data (fixed vs. variable fields)

The longevity of the format complicates reuse of data tags; redundancies have built up over time

Ability to link is limited Lack of explicit hierarchical levels

Efforts to streamline MARC 21

Take advantage of XML Increasingly use MARC 21 in an XML structure

Take advantage of freely available XML tools

Develop simpler (but compatible) alternatives MODS and MADS

Allow for interoperability with different XML metadata schemas Assemble coordinated set of tools

MARCXML

MARCXML uses the MARC data element set in an XML syntax

Lossless roundtrip conversions Simple flexible XML schema, no need to change when MARC 21 changes

Continuity with current data and flexible transition options

Problems with limitations in tagging persist

http://lccn.loc.gov/2004012412/marcxml

MARC derivatives: MODS and MADS

Attempts to deal with MARC limitations Eliminates some of the problems with MARC (e.g. lack of tags/subfield codes)

More user-friendly (uses language tags) Repackages redundant data elements into one Can carry hierarchical data Less tied to cataloging rules Highly compatible with MARC but simpler, although retaining some richness

Widely implemented especially for digital projects

Governed by Editorial Committee

Example: http://lccn.loc.gov/2004012412/mods

Related XML schemas: METS

METS A container/information package Wrapper for MARCXML and MODS descriptions Allows for additional technical and preservation metadata

Enables tracking of actions on the metadata itself

Many use METS as a framework for digital libraries and their metadata

Particularly useful for complex digital objects

Allows for reuse of rich descriptions

Experimentation with “Linked data”

Library of Congress Authorities & Vocabularies service: http://id.loc.gov

Allows both human-oriented and programmatic access to LC authorities and vocabularies

Actionable URIs associated with concepts First offering was Library of Congress Subject

Headings, then Names, MARC code lists, Thesaurus of Graphic Materials, ISO 639-2, PREMIS vocabularies

Advantages Facilitate development and maintenance process for

vocabularies Expose vocabularies to wider communities Experiment with Linked Data Offer bulk downloads

Example:http://id.loc.gov/authorities/sh85049843

Experimentation with Linked Data

MADS in RDF MODS in RDF Linking vocabularies in id.loc.gov with other external vocabularies

PREMIS OWL ontology Integration between ontologies and controlled vocabularies becomes possible

MARC Changes for RDA

MARC community made many changes to accommodate RDA

In some cases RDA was more granular than MARC and data elements had to be examined as to whether such detail was needed

Limitations in number of fields/subfields prevented complete crosswalking

Need for additional experimentation to determine what needs to be accommodated

http://www.loc.gov/marc/RDAinMARC29-9-12-11.html

Challenges in adapting MARC for RDA

RDA was changing as MARC was revised Not all MARC users will be using RDA Continuity with current data is important

Not all RDA users will use the increased granularity– tension between simpler vs more complex

Impact of FRBR Financial constraints of too much change and scarce resources

Specific RDA changes

RDA Content, Media, Carrier Fields 336, 337, 338 Controlled vocabularies—codes or text

Carrier characteristics Additional values in 008 New subfields in 340 New fields for sound, video, digital

Authority changes

Attributes of Names and Resources Changes to Authority format for additional metadata about persons, families, organizations

New fields for date, place, address, field of activity, occupation, gender, family information

Changes to Authority format for uniform titles (works or expressions) New fields for date, content type, language, form of work, medium of performance, key

All elements for works/expressions also added to bibliographic

Other RDA changes

Relationships between resources Name to resource (RDA App I) Resource to resource (RDA App J) Name to name (RDA App K) Uses MARC relators or subfield $i

Production, publication, distribution Field 264 with designation of function in indicator

URIs in MARC records Links to resources

Field 856 for link to resource or related resource

URIs available in numerous fields as a link to additional information, e.g. 505, 506, 583

Links to values Controlled vocabulary values may be identified by a URI

id.loc.gov RDA vocabularies bring established with URIs in Open Metadata Registry http://rdvocab.info/

URIs in MARC records Do URIs need their own data element or are they self-identifying?

Data elements where needed Code lists (relators, countries, GACs, orgs)

RDA controlled vocabs (e.g. 336,337, 338) Fields with controlled lists (with $2) Headings

Approach (experimental) Use same subfield where data is now Both URI and textual data?

Results of the RDA test

Feeling that MARC structure doesn’t allow for taking full advantage of RDA

Not all RDA data elements have a distinct place in MARC

RDA is element based; MARC groups elements that can’t live independently

Concerns whether MARC can interoperate with other metadata in a semantic web world

Limitations for showing relationships between entities and applying FRBR model

Evolving the bibligraphic framework: issues to consider

Actionable vs. descriptive data Parsed vs. text Controlled/access vs. transcribed Codes vs. words Library vs. non-library traditions My model vs. your model Stability vs. change Basic retrieval vs. scholar retrieval

Cost of change

Bibliographic Framework Transition Initiative

Rethinking bibliographic control because of technological and environmental changes

Content and packaging of RDA suggest that a different carrier is needed to fully exploit it

Reevaluate use of scarce resources and provide efficiencies in creating and sharing bibliographic metadata

Analyze present and future environment Identify components of the bibliographic

framework to support users Plan for an evolution to a future framework

Issues to be addressed

Determine aspects of MARC that should be retained

Experiment with Semantic Web and Linked Data technologies

Foster reuse of existing rich metadata Allow for navigating relationships among entities

Explore risks of action and inaction and pace of change

Plan for migrating existing metadata into a new infrastructure

Components of a new bibliographic framework

Based on Working Group on Bibliographic Control and RDA test

Continue to support MARC during the transition and as long as is needed

Broaden participation in a network of resources and be able to link patrons to all kinds of resources

Follow an open and transparent process

Requirements

Broad accommodation of content rules and data models

Provide for types of data that accompany or support bibliographic data, e.g. holdings, preservation

Accommodate textual and linked data with URIs

Reconsider the relationship between internal storage, displays and input screens

Requirements

Consider all sizes and types of libraries

Continue maintenance of MARC until no longer necessary; minimize changes to only those needed for RDA

Compatibility with existing records Provide transformations from MARC 21 to the new environment to enable experimentation

General approach

Focus on the Web environment, Linked Data and RDF

Integrate library data and other cultural heritage data on the Web

Use of triplestores to provide more options for storing and retrieving data

Allow the library environment to become more readily understandable by data creators and software developers

Explorations

Develop interaction scenarios in the broader information community

Develop use cases to scope its boundaries and interdependence with other initiatives, e.g. PREMIS, METS

Develop ontologies for the description of resources

Experiment collaboratively with new models

Use existing partners for prototyping

Collaborations

Close contact with MARC format partner institutions (national libraries)

Review and comment from MARC advisory bodies (e.g. MARBI)

Prototyping by networks and vendors

Input on modeling with general resource description community

Timetable and next steps

Provide funding through a 2-year grant

Organize consultative groups and prototyping activities

Develop models and scenarios Assemble and review ontologies Few real details on time frame

Community input

Individuals and institutions can recommend members to serve on the advisory or technical committee

Join and post thoughts to the bibliographic transition listserv ([email protected])

Comments will be publicly available

Likely characteristics of post-MARC

Web and linked data based High level simple core ontologies Modularized format that allows for extensions

Application builders can pick from ontologies and extensions

There should be a way to keep all elements of MARC

MARC to post-MARC could be lossy Agnostic to cataloging rules Ability to output in various syntaxes

Conclusions MARC 21 has served the community well for wide sharing of bibliographic metadata

Much effort will go into the new initiative There are widely differing views More questions than answers remain

How much of MARC will be retained? Will the new format look like MODS, a derivative, or will it be completely new?

How will supporting data be accommodated? How will systems change? How long will it take?

Thank you!

Rebecca [email protected]://www.meetyourdata.com

the future of marc: dead or reviving? rebecca guenther nytsl fall program nov. 4, 2011

Documents

classic marc marc

marc bibliographic

marc changes

marc limitations

marc data element

data slide

marc systems

marc users