1 metadata 101 amy benson nelinet, inc. november 7, 2005

58
1 Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Upload: ezekiel-rothery

Post on 15-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Metadata 101

Amy Benson

NELINET, Inc.

November 7, 2005

2

Standards

Increase interoperability Lower use and participation barriers Build larger communities of users which can

drive creation of a wider range of relevant services and tools (Windows vs Mac)

Improve chances of long term survival of materials

Prefer open over proprietary

3

Categories

Metadata containers– XML, RDF

Metadata standards– MARC, MODS, DC, EAD, TEI, ONIX, FGDC, GILS

Metadata content standards Transmission standards and protocols

– METS, OAI, SOAP, Z39.50, SRW Identifiers

– URI, URL, PURL, URN, DOI, ISTC

4

Metadata - What is it?

Data about data Information about any aspect of a resource -

size, location, attributes, topic, origin, use, audience, creator, quality, access rights, reviews… the list is endless

An aid to the discovery, identification, assessment, and management of described entities

5

Types of Metadata

Descriptive– What is it?

Discovery– How can I find it?

Structural– What files comprise it?

Administrative– When was it created?

6

Types of Metadata

Identifiers– How can I get to it?

Terms & conditions– Can I use it?

Preservation– Which key characteristics of the resource need to

be maintained?

7

MARC

Advantages– Rich set of descriptive elements– Highly interoperable within library community– Long, established history

Disadvantages– Low extensibility– As is, not interoperable beyond the library world– Weak on administrative, rights, and other kinds of

metadata important for digital resources

8

MARC

Future of MARC– Must MARC die? No. New life through XML

MARC XML from the Library of Congress (LC) MODS: a version of MARC encoded in XML,

developed by the Library of Congress Crosswalks between MARC and many other

metadata schemas already exist

9

MARC XML

LC has developed a MARC XML schema, stylesheets, and tools

The schema allows representation of a complete MARC record in XML– Lossless conversion

Will support new transformations to new uses of MARC data– MARC to MARCXML to Dublin Core and MODS

10

Metadata Object Description Schema (MODS)

Set of 20 bibliographic elements - a subset of the MARC 21 Format for Bibliographic Data

Not as complete as the full MARC format, but richer than Dublin Core (for example)

Highly interoperable with existing MARC records Uses language-based tags, rather than numbers like

MARC 21 (245, 650, etc.) Under development by the LC Network Development

and MARC Standards Office

11

MODS

XML-based– Intended to work with/complement other metadata

formats

Can be used for conversion of existing MARC records or to create new resource description records

Useful particularly for library applications that want to go beyond the OPAC

Shares features of MARC and Dublin Core

12

MODS Elements

TitleInfo Name TypeOfResource Genre PublicationInfo Language PhysicalDescription Abstract TableOfContents TargetAudience

Note Cartographics Subject Classification RelatedItem Identifier Location AccessCondition Extension RecordInfo

13

MODS Elements

Title element is mandatory, all others are optional

Elements can have subelements and attributes which provide refining detail for the element

Elements and sub-elements are repeatable, except in certain cases

Elements display in any order

MODS Example

15

MODS Implementation

MODS User Guidelines– http://www.loc.gov/standards/mods/registry.html

MODS Implementation Registry Contains descriptions of MODS projects

planned, in progress, and fully implemented– http://www.loc.gov/standards/mods/registry.html

16

Dublin Core (DC)

A method of describing resources intended to facilitate the discovery of electronic resources

Designed to allow simple description of resources by non-catalogers as well as specialists

National and International standard– ANSI/NISO standard Z39.85-2001– ISO standard 15836

Includes 15 “core” elements

17

Dublin Core Elements

Title Creator Subject Description Publisher Contributor Date Type

Format Identifier Source Language Relation Coverage Rights

18

Dublin Core

All elements optional and repeatable Elements display in any order Authority control not required Simple and Qualified DC Extensible Flexible International

19

Dublin Core

Simple– Lowest common denominator– Less rich– Discovery role – leads to resource or more complete

description of resource

Qualified– More precise– Less interoperable

20

Dublin Core Examples

Generic

Title=“The sound of music” HTML

<meta name = "DC.Title" content = “The sound of music”>

XML<?xml version="1.0"?> <metadata

xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title> The Sound of Music</dc:title> </metadata>

21

Dublin Core Examples - HTML

22

Dublin Core Examples - XML

23

Other Metadata Standards

Encoded Archival Description (EAD) Text Encoding Initiative (TEI) Visual Resources Association (VRA) Global Information Locator Service (GILS) Online Information Exchange (ONIX) Content Standards for Digital Geospatial

Metadata (CSDGM) aka FGDC Document Data Initiative (DDI)

24

Crosswalks

Crosswalks map an element from one scheme to its closest equivalent in another scheme– Example: MARC 1XX field is mapped to DC ‘creator’

Instrumental for converting data in one format to another format - one that is potentially more widely accessible

Support the demand for cross-domain searching and interoperability

25

Crosswalks

There is rarely a one-to-one correlation between elements of different schemes– One to many - DC to MARC– Many to one or none - MARC to DC– None to one or many

MARC to DC– http://www.loc.gov/marc/marc2dc.html#unqualif

26

Content Standards

AACR (Anglo-American Cataloguing Rules)– “The rules cover the description of, and the

provision of access points for, all library materials commonly collected at the present time.”

– The current text is the 2nd ed, 2002 Revision (with 2003, 2004, and 2005 updates)

– The Joint Steering Committee for Revision of AACR (JSC) is working on a new code, “RDA: Resource Description and Access” scheduled to be published in 2008

27

Content Standards

International Standard Bibliographic Description (ISBD)– A family of standards to regularize the form and

content of bibliographic descriptions– Available for different material types: monographs,

computer files, etc.– Designed to promote record sharing and exchange

28

Content Standards

Describing Archives: A Content Standard (DACS)– Designed to facilitate consistent, appropriate, and

self-explanatory description of archival materials and creators of archival materials

– Replaces Archives, Personal Papers, and Manuscripts (APPM)

29

Metadata Encoding & Transmission Standard (METS)

A system for packaging metadata necessary for both the management of digital library objects within a repository and the exchange of such objects between repositories, or between repositories and their users

Used for: Digital collection repositories Developed by the Digital Library Federation

(DLF) and Library of Congress (LC)

30

Metadata Encoding & Transmission Standard (METS)

METS can be understood as a binder that unites metadata about a particular resource

A METS record includes six parts:– Header– Descriptive metadata– Administrative metadata– File groups– Structural map– Behavior section

100 Pixel GIF

800 Pixel JPG

1400 Pixel JPG

2000 Pixel JPG TIFF PDF TEI MrSid AIFF

Whole DocumentPage 1Page 2Page 3Page 4

Object Components(21 Files and counting…)

32

METS Schema

m etsHdr(M E TS

H ead er)

dm dSec(D esc rip t iveM etad a ta )

am dSec(A d m in s tra tive

M etad a ta )

fileSec(F iles )

structM ap(S tru c tu re )

behaviorSec(V iew ers )

MET S

33

Open Archives Initiative (OAI)

A tool that supports interoperability among multiple databases

OAI goal: coarse-granularity resource discovery

OAI handles simple discovery from multiple community-specific repositories with metadata crosswalked to unqualified Dublin Core

34

OAI

Roots are in the science community interested in locating and searching multiple repositories of pre- and e-prints of scientific papers

Not really an archive, the way we traditionally think of the word

35

OAI

Data providers expose (make available) the metadata for their collections

Service providers harvest the exposed metadata and aggregate it (so that one search does it all) and/or provide additional services related to the harvested metadata, such as providing easy access to recent additions, updated materials, pre-set searches, etc.

36

OAI

OAI Protocol for Metadata Harvesting– Metadata content must be encoded in XML and

have a corresponding XML schema for validation– Metadata must be supplied in unqualified Dublin

Core format, at least– Other metadata formats are optional– Metadata may optionally include a link to the actual

content / resource

OAI Infrastructure

repository

repository

repository

repository

Harvester

Service Provider

DC

DC

DC

DC

DC

OAI Infrastructure

user

Repository

search

OAI Infrastructure

user

Repository

search

repository

40

Z39.50

Z39.50 is a search and retrieval protocol, maintained by LC, capable of operating over TCP/IP

Negotiates queries with multiple, separate databases – does not harvest + create new db

Built in to some library software systems OAI not intended to replace other approaches, but

to provide an easy-to-use alternative for different constituencies and purposes

41

Search/Retrieve Web Service

The primary function of SRW is to allow a user to search remote databases of records

Protocol uses easily available technologies -- XML, SOAP, HTTP, URI -- to perform tasks traditionally done using proprietary solutions such as database queries and responses

Builds on Z39.50 and moves it forward– ZING: Z39.50 International: Next Generation

42

Functional Requirements for Bibliographic Records (FRBR)

A study by IFLA (International Federation of Library Associations) of the full range of functions performed by the bibliographic record– What do we use bibliographic records for?

Description, access, location, identification, annotations ...

The report provides a framework for the nature of and uses for bibliographic records

A conceptual model that can be used as a means to meet user needs and expectations

43

Functional Requirements for Bibliographic Records (FRBR)

Tasks we use bibliographic records for:– Finding– Identifying– Selecting– Obtaining access to resources

FRBR should allow systems to handle bibliographic data in new, useful ways that fulfill these tasks

44

Functional Requirements for Bibliographic Records (FRBR)

Conceptual model of relationships between bibliographic entities

Hierarchical relationships– Work

The intellectual product

– Expression An ‘expression’ of the parent work such as a translation,

edition, revisions, annotated text, etc. – Expressions entail additional intellectual effort

45

Functional Requirements for Bibliographic Records (FRBR)

Hierarchical relationships– Manifestation

Published runs of each expression in multiple formats over time

The level at which we traditionally create a catalog record

– Item Each copy of a specific manifestation Circulation records track items

47

Functional Requirements for Bibliographic Records (FRBR)

OCLC is researching the application of FRBR to WorldCat– “FRBRization”

They have created an algorithm that groups records automatically based on the Work/Expression/Manifestation/Item model

http://www.oclc.org/research/projects/frbr/algorithm.htm

48

Identifiers

Four potential purposes– Locator

Where is the document I seek?

– Identifier Unique label for a resource

– Gatherers Groups like resources similar to a uniform title

– Differentiator Helps identify different versions of same resource

49

Identifiers

Uniform Resource Identifiers (URI) – Generic set of all names/addresses that refer to

resources on the Web including: Uniform Resource Locator (URL) Persistent Uniform Resource Locator (PURL) Uniform Resource Name (URN)

OpenURL DOI ISTC

50

Uniform Resource Locator (URL)

Web address or location at which a resource is held, not an identifier for the resource itself

Most common way to locate documents / items on the Web (http, ftp, mailto, etc.)

Not particularly stable or permanent– Error 404: File not Found

No metadata, but important starting point as we look at some of the related technologies

51

Persistent Uniform Resource Locator (PURL)

PURL Service is managed by OCLC Functionally, a PURL is a URL The PURL remains constant even if the URL

changes - its function is to automatically re-direct a user to the current URL

PURL system/resolver is updated by resource manager to reflect any changes to location of the file, or URL

52

PURLs

PURLs can be used both in documents and in cataloging systems

PURLs increase the probability of correct resolution and long-term access to resources

Use of PURLs can reduce the burden and expense of catalog maintenance (and business card printing)

53

PURL - Example

US Government is a big user of PURLs– http://www.ccny.cuny.edu/library/Divisions/

Government/iraqbib.html

54

Uniform Resource Name (URN)

Uniform Resource Names (URNs) are intended to serve as persistent, location-independent resource identifiers

Globally unique Never change Format

– urn:<namespace identifier>:<namespace specific string>

Use a resolver system to indicate current location of resource

55

Digital Object Identifier (DOI)

Overseen by the International DOI Foundation DOIs are persistent, location-independent

identifiers of resources Developed to enable management of

copyrightable materials in an electronic environment (locate, buy, sell, track, license)

Specific type / implementation of a URN

56

DOI

A two-part number with a prefix identifying the original publisher and a suffix identifying the specific work– Similar to the ISBN

A DOI resolution request for a specific resource would return one or more URLs - *locations* where a user could obtain access to the resource– Appropriate copy: online, text, free, illustrated, etc.

57

DOI

Applications of the DOI will require metadata The basis of the DOI metadata scheme is a

minimal "kernel" of elements DOI minimal kernel elements of metadata:

– DOI, DOI genre, identifier, title, type, origination, primary agent, agent role, and administrative data such as registrant, and date of registration

58

Questions?

Amy Benson

Program Director

NELINET Digital Services

NELINET, Inc.

[email protected]

508.597.1937

800.635.4638 x1937