using descriptive metadata and controlled vocabularies to ...€¦ · controlled vocabularies to...

13
M. Baca Descriptive Metadata for VR Collections & Museums, Portland, May 2010 page 1 Using Descriptive Metadata and Controlled Vocabularies to Enhance Access to Visual Using Descriptive Metadata and Controlled Vocabularies to Enhance Access to Visual a workshop at Reed College a workshop at Reed College May 28, 2010 May 28, 2010 Enhance Access to Visual Collections Murtha Baca Head, Digital Art History Access Getty Research Institute Enhance Access to Visual Collections Murtha Baca Head, Digital Art History Access Getty Research Institute Master of St. Bartholomew, The Meeting of the Three Kings (detail), ca. 1480 J. Paul Getty Museum Information standards and controlled vocabularies can help can help extricate us from our metadata dilemmas... A Typology of Data Standards from Introduction to Metadata, revised edition (2008) Type of Data Standard Examples Data structure standards (metadata element sets, schemas). These are “categories” or “containers” of data that make up a record or other information object. the set of MARC (Machine-Readable Cataloging format) fields, Encoded Archival Description (EAD), Dublin Core Metadata Element Set (DCMES), Categories for the Description of Works of Art (CDWA), VRA Core Categories Data value standards (controlled vocabularies thesauri controlled Library of Congress Subject Headings (LCSH), Library of Congress Name Authority File (LCNAF) LC Thesaurus for Graphic Materials (TGM) Medical Subject Headings vocabularies, thesauri, controlled lists). These are the terms, names, and other values that are used to populate data structure standards or metadata element sets. File (LCNAF), LC Thesaurus for Graphic Materials (TGM), Medical Subject Headings (MeSH), Art & Architecture Thesaurus (AAT), Union List of Artist Names (ULAN), Getty Thesaurus of Geographic Names (TGN), ICONCLASS Data content standards (cataloging rules and codes). These are guidelines for the format and syntax of the data values that are used to populate metadata elements Anglo-American Cataloguing Rules (AACR), Resource Description and Access (RDA), International Standard Bibliographic Description (ISBD), Cataloging Cultural Objects (CCO), Describing Archives: A Content Standard (DACS) Data format/technical interchange standards (metadata standards expressed in machine-readable form). This type of standard is often a manifestation of a particular data structure standard (type 1 above), encoded or marked up for machine processing. MARC21, MARCXML, EAD XML DTD, METS, MODS, CDWA Lite XML schema, Simple Dublin Core XML schema, Qualified Dublin Core XML schema, VRA Core 4.0 XML schema

Upload: others

Post on 24-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 1

    Using Descriptive Metadata and Controlled Vocabularies to Enhance Access to Visual

    Using Descriptive Metadata and Controlled Vocabularies to Enhance Access to Visual

    a workshop at Reed Collegea workshop at Reed CollegeMay 28, 2010May 28, 2010

    Enhance Access to Visual Collections

    Murtha BacaHead, Digital Art History Access

    Getty Research Institute

    Enhance Access to Visual Collections

    Murtha BacaHead, Digital Art History Access

    Getty Research Institute Master of St. Bartholomew, The Meeting of the Three Kings (detail), ca. 1480 J. Paul Getty Museum

    Information standards and controlled vocabularies can help can help extricate us from our metadata dilemmas...

    A Typology of Data Standardsfrom Introduction to Metadata, revised edition (2008)

    Type of Data Standard Examples

    Data structure standards (metadata element sets, schemas). These are “categories” or “containers” of data that make up a record or other information object.

    the set of MARC (Machine-Readable Cataloging format) fields, Encoded Archival Description (EAD), Dublin Core Metadata Element Set (DCMES), Categories for the Description of Works of Art (CDWA), VRA Core Categories

    Data value standards (controlled vocabularies thesauri controlled

    Library of Congress Subject Headings (LCSH), Library of Congress Name Authority File (LCNAF) LC Thesaurus for Graphic Materials (TGM) Medical Subject Headings vocabularies, thesauri, controlled

    lists). These are the terms, names, and other values that are used to populate data structure standards or metadata element sets.

    File (LCNAF), LC Thesaurus for Graphic Materials (TGM), Medical Subject Headings (MeSH), Art & Architecture Thesaurus (AAT), Union List of Artist Names (ULAN), Getty Thesaurus of Geographic Names (TGN), ICONCLASS

    Data content standards (cataloging rules and codes). These are guidelines for the format and syntax of the data values that are used to populate metadata elements

    Anglo-American Cataloguing Rules (AACR), Resource Description and Access (RDA), International Standard Bibliographic Description (ISBD), Cataloging Cultural Objects (CCO), Describing Archives: A Content Standard (DACS)

    Data format/technical interchangestandards (metadata standards expressed in machine-readable form). This type of standard is often a manifestation of a particular data structure standard (type 1 above), encoded or marked up for machine processing.

    MARC21, MARCXML, EAD XML DTD, METS, MODS, CDWA Lite XML schema, Simple Dublin Core XML schema, Qualified Dublin Core XML schema, VRA Core 4.0 XML schema

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 2

    • Content (“critical mass”)

    The Pieces of the Puzzle:

    (Some) Essential Elements for Good Digital Collections

    Content ( critical mass )• Curation• Cataloging• Controlled Vocabularies• Copyright

    Speaking of cataloging…

    Before you even get started, the first question to ask is

    “What are we cataloging?

    a work?an image?a document or piece of ephemera

    relating to a work or exhibition?CCO 10 key principlesCCO 10 key principles

    (see article by M.J. Bates in Information Processing and Management 38, no. 3 (2002): 381-400.http://www.gseis.ucla.edu/faculty/bates/articles/cascade.htmlhttp://www.gseis.ucla.edu/faculty/bates/articles/cascade.html

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 3

    characteristics of the information content: ambiguity or “fuzziness” of much informationdifferent system(s) of metadata may be needed: MARC MODS METS EAD CDWA

    characteristics of the information content: ambiguity or “fuzziness” of much informationdifferent system(s) of metadata may be needed: MARC MODS METS EAD CDWA needed:, MARC, MODS, METS, EAD, CDWA, VRA Core 4.0, etc.actual indexing: AAT, ULAN, TGN, LCSH, LCNAF, ICONCLASS, other vocabulariesSearch capabilities/user understanding and searching activities: uncharted territory?the World Wide Web and its vast, diverse user pool

    needed:, MARC, MODS, METS, EAD, CDWA, VRA Core 4.0, etc.actual indexing: AAT, ULAN, TGN, LCSH, LCNAF, ICONCLASS, other vocabulariesSearch capabilities/user understanding and searching activities: uncharted territory?the World Wide Web and its vast, diverse user pool

    financial and temporal constraintsinstitutional/organizational pressures

    and constraintsand constraintspersonal/personnel factorslegacy systems and dataunforeseeable technical factorscopyright/intellectual property issuesthe nature of collections themselves

    (different languages, formats, etc.)

    • Bibliographic records (MARC, MODS)Bibliographic records (MARC, MODS)•• Finding aids for intact archival collections Finding aids for intact archival collections

    The Pieces of the Puzzle: What form does the information take?

    ggwith a common provenance (EAD)with a common provenance (EAD)•• Descriptive records for individual objects Descriptive records for individual objects (CDWA, VRA, etc.)(CDWA, VRA, etc.)•• “Home“Home--grown” metadata element setsgrown” metadata element sets•• Dublin Core for “resource discovery,” Dublin Core for “resource discovery,” mapping, metadata harvestingmapping, metadata harvesting

    • Collection-level versus item-

    The Pieces of the Puzzle: other considerations

    level records• Are you providing a “card catalog” or “finding aids,” and/or digital “surrogates”?

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 4

    First Step: Select and Use Appropriate Metadata Elements

    Data Structure Standards (a.k.a. metadata standards)

    Guidelines for the structure of information systems: What elements should a database include?

    Meant to be customized according to institutional needs.

    MARC, EAD, MODS, Dublin Core, CDWA, VRA Core are examples of data structure standards.

    Second Step: Select and Use Vocabularies, Thesauri, and

    Classifications

    Data Value Standards

    Data values are used to “populate” or fill metadata elements

    Examples are LSCH, AAT, TGM, Iconclass, “local” or “collection-specific” thesauri

    Data Value Standardscontinued

    Used as controlled vocabularies or authorities to assist with documentation and cataloging.

    Used as research tools – vocabularies contain rich information and contextual knowledge.

    Used as search assistants in database retrieval systems or with online collections.

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 5

    Third Step: Follow Guidelines for Documentation

    Data Content StandardsBest practices for documentation (i.e.,

    implementing data structure and data value t d d )standards)

    Rules for the selection, organization, and formatting of content.

    AACR (Anglo American Cataloguing Rules), CCO (Cataloging Cultural Objects), DA:CS (Describing Archives: A Content Standard)

    AACR: the “bible” for bibliographic cataloging (up to

    DACS: Cataloging of archival collections; now) collections; successor to

    APPMCCO: “AACR

    for art objects, built works, visual materials” RDA: heir

    apparent to AACR

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 6

    CCO is the first data content (cataloging) standard specifically intended for cataloging cultural heritage materials and their images.CCO follows on the development of

    metadata element sets (e.g., CDWA,VRA ( g , ,Core) and controlled vocabulary standards (AAT, TGN, ULAN, etc.) specifically destined for art & cultural objects information.CCO is not a new data element set. Rather,

    it provides guidance for how to populate data data elements or fields” based on the VRA Core & CDWA elements. A map to Dublin Core and MARC 21 elements is provided.

    This simplified diagram illustrates how This simplified diagram illustrates how works may be related to other works, and works may be related to other works, and how works may be related to images, how works may be related to images, sources, and authorities.sources, and authorities.

    Both explicitly deal with issues of display vs. indexing.Both stress“relationships.” Both stress the importance

    of authorities.

    Both are independent of information communication format (yet associated with certain formats).

    Both are compatible/combinable with other standards (see example of Morgan Library & Museum)

    Both are designed to build and rely upon the cataloger’s judgment

    Both are derived from English-language conventions, but adaptable world-wide in other languages.

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 7

    Definitions of “work,” expression, manfestation, item (per FRBR) do not work well for unique cultural objects and built work.

    CCO provides guidelines for descriptive metadataCCO provides guidelines for descriptive metadata for unique items, not for bibliographic items nor Web resources (however, some items, e.g. some decorative arts, prints, etc. may appear as multiples).

    For CCO, items are not “self-describing”

    Titles and names are often handled differently.

    “A work is an abstract entity; there is no single material object one can point to as the work. We recognize the work through individual realizations or expressions of the work, but the 

    k i lf  i   l  i   h   li   f work itself exists only in the commonality of content between and among the various expressions of the work.”

    “A creative product, including architecture, art works such as paintings, drawings, graphic arts, sculpture decorative arts photographs that are sculpture, decorative arts, photographs that are considered to be art, and other cultural artifacts. A work may be a single item or it may be made up of many physical parts.”

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 8

    “A visual representationof a work  typically of a work, typically existing in photomechanical, photographic, or digital format.”

    WorkWork

    Expression

    FRBRWork Ronald Hayman’s Playbackthe author's text edited for publication

    Manifestation

    Item

    the book published in in 1973 by by Davis-Poynter

    •• FindFind

    •• IdentifyIdentify

    •• SelectSelect

    •• ObtainObtain

    a copy you found on amazon.com for

    $1.99 from a used bookstore

    WorkWork

    Related WorkRelated Work

    CCORaphael, Raphael, Entombment of ChristEntombment of ChristGalleria Borghese, RomeGalleria Borghese, Rome

    Raphael, Raphael, Preliminary Preliminary drawing for Entombment drawing for Entombment of Christof Christ

    Ashmolean Museum, Ashmolean Museum, OxfordOxford

    Work: find, identify, select

    Surrogate:

    Image/Image/SurrogateSurrogate

    Related WorkRelated WorkDegas, Degas, The Deposition after The Deposition after Raphael's Borghese DepositionRaphael's Borghese Deposition, , Leicester Galleries, LondonLeicester Galleries, London

    •• 35 mm slide, view with frame35 mm slide, view with frame

    •• Color print, view without frameColor print, view without frame

    •• JPEG, view without frameJPEG, view without frame

    •• TIFF, detail of the head of MaryTIFF, detail of the head of Mary

    Surrogate: select, “obtain”

    Raphael, Raphael, Entombment of ChristEntombment of ChristGalleria Borghese, RomeGalleria Borghese, Rome

    Raphael, Raphael, Preliminary drawing for Preliminary drawing for Entombment of ChristEntombment of Christ

    Ashmolean Museum, OxfordAshmolean Museum, Oxford

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 9

    Fourth Step:

    Select the Appropriate Format for Expressing Data

    DATA FORMAT STANDARDS

    How will you “publish” and share your data in How will you publish and share your data in electronic form?

    How will service providers obtain, add value, and disseminate your data?

    Candidates are Dublin Core XML; MARC21; MARC XML; CDWA Lite XML schema; VRA Core XML schema, MODS, etc.

    What do search engines do?— They “index” the Web.Web pages (HTML “document-Web pages (HTML document

    like objects”) can be indexed by search engines.What about dynamic content

    that is generated on the fly from searchable databases?

    Title HTML TagTitle HTML TagLimit to no more than 60 characters, including spaces. This is what Limit to no more than 60 characters, including spaces. This is what displays in results lists from search engines, and at the top of the displays in results lists from search engines, and at the top of the screen in a Web browser display. It is also what is used by Web screen in a Web browser display. It is also what is used by Web browsers in creating the names of “bookmarks.”browsers in creating the names of “bookmarks.”•• Make sure that the Title HTML tag matches as closely as possible Make sure that the Title HTML tag matches as closely as possible the actual title that appears on the Web page. the actual title that appears on the Web page. •• Always use upper and lower case, as this will appear at the top of Always use upper and lower case, as this will appear at the top of the browser display and in the search results display. the browser display and in the search results display.

    example:example:Antelope Valley Indian MuseumAntelope Valley Indian Museum

    example:example: Fowler Museum Collections Fowler Museum Collections

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 10

    Description TagDescription TagA textual description of the content of the resource.A textual description of the content of the resource.

    •• Limit to 120Limit to 120--140 characters, including spaces. 140 characters, including spaces. •• This should be a concise text that clearly This should be a concise text that clearly describes the content of the particular resource; describes the content of the particular resource; it is what appears as the “summary” or it is what appears as the “summary” or it is what appears as the summary or it is what appears as the summary or description of the resource in results lists from description of the resource in results lists from many Web search engines.many Web search engines.

    example:

    Keywords TagKeywords TagThis tag is for words and phrases used to describe a resource. This tag is for words and phrases used to describe a resource. Judiciously selected keywords can provide important additional Judiciously selected keywords can provide important additional “access points” to a Web resource, even if those words and “access points” to a Web resource, even if those words and phrases do not actually appear on the page.phrases do not actually appear on the page.

    •• Limit to a total of 1000 characters, including spaces and Limit to a total of 1000 characters, including spaces and punctuationpunctuationpunctuation.punctuation.•• Do not repeat keywords more than 7 times, or search Do not repeat keywords more than 7 times, or search engines may ignore them as “spamming”engines may ignore them as “spamming”•• Don’t include generic keywords that will cause pages that Don’t include generic keywords that will cause pages that aren’t particularly relevant to appear in search results aren’t particularly relevant to appear in search results displays. For example, don’t put “art” or “Los Angeles” as displays. For example, don’t put “art” or “Los Angeles” as keywords for a particular resource unless there is really keywords for a particular resource unless there is really relevant content about art or Los Angeles contained in that relevant content about art or Los Angeles contained in that resource.resource.

    Keywords TagKeywords Tag

    example:example:

    ll META META example:example:

    “No Index, No Follow” Meta TagA tag that tells Web robots not to index a particular page.A tag that tells Web robots not to index a particular page.

    •• This is an extremely important tag that helps searchers to This is an extremely important tag that helps searchers to avoid retrieving “decontextualized” or “disembodied” Web avoid retrieving “decontextualized” or “disembodied” Web pages. It also helps to eliminate search results lists that pages. It also helps to eliminate search results lists that contain page after page from the same resource. contain page after page from the same resource. •• Remember that when you use this tag in many cases you Remember that when you use this tag in many cases you •• Remember that when you use this tag, in many cases you Remember that when you use this tag, in many cases you will have to include important keywords from the will have to include important keywords from the “suppressed” pages on the home page or main page of the “suppressed” pages on the home page or main page of the resource in question. In fact, it should be a priority to put resource in question. In fact, it should be a priority to put keywords from pages that have the “no index, no follow” keywords from pages that have the “no index, no follow” tag on the home page or main page of a resource.tag on the home page or main page of a resource.

    example:example:

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 11

    Speaking of the Web...

    • Are your collections “reachable” by commercial search engines?

    If h ill “ t t li ” •If yes, how will you “contextualize” individual collection objects?

    • If not, what is your strategy to lead Web users to your search page?

    The “Visible Web” versus the “Deep Web”

    • The Visible Web is what you see in the results pages from general Web search engines & subject directories (static Web pages)

    • The Invisible or Deep Web consists of data from dynamically searchable databases that cannot be indexed by search engines, because they aren’t “stored” anywhere.

    The “Google factor”factor

    What Google “looks at”

    Title tagt e tag

    Text on the Web page

    Referring links

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 12

    What Google doesn’t look at (mostly)

    Keywords meta tag

    Description meta tagsearchenginewatch.com provides information on how commercial search engines work

    How to assist users in “unmediated” searching, browsing, etc.

    How to present large, complicated amounts of data in a way that users can

    d t d d i t t

    How to assist users in “unmediated” searching, browsing, etc.

    How to present large, complicated amounts of data in a way that users can

    d t d d i t tunderstand and interpretHow to create “cataloging for the Web”:

    harnassing and adapting the power of metadata and controlled vocabulariesHow to provide reliable, up-do-date,

    “authoritative” metadata

    understand and interpretHow to create “cataloging for the Web”:

    harnassing and adapting the power of metadata and controlled vocabulariesHow to provide reliable, up-do-date,

    “authoritative” metadata

    Facing the Challenges

    Institutions need to carefully chose and consistently apply metadata schemas to their collections information.

    Application of vocabulary resources (including local authorities and thesauri) is essential for enhancing end-user access.

    Use of picklists, thematic groupings, and “browsing categories” based on institutions’ organized data improves end-user access.

  • M. BacaDescriptive Metadata for VR Collections & Museums, Portland, May 2010page 13

    Facing the Challenges continued

    Careful and consistent implementation of title tags and other metadata on Web pages facilitates end-user searching and retrieval of Web resources

    Use metadata and usability analysis should be a routine part of digital library work.

    Provide both searching and browsing functionalities (and carefully consider whether or not to offer an advanced “fielded” searching option)

    Don’t “show” all your data, nor make it all available for end-user searching.Don’t create hyperlinks simply because you can!Create thematic groupings (based on carefully constructed metadata!) that reflect your collections and help your users.Study end-user behavior (including your own).