lbsc 670 information organization. thoughts from last class “i feel like we are getting behind”...
TRANSCRIPT
LBSC 670
Information Organization
Thoughts from last class
• “I feel like we are getting behind”
• “Why are we learning HTML/CSS?”
• “What is cloud computing?”
• “Can we have printouts again?”
LITA National Forum
http://www.ala.org/
LITA - Free your metadata
• http://freeyourmetadata.org/ (Amalia)
• http://code.google.com/p/google-refine/
LITA – Digital Institutes
Data Services Librarian
Advocate for data publishing, research, curation, collaboration
Class Plan
• Explore historical foundation of cataloging
• Identify metadata standards central to cataloging
• Explore metadata schemas useful for libraries
Review
• HTML implements a metadata schema (e.g h1, h2, DOM. . .) and an encoding system (e.g XHTML) in concert with supporting technologies (e.g. CSS, JavaScript)
• Digital documents have embedded structure that programs use to encode and decode information for use
Storage
• Relational database– Tables, SQL, indexes, abstracted but semi-fixed structure
• Object databases– Storage of objects which are directly accessible via programs
• Flat text files– embedded structure, tight association with application, quick,
simple
• XML files– Abstracted structure, portable, extensible, slower?
• Embedded in digital objects– Portable, associative
Representation of objectsObject
Representation Record
Encoding model
Metadata definitions
• Common• Data about Data• Data that describes a resource• Information about Information
• Gillian-Swetland, Baca• "the sum total of what one can say about any information
object at any level of aggregation.“• Content, Context, Structure
• Greenberg• Structured data about an information object that facilitates
functions associated with the designated object
Metadata Life-cycle
Gilliand, 2007
• Codification
• Storage
• Use/Reuse
• Scalability
Gilliand, 2007
Standards Types
• Data structure standards– Standards that govern the scope and
purpose of a metadata record (MARC, Dublin Core, Text Encoding Initiative (TEI))
• Data communication standards– Encoding (e.g., HTML/XHML, XML)
• Data syntax standards– Element ordering, content syntax, and
encoding syntax (e.g. date/time syntax)
Cataloging purposes
• “A list of books, maps, and other items arranged in some definite order” (cutter)– Discovery
• Catalogs, indexes, databases
– Management• Technical and administrative metadata
– Access• User interfaces (OPACS)
Dewey’s rules for cataloging
• Based on:– Panizzi’s 91 rules for cataloging
• British museum
– Charles coffin Jewett • Smithsonian librarian 1852• Standard cataloging, printed entries on cards
• Dewey• Alphabetic ordering by subject• System of all knowledge• Classification – browse http://dewey.info/
A quick history of classification
• 245 BCE – Callimachus creates Pinakes – 120 volume catalog for 400 scrolls
– Title, author, teachers, biography– Six genres (rhetoric, law, epic, tragedy, lyric poetry,
history, medicine, mathematics, natural science, misc)
• 48 BC – Alexandria burns• . . .Then for a long time nothing happened . . • 1876 Dewey Decimal System• 1882 – Charles Cutter – Cutter classification• 1897 – Herbert Putnam – LC Classification
Images from Wikipedia
Book Metadata (circa 1960)
Book Metadata (circa 1980)
• 100 2_ |a Berners-Lee, Tim.• 245 10 |a Weaving the Web : |b the original design and ultimate
destiny of the World Wide Web by its inventor / |c Tim Berners-Lee with Mark Fischetti.
• 250 __ |a 1st ed.• 260 __ |a San Francisco : |b Harper SanFrancisco, |c c1999.• 300 __ |a xi, 226 p. ; |c 25 cm.• 500 __ |a Includes index.• 650 _0 |a World Wide Web |x History.• 600 20 |a Berners-Lee, Tim.• 700 1_ |a Fischetti, Mark.• 856 42 |3 Publisher description |u
http://www.loc.gov/catdir/description/hc044/99027665.html
http://www.oclc.org/bibformats/en/default.shtm
Book Metadata (circa 2002)
Library uses of metadata
• Descriptive cataloging
• Inventory of holdings
• Technical and administrative metadata about acquisitions
• Interoperability with other systems
• Facilitating acquisition decisions
• Federate searches from other catalogs
Cataloging process
An example MARC record
1. Description1. Description
2. Access points
2. Access points
3. Headings3. Headings
4. References4. References
Anatomy of a bibliographic record
AACR2 processes
• Area 1: title, statement of responsibility
• Area 2: edition
• Area 3: material type
• Area 4: publication, distribution
• Area 5: physical description
• Area 6: series
• Area 7: notes
• Area 8: standard number, terms
How to enter a title into a MARC record
– AACR2• Transcribe title exactly according to spelling but not necessarily
punctuation/capitalization.
• If an alternative title is present, precede it by a comma following the regular title
• Use a General Material Designation in brackets []
– MARC Standard• Use 245 field – indicates Main title
• Indicator 2 – Number of non-filing characters (leading articles)
• Subfield a – main title
• Subfield b – remainder of title
• Subfield h – General Material Designation in brackets []
Dublin Core Overview
• Created out of a 1995 meeting in Dublin Ohio
• An intentionally simple standard focused on resource description
• DCMI conference (2007)• Enjoys widespread adoption in Library
and Digital library community, particuarly as a lowest-common-denominator standard
Initial Dublin Core
• Focused on Digital Document-like-objects
• Simple description, human based
• Focus on descriptive metadata over technical, preservation, use metadata
Dublin Core (1.0 -1995)1. Subject: The topic addressed by the work 2. Title: The name of the object 3. Author: The person(s) primarily responsible for the intellectual content of the
object 4. Publisher: The agent or agency responsible for making the object available 5. OtherAgent: The person(s), such as editors and transcribers, who have made
other significant intellectual contributions to the work 6. Date: The date of publication 7. ObjectType: The genre of the object, such as novel, poem, or dictionary 8. Form: The physical manifestation of the object, such as Postscript file or
Windows executable file 9. Identifier: String or number used to uniquely identify the object 10. Relation: Relationship to other objects 11. Source: Objects, either print or electronic, from which this object is derived, if
applicable 12. Language: Language of the intellectual content 13. Coverage: The spatial locations and temporal durations characteristic of the
object
Weibel, 1995
Dublin Core (1.1 - 1999)
• Title• Author or Creator• Subject and
Keywords• Description• Publisher• Other Contributor• Date• Resource Type
• Format• Resource Identifier• Source• Language• Relation• Coverage• Rights
Management
Qualified Dublin Core (Current)
• 71 properties, 35 classes . . .(Registry)
• Expansion of scope/purpose
• Multiple encoding models (HTML/XHTML, XML, RDF)
• Addition of Application Profile concept
A possible record• Title: New Web language promises smarter surfing• Subject: World Wide Web• Subject: Extensible Markup Language• Subject: World Wide Web Consortium• Subject: Standards, Web• Creator: Heid, Jim• Creator: Glenn McDonald• Created: 01/07/1998• Identifier: http://www.cnn.com/TECH/computing.......• Publisher: Cable News Network• Language: en• Description: This article discusses the recent adoption of XML by
the W3C as a standard and its possible uses in a web environment
• Format: text/html• Rights: All Rights Reserved
Dublin Core Abstract Model
HTML Encoding of DC
Example
• Title: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web
• Author: Tim Berners-Lee• Subject: World Wide Web• Publisher: Collins• Date: 2000• Language: English • ISBN-13: 978-0062515872
Work time
• Complete pages 1-4 of the worksheet– What is Dublin Core– Creating a Dublin Core record
Issues in cataloging
• focus of 'by-value' cataloging instead of by-reference means that consistency is poor
• focus on text identifiers (title, author) over unique IDs means record duplication is rampant
• focus on traditional descriptive measures limits effectiveness in new discovery systems that do not respect complex metadata
• focus on print resources has made cataloging for internet resources difficult
New concepts in cataloging
• RDA: Resource description and analysis
• FRBR: Functional requirements for bibliographic records
• FRAD: Functional requirements for authority data
• FRSD: Functional requirements for subject data
Resource Description and Analysis
• RDA is an update to the AACR2
• RDA uses a new data model (FRBR)
• RDA includes new MARC fields• http://www.loc.gov/marc/formatchanges-RDA.ht
ml
• RDA is not yet implemented
40
Addresses user tasks
FRBR:
• Find
• Identify
• Select
• Obtain
FRAD:
• Find
• Identify
• Contextualize
• Justify
• ICP’s highest principle = “convenience of the user”
Slide from http://www.loc.gov/aba/rda/training_modules.html
41
FRBR’s Entity-Relationship Model
• Entities• Relationships• Attributes (data elements)
• National level required elements
relationship
One Entity Another Entity
Slide from http://www.loc.gov/aba/rda/training_modules.html
42
FRBR’s Entity-Relationship Model
created
Shakespeare Hamletwas created by
Pers
on W
or
kSlide from http://www.loc.gov/aba/rda/training_modules.html
43
Terminology
• FRBR and FRAD “attributes” are “elements” in RDA = identifying characteristics
• FRBR and FRAD Group 1 entities:– Work– Expression– Manifestation– Item
Slide from http://www.loc.gov/aba/rda/training_modules.html
FRBR
• Functional requirements for bibliographic records– group 1 - Entities -work, expression,
manifestation, item– group 2 - person or corporate bodies
responsible for a work (FRAD)– group 3 - subjects - concepts, events,
places. . . (FRSD)
FRBR Model
http://www.ifla.org/
http://fictionfinder.oclc.org/
http//worldcat.org
http://www.frbr.org
FRBR components
• Work– distinct intellectual or artistic creation
• Expression– intellectual or artistic realization of a work
• Manifestation– physical embodiment of an expression of a
work
• Item– a single exemplar of a manifestation
Adapted from Jane Greenberg
http://frbr.oclc.org/pages/Pages?sn=460059802&instname=
FRBR Example
• Rolling Stones’ IT'S ONLY ROCK-N –ROLL (1974) (work)– Group’s performance recorded for the
album (Expression)• Recording released in 1974 by MCA
Records on tape cassette (Manifestation)• Recording released in 1974 by MCA
Records on compact disc (Manifestation)• Sheet music released in 1992 (?)
Adapted from Jane Greenberg
FRBR diagram
Work, the Performance (1974)
E: Music and lyrics
E: Music (just the instruments)
M: CD, RCA, 2005
M: RS, LP 1974
M: 8-track, RCA, 1975
I: My CD, RCA, 2005 c.2
I: Your CD, RCA, 2005 c.1
I: UNC Musllib.CD, RCA, 2005 c.3
Adapted from Jane Greenberg
FRBR Algorithm (1)
• Process– Extract Author
• Construct Authority author entry from100, 400 using subfields and 008 data to limit
– Extract Title• Construct Authority title entry from 130, 240, 245, etc.
Normalize using NACO
– Combine these two authorities to create a unique Work identifier
• <author>Mitchell, Margaret</author><title>Gone with the wind</title>
FRBR Algorithm (2)
• Results from a sample extraction (From FRBR doc)
• <author>/<title> (75.97%)• <uniform title> (1.34 %)• /<title>/[one or more <name>] (17.35%)• /<title>/<control number> (5.34%)
• http://www.oclc.org/research/software/frbr/frbr_workset_algorithm.pdf
Worktime
• Complete pages 5 & 6– Mapping DC to MARC
Metadata tools
Tool Type Uses
Conversion / Crosswalk Migrate data from one form to another
Creation Automatic or semi-automatic creation of metadata
Extraction / Harvesting Pull metadata from digital objects or systems for use/re-use
Evaluation Validate schema or encoding of metadata records
Searching Facilitate discovery and use of metadata
Evaluation
• Metadata evaluation methods
• Greenberg Review (2002)– Toezer (1999)
• Accuracy, completeness, consistency, timeliness, and intelligibility
– Rothenberg (1996) • Correctness, appropriateness
– Zeng (1993)• Specificity, exhaustivity, record completeness
• Completeness, specificity, exhaustivity• Did the record capture essential elements of
the object?• Does the encoded record differentiate
appropriately between elements?
• Document/Index surrogation, retrieval• Is this a surrogate/abstraction and not a
codification of the resource?• Is the level of surrogation/abstraction
appropriate for storage/retrieval/use goals?
Evaluating Representation
• Accuracy, consistency• Are the details of abstraction correct? • Is the content represented/encoded accurately?
• Utility, effectiveness, timeliness• Is the representation appropriate for a given
audience and use?• Does the representation solve an information
need?
Evaluating Representation
Worktime
• Complete pages 7-10 – Metadata tools and evaluation
Next Week
• Online– Read, complete worksheet, iscuss
• Encoding systems– XML overview– More on MARC encoding
• Assignment 1 questions
http://bit.ly/lbsc670_questions