lbsc 670 information organization. thoughts from last class “i feel like we are getting behind”...

57
LBSC 670 Information Organization

Upload: milton-greene

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

LBSC 670

Information Organization

Page 2: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Thoughts from last class

• “I feel like we are getting behind”

• “Why are we learning HTML/CSS?”

• “What is cloud computing?”

• “Can we have printouts again?”

Page 3: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

LITA National Forum

http://www.ala.org/

Page 4: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

LITA - Free your metadata

• http://freeyourmetadata.org/ (Amalia)

• http://code.google.com/p/google-refine/

Page 5: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

LITA – Digital Institutes

Page 6: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Data Services Librarian

Advocate for data publishing, research, curation, collaboration

Page 7: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Class Plan

• Explore historical foundation of cataloging

• Identify metadata standards central to cataloging

• Explore metadata schemas useful for libraries

Page 8: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Review

• HTML implements a metadata schema (e.g h1, h2, DOM. . .) and an encoding system (e.g XHTML) in concert with supporting technologies (e.g. CSS, JavaScript)

• Digital documents have embedded structure that programs use to encode and decode information for use

Page 9: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Storage

• Relational database– Tables, SQL, indexes, abstracted but semi-fixed structure

• Object databases– Storage of objects which are directly accessible via programs

• Flat text files– embedded structure, tight association with application, quick,

simple

• XML files– Abstracted structure, portable, extensible, slower?

• Embedded in digital objects– Portable, associative

Page 10: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Representation of objectsObject

Representation Record

Encoding model

Page 11: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Metadata definitions

• Common• Data about Data• Data that describes a resource• Information about Information

• Gillian-Swetland, Baca• "the sum total of what one can say about any information

object at any level of aggregation.“• Content, Context, Structure

• Greenberg• Structured data about an information object that facilitates

functions associated with the designated object

Page 12: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Metadata Life-cycle

Gilliand, 2007

• Codification

• Storage

• Use/Reuse

• Scalability

Page 13: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Gilliand, 2007

Page 14: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Standards Types

• Data structure standards– Standards that govern the scope and

purpose of a metadata record (MARC, Dublin Core, Text Encoding Initiative (TEI))

• Data communication standards– Encoding (e.g., HTML/XHML, XML)

• Data syntax standards– Element ordering, content syntax, and

encoding syntax (e.g. date/time syntax)

Page 15: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Cataloging purposes

• “A list of books, maps, and other items arranged in some definite order” (cutter)– Discovery

• Catalogs, indexes, databases

– Management• Technical and administrative metadata

– Access• User interfaces (OPACS)

Page 16: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Dewey’s rules for cataloging

• Based on:– Panizzi’s 91 rules for cataloging

• British museum

– Charles coffin Jewett • Smithsonian librarian 1852• Standard cataloging, printed entries on cards

• Dewey• Alphabetic ordering by subject• System of all knowledge• Classification – browse http://dewey.info/

Page 17: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

A quick history of classification

• 245 BCE – Callimachus creates Pinakes – 120 volume catalog for 400 scrolls

– Title, author, teachers, biography– Six genres (rhetoric, law, epic, tragedy, lyric poetry,

history, medicine, mathematics, natural science, misc)

• 48 BC – Alexandria burns• . . .Then for a long time nothing happened . . • 1876 Dewey Decimal System• 1882 – Charles Cutter – Cutter classification• 1897 – Herbert Putnam – LC Classification

Images from Wikipedia

Page 18: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Book Metadata (circa 1960)

Page 19: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Book Metadata (circa 1980)

• 100 2_ |a Berners-Lee, Tim.• 245 10 |a Weaving the Web : |b the original design and ultimate

destiny of the World Wide Web by its inventor / |c Tim Berners-Lee with Mark Fischetti.

• 250 __ |a 1st ed.• 260 __ |a San Francisco : |b Harper SanFrancisco, |c c1999.• 300 __ |a xi, 226 p. ; |c 25 cm.• 500 __ |a Includes index.• 650 _0 |a World Wide Web |x History.• 600 20 |a Berners-Lee, Tim.• 700 1_ |a Fischetti, Mark.• 856 42 |3 Publisher description |u

http://www.loc.gov/catdir/description/hc044/99027665.html

http://www.oclc.org/bibformats/en/default.shtm

Page 20: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Book Metadata (circa 2002)

Page 21: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Library uses of metadata

• Descriptive cataloging

• Inventory of holdings

• Technical and administrative metadata about acquisitions

• Interoperability with other systems

• Facilitating acquisition decisions

• Federate searches from other catalogs

Page 22: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Cataloging process

Page 23: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

An example MARC record

1. Description1. Description

2. Access points

2. Access points

3. Headings3. Headings

4. References4. References

Page 24: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Anatomy of a bibliographic record

Page 25: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

AACR2 processes

• Area 1: title, statement of responsibility

• Area 2: edition

• Area 3: material type

• Area 4: publication, distribution

• Area 5: physical description

• Area 6: series

• Area 7: notes

• Area 8: standard number, terms

Page 26: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

How to enter a title into a MARC record

– AACR2• Transcribe title exactly according to spelling but not necessarily

punctuation/capitalization.

• If an alternative title is present, precede it by a comma following the regular title

• Use a General Material Designation in brackets []

– MARC Standard• Use 245 field – indicates Main title

• Indicator 2 – Number of non-filing characters (leading articles)

• Subfield a – main title

• Subfield b – remainder of title

• Subfield h – General Material Designation in brackets []

Page 27: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Dublin Core Overview

• Created out of a 1995 meeting in Dublin Ohio

• An intentionally simple standard focused on resource description

• DCMI conference (2007)• Enjoys widespread adoption in Library

and Digital library community, particuarly as a lowest-common-denominator standard

Page 28: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Initial Dublin Core

• Focused on Digital Document-like-objects

• Simple description, human based

• Focus on descriptive metadata over technical, preservation, use metadata

Page 29: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Dublin Core (1.0 -1995)1. Subject: The topic addressed by the work 2. Title: The name of the object 3. Author: The person(s) primarily responsible for the intellectual content of the

object 4. Publisher: The agent or agency responsible for making the object available 5. OtherAgent: The person(s), such as editors and transcribers, who have made

other significant intellectual contributions to the work 6. Date: The date of publication 7. ObjectType: The genre of the object, such as novel, poem, or dictionary 8. Form: The physical manifestation of the object, such as Postscript file or

Windows executable file 9. Identifier: String or number used to uniquely identify the object 10. Relation: Relationship to other objects 11. Source: Objects, either print or electronic, from which this object is derived, if

applicable 12. Language: Language of the intellectual content 13. Coverage: The spatial locations and temporal durations characteristic of the

object

Weibel, 1995

Page 30: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Dublin Core (1.1 - 1999)

• Title• Author or Creator• Subject and

Keywords• Description• Publisher• Other Contributor• Date• Resource Type

• Format• Resource Identifier• Source• Language• Relation• Coverage• Rights

Management

Page 31: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Qualified Dublin Core (Current)

• 71 properties, 35 classes . . .(Registry)

• Expansion of scope/purpose

• Multiple encoding models (HTML/XHTML, XML, RDF)

• Addition of Application Profile concept

Page 32: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

A possible record• Title: New Web language promises smarter surfing• Subject: World Wide Web• Subject: Extensible Markup Language• Subject: World Wide Web Consortium• Subject: Standards, Web• Creator: Heid, Jim• Creator: Glenn McDonald• Created: 01/07/1998• Identifier: http://www.cnn.com/TECH/computing.......• Publisher: Cable News Network• Language: en• Description: This article discusses the recent adoption of XML by

the W3C as a standard and its possible uses in a web environment

• Format: text/html• Rights: All Rights Reserved

Page 33: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Dublin Core Abstract Model

Page 34: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

HTML Encoding of DC

Page 35: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Example

• Title: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web

• Author: Tim Berners-Lee• Subject: World Wide Web• Publisher: Collins• Date: 2000• Language: English • ISBN-13: 978-0062515872

Page 36: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Work time

• Complete pages 1-4 of the worksheet– What is Dublin Core– Creating a Dublin Core record

Page 37: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Issues in cataloging

• focus of 'by-value' cataloging instead of by-reference means that consistency is poor

• focus on text identifiers (title, author) over unique IDs means record duplication is rampant

• focus on traditional descriptive measures limits effectiveness in new discovery systems that do not respect complex metadata

• focus on print resources has made cataloging for internet resources difficult

Page 38: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

New concepts in cataloging

• RDA: Resource description and analysis

• FRBR: Functional requirements for bibliographic records

• FRAD: Functional requirements for authority data

• FRSD: Functional requirements for subject data

Page 39: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Resource Description and Analysis

• RDA is an update to the AACR2

• RDA uses a new data model (FRBR)

• RDA includes new MARC fields• http://www.loc.gov/marc/formatchanges-RDA.ht

ml

• RDA is not yet implemented

Page 40: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

40

Addresses user tasks

FRBR:

• Find

• Identify

• Select

• Obtain

FRAD:

• Find

• Identify

• Contextualize

• Justify

• ICP’s highest principle = “convenience of the user”

Slide from http://www.loc.gov/aba/rda/training_modules.html

Page 41: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

41

FRBR’s Entity-Relationship Model

• Entities• Relationships• Attributes (data elements)

• National level required elements

relationship

One Entity Another Entity

Slide from http://www.loc.gov/aba/rda/training_modules.html

Page 42: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

42

FRBR’s Entity-Relationship Model

created

Shakespeare Hamletwas created by

Pers

on W

or

kSlide from http://www.loc.gov/aba/rda/training_modules.html

Page 43: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

43

Terminology

• FRBR and FRAD “attributes” are “elements” in RDA = identifying characteristics

• FRBR and FRAD Group 1 entities:– Work– Expression– Manifestation– Item

Slide from http://www.loc.gov/aba/rda/training_modules.html

Page 44: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR

• Functional requirements for bibliographic records– group 1 - Entities -work, expression,

manifestation, item– group 2 - person or corporate bodies

responsible for a work (FRAD)– group 3 - subjects - concepts, events,

places. . . (FRSD)

Page 45: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR Model

http://www.ifla.org/

http://fictionfinder.oclc.org/

http//worldcat.org

http://www.frbr.org

Page 46: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR components

• Work– distinct intellectual or artistic creation

• Expression– intellectual or artistic realization of a work

• Manifestation– physical embodiment of an expression of a

work

• Item– a single exemplar of a manifestation

Adapted from Jane Greenberg

http://frbr.oclc.org/pages/Pages?sn=460059802&instname=

Page 47: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR Example

• Rolling Stones’ IT'S ONLY ROCK-N –ROLL (1974) (work)– Group’s performance recorded for the

album (Expression)• Recording released in 1974 by MCA

Records on tape cassette (Manifestation)• Recording released in 1974 by MCA

Records on compact disc (Manifestation)• Sheet music released in 1992 (?)

Adapted from Jane Greenberg

Page 48: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR diagram

Work, the Performance (1974)

E: Music and lyrics

E: Music (just the instruments)

M: CD, RCA, 2005

M: RS, LP 1974

M: 8-track, RCA, 1975

I: My CD, RCA, 2005 c.2

I: Your CD, RCA, 2005 c.1

I: UNC Musllib.CD, RCA, 2005 c.3

Adapted from Jane Greenberg

Page 49: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR Algorithm (1)

• Process– Extract Author

• Construct Authority author entry from100, 400 using subfields and 008 data to limit

– Extract Title• Construct Authority title entry from 130, 240, 245, etc.

Normalize using NACO

– Combine these two authorities to create a unique Work identifier

• <author>Mitchell, Margaret</author><title>Gone with the wind</title>

Page 50: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

FRBR Algorithm (2)

• Results from a sample extraction (From FRBR doc)

• <author>/<title> (75.97%)• <uniform title> (1.34 %)• /<title>/[one or more <name>] (17.35%)• /<title>/<control number> (5.34%)

• http://www.oclc.org/research/software/frbr/frbr_workset_algorithm.pdf

Page 51: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Worktime

• Complete pages 5 & 6– Mapping DC to MARC

Page 52: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Metadata tools

Tool Type Uses

Conversion / Crosswalk Migrate data from one form to another

Creation Automatic or semi-automatic creation of metadata

Extraction / Harvesting Pull metadata from digital objects or systems for use/re-use

Evaluation Validate schema or encoding of metadata records

Searching Facilitate discovery and use of metadata

Page 53: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Evaluation

• Metadata evaluation methods

• Greenberg Review (2002)– Toezer (1999)

• Accuracy, completeness, consistency, timeliness, and intelligibility

– Rothenberg (1996) • Correctness, appropriateness

– Zeng (1993)• Specificity, exhaustivity, record completeness

Page 54: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

• Completeness, specificity, exhaustivity• Did the record capture essential elements of

the object?• Does the encoded record differentiate

appropriately between elements?

• Document/Index surrogation, retrieval• Is this a surrogate/abstraction and not a

codification of the resource?• Is the level of surrogation/abstraction

appropriate for storage/retrieval/use goals?

Evaluating Representation

Page 55: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

• Accuracy, consistency• Are the details of abstraction correct? • Is the content represented/encoded accurately?

• Utility, effectiveness, timeliness• Is the representation appropriate for a given

audience and use?• Does the representation solve an information

need?

Evaluating Representation

Page 56: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Worktime

• Complete pages 7-10 – Metadata tools and evaluation

Page 57: LBSC 670 Information Organization. Thoughts from last class “I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?”

Next Week

• Online– Read, complete worksheet, iscuss

• Encoding systems– XML overview– More on MARC encoding

• Assignment 1 questions

http://bit.ly/lbsc670_questions