lecture 07: controlled vocabularies

64
6 2003.09.16- SLIDE 1 IS 202 - FALL 2003 Lecture 07: Controlled Vocabularies Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003 SIMS 202: Information Organization and Retrieval Some slides in this lecture were developed by Prof. Marti Hearst

Upload: nan

Post on 24-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003. Lecture 07: Controlled Vocabularies. SIMS 202: Information Organization and Retrieval. Some slides in this lecture were developed by Prof. Marti Hearst. Lecture Contents. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 1IS 202 - FALL 2003

Lecture 07: Controlled Vocabularies

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2003

SIMS 202:

Information Organization

and Retrieval

Some slides in this lecture were developed by Prof. Marti Hearst

Page 2: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 2IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 3: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 3IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 4: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 4IS 202 - FALL 2003

Assignments

• Assignment 2: Due

• Assignment 3: Photo Capture and Annotation – Assigned Sept 18– Due Sept 23

Page 5: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 5IS 202 - FALL 2003

Phone Project Consent Forms

• Collection of Data for the Phone Project• Informed Consent and Release Form• Informed Consent to Release Academic

Information

– You must sign these forms to receive a phone and participate in the Phone Project

– Signing these consent forms is not a condition of your participation in this course, nor will it be used as a basis for grading your performance therein

Page 6: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 6IS 202 - FALL 2003

Collection of Data for the Phone Project

• Call logging– All phone calls made from the phones provided to you will be logged. The phone

conversations themselves are not going to be recorded, but record will be made of which numbers were called when and for how long.

• Approximate location logging– Your approximate location may be logged whenever the phone is used either for

phone calls or to take, upload, annotate or retrieve photos. • Data correlation

– The information call logging and approximate location logging may be correlated with various other sources of information (e.g., raw location data may be correlated with map data to try to determine in which buildings the phone was used.)

• Sublicensing of data collected– Garage Cinema Research may sublicense portions of the collected data to other

parties. This may include images of you or provided by you, as well as metadata about you or provided by you.

• Privacy projections– Garage Cinema Research will not release your name, email address, or the

complete phone numbers of the parties you called, except for their area codes and except for calls made between two Phone Project phones.

Page 7: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 7IS 202 - FALL 2003

Informed Consent and Release Form

• License to content– License to the content contributed by you to the system, including

but not limited to images, annotations, and annotation frameworks, as well as any data that will be collected in accordance with the privacy protecting measures.

• Identifying information and pseudonyms– Use of your name and email address by the system, understanding

that they are not going to be released to third parties. Your name will be replaced with a pseudonym if the data is released to third parties.

• Personal data collection– Applications built in the system will benefit from the use of personal

information, however, you are not required to provide the system with any personal information about yourself or other people beyond the data that is being collected automatically.

• Right of inspection/correction/deletion of photos– You have the right to inspect photos of you or information about you

submitted by you and/or other users of the system and to have them corrected or removed.

Page 8: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 8IS 202 - FALL 2003

Consent to Release Academic Information

• Agreement to post work on IS202 web site– You agree to have your Phone Project course work posted, including

your name, on the IS202 web site, which is accessible to the general public.

• Understanding of course enrollment and authorship disclosure– You understand that this will publicly reveal that you are a student at the

University of California at Berkeley, that you are taking this course, and that you are an author of this work.

• Indefinite time period of posting– You understand that my name may be posted on this web site

indefinitely, starting in September 2003.

• Optional email address posting– The posting of student email addresses on the IS202 web site Phone

Project group pages, while kindly requested, is not required.

Page 9: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 9IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 10: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 10IS 202 - FALL 2003

Metadata

• Structures and languages for the description of information resources and their elements (components or features)

• “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

Page 11: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 11IS 202 - FALL 2003

Metadata

• Often two main types of metadata are distinguished:– Descriptive metadata

• Describes the information/data object and its properties

• May use a variety of descriptive formats and rules

– Topical metadata• Describes the topic or “aboutness” of an

information/data object • May include a variety of vocabularies for

describing, subjects, topics, categories, etc.

Page 12: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 12IS 202 - FALL 2003

Metadata Systems and Standards

• Naming and ID systems – URLS, ISBNS• Bibliographic description – MARC, Dublin

Core, TEI, etc.• Music – SMDL• Images and objects – CIMI, VRA core

categories• Numeric data – DDI, SDSM• Geospatial data – FGDC • Collections – EAD

Page 13: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 13IS 202 - FALL 2003

Dublin Core

• Simple metadata for describing internet resources

• For “Document-Like Objects”

• 15 Elements (in base DC)

Page 14: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 14IS 202 - FALL 2003

Dublin Core Elements

• Title

• Creator

• Subject

• Description

• Publisher

• Other Contributors

• Date

• Resource Type

• Format

• Resource Identifier

• Source

• Language

• Relation

• Coverage

• Rights Management

Page 15: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 15IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 16: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 16IS 202 - FALL 2003

Controlled Vocabularies

• Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information

• That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata

Page 17: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 17IS 202 - FALL 2003

Controlled Vocabularies

• Names and name authorities

• Gazetteers (geographic names)

• Code lists (e.g., LC language codes)

• Subject heading lists

• Classification schemes

• Thesauri

Page 18: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 18IS 202 - FALL 2003

Control of Names

• Cutter’s (1876) objectives of bibliographic description– To enable a person to find a document of which

• The author, or• The title, or• The subject is known

– To show what a library has• By a given author• On a given subject (and related subjects)• In a given kind (or form) of literature.

• First serves access• Second serves collocation

Page 19: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 19IS 202 - FALL 2003

Problems with Names

• How many names should be associated with a document?

• Which of these should be the “main entry?”

• What form should each of the names take?

• What references should be made from other possible forms of names that haven’t been used?

Page 20: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 20IS 202 - FALL 2003

The Problem

• Proliferation of the forms of names– Different names for the same person– Different people with the same names

• Examples – from Books in Print (semi-controlled but not

consistent)– ERIC author index (not controlled)

Page 21: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 21IS 202 - FALL 2003

Goethe

…etc…

Page 22: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 22IS 202 - FALL 2003

John Muir

Page 23: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 23IS 202 - FALL 2003

Pauline Cochrane nee Atherton

Page 24: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 24IS 202 - FALL 2003

Pauline Cochrane nee Atherton

Page 25: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 25IS 202 - FALL 2003

Rules for Description

• AACR II and other sets of descriptive cataloging rules provide guidelines for:– Determining the number of name entries– Choosing a main entry– Deciding on the form of name to be used– Deciding when to make references

Page 26: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 26IS 202 - FALL 2003

Authority Control

• Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules

• If you have rules, why do you need to keep track of all of the headings? Can’t you just infer the headings from the rules?

Page 27: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 27IS 202 - FALL 2003

Conditions of Authorship?

• Single person or single corporate entity

• Unknown or anonymous authors– Fictitiously ascribed works

• Shared responsibility

• Collections or editorially assembled works

• Works of mixed responsibility (e.g., translations)

• Related works

Page 28: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 28IS 202 - FALL 2003

Added Entries

• Personal names– Collaborators– Editors, compilers, writers– Translators (in some cases)– Illustrators (in some cases)– Other persons associated with the work (such as the

honoree in a festschrift)

• Corporate names– Any prominently named corporate body that has

involvement in the work beyond publication, distribution, etc.

Page 29: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 29IS 202 - FALL 2003

Choice of Name

• AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name

• References should be made from the other forms of the name

Page 30: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 30IS 202 - FALL 2003

Form of the Name

• When names appear in multiple forms, one form needs to be chosen

• Criteria for choice are:– Fullness (e.g., full names vs. initials only)– Language of the name– Spelling (choose predominant form)

• Entry element:– John Smith or Smith, John?– Mao Zedong or Zedong, Mao? (Mao Tse

Tung?)

Page 31: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 31IS 202 - FALL 2003

Name Authority Files

ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973

Different names for thesame person

Page 32: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 32IS 202 - FALL 2003

Name Authority Files

ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J .J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)

Page 33: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 33IS 202 - FALL 2003

Name Authority Files

ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J .J. Marric)

Different people writing with the same name

Page 34: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 34IS 202 - FALL 2003

The Haunting of Lauran Paine

1. Paine, Lauran. ALSO KNOWN AS: Carrel, Mark. Thompson, Russ. Andrews, A. A. Benton, Will. Bradford, Will. Bradley, Concho. Brennan, Will. Carter, Nevada. Allen, Clay. Almonte, Rosa. Armour, John. Cassady, Claude. Glendenning, Donn. Kelley, Ray. Kilgore, John. Martin, Tom. Slaughter, Jim. Standish, Buck. …

Batchelor, Reg. Beck, Harry. Bedford, Kenneth. Bosworth, Frank. Bovee, Ruth. Cassidy, Claude. Custer, Clint. Dana, Amber. Dana, Richard. Davis, Audrey. Drexler, J. F. Duchesne, Antoinette. Fisher, Margot. Fleck, Betty. Frost, Joni. Gordon, Angela. Gorman, Beth. Hayden, Jay. Houston, Will. Howard, Troy. Ingersol, Jared. …

Kelly, Ray. Ketchum, Jack. Liggett, Hunter. Lucas, J. K. Lyon, Buck. Morgan, Arlene. Morgan, Valerie. O'Connor, Clint. St. George, Arthur. Sharp, Helen. Thorn, Barbara. Archer, Dennis. Clark, Badger.

Page 35: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 35IS 202 - FALL 2003

Some Interesting Ones…

Page 36: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 36IS 202 - FALL 2003

Structure of an IR System

SearchLine

Interest profiles& Queries

Documents & data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Adapted from Soergel, p. 19

Page 37: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 37IS 202 - FALL 2003

Uses of Controlled Vocabularies

• Library subject headings, classification, and authority files

• Commercial journal indexing services and databases

• Yahoo, and other web classification schemes

• Online and manual systems within organizations– SunSolve– MacArthur

Page 38: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 38IS 202 - FALL 2003

Types of Indexing Languages

• Uncontrolled keyword indexing

• Indexing languages– Controlled, but not structured

• Thesauri– Controlled and structured

• Classification systems– Controlled, structured, and coded

• Faceted thesauri and classification systems

Page 39: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 39IS 202 - FALL 2003

Indexing Languages

• An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents

• An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms

Page 40: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 40IS 202 - FALL 2003

Indexing Languages

• Library of Congress Subject Headings

• Yellow pages topics

• Wilson indexes (“reader’s guide”)

Page 41: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 41IS 202 - FALL 2003

Thesauri

• A thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among – Synonymous – Equivalent– Broader– Narrower, and– Other related terms

• National and international standards for thesauri (More next time)

Page 42: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 42IS 202 - FALL 2003

Classification Systems

• A classification system is an indexing language often based on a broad ordering of topical areas

• Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics

• Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms

Page 43: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 43IS 202 - FALL 2003

Classification Systems (Cont.)

• Examples:– The Library of Congress Classification System– The Dewey Decimal Classification System– The ACM Computing Reviews Categories– The American Mathematical Society

Classification System

Page 44: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 44IS 202 - FALL 2003

Using Controlled Vocabulary

• Start with the text of the document• Attempt to “control” or regularize:

– The concepts expressed within• mutually exclusive• exhaustive

– The language used to express those concepts• limit the normal linguistic variations• regulate word order and structure of phrases• reduce the number of synonyms or near-synonyms

• Also, provide cross-references between concepts and their expression

Slide author: Marti Hearst(These slides follow Bates 88)

Page 45: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 45IS 202 - FALL 2003

Classification Schemes

• Classify possible concepts.

• Goals:– Completely distinct conceptual categories

(mutually exclusive)– Complete coverage of conceptual categories

(exhaustive)

Slide author: Marti Hearst

Page 46: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 46IS 202 - FALL 2003

Assigning Headings vs. Descriptors

• Descriptors– Mix and match

How would we describe recipes using each technique?

Slide author: Marti Hearst

• Subject headings – Assign one (or a few)

complex heading(s) to the document

Page 47: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 47IS 202 - FALL 2003

Subject Heading vs. Descriptors

• Wilsonline– Athletes– Athletes -- Heath&hygiene– Athletes -- Nutrition– Athletes -- Physical Exams– …– Athletics– Athletics -- Administration– Athletics -- Equipment --

Catalogs– …– Sports -- Accidents and

Injuries– Sports -- Accidents and

Injuries -- Prevention

• ERIC– Athletes– Athletic Coaches– Athletic Equipment– Athletic Fields– Athletics– …– Sports Psychology– Sportsmanship

Slide author: Marti Hearst

Page 48: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 48IS 202 - FALL 2003

Subject Headings vs. Descriptors

• Describe the contents of an entire document

• Designed to be looked up in an alphabetical index– Look up document

under its heading

• Few (1-5) headings per document

• AKA: Precoordination

• Describe one concept within a document

• Designed to be used in Boolean searching– Combine to describe

the desired document

• Many (5-25) descriptors per document

• AKA: Postcoordination

Slide author: Marti Hearst

Page 49: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 49IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 50: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 50IS 202 - FALL 2003

Hierarchical Classification

• Each category is successively broken down into smaller and smaller subdivisions

• No item occurs in more than one subdivision

• Each level divided out by a “character of division” (also known as a feature)– Example:

• Distinguish “Literature” based on:– Language– Genre– Time Period

Slide author: Marti Hearst

Page 51: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 51IS 202 - FALL 2003

Hierarchical Classification

Literature

SpanishFrenchEnglish

DramaPoetryProse

18th17th16th

DramaPoetryProse

19th 18th17th16th 19th

...

... ... ...

...

Slide author: Marti Hearst

Page 52: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 52IS 202 - FALL 2003

Labeled Categories for Hierarchical Classification

• LITERATURE– 100 English Literature

• 110 English Prose– English Prose 16th Century– English Prose 17th Century– English Prose 18th Century– ...

• 111 English Poetry– 121 English Poetry 16th Century– 122 English Poetry 17th Century– ...

• 112 English Drama– 130 English Drama 16th Century– …

– 200 French LiteratureSlide author: Marti Hearst

Page 53: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 53IS 202 - FALL 2003

Faceted Classification

• Create a separate, free-standing list for each characteristic or division (feature)

• Combine features to create a classification

Slide author: Marti Hearst

Page 54: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 54IS 202 - FALL 2003

Faceted Classification Along With Labeled Categories

• A Language– a English– b French– c Spanish

• B Genre– a Prose– b Poetry– c Drama

• C Period– a 16th Century– b 17th Century– c 18th Century– d 19th Century

• Aa English Literature

• AaBa English Prose

• AaBaCa English Prose 16th Century

• AbBbCd French Poetry 19th Century

• BbCd Drama 19th Century

Slide author: Marti Hearst

Page 55: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 55IS 202 - FALL 2003

Questions

• How (and when) to use both types of classification structures?

• How to look through them?

• How to use them in searching?

Slide author: Marti Hearst

Page 56: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 56IS 202 - FALL 2003

Lecture Contents

• Phone Project• Review

– Metadata Systems– Dublin Core

• Controlled Vocabularies• Name Authority Files• Other Types of Controlled Vocabularies• Faceted vs. Hierarchic Organization of

Vocabularies• Discussion Questions

Page 57: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 57IS 202 - FALL 2003

Sarah Ellinger on Svenonius

• Many of the studies Svenonius cites seem to grapple with the same issue: how, or from whose perspective, do we measure the success of a database search? Should a successful search return all information, or distinguish by relevance? Can we always accept the searcher's view of relevant material? If an issue is under debate, should our search technologies provide the user with information from all sides, or only the side with which the searcher agrees?

Page 58: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 58IS 202 - FALL 2003

Sarah Ellinger on Svenonius

• In regards to discipline-specific search vocabularies, Svenonius asks, "Would it not make more sense to custom tailor a vocabulary-control tool to the vocabulary being tailored?" In a world where academic terms are prone to change, how do we avoid replicating obsoletisms like "Vietnamese conflict" in disciplinary vocabularies? Would such a vocabulary preserve outdated associations in the minds of searchers or lend credence to some theories over others? How can a controlled vocabulary reflect academic debate?

Page 59: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 59IS 202 - FALL 2003

Matt Meiske on Bates

• Bates’ article was written over 17 years ago. Since then, online catalogues have changed (e.g., web-based Melvyl), but not to the extent that Bates proposes. Why not?

Page 60: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 60IS 202 - FALL 2003

Matt Meiske on Bates

• In her proposal, Bates states that a good online catalogue design will provide some means of “orientation,” so that the user can “get a feel” for the system. Seventeen years later, the world is far a more computer-centric place. Are we becoming naturally oriented to systems of this sort? Is the issue of “orientation” / “docking” still relevant?

Page 61: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 61IS 202 - FALL 2003

Paul Laskowski on Borgman

• Borgman wrote in 1996, but certain passages already seem outdated to me (e.g., "a customer trying to operate a mouse as a foot pedal." 499) The spread of GUI interfaces, in particular, may solve some of the interface problems Borgman identifies. Is there still a problem with "user education," or is it now time to focus on how catalogues react to user queries? Is the real problem for users one of "technical skills," or should users be trained specifically to formulate queries "strategically"? Is this a skill that can be taught?

Page 62: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 62IS 202 - FALL 2003

Paul Laskowski on Borgman

• I opened up JSTOR (http://www.jstor.com) to try to compare Borgman's ideas to practice. JSTOR allows me to query the following fields: author, title, abstract, and full-text. Borgman does not seem to foresee the ability to search the full text. Does this ability make a subject query obsolete? In what scenario might I prefer to query a subject field? JSTOR allows me to constrain my search along multiple fields, using the operators AND, OR, and NEAR (10 words or 25 words). Sure enough, no information seems to be given on the order of operations. In what cases might this foil my search attempt?

Page 63: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 63IS 202 - FALL 2003

Next Time

• Thesaurus Design and Construction

• Readings/Discussion– Chapter F: Flow of Work in the Construction of

Indexing Languages and Thesauri (Soergel) - Simon– The House of Quality (Hauser and Clausing) - Sean– Designing the Organizational Framework (Sano) -

Lisa

• Phone Project– Phones!– Phone demo– Assignment 3: Photo Capture and Annotation

Page 64: Lecture 07: Controlled Vocabularies

62003.09.16- SLIDE 64IS 202 - FALL 2003

Discussion Questions Leaders

• Soergel

• Hauser and Clausing

• Sano