knowledge organization: library tools and taxonomies for the web jan herd [email protected] business...

49
Knowledge Organization: Library Tools and Taxonomies for the Web Jan Herd [email protected] Business Reference Services Science, Technology & Business Division The Library of Congress

Upload: ira-scott

Post on 18-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Knowledge Organization:Library Tools and

Taxonomies for the WebJan Herd [email protected]

Business Reference Services

Science, Technology & Business Division

The Library of Congress

2

3

Web is too big to organize?One billion pages1.5 million pages added

dailySelection of sites by

collection development specialists/reference librarians

4

Librarians work in corporate settings

Yahoo.com (directory)

Northern Light.com

(search engine)

Amazon.com (e-book seller)

Microsoft.com

5

OCLC Library Corporation Cooperatively Catalogs:

45 Million Works

350,000 Web sites and growing

6

Traditional Library Tools on the Web

Medical Subject Headings 1996

Web Dewey 2000

Classification Web 2001 (LCSH & LCC)

7

Importance of controlled vocabulary as metadata

American Library Association

Subject Analysis Committee (SAC)

Subcommittee on Metadata and

Subject Analysis recommendations

http://www.ala.org/alcts/organization/

ccs/metarept2.html

8

Controlled VocabulariesWhy We Need Them

Used “behind” search engines

Standard in online databases

New adherents (i.e., Web Content

Managers utilizing Taxonomies)

They Work !

9

Sherry Vellucci, Associate Professor, St. John’s Univ., during the Conference on Bibliographic Control for the New Millennium:

“authority control is not only wonderful, but critical. Controlled vocabulary mediating tools should cover Subjects, Genres, Gazetteers, Names and Titles, etc.”

10

Metathesauri/Subject CorrelationsUniversal Medical Language System

(UMLS) maps over 60 medical and health care thesauri in one

http://www.nlm.nih.gov/pubs/ factsheets/umlsmeta.html

ClassificationWebThe Library of Congress subject

headings and LC classification correlations

http://classweb.loc.gov

11

12

13

14

15

16

17

18

19

20

21

22

Mapping:Standard information exchangesystemsDublin Core to MARC

http://lcweb.loc.gov/marc/dccross.html

MARC to Dublin Core

http://www.loc.gov/marc/marc2dc.htmlXMLMARC Crosswalk

http://lcweb.loc.gov/marc/marcsgml.html (Must download files)

MARC to XML to MARC Converter http://www.logos.com/marc/default.asp

23

Mapping:Specialized information exchange systems

Standard Industrial Classification (SIC codes)

to

North American Industrial Classification System (NAICS codes)

24

25

SIC Code Example Major group 73=Business services

737=Computer programming, data processing, and other computer related services, 7372=Prepackaged software

Equivalent NAICS codes are:

Major group=51 Information

511=Publishing industries

5112=Software publishers (with cross ref. to Sector 42 for reselling packaged software)

26

Using old and new tools for knowledge organization on the Web

Water into Wine

27

What is a Taxonomy ?

A high level information search device constructed to provide a means of understanding, navigating, and gaining access to intellectual capital.

28

384 - 322 B.C.

Aristotle

Library of Alexandria

Carl Linnaeus

1707-1778

Kallimachos

305 - 240 B.C.

History of Taxonomies

29

“Classification” is used much more frequently than “Taxonomy”, in all fields of study.

30

Numerous formal taxonomies are maintained by

government and commercial enterprises

31

Taxonomies are used in:

Customized search engines

Interfaces in web portals

32

33

34

Service Codes CODE TITLE A Research and Development B Special Studies and Analysis ‑ Not R&D C Architect and Engineering Services ‑ Construction D Information Technology Services, including Telecommunication Services E Purchase of Structures and Facilities F Natural Resources and Conservation Services G Social Services H Quality Control, Testing and Inspection Services J Maintenance, Repair, and Rebuilding of Equipment K Modification of Equipment L Technical Representative Services M Operation of Government‑Owned Facilities N Installation of Equipment P Salvage Services Q Medical Services R Professional, Administrative and Management Support Services S Utilities and Housekeeping Services T Photographic, Mapping, Printing, and Publication Services U Education and Training Services V Transportation, Travel and Relocation Services W Lease or Rental of Equipment X Lease or Rental of Facilities Y Construction of Structures and Facilities Z Maintenance, Repair or Alteration of Real Property

35

36

37

How do we define taxonomies in a wired world ?

Taxonomy: A classification of elements within a domain

Domain: a sphere of knowledge, influence, or activity

Classification: the operation of grouping elements and establishing relationships between them (or the product of that operation)

Relationships: a defined linkage between two elements

Element: an object or concept

Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.

38

What are Taxonomies Good For?Taxonomies are applied to: Items (aka resources) individual pieces of

information (documents, people...

By the use of:Metadata: (aka properties, attributes) information

describing types of data

Which may or may not use values from a:Vocabulary: selection of terms, classified or sorted

To create:Content: an item and its associated metadata

Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.

39

ChallengesInformation management across divisions of

your agencyAgency global intranets/Internet portalsGlobal or national document management

including technical documentationIncorporating taxonomy technology into agency

technology +info. policiesCost of building a taxonomyMoving a taxonomy from overhead to being a

core part of your agency’s information management.

40

More ChallengesCertification of the taxonomy by an

authoritative body.Finding common ground across multiple

taxonomies or schemas with similar terms and different meanings.

Ensuring the ongoing integrity of the taxonomy with constant maintenance.

Acceptance by developers of tagging tools.Integrating with a legacy system and

external content.

41

The core expertise required for constructing a taxonomy is:

Systems Analyst who understands specifications for creating taxonomies

Domain expert/Subject expert in the subject of the taxonomy

Computational linguist, AI engineerLinguist and/or LexicographerDatabase/Application Development ExpertAdministrative SupportReview Support

42

Example of a custom taxonomy marked up in xbrl:

<?xml version=”1.0" encoding=”utf-8"?><schema xmlns:xbrl=”http://www.xbrl.org/core/2000-07-31">

targetNamespace=”http://www.xbrl.org/us/gaap/ci/2000-07-31"> <import namespace=http://www.xbrl.org/core/2000-07-31/

schemaLocation=”http://www.xbrl.org/core/2000-07-31/ xbrl-meta-2000-07-31.xsd”/>

<element name=”propertyPlantAndEquipmentGrossNote.purchasedSoftwareForInternalUse” type=”monetary”> <annotation>

<documentation>this is software that...</documentation> <appinfo> <xbrl:rollup to=”ci:propertyPlantAndEquipmentNetNote.propertyPlantAndEquipmentGrossNote” weight=”1" order=”7.5" /> <xbrl:label xml:lang=”en”>Purchased software for internal use</xbrl:label> <xbrl:reference name=”GPSI” number=”73" chapter=”11" paragraph=”b” subparagraph=”i” /> </appinfo>

</annotation> </element></schema>

43

44

Recommendations: Actively seek out existing taxonomies in the target discipline or

subject area. If your needs are met in part by an existing taxonomy use it and build on it.

Look at the intended purpose of the taxonomy and select appropriate software tools.

Consider scalability of the taxonomy. Look at the big picture and see how the taxonomy will be able to hook into others.

Consider utilizing numerical taxonomy as a schema in the metadata in order to merge documents in foreign languages.

Accommodate new standards whenever possible. Document “Best Practices” while creating the taxonomy and

review them regularly. Maintain and update the taxonomy continually.

45

Your Agency

Taxonomy

Existing Taxonomy

in your Field

Related Taxonomy of other agency in same field

Related Taxonomy of other

agency hooked to one above

Electronic Document

in XML

Core Schema (Describes how

document is to be created)

Meta Model(Describes how

taxonomies are created)

46

Efficient Web information

retrieval systems

in the form of search engines

or Web portals

require continued support and

improvement of:

47

Web based classification and numerical taxonomic tools to use in

Web based cataloging tools such as CORC, which provides metadata based on

Taxonomies such as controlled vocabularies/thesauri which will be hooked together using

Metathesauri and standard information exchange systems such as MARC-XML

48

And this is the house that Jack built…

With a wine cellar...

49

Knowledge Organization:Library Tools and

Taxonomies for the WebJan Herd [email protected]

Business Reference Services

Science, Technology & Business Division

The Library of Congress