taxonomy fundamentals - sla 2014

327
Taxonomy Fundamentals Why build a taxonomy? SLA – Vancouver – June 7, 2013 www.accessinn.com www.dataharmony.com 505-998-0800 Marjorie M.K. Hlava President and Chief Scientist Bob Kasenchak Project Coordinator Access Innovations, Inc. Copyright © 2013 Access Innovations, Inc.

Upload: accessinnovations

Post on 11-May-2015

921 views

Category:

Documents


1 download

DESCRIPTION

An all-day version of Access Innovations' Taxonomy Fundamentals workshop, presented by Marjorie M.K. Hlava and Bob Kasenchak at the 2014 Special Libraries Association (SLA) annual meeting in Vancouver, British Columbia on June 7, 2014.

TRANSCRIPT

Page 1: Taxonomy Fundamentals - SLA 2014

Taxonomy Fundamentals

Why build a taxonomy?

SLA – Vancouver – June 7, 2013

www.accessinn.comwww.dataharmony.com

505-998-0800Marjorie M.K. Hlava

President and Chief ScientistBob Kasenchak

Project CoordinatorAccess Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Page 2: Taxonomy Fundamentals - SLA 2014

A fast moving and powerful introduction to both the theoretical and practical aspects of building a taxonomy, thesaurus, and ontology. A well-built taxonomy is part of the foundation of the information architecture underlying web sites, corporate Intranets, search/retrieval, and access to relevant content in databases. After defining controlled vocabularies and identifying core standards, you will explore key concepts of taxonomy, thesaurus, indexing, classification, and filtering. Discussion will include the basics of a taxonomy records and fundamental term relationships. Attendees will put concepts into practice through multiple exercises, taxonomy, indexing, and related software tools will be demonstrated.

Introduction To Taxonomy Concepts

Copyright © 2013 Access Innovations, Inc.

Page 3: Taxonomy Fundamentals - SLA 2014

About Access InnovationsAccess Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata-enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found!

Quick Facts• Founded in 1978• Headquartered in Albuquerque, NM• Privately held• Delivered more than 2000 engagements

Copyright © 2013 Access Innovations, Inc.

Page 4: Taxonomy Fundamentals - SLA 2014

What we do

Access Innovations Ensure clean, well formed content Create Knowledge Organization Systems (KOS)

Data Harmony Tools To automatically index content To manage KOS and more To semantically enrich the content To organize the content

Visualization tools to portray the data

4Copyright © 2013 Access Innovations, Inc.

Page 5: Taxonomy Fundamentals - SLA 2014

Outline of the Day Why the excitement What is a Taxonomy Card Sort – Slide 39 How to build a taxonomy Term relationships Thesaurus Examples Pre and Post

Coordination What are we controlling Vocabulary Options

TaxoMatch - Slide 189 Term Forms Facets / Notation / Roles /

Treatment/ Weighting Auto Indexing A Taxing Situation - Slide

315 Search Where do I use it? Standards and references

Page 6: Taxonomy Fundamentals - SLA 2014

Why The Excitement? Makes information findable!

Cut search time by 50%! (The Weather Channel) Leverages information in new ways User satisfaction Organizes topical areas and web sites Provides better online help

Customer support 30x more costly than web self-service*

*(Forrester Research "Tier Zero Customer Support" 1999)

Copyright © 2013 Access Innovations, Inc.

Page 7: Taxonomy Fundamentals - SLA 2014

Taxonomies are found…

• In “indexing”, tagging, categorizing, subject metadata• In search - precision, recall• In content management systems, web sites• In SharePoint to replace term tree, tag uploads• In mashups, repackaging, repurposing data• In social networking sites• In author tagging - peer reviewer selection• In filtering data – e.g., spam filters and RSS feeds• In web crawlers• In text analytics – trend analysis• … and much more

Copyright © 2013 Access Innovations, Inc.

Because taxonomies make them work

Page 8: Taxonomy Fundamentals - SLA 2014

Where Does Implementation Happen?

At the backend When the records / articles are added to

the production system When the search software’s “inverted file”

is created When the HTML for the web page is

created

Copyright © 2013 Access Innovations, Inc.

Page 9: Taxonomy Fundamentals - SLA 2014

Heart Of The “Big Data” Production Process

Copyright © 2013 Access Innovations, Inc.

Page 10: Taxonomy Fundamentals - SLA 2014

From the production side to the website display, carry the taxonomy descriptors for use in precision search

Copyright © 2013 Access Innovations, Inc.

Page 11: Taxonomy Fundamentals - SLA 2014

Taxonomy

Copyright © 2013 Access Innovations, Inc.

Page 12: Taxonomy Fundamentals - SLA 2014

Authors at a place

MASHUP locations to a GPS grid of an area

Two data points GPS Coordinates Taxonomy description of the place

Copyright © 2013 Access Innovations, Inc.

Page 13: Taxonomy Fundamentals - SLA 2014

Watch Crime In Action

Copyright © 2013 Access Innovations, Inc.

Page 14: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 15: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 16: Taxonomy Fundamentals - SLA 2014

Two data points GPS Coordinates Taxonomy description of the crime

Copyright © 2013 Access Innovations, Inc.

Page 17: Taxonomy Fundamentals - SLA 2014

17

Visualization Strategies

MatrixVisualization

Software

Copyright © 2013 Access Innovations, Inc.

Page 18: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 19: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 20: Taxonomy Fundamentals - SLA 2014

All Data Up-postedTo The Top Level

Copyright © 2013 Access Innovations, Inc.

Page 21: Taxonomy Fundamentals - SLA 2014

Pattern AnalysisIndexing Clusters

Copyright © 2013 Access Innovations, Inc.

Page 22: Taxonomy Fundamentals - SLA 2014

Pattern AnalysisDomain Associations

Copyright © 2013 Access Innovations, Inc.

Page 23: Taxonomy Fundamentals - SLA 2014

Pattern AnalysisDomain Correlations

Copyright © 2013 Access Innovations, Inc.

Page 24: Taxonomy Fundamentals - SLA 2014

Pattern AnalysisGap Analyses

Copyright © 2013 Access Innovations, Inc.

Page 25: Taxonomy Fundamentals - SLA 2014

Pattern AnalysisComponent Gaps

Copyright © 2013 Access Innovations, Inc.

Page 26: Taxonomy Fundamentals - SLA 2014

More Like This - RecommenderCancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003© 2003 American Association for Cancer Research Short Communications

Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251

Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort.

Related Press Releases• How What and How Much We Eat (And Drink)

Affects Our Risk of Cancer • Novel COX-2 Combination Treatment May

Reduce Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death

• COX-2 Levels Are Elevated in Smokers

Related AACR Workshops and Conferences• Frontiers in Cancer Prevention Research• Continuing Medical Education (CME) • Molecular Targets and Cancer

TherapeuticsRelated Meeting Abstracts• Association between dietary folate

intake, alcohol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast

• Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma

• Dietary folate intake and risk of prostate cancer in a large prospective cohort study

Related Working Groups• Finance• Charter• Molecular Epidemiology

Related Education Book ContentOral Contraceptives, Postmenopausal Hormones, and Breast CancerPhysical Activity and CancerHormonal Interventions: From Adjuvant Therapy to Breast Cancer PreventionRelated Awards

• AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards

• ACS Award• Weinstein Distinguished Lecture

WebcastsRelated Webcasts

Think Tank ReportRelated Think Tank Report Content

Copyright © 2013 Access Innovations, Inc.

Page 27: Taxonomy Fundamentals - SLA 2014

Link to Society Resources

Journal Article on Topic A

Other Journal

Articles on Topic A

Upcoming Conference on Topic A

Podcast Interview with Researcher

Working on Topic A

Grant Available for Researchers

Working on Topic A

CME Activity on

Topic A

Job Posting for Expert on Topic A

Copyright © 2013 Access Innovations, Inc.

Page 28: Taxonomy Fundamentals - SLA 2014

Author Connections

Copyright © 2013 Access Innovations, Inc.

Page 29: Taxonomy Fundamentals - SLA 2014

What is a taxonomy?

Albuquerque, NM 87110www.accessinn.com

www.dataharmony.com505-998-0800

Marjorie M.K. Hlava

President and Chief Scientist

Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Page 30: Taxonomy Fundamentals - SLA 2014

Vocabulary Control - Options Classification

systems* Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

[*We will concentrate on taxonomies and thesauri, first, and then cover the others as time permits.]

Page 31: Taxonomy Fundamentals - SLA 2014

Taxonomy Standards Z39.19 (2005) Controlled Vocabularies BS 8723 Parts 1 – 5 ISO25964 Parts 1 - 4 TAG 37 and 46 standards SKOS - Simple Knowledge Organization

System OWL - Web Ontology Language AND more!

Copyright © 2013 Access Innovations, Inc.

Page 32: Taxonomy Fundamentals - SLA 2014

A Taxonomy is a Knowledge Organization System (KOS)

Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus Ontology Semantic network

Not complex

Highly complex

Copyright © 2013 Access Innovations, Inc.

Page 33: Taxonomy Fundamentals - SLA 2014

Structure Of Controlled Vocabularies

Lists Synonyms Taxonomy Thesaurus Ontology

Ambiguity Ambiguity Ambiguity Specifies a KOS Synonym Synonym Additional kinds of

Hierarchy Hierarchy RelationshipsRelationships relationships

INCREASING COMPLEXITY and CONTROL

Copyright © 2013 Access Innovations, Inc.

Page 34: Taxonomy Fundamentals - SLA 2014

What is a Taxonomy? ANSI/NISO Z39.19-2005

“A collection of controlled vocabulary terms organized into a

hierarchical structure.”

controlled

Missing: equivalence, homographic, and associative relationships and notes

Yes!

Copyright © 2013 Access Innovations, Inc.

Page 35: Taxonomy Fundamentals - SLA 2014

Taxonomy? Thesaurus?

Often used interchangeably Thesaurus is a taxonomy with extras

Related Terms Non-preferred Terms (USE/Used for) Scope Notes More

Taxonomies often have the actual information object at the final node.

CMS and SharePoint tend to the hierarchical view only, definition, and USE

Copyright © 2013 Access Innovations, Inc.

Page 36: Taxonomy Fundamentals - SLA 2014

Taxonomy? Thesaurus?

Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT)

See also (SA) Non-Preferred Term (NP)

Used for (UF), See (S) Scope Note (SN) History (H)

= subject term, heading, node, category, descriptor, class

TAXONOMY

THESAURUSOWL can specify

Copyright © 2013 Access Innovations, Inc.

Page 37: Taxonomy Fundamentals - SLA 2014

The Semantic Roadmap: Knowledge Organization Systems

Semantic network Ontology Thesaurus Taxonomy Controlled vocabulary Synonym set/ring Name authority file Uncontrolled list

• Unrelated Entities• Ambiguity

• Linked Entities• Contextual Specificity

• Simple• Low Value

• Complex• High value

Uncontrolled list has the

Highest Cost over Time!

Copyright © 2013 Access Innovations, Inc.

Page 38: Taxonomy Fundamentals - SLA 2014

Copyright © 2005 - Access Innovations, Inc.

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

Page 39: Taxonomy Fundamentals - SLA 2014

CARD SORT

Copyright © 2013 Access Innovations, Inc.

Page 40: Taxonomy Fundamentals - SLA 2014

Taxonomy 101How do you build a taxonomy?

Albuquerque, NM 87110www.accessinn.com

www.dataharmony.com505-998-0800

Marjorie M.K. Hlava

President and Chief Scientist

Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Page 41: Taxonomy Fundamentals - SLA 2014

How Do You Build a Taxonomy ?

• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data

You’re done!

Copyright © 2013 Access Innovations, Inc.

Page 42: Taxonomy Fundamentals - SLA 2014

Foundations Start with what is known Build from there Use the literature, your data Use the lists you already have internally Built-in continuous review throughout the

process, and beyond Who is involved?

Taxonomists Subject matter experts (SME) Project management Users

Copyright © 2013 Access Innovations, Inc.

Page 43: Taxonomy Fundamentals - SLA 2014

Define Subject Field

Review representative collection of content Determine:

Core areas Peripheral topics

PsychologyEducation

Sociology

Law

Scope can be modified later

Copyright © 2013 Access Innovations, Inc.

Page 44: Taxonomy Fundamentals - SLA 2014

Where Do I Get the Terms?

Your documents and databases Departmental terminology Text books and their indexes Book tables of contents and indexes Journal quarterly indexes Encyclopedias Lexicons, glossaries on the topic Web resources Users and experts Search logs

Copyright © 2013 Access Innovations, Inc.

Page 45: Taxonomy Fundamentals - SLA 2014

How Do You Choose Terms?

Importance in the subject area Use in the literature, by the organization

or community Necessary degree of specificity or detail Relationship with other controlled

vocabularies Single concept = single term

Copyright © 2013 Access Innovations, Inc.

Page 46: Taxonomy Fundamentals - SLA 2014

Build, Buy, Augment? Survey existing thesaurus/taxonomy resources for your

domain Test for

• Scope• Depth• Make-or-break terms• Cost

Adoption of existing taxonomies Term registries Taxobank Taxonomy Warehouse Other resources

Don’t reinvent the wheel!Copyright © 2013 Access Innovations, Inc.

Page 47: Taxonomy Fundamentals - SLA 2014

Gather Terms From Search Logs

Top ~100 search terms from search logs Terms used more than 50 times Match to web site with appropriate

answer Basis for favorites or best bets, presented

at the top of results list Behavior-based taxonomy

Copyright © 2013 Access Innovations, Inc.

Page 48: Taxonomy Fundamentals - SLA 2014

Vocabulary Control – How?

Use unambiguous terms, clear to the user group

Distinguish between terms that appear similar

Use Scope Notes when necessary Use terms as elements that can be

coordinated in a flexible manner Create compound terms, if necessary

Copyright © 2013 Access Innovations, Inc.

Page 49: Taxonomy Fundamentals - SLA 2014

Term Format

KISS – Keep it short and simple• 1-2-3 words• Effect on search• Pre and Post Coordination

Establish a policy • follow Chicago Manual of Style

Grammatical issues • Nouns and noun phrases• Verbs Gerunds • Adjectives - no• Adverbs - no• Initial articles – no

Copyright © 2013 Access Innovations, Inc.

Page 50: Taxonomy Fundamentals - SLA 2014

Thesaurus - Format

Main Entries Top Terms - TT Broader Terms - BT Narrower Terms - NT Related Terms - RT Scope Notes - SN History - HI Date term added/changed - DA

Copyright © 2013 Access Innovations, Inc.

Page 51: Taxonomy Fundamentals - SLA 2014

Thesaurus - Format

Related terms - RT See - S See also - SA Use - U

Preferred Term PT Use for - UF

Non Preferred Term NP ..

Copyright © 2013 Access Innovations, Inc.

Page 52: Taxonomy Fundamentals - SLA 2014

Definitions

Index term the representation of a concept

Preferred term (International)

a term used consistently to index a concept descriptor (USE) what the “USED FOR” reference points to

Copyright © 2013 Access Innovations, Inc.

Page 53: Taxonomy Fundamentals - SLA 2014

Definitions

Non preferred term (International) synonym or quasi synonym of a preferred term non-descriptor (USE) the “USE” reference the “SEE” reference

Related term the “SEE ALSO”

Copyright © 2013 Access Innovations, Inc.

Page 54: Taxonomy Fundamentals - SLA 2014

Indexing Terms

Three main categories concrete entities abstract concepts proper nouns

Copyright © 2013 Access Innovations, Inc.

Page 55: Taxonomy Fundamentals - SLA 2014

One Term / One Concept

Importance in the subject area Use in the literature, by the organization

or community Necessary degree of specificity or detail Relationship with other controlled

vocabularies

Copyright © 2013 Access Innovations, Inc.

Page 56: Taxonomy Fundamentals - SLA 2014

One Term / One Concept

Terms represent simple or unitary concept A unit of thought Can be a single-word term Can be a multiword term, if required to

represent the concept Three main categories

– Concrete entities – Abstract concepts– Proper nouns

“A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”

Copyright © 2013 Access Innovations, Inc.

Page 57: Taxonomy Fundamentals - SLA 2014

Concrete Entities

Things and their physical parts primates

head buildings

floors islands

Copyright © 2013 Access Innovations, Inc.

Page 58: Taxonomy Fundamentals - SLA 2014

Concrete Entities as Terms

• Things and their physical parts– Birds

• Feathers

• Buildings• Floors

• Materials– Cement – Wood – Lead

– Cards and Chips

Copyright © 2013 Access Innovations, Inc.

Page 59: Taxonomy Fundamentals - SLA 2014

Concrete Entities

Materials cement wood lead cars refrigerators

Copyright © 2013 Access Innovations, Inc.

Page 60: Taxonomy Fundamentals - SLA 2014

Abstract Concepts

Actions and events evolution respiration skating management wars ceremonies

Copyright © 2013 Access Innovations, Inc.

Page 61: Taxonomy Fundamentals - SLA 2014

Abstract Concepts

Abstract entities, properties of things, materials and actions law theory strength efficiency lead (management)

Copyright © 2013 Access Innovations, Inc.

Page 62: Taxonomy Fundamentals - SLA 2014

Abstract Concepts

Disciplines and sciences physics meteorology mathematics psychology

Copyright © 2013 Access Innovations, Inc.

Page 63: Taxonomy Fundamentals - SLA 2014

Abstract Concepts

Units of measurement kilograms pounds meters miles

Copyright © 2013 Access Innovations, Inc.

Page 64: Taxonomy Fundamentals - SLA 2014

Abstract Concepts as Terms• Actions and events

– evolution, skating, management, ceremonies• Abstract entities

– law, theory• Properties of things, materials, and

actions– strength, efficiency

• Disciplines and sciences– physics, meteorology, mathematics

• Units of measurement– pounds, kilograms, miles, meters, nanoseconds

Copyright © 2013 Access Innovations, Inc.

Page 65: Taxonomy Fundamentals - SLA 2014

Proper Nouns*

Individual entities, or “classes of one”, expressed as proper nouns San Francisco United States of America Lake Michigan

* Proper names – of persons – are not included

Copyright © 2013 Access Innovations, Inc.

Page 66: Taxonomy Fundamentals - SLA 2014

Proper Nouns as Terms

Individual entities – “classes of one” – expressed as proper nouns San Francisco, Lake Michigan

Thesaurus standards exclude proper names, persons, and trade names authority files.

Taxonomies include them as final nodes.

Copyright © 2013 Access Innovations, Inc.

Page 67: Taxonomy Fundamentals - SLA 2014

Most Terms Are Nouns

Nouns or simple noun phrases Adj + Noun – Art history (ANSI/NISO standard)

Noun + Prep + Noun – History of art (ISO standard) Exceptions – Burden of proof, Coats of arms,

Prisoners of war, Birds of prey, etc.

Copyright © 2013 Access Innovations, Inc.

Page 68: Taxonomy Fundamentals - SLA 2014

About “and”

Avoid “and” in terms – not a single concept

Instead of: Children and television

Factor and postcoordinate

USE Media influence + Television + Children“And” is not in the standard

In real life—need for granularity may dictate your choice

Copyright © 2013 Access Innovations, Inc.

Page 69: Taxonomy Fundamentals - SLA 2014

Compound Terms – Nope!

“Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)

“Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)

Term phrases are okay (bigrams) Adjective Noun American history

Two concepts combined are not Aromatherapy for bloating

Copyright © 2013 Access Innovations, Inc.

Page 70: Taxonomy Fundamentals - SLA 2014

Organize Terms – Roughly

Sort terms into several major categories – logical groups of similar concepts as Top Terms Identify core areas and peripheral topics 10 – 20 to start Consider moving proper names to authority files

Result: loose collection of terms under several main headings Rough and tentative – see how it fits as you go Initial gap analysis Add / modify / delete as needed

Copyright © 2013 Access Innovations, Inc.

Page 71: Taxonomy Fundamentals - SLA 2014

Term Relationships

Page 72: Taxonomy Fundamentals - SLA 2014

How Do Terms Relate?

Hierarchical relationships-- Parents and their

children Equivalence relationships

-- Aliases Associative relationships

-- Cousins

TAXONOMY

THESAURUS

Copyright © 2013 Access Innovations, Inc.

Page 73: Taxonomy Fundamentals - SLA 2014

Hierarchical Relationships

Broader Term (BT) represents the class, whole, or genus

Narrower Term (BT) is a member, part, or species Generic relationship Whole-part relationship Instance relationship

NT inherit all the BT characteristics BTs/NTs have a reciprocal relationship

Copyright © 2013 Access Innovations, Inc.

Page 74: Taxonomy Fundamentals - SLA 2014

Hierarchical Relationships

Class as a whole superordination broader term (BT) sometimes top term (TT)

Members or parts of the class subordination narrower term (NT)

Reciprocal

Copyright © 2013 Access Innovations, Inc.

Page 75: Taxonomy Fundamentals - SLA 2014

Hierarchical Relationships

BT/NT based on being part of same class Same fundamental category

entities activities agents properties

Copyright © 2013 Access Innovations, Inc.

Page 76: Taxonomy Fundamentals - SLA 2014

Hierarchical Relationships

Museums Archaeological museum type of entity NT Ethnological museum type of entity NT Curators agents RT Museum techniques action RT Scientific museum type of entity NT

Copyright © 2013 Access Innovations, Inc.

Page 77: Taxonomy Fundamentals - SLA 2014

Hierarchy – Whole-Part Relationships

Four general types 1. Body systems and organs

Ear Middle ear

2. Geographical locations Bernalillo County Albuquerque

3. Fields of study Geology Physical geology

4. Hierarchical social structures Ontario Manitoulin District

Copyright © 2013 Access Innovations, Inc.

Page 78: Taxonomy Fundamentals - SLA 2014

Hierarchy – Instance Relationships

General category (common noun) as BT,

with individual example (proper noun) as Narrower Term Instance (NTI)

Seas French cathedralsBaltic Sea Chartres Cathedral

Caspian Sea Rheims Cathedral

Mediterranean Sea Rouen Cathedral

Essentially identical to “final node” in taxonomies

Copyright © 2013 Access Innovations, Inc.

Page 79: Taxonomy Fundamentals - SLA 2014

Hierarchical Typesof Display

Systematic Alphabetic other, but less common views

Copyright © 2013 Access Innovations, Inc.

Page 80: Taxonomy Fundamentals - SLA 2014

80

DTIC

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Page 81: Taxonomy Fundamentals - SLA 2014

Polyhierarchical Relationship

• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)

• Part of ISO standards, new to ANSI/NISO

Nurses Health administrators Nurse administrators Nurse administrators

Finance Careers Accounting Accounting

Copyright © 2013 Access Innovations, Inc.

Page 82: Taxonomy Fundamentals - SLA 2014

PolyhierarchicalRelationships

Great for the web click environment Terms occur in multiple categories Can be generic as well as hierarchical

Engineering PhysicsNT Nanotechnology NT Nanotechnology

NanotechnologyBT EngineeringBT Physics

Copyright © 2013 Access Innovations, Inc.

Page 83: Taxonomy Fundamentals - SLA 2014

83

DTIC

Alpha

Copyright © 2013 Access Innovations, Inc.

Page 84: Taxonomy Fundamentals - SLA 2014

Pests

Generic Relationship Tests

Squirrels

Rodents

ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents

Copyright © 2013 Access Innovations, Inc.

Page 85: Taxonomy Fundamentals - SLA 2014

Generic Relationship Tests

• Both terms in same fundamental category• “All-and-some” test

SOME ALL

SOME NOT ALL

Rodents

Squirrels

Pests

SquirrelsConsider concepts of marketing and advertising

Copyright © 2013 Access Innovations, Inc.

Page 86: Taxonomy Fundamentals - SLA 2014

Generic Relationships

“Identifies the link between a class or category and its members or species.”

Easy in biology Rodents

NT Squirrels All and some rule

Copyright © 2013 Access Innovations, Inc.

Page 87: Taxonomy Fundamentals - SLA 2014

All and Some Rule

Rodents NT Squirrels RT Pests

Q. Is this an example of polyhierarchy? Q. Do you need to make RT relationships

for “Pests” to all of the NTs under “Rodents”?

Copyright © 2013 Access Innovations, Inc.

Page 88: Taxonomy Fundamentals - SLA 2014

Instance Relationships Seas ISO

NT Baltic Sea NT Caspian Sea NT Mediterranean Sea

French Cathedrals NISO / ANSI NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals

Copyright © 2013 Access Innovations, Inc.

Page 89: Taxonomy Fundamentals - SLA 2014

Instance Relationships French Cathedrals NISO / ANSI

NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals

French Gothic Cathedral NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral BT Gothic cathedrals

Q. Why/how do these differ?Copyright © 2013 Access Innovations, Inc.

Page 90: Taxonomy Fundamentals - SLA 2014

90

CABI Pages

Copyright © 2013 Access Innovations, Inc.

Page 91: Taxonomy Fundamentals - SLA 2014

Instance Relationships

“…a general category of things and events expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which is represented by a proper name.”

A way of adding the proper names and items from the Authority files to the thesaurus

Copyright © 2013 Access Innovations, Inc.

Page 92: Taxonomy Fundamentals - SLA 2014

Questions before moving on to Associative Relationships?

Page 93: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Related Terms (RTs) – cousins “…terms related conceptually, but not

hierarchically, and are not part of an equivalence set” (i.e. not synonyms)

Both terms are valid thesaurus terms for indexing and have reciprocal relationship

Expands user’s awareness and reflects thesaurus coverage of unanticipated areas

Standards describe specific types

Copyright © 2013 Access Innovations, Inc.

Page 94: Taxonomy Fundamentals - SLA 2014

Associated Relationships

Related terms

Physicians Medicine

(“Reciprocal posting” done automatically is highly desirable.)

Copyright © 2013 Access Innovations, Inc.

Page 95: Taxonomy Fundamentals - SLA 2014

Associative Relationships Sibling relationships Examples:

Brother : Sister Desk : Chair

Easier to create within well defined facets (e.g. AAT)

Usual step in building process Can be identified automatically

Copyright © 2013 Access Innovations, Inc.

Page 96: Taxonomy Fundamentals - SLA 2014

Associative Relationships

RT relationships Braking systems

RT Trains RT Bicycle RT Motor vehicle

Office furniture RT Office buildings RT Ergonomics

Copyright © 2013 Access Innovations, Inc.

Page 97: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Field of study and objects studied Seismology

RT Earthquakes Meteorology

RT Weather patterns

Copyright © 2013 Access Innovations, Inc.

Page 98: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Operation or process and the agent or instrument Hairdressing

RT Hair dryers Word processing

RT Typing skills

Copyright © 2013 Access Innovations, Inc.

Page 99: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Occupation and person in occupation Social work

RT Social workers Information science

RT Special librarians

Copyright © 2013 Access Innovations, Inc.

Page 100: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Action and the product of the action Publishing

RT Music scores Landscaping

RT Lawn mowers RT Irrigation systems

Copyright © 2013 Access Innovations, Inc.

Page 101: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Action and its patient Teaching

RT Students Conducting

RT Musicians

Copyright © 2013 Access Innovations, Inc.

Page 102: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Concepts related to their properties Women

RT Femininity Automobiles

RT Automotive safety

Copyright © 2013 Access Innovations, Inc.

Page 103: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Concepts related to their origins Water

RT Water wells Carpet

RT Thread

Copyright © 2013 Access Innovations, Inc.

Page 104: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Concepts linked by causal dependence Injuries

RT Accidents Cultural stress

RT Culture shock

Copyright © 2013 Access Innovations, Inc.

Page 105: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Action and counter action Pests

RT Pesticides Log on

RT Log off

Copyright © 2013 Access Innovations, Inc.

Page 106: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Raw material and its product Hides

RT Leather Clothing

RT Fabric

Copyright © 2013 Access Innovations, Inc.

Page 107: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Action and associated property Precision instrument

RT Accuracy Production processes

RT Quality control

Copyright © 2013 Access Innovations, Inc.

Page 108: Taxonomy Fundamentals - SLA 2014

Associative Relationships

Concept and its opposite Single People

RT Married people Height

RT Depth RT Weight

If not hierarchical, probably associative

Copyright © 2013 Access Innovations, Inc.

Page 109: Taxonomy Fundamentals - SLA 2014

Questions before moving on to Equivalence Relationships?

Page 110: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Refer to the same concept (Use for)

Prefix for non-preferred terms (Use)

Prefix for preferred terms Automobiles

used for Cars Cars

use Automobiles

Copyright © 2013 Access Innovations, Inc.

Page 111: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Use

Use forPhysicians

Doctors

Copyright © 2013 Access Innovations, Inc.

Page 112: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships Synonyms

popular and scientific spiders - arachnida

scientific and trade names Motrin (TM) - ibuprofen

standard names and slang hi fi - high fidelity

different linguistic origin home care - domicillary care

Copyright © 2013 Access Innovations, Inc.

Page 113: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Synonyms cont’d different cultures

aerials - antenna trunk - boot hire - rent

emerging concepts telecommuting - distance working

outdated refrigerators - iceboxes

Copyright © 2013 Access Innovations, Inc.

Page 114: Taxonomy Fundamentals - SLA 2014

A “Term” Synonym Ring

Term

Node

Subject headingCategory

Descriptor

Copyright © 2013 Access Innovations, Inc.

Page 115: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Lexical variants variant spellings

Muslim - Moslem center - centre

direct and indirect forms electric power plants power plants, electric

abbreviations ECG - electrocardiograph

Copyright © 2013 Access Innovations, Inc.

Page 116: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Quasi synonyms urban areas - cities gifted people - geniuses

Antonyms height - depth literacy - illiteracy

Copyright © 2013 Access Innovations, Inc.

Page 117: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Up posting (generic posting) useful for web interfaces NT equivalent to their BT

not sub species of BT

Copyright © 2013 Access Innovations, Inc.

Page 118: Taxonomy Fundamentals - SLA 2014

Equivalence RelationshipsPsychInfo Rotated

Copyright © 2013 Access Innovations, Inc.

Page 119: Taxonomy Fundamentals - SLA 2014

Equivalence Relationships

Factored terms express terms in their combinations

Milk hygiene use milk and hygiene

Copyright © 2013 Access Innovations, Inc.

Page 120: Taxonomy Fundamentals - SLA 2014

Equivalence Relationship

• Preferred Term – Thesaurus term and valid for indexing– Thesaurus notation: USE

• Non-Preferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPT

Spiders Plant pathology UF Arachnids USE Phytopathology

Copyright © 2013 Access Innovations, Inc.

Page 121: Taxonomy Fundamentals - SLA 2014

Equivalence – When to Use

Synonyms, slang, quasi-synonyms Scientific and trade names

Ibubrofen UF Motrin™ Lexical variants

Fiber optics UF Fibre optics Mouse UF Mice

Upward posting of narrow concepts not specified in taxonomy or thesaurus Social class UF Elite, Middle class, Working class

Get equivalent terms from search logs, brainstorming…

Copyright © 2013 Access Innovations, Inc.

Page 122: Taxonomy Fundamentals - SLA 2014

Scope Notes (SN)

Indicate meaning of the term in the context of this thesaurus, for this audience Stress – Metal, Psychological, Physiological

Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often

confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise

Copyright © 2013 Access Innovations, Inc.

Page 123: Taxonomy Fundamentals - SLA 2014

Scope Notes (SN)

Restrictions on meaning Range of topics covered Instructions to indexers Term histories Reciprocal scope notes

Copyright © 2013 Access Innovations, Inc.

Page 124: Taxonomy Fundamentals - SLA 2014

Questions before moving on to more thesaurus examples?

Page 125: Taxonomy Fundamentals - SLA 2014

Thesaurus - Examples Roget's 1852

synonyms COSATI - 1964

concept linking NASA AEC - ERDA - DOE - ESA

National Library of Medicine outline of a field Medical Subject Headings - MeSH

Copyright © 2013 Access Innovations, Inc.

Page 126: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 126

NASA

Alphabetic

Copyright © 2013 Access Innovations, Inc.

Page 127: Taxonomy Fundamentals - SLA 2014

127

NASA

Hierarchical

Copyright © 2013 Access Innovations, Inc.

Page 128: Taxonomy Fundamentals - SLA 2014

Thesaurus - Examples

INSPEC - multifaceted Thesaurus Classification system Free text terms Variant spellings

NICEM 27 Top Terms

Copyright © 2013 Access Innovations, Inc.

Page 129: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 129

INSPEC

Copyright © 2013 Access Innovations, Inc.

Page 130: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 130

INSPEC

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Page 131: Taxonomy Fundamentals - SLA 2014

Merged Vocabularies

Yahoo! Subject headings Authority files In a single list

Copyright © 2013 Access Innovations, Inc.

Page 132: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 132

Copyright © 2013 Access Innovations, Inc.

Page 133: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 133

Yahoo!

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Page 134: Taxonomy Fundamentals - SLA 2014

Merged Vocabularies - continued

Office.com Multiple broader terms Concept mapping

Copyright © 2013 Access Innovations, Inc.

Page 135: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 135Copyright © 2013 Access Innovations, Inc.

Page 136: Taxonomy Fundamentals - SLA 2014

Eurovoc Thesaurus

PagesCopyright © 2013 Access Innovations, Inc.

Page 137: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 137

Eurovoc Thesaurus Hierarchy

Copyright © 2013 Access Innovations, Inc.

Page 138: Taxonomy Fundamentals - SLA 2014

138

Eurovoc Terms

Copyright © 2013 Access Innovations, Inc.

Page 139: Taxonomy Fundamentals - SLA 2014

So far you’ve got… Hierarchy

– Broader and Narrower Terms• Polyhierarchies when needed

– Preferred/Non-Preferred Terms – Equivalence relationships

– Related Terms– Associative relationships

– Scope Notes– Complete term records

– Correct term format

Copyright © 2013 Access Innovations, Inc.

Page 140: Taxonomy Fundamentals - SLA 2014

So far you’ve got…

Hierarchical relationships-- Parents and their

children Equivalence relationships

-- Aliases Associative relationships

-- Cousins-- See Also’s

TAXONOMY

THESAURUS

Copyright © 2013 Access Innovations, Inc.

Page 141: Taxonomy Fundamentals - SLA 2014

So far you’ve got…

• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency

Copyright © 2013 Access Innovations, Inc.

Page 142: Taxonomy Fundamentals - SLA 2014

Pre and Post Coordination

Page 143: Taxonomy Fundamentals - SLA 2014

Pre and Post Coordinate Terms

Pre coordinates – two concepts Subject headings – Library of Congress

American history – Civil War Back of the book Put together in advance by the publisher

Post Coordinate Taxonomy terms Single concept Put together by the user / searcher

Copyright © 2013 Access Innovations, Inc.

Page 144: Taxonomy Fundamentals - SLA 2014

Pre-coordination

Card catalogs - printed indexes Links and roles defined Controlled vocabularies High input costs Precise recall - easier searching

Copyright © 2013 Access Innovations, Inc.

Page 145: Taxonomy Fundamentals - SLA 2014

Post-coordination

Starting with punch cards Machine readable Frequently natural language Currency and specificity Exhaustive coverage - loss of precision Low input costs False drops

Copyright © 2013 Access Innovations, Inc.

Page 146: Taxonomy Fundamentals - SLA 2014

Work first from the literature Establish literary warrant for terms Some one else do the clerical work Differentiate the lexicography work

From the Subject Matter expert work Let SMEs do the review and tailoring Expert review ensures the proper term use and

application Advisory Board…advisable!

Subject Matter Experts (SME)

Copyright © 2013 Access Innovations, Inc.

Page 147: Taxonomy Fundamentals - SLA 2014

Again, why do we index?

Improve precision define scope of terms

Improve recall different terms for same concept

Guide to a field of expertise Learning tool Richer expression

Copyright © 2013 Access Innovations, Inc.

Page 148: Taxonomy Fundamentals - SLA 2014

Uses?

Indexing …process by which subject terms or

classification symbols are assigned to concepts in documents

A thesaurus is also known as an indexing language

M.A.I.™ is an automated indexing system

Copyright © 2013 Access Innovations, Inc.

Page 149: Taxonomy Fundamentals - SLA 2014

What are We Controlling?

Page 150: Taxonomy Fundamentals - SLA 2014

What are We Controlling?

Synonyms different terms same concept

Polysemes or Homonyms same word different meanings lead or mercury

Copyright © 2013 Access Innovations, Inc.

Page 151: Taxonomy Fundamentals - SLA 2014

How? Meaning

delineation of scope of a term Term equivalence

linking of synonyms Disambiguation of homonyms

lead (metal) lead (element) lead (management)

Copyright © 2013 Access Innovations, Inc.

Page 152: Taxonomy Fundamentals - SLA 2014

Disambiguation

Bridge Structure

Bridge Dentistry

Bridge Game

Bridge ConceptCopyright © 2013 Access Innovations, Inc.

Page 153: Taxonomy Fundamentals - SLA 2014

Disambiguation

Restriction and clarification of meaning Cells

biological microsystems electrical equipment prison housing

Reading town in England communication process

Copyright © 2013 Access Innovations, Inc.

Page 154: Taxonomy Fundamentals - SLA 2014

Disambiguation

Bill Invoice

Bill Legislative

Bill Sport

Bill PersonCopyright © 2013 Access Innovations, Inc.

Page 155: Taxonomy Fundamentals - SLA 2014

Disambiguation: Pre-Coordinate vs.Post-Coordinate Forms

Cells (biology) Cells (electric) Cells (prison)

Reading (place) Reading (process)

Biological cells Electric cells

Copyright © 2013 Access Innovations, Inc.

Page 156: Taxonomy Fundamentals - SLA 2014

Precision Options

Language specificity Coordination Compound terms - level of

precoordination Homographs and scope notes Word distance indication

Copyright © 2013 Access Innovations, Inc.

Page 157: Taxonomy Fundamentals - SLA 2014

Precision Options

Structural relationships Links and roles Treatment and aspect codes Weighting

Copyright © 2013 Access Innovations, Inc.

Page 158: Taxonomy Fundamentals - SLA 2014

Maintenance of aControlled Vocabulary

Allow for new jargon to be added Any living field will have new terms Identifier field Candidate terms Consider multiple broader terms

Copyright © 2013 Access Innovations, Inc.

Page 159: Taxonomy Fundamentals - SLA 2014

Review, edit, test, edit, use, edit, and maintain, i.e., edit

Review Users Expert reviewers

Test Index 500+ documents

(more for variable writing style; fewer for strict style)

Monitor search log

Edit and maintain Add term Change existing term Change term status Delete term Add term relationship Delete term relationship Add/modify Scope Note Change overall structure

Consider automated / assisted indexing software

Copyright © 2013 Access Innovations, Inc.

Page 160: Taxonomy Fundamentals - SLA 2014

When Do You Add More Terms?

On demand When usage changes Stewardess – flight attendant

As the field evolves 8 changes to 64 colors

In Use Don’t freeze waiting for perfection

Copyright © 2013 Access Innovations, Inc.

Page 161: Taxonomy Fundamentals - SLA 2014

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Page 162: Taxonomy Fundamentals - SLA 2014

Classification Systems - Defined

Are used to put an object in a specific place. In the traditional classification system each item has a single spot to go.

Follow an outline of knowledge Used to shelve books in a library

Copyright © 2013 Access Innovations, Inc.

Page 163: Taxonomy Fundamentals - SLA 2014

Catalog Systems - Defined

Used to catalog the object to identify its contents

Based on perception Multiple terms are used to identify a

single object Not natural language Pre-coordinated - subheadings

Copyright © 2013 Access Innovations, Inc.

Page 164: Taxonomy Fundamentals - SLA 2014

Classification Systems - Examples

Classification of actual collections New York State Library - Dewey

810.01 Cutter - Universities 1800 - 1960’s

Z34 Lan

Thomas Jefferson - Library of Congress z34.18 la

Government Documents Numbers based on government structure

Copyright © 2013 Access Innovations, Inc.

Page 165: Taxonomy Fundamentals - SLA 2014

Catalog Systems - Examples

Library of Congress Subject Headings Sears Subject Headings

(used with Dewey)

Copyright © 2013 Access Innovations, Inc.

Page 166: Taxonomy Fundamentals - SLA 2014

King of Catalogers

Charles Ammi Cutter rules for alphabetical subject indexing

most specific heading put two topics under two headings use English if possible x ref antonyms careful with homographs

1895 ALA Subject Headings following Cutter

Copyright © 2013 Access Innovations, Inc.

Page 167: Taxonomy Fundamentals - SLA 2014

Politics in Libraries

In 1905 Dewey was president of ALA (American Library Association)

LC adopted DDC Threw out Cutter The two never spoke again.

Copyright © 2013 Access Innovations, Inc.

Page 168: Taxonomy Fundamentals - SLA 2014

Types of Headings Single word

Botany or Ethics Adjective noun

Capital punishment Noun - noun

Death penalty American Standard

Noun preposition noun Penalty of death International Standard

Noun conjunction noun Nurses and nursing

Copyright © 2013 Access Innovations, Inc.

Page 169: Taxonomy Fundamentals - SLA 2014

Cutter Guidelines File under the phrase “as it reads” Use the most significant words Reduce adjective nouns to noun

phrases Use singular rather than plural File compound words under the first

word No subheadings

Copyright © 2013 Access Innovations, Inc.

Page 170: Taxonomy Fundamentals - SLA 2014

Cross References

Cross reference synonyms main heading should be what the class uses use the common term use the unambiguous heading prefer the one which brings relations “…with a well defined network of cross

references the mob becomes an army.. “ C.A. Cutter

Copyright © 2013 Access Innovations, Inc.

Page 171: Taxonomy Fundamentals - SLA 2014

Library of Congress (LC) Subject Headings

1911 - List of Subject Headings extensive use of sub-headings invert phrases for main subject file under the noun not the adjective see references not cross filing place holder terms homographs defined parenthetically

Copyright © 2013 Access Innovations, Inc.

Page 172: Taxonomy Fundamentals - SLA 2014

Classification vs. Subject Headings

Classification single spot or placement browse physical list often a numbering system clear hierarchy no or few cross references Like Yahoo!

Copyright © 2013 Access Innovations, Inc.

Page 173: Taxonomy Fundamentals - SLA 2014

Classification vs. Subject Headings

Subject headings generic search hidden classification system related terms and cross references in heavy use usually the inverted form

cells, electric

Copyright © 2013 Access Innovations, Inc.

Page 174: Taxonomy Fundamentals - SLA 2014

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Page 175: Taxonomy Fundamentals - SLA 2014

Authority Systems - Defined

Frequently have cross references Widely available Frequently coded lists Brand names .. Lists of terms in the preferred format for

use.

Copyright © 2013 Access Innovations, Inc.

Page 176: Taxonomy Fundamentals - SLA 2014

Authority Files - Defined

People Places Things ……..NOT Concepts Methods Processes

Copyright © 2013 Access Innovations, Inc.

Page 177: Taxonomy Fundamentals - SLA 2014

Authority Files - Examples

ISO Country Name and Code International Standards Organization

ISO Language list NAICS (SIC)

Standard Industrial Classification Code (SIC) Replaced by

North American Industrial Classification System (NAICS)

Copyright © 2013 Access Innovations, Inc.

Page 178: Taxonomy Fundamentals - SLA 2014

Authority Lists - Format

Belgian Congo use Congo

Bill Gates use William F. Gates, III (computer scientist) see also

William Gates (basketball player)

Copyright © 2013 Access Innovations, Inc.

Page 179: Taxonomy Fundamentals - SLA 2014

Authority Lists - Need Style Sheets

Names AACR2

Anglo American Cataloging Rules AAP

American Association of Publishers Chicago Manual of Style Dun & Bradstreet Style Sheet

Copyright © 2013 Access Innovations, Inc.

Page 180: Taxonomy Fundamentals - SLA 2014

Vocabulary Control - Options

Classification systems

Authority files Controlled term

lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Page 181: Taxonomy Fundamentals - SLA 2014

Controlled Term Lists - Defined

State the preferred terms Provide allowed term entry Heavily cross referenced Not generally hierarchical Popular Easy to create

Copyright © 2013 Access Innovations, Inc.

Page 182: Taxonomy Fundamentals - SLA 2014

Controlled Term Lists - Examples

ABI/Inform Predicasts RDS - Responsive Data Services Back of book indexes Art and Architecture Thesaurus …....These are not FULL thesauri

Copyright © 2013 Access Innovations, Inc.

Page 183: Taxonomy Fundamentals - SLA 2014

Controlled Term List - Format

Cars use Automobiles

Personal Computer use Microcomputer

Copyright © 2013 Access Innovations, Inc.

Page 184: Taxonomy Fundamentals - SLA 2014

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Page 185: Taxonomy Fundamentals - SLA 2014

Uncontrolled List - Define

Add terms as they occur No cross reference Simple flat structure

Copyright © 2013 Access Innovations, Inc.

Page 186: Taxonomy Fundamentals - SLA 2014

Uncontrolled List - Example

List of names Grocery list Candidate term list

Copyright © 2013 Access Innovations, Inc.

Page 187: Taxonomy Fundamentals - SLA 2014

Uncontrolled List - Format

Laundry Trim bushes Cat box needs cleaning Tommy’s birthday (bake cake) Iron Water plants ….other natural language lists

Copyright © 2013 Access Innovations, Inc.

Page 188: Taxonomy Fundamentals - SLA 2014

Trying to Impose Control...

Do laundry Trim bushes Clean cat box Bake birthday cake Iron shirts Water plants

Copyright © 2013 Access Innovations, Inc.

Page 189: Taxonomy Fundamentals - SLA 2014

Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.

Game supplies: 1 Deck of Orange Question and Challenge Cards 1 Deck of Green Answer Cards

Game setup: Shuffle the deck of Green Answer cards, Deal the entire deck to the players. Shuffle the deck of Orange Question and Challenge cards Place them facedown in a pile in the middle of the table so that all players can

reach the pile.

Reinforce what you just heard! Have fun!

TAXONOMATCH

Copyright © 2013 Access Innovations, Inc.

Page 190: Taxonomy Fundamentals - SLA 2014

1. Play moves to the left of the dealer

2. Draw a card from the top of the Orange cards. Read it aloud to all of the players.

3. The player who read the card says out loud what they think the answer is.

4. Each player looks at the Green Answer cards in their hand.

1. If they have the correct answer to the Question or Challenge, they show their card to everyone at the table.

2. If everyone agrees that the answer is correct, the player holding the correct answer card gives it to the player who read the Question or Challenge card.

5. The player places their associated pair of cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.

6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above.

7. Discussion among the players to arrive at the correct answer is permissible and encouraged!

8. If players do not arrive at a consensus regarding the correct answer, the Orange Question and Challenge card may be returned to the bottom of the pile, and play passes to the person to the left of the player who drew the previous card.

9. When all of the Orange Question and Challenge cards have been drawn, read aloud, and matched with their Green Answer cards, the game ends.

10. If there are any Orange Question and Challenge cards remaining to which players cannot agree on an answer, players may consult their notes or ask the session speaker.

Copyright © 2013 Access Innovations, Inc.

TAXONOMATCH RULES

Page 191: Taxonomy Fundamentals - SLA 2014

Term Forms

Page 192: Taxonomy Fundamentals - SLA 2014

Term Forms

Nouns Prepositional forms Adjectives Adverbs Initial Articles Singular and plural

Copyright © 2013 Access Innovations, Inc.

Page 193: Taxonomy Fundamentals - SLA 2014

Term Forms - Noun and Noun Phrases

Nouns and noun phrases print media carpet

Copyright © 2013 Access Innovations, Inc.

Page 194: Taxonomy Fundamentals - SLA 2014

Term Forms - Prepositional Forms

Prepositional forms are seldom used okay in International Standard ISO

Philosophy of Education ANSI / NISO

Educational philosophy

Copyright © 2013 Access Innovations, Inc.

Page 195: Taxonomy Fundamentals - SLA 2014

Term Forms – Adjectives

Adjectives not used in isolation may be used for coordination Miniature paintings

USE PAINTINGS AND MINIATURE Portable typewriters

USE TYPEWRITERS AND PORTABLE

Copyright © 2013 Access Innovations, Inc.

Page 196: Taxonomy Fundamentals - SLA 2014

Term Forms – Adjectives

Adjectives may convert to noun forms

MINIATURE SIZE PORTABLE DEVICES TRIANGULAR SHAPE

Copyright © 2013 Access Innovations, Inc.

Page 197: Taxonomy Fundamentals - SLA 2014

Term Forms - Adverbs

Adverbs not used unless part of a compound term VERY LARGE ARRAY RADIO TELESCOPE

Used for VLA

Copyright © 2013 Access Innovations, Inc.

Page 198: Taxonomy Fundamentals - SLA 2014

Term Forms - Verbs Verbs

no infinitive or participle forms for actions that can be expressed as nouns and retain

clear meaning, use noun form or gerunds

Examples Speaking (not Speech) Walking (not Ambulation) Communication (not Communicate) Administration (not Administer)

Copyright © 2013 Access Innovations, Inc.

Page 199: Taxonomy Fundamentals - SLA 2014

Term Forms - Initial Articles

AVOID THEM Example

Theater not The theater State (political entity) not The state

Use if part of a proper name Le Mans El Salvador

Copyright © 2013 Access Innovations, Inc.

Page 200: Taxonomy Fundamentals - SLA 2014

Term Forms - Singular and Plural

Concrete entities count nouns are plurals - how many?

planets children

non count nouns - how much? nickel snow lace

Copyright © 2013 Access Innovations, Inc.

Page 201: Taxonomy Fundamentals - SLA 2014

Term Forms - Singular and Plural

fully formed organism eyes mouth

objects are singular lamp

classes of things fruits

Copyright © 2013 Access Innovations, Inc.

Page 202: Taxonomy Fundamentals - SLA 2014

Term Forms - Singular and Plural

Abstract concepts Show in the singular form

authority socialism packaging biochemistry

Copyright © 2013 Access Innovations, Inc.

Page 203: Taxonomy Fundamentals - SLA 2014

Term Forms - Singular and Plural

Unique entities Show in the singular

Big Ben Grand Canyon

Copyright © 2013 Access Innovations, Inc.

Page 204: Taxonomy Fundamentals - SLA 2014

Other Formatting

Spelling Punctuation Capitalization Abbreviations ...

Copyright © 2013 Access Innovations, Inc.

Page 205: Taxonomy Fundamentals - SLA 2014

Spelling

Use what the users will use and cross post for multilingual fiber - fibre center - centre organization - organisation hemo - haemo Pediatrics - paediatrics

Copyright © 2013 Access Innovations, Inc.

Page 206: Taxonomy Fundamentals - SLA 2014

Punctuation

Parentheses only for qualifiers Apostrophes are retained Hyphens - avoid

avoid avoid

avoid avoid

Copyright © 2013 Access Innovations, Inc.

Page 207: Taxonomy Fundamentals - SLA 2014

Capitalization

NISO = initial only AACR2 format

Practice is to follow a manual of style Chicago Manual of Style Associated Press American Association of Publishers

Copyright © 2013 Access Innovations, Inc.

Page 208: Taxonomy Fundamentals - SLA 2014

Abbreviations

Use only when well known Always include the full meaning LASER

Scope Note Light Amplification by Stimulated Emission of Radiation

WHO World Health Organization

Copyright © 2013 Access Innovations, Inc.

Page 209: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 210: Taxonomy Fundamentals - SLA 2014

Cross References

See - S See also - SA Not related or associated Not opposite Just helpful guides

Copyright © 2013 Access Innovations, Inc.

Page 211: Taxonomy Fundamentals - SLA 2014

Synthesis in Classification

S.R.Ranganathan 1933 Colon Classification analytico-syntactic classification analyze subject into component parts

(facets) arrange facets into schedules combine facets to express subject

complexity

Copyright © 2013 Access Innovations, Inc.

Page 212: Taxonomy Fundamentals - SLA 2014

Ranganathan

A General Properties Ab Configuration

Ac Tubular B Materials Bc Metals

Bcc ferrous Bcd steels

Bcf Chromium steels Bcfi Chromium-nickel steels

K Modes of failure Kg Creep

Kgb Creep rupture L Stresses and loads

Lb Tensile

Copyright © 2013 Access Innovations, Inc.

Page 213: Taxonomy Fundamentals - SLA 2014

Ranganathan

Tubular Chromium Nickel steel creep rupture Tensile strength

Ac Bcfi Kgb Bb Chain indexing Tubular

Chromium Nickel steel creep rupture

Tensile strength

Copyright © 2013 Access Innovations, Inc.

Page 214: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 215: Taxonomy Fundamentals - SLA 2014

Facets

Additional ways to add meaning Divide terms into categories using a

single characteristic Limited number of categories

Copyright © 2013 Access Innovations, Inc.

Page 216: Taxonomy Fundamentals - SLA 2014

Facets and Roles

PRECIS - Austin 1984 order of terms post-coordinate indexing system role of the term is important tomato

living plant? marketable product?

Facet role indicator organism end product

Copyright © 2013 Access Innovations, Inc.

Page 217: Taxonomy Fundamentals - SLA 2014

Many Faceted Vocabularies

UMLS Semantic Network Unified Medical Language System - 49

BLISS Classification Association British Library Information Science System

Dewey Decimal Classification System Universal Decimal Classification

System Art and Architecture Thesaurus

Copyright © 2013 Access Innovations, Inc.

Page 218: Taxonomy Fundamentals - SLA 2014

MeSH and Tree Pages

Copyright © 2013 Access Innovations, Inc.

Page 219: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 219

MeSH Alpha

Copyright © 2013 Access Innovations, Inc.

Page 220: Taxonomy Fundamentals - SLA 2014

Order of Facets

Post-coordinate Means before order Notation becomes important Breaks down for large classes

(more than 5,000 terms)

Copyright © 2013 Access Innovations, Inc.

Page 221: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 222: Taxonomy Fundamentals - SLA 2014

Notation Options

Expressive Ordinal Synthetic Enumeration Many style options

Copyright © 2013 Access Innovations, Inc.

Page 223: Taxonomy Fundamentals - SLA 2014

Expressive Notation

83 Hazards 831 Fire 831.5Fire fighting 831.53 Fire fighting equipment 831.532 Fire extinguishers 831.532.5 Carbon dioxide fire extinguishers

832 Explosions

Copyright © 2013 Access Innovations, Inc.

Page 224: Taxonomy Fundamentals - SLA 2014

Ordinal and Semi-ordinal Notation

HK Hazards HL Fire HM Fire fighting HN Fire fighting equipment HNB Fire extinguishers HNE Carbon dioxide fire extinguishers

HO Explosions

Indention is the sole indication of hierarchy

Copyright © 2013 Access Innovations, Inc.

Page 225: Taxonomy Fundamentals - SLA 2014

Synthetic and Enumeration Notation

Need to allow the classification system to grow

Synthetic example P Architecture PAT Architectural information PAT.M Architectural information services

Copyright © 2013 Access Innovations, Inc.

Page 226: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 226

Notation Examples - AAT Facets

Copyright © 2013 Access Innovations, Inc.

Page 227: Taxonomy Fundamentals - SLA 2014

Systematic Display

Paints (By composition)

Oil paints Water paints Cement paints

(By use) Primers Undercoats Top coats

Copyright © 2013 Access Innovations, Inc.

Page 228: Taxonomy Fundamentals - SLA 2014

Copyright © 2001 Access Innovations, Inc. 228

AAT Pages

Notice faceted indentions

Copyright © 2013 Access Innovations, Inc.

Page 229: Taxonomy Fundamentals - SLA 2014

229

AAT Term

Copyright © 2013 Access Innovations, Inc.

Page 230: Taxonomy Fundamentals - SLA 2014

Alphabetical Display

Paints NT

Cement paints Oil paints Primers Top coats Undercoats Water paints

Copyright © 2013 Access Innovations, Inc.

Page 231: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 232: Taxonomy Fundamentals - SLA 2014

Roles

ERIC Thesaurus - role indicators Adjectives - bibliographic terms Input or raw material Output or product Undesirables Indicated uses Materials “In which” Affects Primary topics of discussion Passive recipients, possessors, location Means used

Copyright © 2013 Access Innovations, Inc.

Page 233: Taxonomy Fundamentals - SLA 2014

Roles

CAS - Super roles Analytical study Biological study Formation, nonpreparative Occurrence Preparation Process Uses

CAS Specific roles Miscellaneous Properties Reactant

Copyright © 2013 Access Innovations, Inc.

Page 234: Taxonomy Fundamentals - SLA 2014

Subheadings as Roles

MeSH Therapeutic use Drug treatment (disease) Adverse effect (drug treatment) Diagnosis

Copyright © 2013 Access Innovations, Inc.

Page 235: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 236: Taxonomy Fundamentals - SLA 2014

Treatment and Aspect Codes

Apply codes or types at article level Theoretical New development Experimental Practical

Copyright © 2013 Access Innovations, Inc.

Page 237: Taxonomy Fundamentals - SLA 2014

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Page 238: Taxonomy Fundamentals - SLA 2014

Cranfield Project - Cleverdon 1966

Concepts in the main theme 9/10 Major subsidiary theme 7/8 Minor subsidiary theme 5/6

Copyright © 2013 Access Innovations, Inc.

Page 239: Taxonomy Fundamentals - SLA 2014

Internet Engines

Complex weighting of terms Use term frequency Rank output wholly automatic Output based on input term weights Can also use “well formed” data -

like a thesaurus hierarchy field formatted data XML files

Copyright © 2013 Access Innovations, Inc.

Page 240: Taxonomy Fundamentals - SLA 2014

Automatic and Semi-automatic Classification?

Data Harmony® M.A.I.™ Semio Autonomy - Muscat Net Owl - Names n-Stein Quiver Smart Logic

Copyright © 2013 Access Innovations, Inc.

Page 241: Taxonomy Fundamentals - SLA 2014

Machine Aided Indexing Goals Improve

Indexing efficiency Indexing consistency Reduce editorial drift Depth of Indexing

Reduce Over and under indexing Term over use and under use

Copyright © 2013 Access Innovations, Inc.

Page 242: Taxonomy Fundamentals - SLA 2014

Machine Aided Indexing Goals

Improve productivity

Indexer Information worker

Disambiguate terms Increase clarity

Copyright © 2013 Access Innovations, Inc.

Page 243: Taxonomy Fundamentals - SLA 2014

Machine Aided Indexing - Intellectual Components

Word List or Thesaurus

Knowledge base Rules based

Natural Language (Semantic)

Editorial evaluation

Copyright © 2013 Access Innovations, Inc.

Page 244: Taxonomy Fundamentals - SLA 2014

Example:M.A.I.™ Software Components

Rule Builder

Concept Extractor

Statistics Collector

Copyright © 2013 Access Innovations, Inc.

Page 245: Taxonomy Fundamentals - SLA 2014

DATA HARMONY DISCOVERY

TOUR

Copyright © 2013 Access Innovations, Inc.

Page 246: Taxonomy Fundamentals - SLA 2014

Taxonomies in Search

Copyright © 2013 Access Innovations, Inc.

Page 247: Taxonomy Fundamentals - SLA 2014

Do the Data FIRST

What do you have? What does it need? How would you LIKE to access it? Look at the data BEFORE you create the

specifications DTD built without data is not going to work

Then choose the system that will support your data

Copyright © 2013 Access Innovations, Inc.

Page 248: Taxonomy Fundamentals - SLA 2014

My Main Frustration

1. Select hardware

2. Select software

3. Design system

4. Try to load the data

5. Add the taxonomy, if at all That’s BACKWARDS

Copyright © 2013 Access Innovations, Inc.

Page 249: Taxonomy Fundamentals - SLA 2014

Why Does Search Fail? Most large organizations have 5 different

search 7 All disappointing and sitting on the shelf

Inconsistent results Unclear path to results Lack of single unified clear consistent

vocabulary Not tied to data governance

Taxonomy Other metadata

Copyright © 2013 Access Innovations, Inc.

Page 250: Taxonomy Fundamentals - SLA 2014

SEARCH

How search works Measuring accuracy in search

Precision Recall Relevance

Search theoretical basis Bayes, Boole, and the rest of the guys

The taxonomy effect

Copyright © 2013 Access Innovations, Inc.

Page 251: Taxonomy Fundamentals - SLA 2014

Parts of Search

Search software Inverted Index Search algorithms

Presentation layer Search box Autocompletion Related and narrower terms Hierarchical display

Copyright © 2013 Access Innovations, Inc.

Page 252: Taxonomy Fundamentals - SLA 2014

Hierarchical Display

InvertedFile

Index

Searchable Index

TaxonomyThesaurus

Inverted Files and Boolean are Basic to ALL Search

Copyright © 2013 Access Innovations, Inc.

Note: not available in all systems!

Page 253: Taxonomy Fundamentals - SLA 2014

“Outline of Presentation”1 Define key terminology2 Thesaurus tools

Features Functions

3 Costs Thesaurus construction Thesaurus tools

4 Why & when?

Creating an Inverted File Index

Sample DOCUMENT

Copyright © 2013 Access Innovations, Inc.

Page 254: Taxonomy Fundamentals - SLA 2014

Simple Inverted File Index ofthe Terms from the “Outline”

&1234constructioncostsdefinefeaturesfunctions

key ofoutlinepresentationterminologythesaurustoolswhenwhy

Copyright © 2013 Access Innovations, Inc.

Page 255: Taxonomy Fundamentals - SLA 2014

& - Stop1 - Stop2 - Stop3 - Stop4 - Stopconstruction - L7, P2, SH costs - L6, P1, Hdefine - L2, P1, Hfeatures - L4, P1, SHfunctions - L5, P1, SH

key - L2, P2, Hof - Stopoutline - L1, P1, Tpresentation - L1, P3, Tterminology - L2, P3, Hthesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SHtools - (1) - L3, P2, H (2) - L8, P2, SHwhen - L9, P3, Hwhy - L9, P1, H

Complex Inverted File Index -Placement, Location added

Copyright © 2013 Access Innovations, Inc.

Page 256: Taxonomy Fundamentals - SLA 2014

Search Presentation Layer

Automatic completionAnd type ahead

from Thesaurus

Copyright © 2013 Access Innovations, Inc.

Page 257: Taxonomy Fundamentals - SLA 2014

Search Presentation Layer

Related

Narrower

Copyright © 2013 Access Innovations, Inc.

Page 258: Taxonomy Fundamentals - SLA 2014

Search Presentation Layer

The Hierarchical view of the thesaurus is also a browse able view of the content.

The numbers include the number of hits 1. For the term 2. For the branch

Copyright © 2013 Access Innovations, Inc.

Page 259: Taxonomy Fundamentals - SLA 2014

Many parts Search software – of course Computer network Parsing of text – the “inverted file” Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems

How Does Search Work?

Copyright © 2013 Access Innovations, Inc.

Page 260: Taxonomy Fundamentals - SLA 2014

Technical Parts of Search

Search technology Ranking algorithms Query language Federators Cache

Inverted index – as discussed above Other enhancements Presentation Layer

Copyright © 2013 Access Innovations, Inc.

Page 261: Taxonomy Fundamentals - SLA 2014

Access Innovations – Complex Farm With Perfect Search

SourceData

Query

Search Harmony

Presentation Layer

Repository XIS (cache)

Cleanup, etc.

Federators

Query Servers

Index Builders

DeployHub

Cache Builders

Copyright © 2013 Access Innovations, Inc.

Page 262: Taxonomy Fundamentals - SLA 2014

QU

ERY API

CUSTOMCONNECTOR

EMAILCONNECTOR

Core Architectural Components

Pipeline

SEARCHSERVER

QU

ERYPR

OC

ESSOR

Query

Results

VerticalApplications

Portals

CustomFront-Ends

MobileDevicesContent

Push

DO

CU

MEN

TPR

OC

ESSOR

WebContent

Files,Documents

Databases

CustomApplications

CO

NTEN

T API

MANAGEMENT API

Index DBDATABASE

CONNECTOR

FILETRAVERSER

WEBCRAWLER

Pipeline

Email, Groupware

Administrator’sDashboard

FILTERSERVER

Agent DB

Alerts

Data Harmony Governance API

MA

Istro

Search harmony

FAST Search Example

Copyright © 2013 Access Innovations, Inc.

Page 263: Taxonomy Fundamentals - SLA 2014

Measuring Accuracy in Search

Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules

263Copyright © 2013 Access Innovations, Inc.

Page 264: Taxonomy Fundamentals - SLA 2014

Relevance

How well a set of returned documents answers the information need

“Accuracy” Related to objective of search

Different user communities Information resources

Tension of user needs and context available A confidence “guesstimate”

Copyright © 2013 Access Innovations, Inc.

Page 265: Taxonomy Fundamentals - SLA 2014

Recall = Number of relevant items retrieved

Number of relevant items in the collection

Precision = Number of relevant items retrieved Number of items retrieved

Relevance = Germane (Precision) Pertinent (Recall)

The Formulas

Copyright © 2013 Access Innovations, Inc.

Page 266: Taxonomy Fundamentals - SLA 2014

Measuring Relevance

Concepts Context Age of documents Completeness (recall) Quality Statistically determined ? Nope, it is subjective

Someone has to determine the rightness of the item A confidence factor = canard!

Copyright © 2013 Access Innovations, Inc.

Page 267: Taxonomy Fundamentals - SLA 2014

Kinds of Search Bayesian –

FAST Lucene Autonomy / Verity

Boolean Dialog Endeca Perfect Search

Ranking algorithms Google

267Copyright © 2013 Access Innovations, Inc.

Page 268: Taxonomy Fundamentals - SLA 2014

George Booleand Boolean Algebra

George Boole Mathematician 1815-1864

Boolean algebra An algebraic system of logic AND, OR, NOT, ANDNOT, Dialog, BRS, Stairs

268Copyright © 2013 Access Innovations, Inc.

Page 269: Taxonomy Fundamentals - SLA 2014

Boolean Representation Venn diagram showing

the intersection of sets A AND B (in violet),

The union of sets A OR B (all the colored regions),

And set A XOR B (all the colored regions except the violet).

The "universe" is represented by the rectangular frame.

269Copyright © 2013 Access Innovations, Inc.

Page 270: Taxonomy Fundamentals - SLA 2014

Bayes and Bayes’ Theorem Thomas Bayes

Mathematician 1702 - 1761

Bayesian theorem Uses probability inductively Established a mathematical basis for probability inference

WHAT? A means of calculating,

from the number of times an event has not occurred, the probability that it will occur in future trials

270Copyright © 2013 Access Innovations, Inc.

Page 271: Taxonomy Fundamentals - SLA 2014

Bayesian Methods –Cautions

A user might wish to change the distribution of probabilities.

A user will make a novel request for information in a previously unanticipated way.

The computational difficulty of exploring a previously unknown network.

The quality and extent of the prior beliefs used in Bayesian inference processing.

Copyright © 2013 Access Innovations, Inc.

Page 272: Taxonomy Fundamentals - SLA 2014

Bayesian Methods - Cautions (continued)

A Bayesian network is only as useful as the prior knowledge is reliable.

An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.

Must ensure the selection of the statistical distribution induced in modeling the data.

Must have the proper distribution model to describe the data.

That is… you have to constantly train and retrain the data

Copyright © 2013 Access Innovations, Inc.

Page 273: Taxonomy Fundamentals - SLA 2014

Basic Areas of Natural Language Processing (NLP)

Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense

Copyright © 2013 Access Innovations, Inc.

Page 274: Taxonomy Fundamentals - SLA 2014

Basic Areas of AutomaticLanguage Processing (ALP)

Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics

Copyright © 2013 Access Innovations, Inc.

Page 275: Taxonomy Fundamentals - SLA 2014

Statistical Search

Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic Etc.

275Copyright © 2013 Access Innovations, Inc.

Page 276: Taxonomy Fundamentals - SLA 2014

Word and Term Parsing

Stemming -ing, -ed, -es, -’s, -s’, etc. Depluralization

Truncation Left and right

Wild cards Organi*ation

Variant Spellings Centre, Center

Hyphens Copyright © 2013 Access Innovations, Inc.

Page 277: Taxonomy Fundamentals - SLA 2014

The Taxonomy Effect

Where do the terms go? How are they used in search What other ways can I use the taxonomy

in search?

Copyright © 2013 Access Innovations, Inc.

Page 278: Taxonomy Fundamentals - SLA 2014

For search all publications

Search database for Journals and pubs

Bookstore search

Search of 53 crawled sites including journals, books, web site, conference sites, etc.

Site search

Navigation

Copyright © 2013 Access Innovations, Inc.

Page 279: Taxonomy Fundamentals - SLA 2014

Taxonomy DrivenSearch Presentation

Navigate the full taxonomy “tree”

BROWSE

Auto-completion using the taxonomy

Guide the user

Copyright © 2013 Access Innovations, Inc.

Page 280: Taxonomy Fundamentals - SLA 2014

Subject Browsing

Copyright © 2013 Access Innovations, Inc.

Page 281: Taxonomy Fundamentals - SLA 2014

Targeted Resources Basedon Subject or User Role

CONFIDENTIALCopyright © 2013 Access Innovations, Inc.

Page 282: Taxonomy Fundamentals - SLA 2014

Member Profile Tagging

User pastes or uploads CV

Button to auto-extract taxonomy attributes

Copyright © 2013 Access Innovations, Inc.

Page 283: Taxonomy Fundamentals - SLA 2014

TaxoTerm ServerData Harmony

(M.A.I.)

Even

t Han

dle

r

Returns subject metadata

MicrosoftSharePointServer 2010

User uploads a document to SharePoint space

Before uploading to SharePoint server, the EventHandler sends the document to Data Harmony.

Data Harmony automatically attaches indexing terms before uploading to MOSS

Adding Terms

to SharePoint

Copyright © 2013 Access Innovations, Inc.

Page 284: Taxonomy Fundamentals - SLA 2014

SharePoint 2010 Only Shows 10 Lines of the Taxonomy

284

This add on makes it all viewable

Copyright © 2013 Access Innovations, Inc.

Page 285: Taxonomy Fundamentals - SLA 2014

QU

ERY API

CUSTOMCONNECTOR

EMAILCONNECTOR

Core Architectural Components

Pipeline

SEARCHSERVER

QU

ERYPR

OC

ESSOR

Query

Results

VerticalApplications

Portals

CustomFront-Ends

MobileDevicesContent

Push

DO

CU

MEN

TPR

OC

ESSOR

WebContent

Files,Documents

Databases

CustomApplications

CO

NTEN

T API

FAST MANAGEMENT API

Index DBDATABASE

CONNECTOR

FILETRAVERSER

WEBCRAWLER

Pipeline

Email, Groupware

Administrator’sDashboard

FILTERSERVER

Agent DB

Alerts

Use taxonomy terms hereData Harmony Governance API

MA

Istro

Search harmony

Taxonomies Added in Search Example

Copyright © 2013 Access Innovations, Inc.

Page 286: Taxonomy Fundamentals - SLA 2014

Auto suggestion ofTaxonomy Terms

Populate Keywords, Descriptors, Indexing terms, etc.

Allow for manual review of auto-tagging for quality assurance.

Copyright © 2013 Access Innovations, Inc.

Page 287: Taxonomy Fundamentals - SLA 2014

Where do I use a taxonomy?

Copyright © 2013 Access Innovations, Inc.

Page 288: Taxonomy Fundamentals - SLA 2014

Thesaurus Master

Machine Aided

Indexer (M.A.I.™) Database

Repository

SearchPresentation

Layer

Increasesaccuracy

Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms

Client Taxonomy

Inline Tagging

Metadata and Entity Extractor

Automatic Summarizati

on

Search Software

Client Data

Full Text

HTML, PDF,

Data Feeds,

etc.

Client taxonomy

The Workflow

288

Tag and Createmetadata

Put in data base with tags

Build Search inverted index

Create user interface

Gather source data

Copyright © 2013 Access Innovations, Inc.

Page 289: Taxonomy Fundamentals - SLA 2014

Thesaurus Master

Machine Aided

Indexer (M.A.I.™) Reposito

ry

SearchPresentation:

90% accuracy

Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms

Client Taxonomy

Inline Tagging

Metadata and Entity Extractor

Automatic Summarizati

on

SearchSoftwar

e

Client Data

Full Text

HTML, PDF,

Data Feeds, etc.

Client taxonomy

Taxonomy In Sharepoint

Copyright © 2013 Access Innovations, Inc.

[Data Harmony fully integrated with MOSS.]

Page 290: Taxonomy Fundamentals - SLA 2014

Adding Terms toInformation Objects

Part of the record XML MARC

A relational table pointing the terms to a record ID number (Secondary key)

Adding data to the HTML META NAME KEYWORD Element

Many other options

Copyright © 2013 Access Innovations, Inc.

Page 291: Taxonomy Fundamentals - SLA 2014

Part of the Record - XML

Added as an element in the XML record Need an element to put the data in

<Taxonomy Term> Capture the terms when creating the

records

Copyright © 2013 Access Innovations, Inc.

Page 292: Taxonomy Fundamentals - SLA 2014

The author pastes the data to the

document template,

attaching images, graphs, as necessary:

Author Submission

Module

Copyright © 2013 Access Innovations, Inc.

Page 293: Taxonomy Fundamentals - SLA 2014

Editorial Workflow IntegrationAuthor Submission Module

The author fills in the data to the document template, attaching images and graphs as necessary.

An API calls Data Harmony and generates a list of indexing terms based on the content.

Copyright © 2013 Access Innovations, Inc.

Page 294: Taxonomy Fundamentals - SLA 2014

Authors review the indexing and may change it.

Content is stored into a data repository as HTML, XML, etc.

Editorial Workflow IntegrationAuthor Submission Module

Copyright © 2013 Access Innovations, Inc.

Page 295: Taxonomy Fundamentals - SLA 2014

In the HTML Record Makes it crawlable for the internet Used in CMS applications

Content Management Systems Add to the HTML

Manually In Dreamweaver In your CMS like Extron

Author Submissions Example Do the same with SharePoint

Copyright © 2013 Access Innovations, Inc.

Page 296: Taxonomy Fundamentals - SLA 2014

META NAME “KEYWORDS”

Copyright © 2013 Access Innovations, Inc.

Page 297: Taxonomy Fundamentals - SLA 2014

In Relational Database Table

Primary Key – the record Secondary key all the metadata

Like taxonomy terms Like author Like publication date

Used in Oracle, SQL, etc Need a field to put the taxonomy data in

Supports “Faceted Search” each item in a separate field or element or table

Copyright © 2013 Access Innovations, Inc.

Page 298: Taxonomy Fundamentals - SLA 2014

RDBMS Connection

Taxonomy term table

Copyright © 2013 Access Innovations, Inc.

Page 299: Taxonomy Fundamentals - SLA 2014

Using Taxonomiesin Applications

• Improve search• Subject browsing• Mobile intelligence• Targeted resources based on

subject or user role• Link to society resources• Author submission module• Author authority database• Expert reviewer identification• Member profiles• Data visualization• More like this

• In “indexing” or categorizing, as subject metadata

• In content management systems

• In SharePoint• In mashups• In social networking sites• In author tagging • In filtering data – e.g., spam

filters and RSS feeds• In web crawlers• Social media - community

Copyright © 2013 Access Innovations, Inc.

Page 300: Taxonomy Fundamentals - SLA 2014

A Quick Look

Behind the Scenes

DatabaseManagement

System

Thesaurustool

Indexingtool• Validate terms

• Add terms and rules• Change terms and rules• Delete terms and rules

• Search thesaurus• Validate term entry• Block invalid terms• Record candidates

• Establish rules for term use

• Suggest indexing terms

Copyright © 2013 Access Innovations, Inc.

Page 301: Taxonomy Fundamentals - SLA 2014

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

Page 302: Taxonomy Fundamentals - SLA 2014

Where Does the Subject Metadata Go?

Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or

other database tables

Copyright © 2013 Access Innovations, Inc.

Page 303: Taxonomy Fundamentals - SLA 2014

HTML Header

Copyright © 2013 Access Innovations, Inc.

Page 304: Taxonomy Fundamentals - SLA 2014

Suggested taxonomy descriptors

Copyright © 2013 Access Innovations, Inc.

Page 305: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 306: Taxonomy Fundamentals - SLA 2014

Integrate Taxonomy to Enhance Find-ability

Browsable categories of a directory Browsable faceted navigation

Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms

and relationships

Copyright © 2013 Access Innovations, Inc.

Page 307: Taxonomy Fundamentals - SLA 2014

More Taxonomy Enrichment

Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement

Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)

Copyright © 2013 Access Innovations, Inc.

Page 308: Taxonomy Fundamentals - SLA 2014

Brand is repeated in several spots and tied to search as well

Copyright © 2013 Access Innovations, Inc.

Page 309: Taxonomy Fundamentals - SLA 2014

Raw Full text data

feeds XIS™

Creation

Taxonomy Thesaurus Master®

Printed source

materials

Taxonomy terms

M.A.I.™ Concept Extractor

M.A.I.™ Rule Base

Load toPerfect Search

Search Harmon

™ Display Search

Database Plus Search Workflow

Data Crawls on 53+ sources

Add metadat

a XIS™ repositor

y

SQL for ecommerc

e

Save data to search and repositories at the same time

Copyright © 2013 Access Innovations, Inc.

Page 310: Taxonomy Fundamentals - SLA 2014

Raw Full text data

feeds

XIS Creation

Taxonomy Thesaurus

Master

Printed source

materials

Taxonomy terms

MAI Rule Base

Load toSearch

Search Harmony Display Search

Data Base Plus Search Workflow

Data Crawls on data sources

Add metadata

XIS repositor

y

SQL for ecommerce

MAI Concept Extractor

Source data

Clean and enhance data

Search data

Copyright © 2013 Access Innovations, Inc.

Page 311: Taxonomy Fundamentals - SLA 2014

Use Case: Inline Tagging

Show the exact point where the concept is mentioned

Mouse-over to view the term record

Statistical summary, showing the number of times each term is mentioned in the article

Copyright © 2013 Access Innovations, Inc.

Page 312: Taxonomy Fundamentals - SLA 2014

Inline Tagging HTML View

Copyright © 2013 Access Innovations, Inc.

Page 313: Taxonomy Fundamentals - SLA 2014

XML View forInline Tagging

Copyright © 2013 Access Innovations, Inc.

Page 314: Taxonomy Fundamentals - SLA 2014

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

Page 315: Taxonomy Fundamentals - SLA 2014

The New Board Game Applications Implementation The taxonomy

A TAXING SITUATION

Copyright © 2013 Access Innovations, Inc.

Page 316: Taxonomy Fundamentals - SLA 2014

The Changing Faces ofWeb Taxonomies

….and how the information is delivered From current site To new version

Depends on TAXONOMY Personalization Feeding ads Consistent information

Copyright © 2013 Access Innovations, Inc.

Page 317: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 318: Taxonomy Fundamentals - SLA 2014

Copyright © 2013 Access Innovations, Inc.

Page 319: Taxonomy Fundamentals - SLA 2014

HTML HeadersMETA NAME KEYWORD

Use the taxonomy here

Copyright © 2013 Access Innovations, Inc.

Page 320: Taxonomy Fundamentals - SLA 2014

Copyright © 2005 - Access Innovations, Inc.Copyright © 2013 Access Innovations, Inc.

Page 321: Taxonomy Fundamentals - SLA 2014

More Innovations! Link topic to article to author to event Make visual links within domain Enable authors to submit and categorize conference

submissions Create author authority database linking to co-authors, topics,

locations, etc. Create expert reviewer database Create member profiles with alternate names, publications,

tagged by topic Visualize data and domain distribution Display interest connections in social network Deliver accurate targeted information through mobile applications Etc.

Copyright © 2013 Access Innovations, Inc.

Page 322: Taxonomy Fundamentals - SLA 2014

Change to Ready, Aim, Fire!

Follow the data Look at the data, format and content Design taxonomy for data Leverage the standards Use taxonomy to tag data Choose search and repository software for data Load the data into the system Keep your eye on the target

Copyright © 2013 Access Innovations, Inc.

Page 323: Taxonomy Fundamentals - SLA 2014

Standards forMonolingual Thesauri

TEST - Thesaurus of engineering and scientific terms - COSATI 1967

ARNOR NFZ 47-100 1981 French DIN 1463 German 1987-1993 NISO Z39.19 - 1993 - American

Copyright © 2013 Access Innovations, Inc.

Page 324: Taxonomy Fundamentals - SLA 2014

Where Can I Get Taxonomy Standards?

www.niso.org Z39.19 (2010) Controlled Vocabularies

www.ISO.ce ISO 25964 parts 1 and 2 (2012 and 2013)

www.bsi.uk.co www.w3c.org SKOS and OWL www.accessinn.com/library

Copyright © 2013 Access Innovations, Inc.

Page 325: Taxonomy Fundamentals - SLA 2014

Suggested Reading F.W. Lancaster - 1986

Vocabulary Control 1986 Aitchison, Gilchrist and Bawden

Thesaurus construction and use: a practical manual 4th edition

Accidental Taxonomist Heather Heddon

TaxoDiary.com Blog site

Copyright © 2013 Access Innovations, Inc.

Page 326: Taxonomy Fundamentals - SLA 2014

Suggested Reading

Introduction to any thesaurus INSPEC NICEM Pychological Abstracts etc.

Copyright © 2013 Access Innovations, Inc.

Page 327: Taxonomy Fundamentals - SLA 2014

It Just Takesa Little

ImaginationThank you

Marjorie M.K. Hlava, PresidentBob Kasenchak, Project CoordinatorAccess [email protected][email protected]

Copyright © 2013 Access Innovations, Inc.