taxonomy fundamentals - sla 2014
DESCRIPTION
An all-day version of Access Innovations' Taxonomy Fundamentals workshop, presented by Marjorie M.K. Hlava and Bob Kasenchak at the 2014 Special Libraries Association (SLA) annual meeting in Vancouver, British Columbia on June 7, 2014.TRANSCRIPT
Taxonomy Fundamentals
Why build a taxonomy?
SLA – Vancouver – June 7, 2013
www.accessinn.comwww.dataharmony.com
505-998-0800Marjorie M.K. Hlava
President and Chief ScientistBob Kasenchak
Project CoordinatorAccess Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
A fast moving and powerful introduction to both the theoretical and practical aspects of building a taxonomy, thesaurus, and ontology. A well-built taxonomy is part of the foundation of the information architecture underlying web sites, corporate Intranets, search/retrieval, and access to relevant content in databases. After defining controlled vocabularies and identifying core standards, you will explore key concepts of taxonomy, thesaurus, indexing, classification, and filtering. Discussion will include the basics of a taxonomy records and fundamental term relationships. Attendees will put concepts into practice through multiple exercises, taxonomy, indexing, and related software tools will be demonstrated.
Introduction To Taxonomy Concepts
Copyright © 2013 Access Innovations, Inc.
About Access InnovationsAccess Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata-enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found!
Quick Facts• Founded in 1978• Headquartered in Albuquerque, NM• Privately held• Delivered more than 2000 engagements
Copyright © 2013 Access Innovations, Inc.
What we do
Access Innovations Ensure clean, well formed content Create Knowledge Organization Systems (KOS)
Data Harmony Tools To automatically index content To manage KOS and more To semantically enrich the content To organize the content
Visualization tools to portray the data
4Copyright © 2013 Access Innovations, Inc.
Outline of the Day Why the excitement What is a Taxonomy Card Sort – Slide 39 How to build a taxonomy Term relationships Thesaurus Examples Pre and Post
Coordination What are we controlling Vocabulary Options
TaxoMatch - Slide 189 Term Forms Facets / Notation / Roles /
Treatment/ Weighting Auto Indexing A Taxing Situation - Slide
315 Search Where do I use it? Standards and references
Why The Excitement? Makes information findable!
Cut search time by 50%! (The Weather Channel) Leverages information in new ways User satisfaction Organizes topical areas and web sites Provides better online help
Customer support 30x more costly than web self-service*
*(Forrester Research "Tier Zero Customer Support" 1999)
Copyright © 2013 Access Innovations, Inc.
Taxonomies are found…
• In “indexing”, tagging, categorizing, subject metadata• In search - precision, recall• In content management systems, web sites• In SharePoint to replace term tree, tag uploads• In mashups, repackaging, repurposing data• In social networking sites• In author tagging - peer reviewer selection• In filtering data – e.g., spam filters and RSS feeds• In web crawlers• In text analytics – trend analysis• … and much more
Copyright © 2013 Access Innovations, Inc.
Because taxonomies make them work
Where Does Implementation Happen?
At the backend When the records / articles are added to
the production system When the search software’s “inverted file”
is created When the HTML for the web page is
created
Copyright © 2013 Access Innovations, Inc.
Heart Of The “Big Data” Production Process
Copyright © 2013 Access Innovations, Inc.
From the production side to the website display, carry the taxonomy descriptors for use in precision search
Copyright © 2013 Access Innovations, Inc.
Taxonomy
Copyright © 2013 Access Innovations, Inc.
Authors at a place
MASHUP locations to a GPS grid of an area
Two data points GPS Coordinates Taxonomy description of the place
Copyright © 2013 Access Innovations, Inc.
Watch Crime In Action
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Two data points GPS Coordinates Taxonomy description of the crime
Copyright © 2013 Access Innovations, Inc.
17
Visualization Strategies
MatrixVisualization
Software
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
All Data Up-postedTo The Top Level
Copyright © 2013 Access Innovations, Inc.
Pattern AnalysisIndexing Clusters
Copyright © 2013 Access Innovations, Inc.
Pattern AnalysisDomain Associations
Copyright © 2013 Access Innovations, Inc.
Pattern AnalysisDomain Correlations
Copyright © 2013 Access Innovations, Inc.
Pattern AnalysisGap Analyses
Copyright © 2013 Access Innovations, Inc.
Pattern AnalysisComponent Gaps
Copyright © 2013 Access Innovations, Inc.
More Like This - RecommenderCancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003© 2003 American Association for Cancer Research Short Communications
Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251
Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort.
Related Press Releases• How What and How Much We Eat (And Drink)
Affects Our Risk of Cancer • Novel COX-2 Combination Treatment May
Reduce Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death
• COX-2 Levels Are Elevated in Smokers
Related AACR Workshops and Conferences• Frontiers in Cancer Prevention Research• Continuing Medical Education (CME) • Molecular Targets and Cancer
TherapeuticsRelated Meeting Abstracts• Association between dietary folate
intake, alcohol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast
• Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma
• Dietary folate intake and risk of prostate cancer in a large prospective cohort study
Related Working Groups• Finance• Charter• Molecular Epidemiology
Related Education Book ContentOral Contraceptives, Postmenopausal Hormones, and Breast CancerPhysical Activity and CancerHormonal Interventions: From Adjuvant Therapy to Breast Cancer PreventionRelated Awards
• AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards
• ACS Award• Weinstein Distinguished Lecture
WebcastsRelated Webcasts
Think Tank ReportRelated Think Tank Report Content
Copyright © 2013 Access Innovations, Inc.
Link to Society Resources
Journal Article on Topic A
Other Journal
Articles on Topic A
Upcoming Conference on Topic A
Podcast Interview with Researcher
Working on Topic A
Grant Available for Researchers
Working on Topic A
CME Activity on
Topic A
Job Posting for Expert on Topic A
Copyright © 2013 Access Innovations, Inc.
Author Connections
Copyright © 2013 Access Innovations, Inc.
What is a taxonomy?
Albuquerque, NM 87110www.accessinn.com
www.dataharmony.com505-998-0800
Marjorie M.K. Hlava
President and Chief Scientist
Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control - Options Classification
systems* Authority files Controlled term lists Uncontrolled term
lists Thesauri
Copyright © 2013 Access Innovations, Inc.
[*We will concentrate on taxonomies and thesauri, first, and then cover the others as time permits.]
Taxonomy Standards Z39.19 (2005) Controlled Vocabularies BS 8723 Parts 1 – 5 ISO25964 Parts 1 - 4 TAG 37 and 46 standards SKOS - Simple Knowledge Organization
System OWL - Web Ontology Language AND more!
Copyright © 2013 Access Innovations, Inc.
A Taxonomy is a Knowledge Organization System (KOS)
Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus Ontology Semantic network
Not complex
Highly complex
Copyright © 2013 Access Innovations, Inc.
Structure Of Controlled Vocabularies
Lists Synonyms Taxonomy Thesaurus Ontology
Ambiguity Ambiguity Ambiguity Specifies a KOS Synonym Synonym Additional kinds of
Hierarchy Hierarchy RelationshipsRelationships relationships
INCREASING COMPLEXITY and CONTROL
Copyright © 2013 Access Innovations, Inc.
What is a Taxonomy? ANSI/NISO Z39.19-2005
“A collection of controlled vocabulary terms organized into a
hierarchical structure.”
controlled
Missing: equivalence, homographic, and associative relationships and notes
Yes!
Copyright © 2013 Access Innovations, Inc.
Taxonomy? Thesaurus?
Often used interchangeably Thesaurus is a taxonomy with extras
Related Terms Non-preferred Terms (USE/Used for) Scope Notes More
Taxonomies often have the actual information object at the final node.
CMS and SharePoint tend to the hierarchical view only, definition, and USE
Copyright © 2013 Access Innovations, Inc.
Taxonomy? Thesaurus?
Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT)
See also (SA) Non-Preferred Term (NP)
Used for (UF), See (S) Scope Note (SN) History (H)
= subject term, heading, node, category, descriptor, class
TAXONOMY
THESAURUSOWL can specify
Copyright © 2013 Access Innovations, Inc.
The Semantic Roadmap: Knowledge Organization Systems
Semantic network Ontology Thesaurus Taxonomy Controlled vocabulary Synonym set/ring Name authority file Uncontrolled list
• Unrelated Entities• Ambiguity
• Linked Entities• Contextual Specificity
• Simple• Low Value
• Complex• High value
Uncontrolled list has the
Highest Cost over Time!
Copyright © 2013 Access Innovations, Inc.
Copyright © 2005 - Access Innovations, Inc.
Taxonomyview
ThesaurusTerm Record
view
Copyright © 2013 Access Innovations, Inc.
CARD SORT
Copyright © 2013 Access Innovations, Inc.
Taxonomy 101How do you build a taxonomy?
Albuquerque, NM 87110www.accessinn.com
www.dataharmony.com505-998-0800
Marjorie M.K. Hlava
President and Chief Scientist
Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
How Do You Build a Taxonomy ?
• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data
You’re done!
Copyright © 2013 Access Innovations, Inc.
Foundations Start with what is known Build from there Use the literature, your data Use the lists you already have internally Built-in continuous review throughout the
process, and beyond Who is involved?
Taxonomists Subject matter experts (SME) Project management Users
Copyright © 2013 Access Innovations, Inc.
Define Subject Field
Review representative collection of content Determine:
Core areas Peripheral topics
PsychologyEducation
Sociology
Law
Scope can be modified later
Copyright © 2013 Access Innovations, Inc.
Where Do I Get the Terms?
Your documents and databases Departmental terminology Text books and their indexes Book tables of contents and indexes Journal quarterly indexes Encyclopedias Lexicons, glossaries on the topic Web resources Users and experts Search logs
Copyright © 2013 Access Innovations, Inc.
How Do You Choose Terms?
Importance in the subject area Use in the literature, by the organization
or community Necessary degree of specificity or detail Relationship with other controlled
vocabularies Single concept = single term
Copyright © 2013 Access Innovations, Inc.
Build, Buy, Augment? Survey existing thesaurus/taxonomy resources for your
domain Test for
• Scope• Depth• Make-or-break terms• Cost
Adoption of existing taxonomies Term registries Taxobank Taxonomy Warehouse Other resources
Don’t reinvent the wheel!Copyright © 2013 Access Innovations, Inc.
Gather Terms From Search Logs
Top ~100 search terms from search logs Terms used more than 50 times Match to web site with appropriate
answer Basis for favorites or best bets, presented
at the top of results list Behavior-based taxonomy
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control – How?
Use unambiguous terms, clear to the user group
Distinguish between terms that appear similar
Use Scope Notes when necessary Use terms as elements that can be
coordinated in a flexible manner Create compound terms, if necessary
Copyright © 2013 Access Innovations, Inc.
Term Format
KISS – Keep it short and simple• 1-2-3 words• Effect on search• Pre and Post Coordination
Establish a policy • follow Chicago Manual of Style
Grammatical issues • Nouns and noun phrases• Verbs Gerunds • Adjectives - no• Adverbs - no• Initial articles – no
Copyright © 2013 Access Innovations, Inc.
Thesaurus - Format
Main Entries Top Terms - TT Broader Terms - BT Narrower Terms - NT Related Terms - RT Scope Notes - SN History - HI Date term added/changed - DA
Copyright © 2013 Access Innovations, Inc.
Thesaurus - Format
Related terms - RT See - S See also - SA Use - U
Preferred Term PT Use for - UF
Non Preferred Term NP ..
Copyright © 2013 Access Innovations, Inc.
Definitions
Index term the representation of a concept
Preferred term (International)
a term used consistently to index a concept descriptor (USE) what the “USED FOR” reference points to
Copyright © 2013 Access Innovations, Inc.
Definitions
Non preferred term (International) synonym or quasi synonym of a preferred term non-descriptor (USE) the “USE” reference the “SEE” reference
Related term the “SEE ALSO”
Copyright © 2013 Access Innovations, Inc.
Indexing Terms
Three main categories concrete entities abstract concepts proper nouns
Copyright © 2013 Access Innovations, Inc.
One Term / One Concept
Importance in the subject area Use in the literature, by the organization
or community Necessary degree of specificity or detail Relationship with other controlled
vocabularies
Copyright © 2013 Access Innovations, Inc.
One Term / One Concept
Terms represent simple or unitary concept A unit of thought Can be a single-word term Can be a multiword term, if required to
represent the concept Three main categories
– Concrete entities – Abstract concepts– Proper nouns
“A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”
Copyright © 2013 Access Innovations, Inc.
Concrete Entities
Things and their physical parts primates
head buildings
floors islands
Copyright © 2013 Access Innovations, Inc.
Concrete Entities as Terms
• Things and their physical parts– Birds
• Feathers
• Buildings• Floors
• Materials– Cement – Wood – Lead
– Cards and Chips
Copyright © 2013 Access Innovations, Inc.
Concrete Entities
Materials cement wood lead cars refrigerators
Copyright © 2013 Access Innovations, Inc.
Abstract Concepts
Actions and events evolution respiration skating management wars ceremonies
Copyright © 2013 Access Innovations, Inc.
Abstract Concepts
Abstract entities, properties of things, materials and actions law theory strength efficiency lead (management)
Copyright © 2013 Access Innovations, Inc.
Abstract Concepts
Disciplines and sciences physics meteorology mathematics psychology
Copyright © 2013 Access Innovations, Inc.
Abstract Concepts
Units of measurement kilograms pounds meters miles
Copyright © 2013 Access Innovations, Inc.
Abstract Concepts as Terms• Actions and events
– evolution, skating, management, ceremonies• Abstract entities
– law, theory• Properties of things, materials, and
actions– strength, efficiency
• Disciplines and sciences– physics, meteorology, mathematics
• Units of measurement– pounds, kilograms, miles, meters, nanoseconds
Copyright © 2013 Access Innovations, Inc.
Proper Nouns*
Individual entities, or “classes of one”, expressed as proper nouns San Francisco United States of America Lake Michigan
* Proper names – of persons – are not included
Copyright © 2013 Access Innovations, Inc.
Proper Nouns as Terms
Individual entities – “classes of one” – expressed as proper nouns San Francisco, Lake Michigan
Thesaurus standards exclude proper names, persons, and trade names authority files.
Taxonomies include them as final nodes.
Copyright © 2013 Access Innovations, Inc.
Most Terms Are Nouns
Nouns or simple noun phrases Adj + Noun – Art history (ANSI/NISO standard)
Noun + Prep + Noun – History of art (ISO standard) Exceptions – Burden of proof, Coats of arms,
Prisoners of war, Birds of prey, etc.
Copyright © 2013 Access Innovations, Inc.
About “and”
Avoid “and” in terms – not a single concept
Instead of: Children and television
Factor and postcoordinate
USE Media influence + Television + Children“And” is not in the standard
In real life—need for granularity may dictate your choice
Copyright © 2013 Access Innovations, Inc.
Compound Terms – Nope!
“Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)
“Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)
Term phrases are okay (bigrams) Adjective Noun American history
Two concepts combined are not Aromatherapy for bloating
Copyright © 2013 Access Innovations, Inc.
Organize Terms – Roughly
Sort terms into several major categories – logical groups of similar concepts as Top Terms Identify core areas and peripheral topics 10 – 20 to start Consider moving proper names to authority files
Result: loose collection of terms under several main headings Rough and tentative – see how it fits as you go Initial gap analysis Add / modify / delete as needed
Copyright © 2013 Access Innovations, Inc.
Term Relationships
How Do Terms Relate?
Hierarchical relationships-- Parents and their
children Equivalence relationships
-- Aliases Associative relationships
-- Cousins
TAXONOMY
THESAURUS
Copyright © 2013 Access Innovations, Inc.
Hierarchical Relationships
Broader Term (BT) represents the class, whole, or genus
Narrower Term (BT) is a member, part, or species Generic relationship Whole-part relationship Instance relationship
NT inherit all the BT characteristics BTs/NTs have a reciprocal relationship
Copyright © 2013 Access Innovations, Inc.
Hierarchical Relationships
Class as a whole superordination broader term (BT) sometimes top term (TT)
Members or parts of the class subordination narrower term (NT)
Reciprocal
Copyright © 2013 Access Innovations, Inc.
Hierarchical Relationships
BT/NT based on being part of same class Same fundamental category
entities activities agents properties
Copyright © 2013 Access Innovations, Inc.
Hierarchical Relationships
Museums Archaeological museum type of entity NT Ethnological museum type of entity NT Curators agents RT Museum techniques action RT Scientific museum type of entity NT
Copyright © 2013 Access Innovations, Inc.
Hierarchy – Whole-Part Relationships
Four general types 1. Body systems and organs
Ear Middle ear
2. Geographical locations Bernalillo County Albuquerque
3. Fields of study Geology Physical geology
4. Hierarchical social structures Ontario Manitoulin District
Copyright © 2013 Access Innovations, Inc.
Hierarchy – Instance Relationships
General category (common noun) as BT,
with individual example (proper noun) as Narrower Term Instance (NTI)
Seas French cathedralsBaltic Sea Chartres Cathedral
Caspian Sea Rheims Cathedral
Mediterranean Sea Rouen Cathedral
Essentially identical to “final node” in taxonomies
Copyright © 2013 Access Innovations, Inc.
Hierarchical Typesof Display
Systematic Alphabetic other, but less common views
Copyright © 2013 Access Innovations, Inc.
80
DTIC
Hierarchy
Copyright © 2013 Access Innovations, Inc.
Polyhierarchical Relationship
• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)
• Part of ISO standards, new to ANSI/NISO
Nurses Health administrators Nurse administrators Nurse administrators
Finance Careers Accounting Accounting
Copyright © 2013 Access Innovations, Inc.
PolyhierarchicalRelationships
Great for the web click environment Terms occur in multiple categories Can be generic as well as hierarchical
Engineering PhysicsNT Nanotechnology NT Nanotechnology
NanotechnologyBT EngineeringBT Physics
Copyright © 2013 Access Innovations, Inc.
83
DTIC
Alpha
Copyright © 2013 Access Innovations, Inc.
Pests
Generic Relationship Tests
Squirrels
Rodents
ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents
Copyright © 2013 Access Innovations, Inc.
Generic Relationship Tests
• Both terms in same fundamental category• “All-and-some” test
SOME ALL
SOME NOT ALL
Rodents
Squirrels
Pests
SquirrelsConsider concepts of marketing and advertising
Copyright © 2013 Access Innovations, Inc.
Generic Relationships
“Identifies the link between a class or category and its members or species.”
Easy in biology Rodents
NT Squirrels All and some rule
Copyright © 2013 Access Innovations, Inc.
All and Some Rule
Rodents NT Squirrels RT Pests
Q. Is this an example of polyhierarchy? Q. Do you need to make RT relationships
for “Pests” to all of the NTs under “Rodents”?
Copyright © 2013 Access Innovations, Inc.
Instance Relationships Seas ISO
NT Baltic Sea NT Caspian Sea NT Mediterranean Sea
French Cathedrals NISO / ANSI NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals
Copyright © 2013 Access Innovations, Inc.
Instance Relationships French Cathedrals NISO / ANSI
NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals
French Gothic Cathedral NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral BT Gothic cathedrals
Q. Why/how do these differ?Copyright © 2013 Access Innovations, Inc.
90
CABI Pages
Copyright © 2013 Access Innovations, Inc.
Instance Relationships
“…a general category of things and events expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which is represented by a proper name.”
A way of adding the proper names and items from the Authority files to the thesaurus
Copyright © 2013 Access Innovations, Inc.
Questions before moving on to Associative Relationships?
Associative Relationships
Related Terms (RTs) – cousins “…terms related conceptually, but not
hierarchically, and are not part of an equivalence set” (i.e. not synonyms)
Both terms are valid thesaurus terms for indexing and have reciprocal relationship
Expands user’s awareness and reflects thesaurus coverage of unanticipated areas
Standards describe specific types
Copyright © 2013 Access Innovations, Inc.
Associated Relationships
Related terms
Physicians Medicine
(“Reciprocal posting” done automatically is highly desirable.)
Copyright © 2013 Access Innovations, Inc.
Associative Relationships Sibling relationships Examples:
Brother : Sister Desk : Chair
Easier to create within well defined facets (e.g. AAT)
Usual step in building process Can be identified automatically
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
RT relationships Braking systems
RT Trains RT Bicycle RT Motor vehicle
Office furniture RT Office buildings RT Ergonomics
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Field of study and objects studied Seismology
RT Earthquakes Meteorology
RT Weather patterns
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Operation or process and the agent or instrument Hairdressing
RT Hair dryers Word processing
RT Typing skills
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Occupation and person in occupation Social work
RT Social workers Information science
RT Special librarians
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Action and the product of the action Publishing
RT Music scores Landscaping
RT Lawn mowers RT Irrigation systems
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Action and its patient Teaching
RT Students Conducting
RT Musicians
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Concepts related to their properties Women
RT Femininity Automobiles
RT Automotive safety
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Concepts related to their origins Water
RT Water wells Carpet
RT Thread
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Concepts linked by causal dependence Injuries
RT Accidents Cultural stress
RT Culture shock
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Action and counter action Pests
RT Pesticides Log on
RT Log off
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Raw material and its product Hides
RT Leather Clothing
RT Fabric
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Action and associated property Precision instrument
RT Accuracy Production processes
RT Quality control
Copyright © 2013 Access Innovations, Inc.
Associative Relationships
Concept and its opposite Single People
RT Married people Height
RT Depth RT Weight
If not hierarchical, probably associative
Copyright © 2013 Access Innovations, Inc.
Questions before moving on to Equivalence Relationships?
Equivalence Relationships
Refer to the same concept (Use for)
Prefix for non-preferred terms (Use)
Prefix for preferred terms Automobiles
used for Cars Cars
use Automobiles
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Use
Use forPhysicians
Doctors
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships Synonyms
popular and scientific spiders - arachnida
scientific and trade names Motrin (TM) - ibuprofen
standard names and slang hi fi - high fidelity
different linguistic origin home care - domicillary care
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Synonyms cont’d different cultures
aerials - antenna trunk - boot hire - rent
emerging concepts telecommuting - distance working
outdated refrigerators - iceboxes
Copyright © 2013 Access Innovations, Inc.
A “Term” Synonym Ring
Term
Node
Subject headingCategory
Descriptor
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Lexical variants variant spellings
Muslim - Moslem center - centre
direct and indirect forms electric power plants power plants, electric
abbreviations ECG - electrocardiograph
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Quasi synonyms urban areas - cities gifted people - geniuses
Antonyms height - depth literacy - illiteracy
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Up posting (generic posting) useful for web interfaces NT equivalent to their BT
not sub species of BT
Copyright © 2013 Access Innovations, Inc.
Equivalence RelationshipsPsychInfo Rotated
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationships
Factored terms express terms in their combinations
Milk hygiene use milk and hygiene
Copyright © 2013 Access Innovations, Inc.
Equivalence Relationship
• Preferred Term – Thesaurus term and valid for indexing– Thesaurus notation: USE
• Non-Preferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPT
Spiders Plant pathology UF Arachnids USE Phytopathology
Copyright © 2013 Access Innovations, Inc.
Equivalence – When to Use
Synonyms, slang, quasi-synonyms Scientific and trade names
Ibubrofen UF Motrin™ Lexical variants
Fiber optics UF Fibre optics Mouse UF Mice
Upward posting of narrow concepts not specified in taxonomy or thesaurus Social class UF Elite, Middle class, Working class
Get equivalent terms from search logs, brainstorming…
Copyright © 2013 Access Innovations, Inc.
Scope Notes (SN)
Indicate meaning of the term in the context of this thesaurus, for this audience Stress – Metal, Psychological, Physiological
Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often
confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise
Copyright © 2013 Access Innovations, Inc.
Scope Notes (SN)
Restrictions on meaning Range of topics covered Instructions to indexers Term histories Reciprocal scope notes
Copyright © 2013 Access Innovations, Inc.
Questions before moving on to more thesaurus examples?
Thesaurus - Examples Roget's 1852
synonyms COSATI - 1964
concept linking NASA AEC - ERDA - DOE - ESA
National Library of Medicine outline of a field Medical Subject Headings - MeSH
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 126
NASA
Alphabetic
Copyright © 2013 Access Innovations, Inc.
127
NASA
Hierarchical
Copyright © 2013 Access Innovations, Inc.
Thesaurus - Examples
INSPEC - multifaceted Thesaurus Classification system Free text terms Variant spellings
NICEM 27 Top Terms
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 129
INSPEC
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 130
INSPEC
Hierarchy
Copyright © 2013 Access Innovations, Inc.
Merged Vocabularies
Yahoo! Subject headings Authority files In a single list
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 132
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 133
Yahoo!
Hierarchy
Copyright © 2013 Access Innovations, Inc.
Merged Vocabularies - continued
Office.com Multiple broader terms Concept mapping
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 135Copyright © 2013 Access Innovations, Inc.
Eurovoc Thesaurus
PagesCopyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 137
Eurovoc Thesaurus Hierarchy
Copyright © 2013 Access Innovations, Inc.
138
Eurovoc Terms
Copyright © 2013 Access Innovations, Inc.
So far you’ve got… Hierarchy
– Broader and Narrower Terms• Polyhierarchies when needed
– Preferred/Non-Preferred Terms – Equivalence relationships
– Related Terms– Associative relationships
– Scope Notes– Complete term records
– Correct term format
Copyright © 2013 Access Innovations, Inc.
So far you’ve got…
Hierarchical relationships-- Parents and their
children Equivalence relationships
-- Aliases Associative relationships
-- Cousins-- See Also’s
TAXONOMY
THESAURUS
Copyright © 2013 Access Innovations, Inc.
So far you’ve got…
• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency
Copyright © 2013 Access Innovations, Inc.
Pre and Post Coordination
Pre and Post Coordinate Terms
Pre coordinates – two concepts Subject headings – Library of Congress
American history – Civil War Back of the book Put together in advance by the publisher
Post Coordinate Taxonomy terms Single concept Put together by the user / searcher
Copyright © 2013 Access Innovations, Inc.
Pre-coordination
Card catalogs - printed indexes Links and roles defined Controlled vocabularies High input costs Precise recall - easier searching
Copyright © 2013 Access Innovations, Inc.
Post-coordination
Starting with punch cards Machine readable Frequently natural language Currency and specificity Exhaustive coverage - loss of precision Low input costs False drops
Copyright © 2013 Access Innovations, Inc.
Work first from the literature Establish literary warrant for terms Some one else do the clerical work Differentiate the lexicography work
From the Subject Matter expert work Let SMEs do the review and tailoring Expert review ensures the proper term use and
application Advisory Board…advisable!
Subject Matter Experts (SME)
Copyright © 2013 Access Innovations, Inc.
Again, why do we index?
Improve precision define scope of terms
Improve recall different terms for same concept
Guide to a field of expertise Learning tool Richer expression
Copyright © 2013 Access Innovations, Inc.
Uses?
Indexing …process by which subject terms or
classification symbols are assigned to concepts in documents
A thesaurus is also known as an indexing language
M.A.I.™ is an automated indexing system
Copyright © 2013 Access Innovations, Inc.
What are We Controlling?
What are We Controlling?
Synonyms different terms same concept
Polysemes or Homonyms same word different meanings lead or mercury
Copyright © 2013 Access Innovations, Inc.
How? Meaning
delineation of scope of a term Term equivalence
linking of synonyms Disambiguation of homonyms
lead (metal) lead (element) lead (management)
Copyright © 2013 Access Innovations, Inc.
Disambiguation
Bridge Structure
Bridge Dentistry
Bridge Game
Bridge ConceptCopyright © 2013 Access Innovations, Inc.
Disambiguation
Restriction and clarification of meaning Cells
biological microsystems electrical equipment prison housing
Reading town in England communication process
Copyright © 2013 Access Innovations, Inc.
Disambiguation
Bill Invoice
Bill Legislative
Bill Sport
Bill PersonCopyright © 2013 Access Innovations, Inc.
Disambiguation: Pre-Coordinate vs.Post-Coordinate Forms
Cells (biology) Cells (electric) Cells (prison)
Reading (place) Reading (process)
Biological cells Electric cells
Copyright © 2013 Access Innovations, Inc.
Precision Options
Language specificity Coordination Compound terms - level of
precoordination Homographs and scope notes Word distance indication
Copyright © 2013 Access Innovations, Inc.
Precision Options
Structural relationships Links and roles Treatment and aspect codes Weighting
Copyright © 2013 Access Innovations, Inc.
Maintenance of aControlled Vocabulary
Allow for new jargon to be added Any living field will have new terms Identifier field Candidate terms Consider multiple broader terms
Copyright © 2013 Access Innovations, Inc.
Review, edit, test, edit, use, edit, and maintain, i.e., edit
Review Users Expert reviewers
Test Index 500+ documents
(more for variable writing style; fewer for strict style)
Monitor search log
Edit and maintain Add term Change existing term Change term status Delete term Add term relationship Delete term relationship Add/modify Scope Note Change overall structure
Consider automated / assisted indexing software
Copyright © 2013 Access Innovations, Inc.
When Do You Add More Terms?
On demand When usage changes Stewardess – flight attendant
As the field evolves 8 changes to 64 colors
In Use Don’t freeze waiting for perfection
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control - Options
Classification systems
Authority files Controlled term lists Uncontrolled term
lists Thesauri
Copyright © 2013 Access Innovations, Inc.
Classification Systems - Defined
Are used to put an object in a specific place. In the traditional classification system each item has a single spot to go.
Follow an outline of knowledge Used to shelve books in a library
Copyright © 2013 Access Innovations, Inc.
Catalog Systems - Defined
Used to catalog the object to identify its contents
Based on perception Multiple terms are used to identify a
single object Not natural language Pre-coordinated - subheadings
Copyright © 2013 Access Innovations, Inc.
Classification Systems - Examples
Classification of actual collections New York State Library - Dewey
810.01 Cutter - Universities 1800 - 1960’s
Z34 Lan
Thomas Jefferson - Library of Congress z34.18 la
Government Documents Numbers based on government structure
Copyright © 2013 Access Innovations, Inc.
Catalog Systems - Examples
Library of Congress Subject Headings Sears Subject Headings
(used with Dewey)
Copyright © 2013 Access Innovations, Inc.
King of Catalogers
Charles Ammi Cutter rules for alphabetical subject indexing
most specific heading put two topics under two headings use English if possible x ref antonyms careful with homographs
1895 ALA Subject Headings following Cutter
Copyright © 2013 Access Innovations, Inc.
Politics in Libraries
In 1905 Dewey was president of ALA (American Library Association)
LC adopted DDC Threw out Cutter The two never spoke again.
Copyright © 2013 Access Innovations, Inc.
Types of Headings Single word
Botany or Ethics Adjective noun
Capital punishment Noun - noun
Death penalty American Standard
Noun preposition noun Penalty of death International Standard
Noun conjunction noun Nurses and nursing
Copyright © 2013 Access Innovations, Inc.
Cutter Guidelines File under the phrase “as it reads” Use the most significant words Reduce adjective nouns to noun
phrases Use singular rather than plural File compound words under the first
word No subheadings
Copyright © 2013 Access Innovations, Inc.
Cross References
Cross reference synonyms main heading should be what the class uses use the common term use the unambiguous heading prefer the one which brings relations “…with a well defined network of cross
references the mob becomes an army.. “ C.A. Cutter
Copyright © 2013 Access Innovations, Inc.
Library of Congress (LC) Subject Headings
1911 - List of Subject Headings extensive use of sub-headings invert phrases for main subject file under the noun not the adjective see references not cross filing place holder terms homographs defined parenthetically
Copyright © 2013 Access Innovations, Inc.
Classification vs. Subject Headings
Classification single spot or placement browse physical list often a numbering system clear hierarchy no or few cross references Like Yahoo!
Copyright © 2013 Access Innovations, Inc.
Classification vs. Subject Headings
Subject headings generic search hidden classification system related terms and cross references in heavy use usually the inverted form
cells, electric
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control - Options
Classification systems
Authority files Controlled term lists Uncontrolled term
lists Thesauri
Copyright © 2013 Access Innovations, Inc.
Authority Systems - Defined
Frequently have cross references Widely available Frequently coded lists Brand names .. Lists of terms in the preferred format for
use.
Copyright © 2013 Access Innovations, Inc.
Authority Files - Defined
People Places Things ……..NOT Concepts Methods Processes
Copyright © 2013 Access Innovations, Inc.
Authority Files - Examples
ISO Country Name and Code International Standards Organization
ISO Language list NAICS (SIC)
Standard Industrial Classification Code (SIC) Replaced by
North American Industrial Classification System (NAICS)
Copyright © 2013 Access Innovations, Inc.
Authority Lists - Format
Belgian Congo use Congo
Bill Gates use William F. Gates, III (computer scientist) see also
William Gates (basketball player)
Copyright © 2013 Access Innovations, Inc.
Authority Lists - Need Style Sheets
Names AACR2
Anglo American Cataloging Rules AAP
American Association of Publishers Chicago Manual of Style Dun & Bradstreet Style Sheet
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control - Options
Classification systems
Authority files Controlled term
lists Uncontrolled term
lists Thesauri
Copyright © 2013 Access Innovations, Inc.
Controlled Term Lists - Defined
State the preferred terms Provide allowed term entry Heavily cross referenced Not generally hierarchical Popular Easy to create
Copyright © 2013 Access Innovations, Inc.
Controlled Term Lists - Examples
ABI/Inform Predicasts RDS - Responsive Data Services Back of book indexes Art and Architecture Thesaurus …....These are not FULL thesauri
Copyright © 2013 Access Innovations, Inc.
Controlled Term List - Format
Cars use Automobiles
Personal Computer use Microcomputer
Copyright © 2013 Access Innovations, Inc.
Vocabulary Control - Options
Classification systems
Authority files Controlled term lists Uncontrolled term
lists Thesauri
Copyright © 2013 Access Innovations, Inc.
Uncontrolled List - Define
Add terms as they occur No cross reference Simple flat structure
Copyright © 2013 Access Innovations, Inc.
Uncontrolled List - Example
List of names Grocery list Candidate term list
Copyright © 2013 Access Innovations, Inc.
Uncontrolled List - Format
Laundry Trim bushes Cat box needs cleaning Tommy’s birthday (bake cake) Iron Water plants ….other natural language lists
Copyright © 2013 Access Innovations, Inc.
Trying to Impose Control...
Do laundry Trim bushes Clean cat box Bake birthday cake Iron shirts Water plants
Copyright © 2013 Access Innovations, Inc.
Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.
Game supplies: 1 Deck of Orange Question and Challenge Cards 1 Deck of Green Answer Cards
Game setup: Shuffle the deck of Green Answer cards, Deal the entire deck to the players. Shuffle the deck of Orange Question and Challenge cards Place them facedown in a pile in the middle of the table so that all players can
reach the pile.
Reinforce what you just heard! Have fun!
TAXONOMATCH
Copyright © 2013 Access Innovations, Inc.
1. Play moves to the left of the dealer
2. Draw a card from the top of the Orange cards. Read it aloud to all of the players.
3. The player who read the card says out loud what they think the answer is.
4. Each player looks at the Green Answer cards in their hand.
1. If they have the correct answer to the Question or Challenge, they show their card to everyone at the table.
2. If everyone agrees that the answer is correct, the player holding the correct answer card gives it to the player who read the Question or Challenge card.
5. The player places their associated pair of cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.
6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above.
7. Discussion among the players to arrive at the correct answer is permissible and encouraged!
8. If players do not arrive at a consensus regarding the correct answer, the Orange Question and Challenge card may be returned to the bottom of the pile, and play passes to the person to the left of the player who drew the previous card.
9. When all of the Orange Question and Challenge cards have been drawn, read aloud, and matched with their Green Answer cards, the game ends.
10. If there are any Orange Question and Challenge cards remaining to which players cannot agree on an answer, players may consult their notes or ask the session speaker.
Copyright © 2013 Access Innovations, Inc.
TAXONOMATCH RULES
Term Forms
Term Forms
Nouns Prepositional forms Adjectives Adverbs Initial Articles Singular and plural
Copyright © 2013 Access Innovations, Inc.
Term Forms - Noun and Noun Phrases
Nouns and noun phrases print media carpet
Copyright © 2013 Access Innovations, Inc.
Term Forms - Prepositional Forms
Prepositional forms are seldom used okay in International Standard ISO
Philosophy of Education ANSI / NISO
Educational philosophy
Copyright © 2013 Access Innovations, Inc.
Term Forms – Adjectives
Adjectives not used in isolation may be used for coordination Miniature paintings
USE PAINTINGS AND MINIATURE Portable typewriters
USE TYPEWRITERS AND PORTABLE
Copyright © 2013 Access Innovations, Inc.
Term Forms – Adjectives
Adjectives may convert to noun forms
MINIATURE SIZE PORTABLE DEVICES TRIANGULAR SHAPE
Copyright © 2013 Access Innovations, Inc.
Term Forms - Adverbs
Adverbs not used unless part of a compound term VERY LARGE ARRAY RADIO TELESCOPE
Used for VLA
Copyright © 2013 Access Innovations, Inc.
Term Forms - Verbs Verbs
no infinitive or participle forms for actions that can be expressed as nouns and retain
clear meaning, use noun form or gerunds
Examples Speaking (not Speech) Walking (not Ambulation) Communication (not Communicate) Administration (not Administer)
Copyright © 2013 Access Innovations, Inc.
Term Forms - Initial Articles
AVOID THEM Example
Theater not The theater State (political entity) not The state
Use if part of a proper name Le Mans El Salvador
Copyright © 2013 Access Innovations, Inc.
Term Forms - Singular and Plural
Concrete entities count nouns are plurals - how many?
planets children
non count nouns - how much? nickel snow lace
Copyright © 2013 Access Innovations, Inc.
Term Forms - Singular and Plural
fully formed organism eyes mouth
objects are singular lamp
classes of things fruits
Copyright © 2013 Access Innovations, Inc.
Term Forms - Singular and Plural
Abstract concepts Show in the singular form
authority socialism packaging biochemistry
Copyright © 2013 Access Innovations, Inc.
Term Forms - Singular and Plural
Unique entities Show in the singular
Big Ben Grand Canyon
Copyright © 2013 Access Innovations, Inc.
Other Formatting
Spelling Punctuation Capitalization Abbreviations ...
Copyright © 2013 Access Innovations, Inc.
Spelling
Use what the users will use and cross post for multilingual fiber - fibre center - centre organization - organisation hemo - haemo Pediatrics - paediatrics
Copyright © 2013 Access Innovations, Inc.
Punctuation
Parentheses only for qualifiers Apostrophes are retained Hyphens - avoid
avoid avoid
avoid avoid
Copyright © 2013 Access Innovations, Inc.
Capitalization
NISO = initial only AACR2 format
Practice is to follow a manual of style Chicago Manual of Style Associated Press American Association of Publishers
Copyright © 2013 Access Innovations, Inc.
Abbreviations
Use only when well known Always include the full meaning LASER
Scope Note Light Amplification by Stimulated Emission of Radiation
WHO World Health Organization
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Cross References
See - S See also - SA Not related or associated Not opposite Just helpful guides
Copyright © 2013 Access Innovations, Inc.
Synthesis in Classification
S.R.Ranganathan 1933 Colon Classification analytico-syntactic classification analyze subject into component parts
(facets) arrange facets into schedules combine facets to express subject
complexity
Copyright © 2013 Access Innovations, Inc.
Ranganathan
A General Properties Ab Configuration
Ac Tubular B Materials Bc Metals
Bcc ferrous Bcd steels
Bcf Chromium steels Bcfi Chromium-nickel steels
K Modes of failure Kg Creep
Kgb Creep rupture L Stresses and loads
Lb Tensile
Copyright © 2013 Access Innovations, Inc.
Ranganathan
Tubular Chromium Nickel steel creep rupture Tensile strength
Ac Bcfi Kgb Bb Chain indexing Tubular
Chromium Nickel steel creep rupture
Tensile strength
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Facets
Additional ways to add meaning Divide terms into categories using a
single characteristic Limited number of categories
Copyright © 2013 Access Innovations, Inc.
Facets and Roles
PRECIS - Austin 1984 order of terms post-coordinate indexing system role of the term is important tomato
living plant? marketable product?
Facet role indicator organism end product
Copyright © 2013 Access Innovations, Inc.
Many Faceted Vocabularies
UMLS Semantic Network Unified Medical Language System - 49
BLISS Classification Association British Library Information Science System
Dewey Decimal Classification System Universal Decimal Classification
System Art and Architecture Thesaurus
Copyright © 2013 Access Innovations, Inc.
MeSH and Tree Pages
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 219
MeSH Alpha
Copyright © 2013 Access Innovations, Inc.
Order of Facets
Post-coordinate Means before order Notation becomes important Breaks down for large classes
(more than 5,000 terms)
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Notation Options
Expressive Ordinal Synthetic Enumeration Many style options
Copyright © 2013 Access Innovations, Inc.
Expressive Notation
83 Hazards 831 Fire 831.5Fire fighting 831.53 Fire fighting equipment 831.532 Fire extinguishers 831.532.5 Carbon dioxide fire extinguishers
832 Explosions
Copyright © 2013 Access Innovations, Inc.
Ordinal and Semi-ordinal Notation
HK Hazards HL Fire HM Fire fighting HN Fire fighting equipment HNB Fire extinguishers HNE Carbon dioxide fire extinguishers
HO Explosions
Indention is the sole indication of hierarchy
Copyright © 2013 Access Innovations, Inc.
Synthetic and Enumeration Notation
Need to allow the classification system to grow
Synthetic example P Architecture PAT Architectural information PAT.M Architectural information services
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 226
Notation Examples - AAT Facets
Copyright © 2013 Access Innovations, Inc.
Systematic Display
Paints (By composition)
Oil paints Water paints Cement paints
(By use) Primers Undercoats Top coats
Copyright © 2013 Access Innovations, Inc.
Copyright © 2001 Access Innovations, Inc. 228
AAT Pages
Notice faceted indentions
Copyright © 2013 Access Innovations, Inc.
229
AAT Term
Copyright © 2013 Access Innovations, Inc.
Alphabetical Display
Paints NT
Cement paints Oil paints Primers Top coats Undercoats Water paints
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Roles
ERIC Thesaurus - role indicators Adjectives - bibliographic terms Input or raw material Output or product Undesirables Indicated uses Materials “In which” Affects Primary topics of discussion Passive recipients, possessors, location Means used
Copyright © 2013 Access Innovations, Inc.
Roles
CAS - Super roles Analytical study Biological study Formation, nonpreparative Occurrence Preparation Process Uses
CAS Specific roles Miscellaneous Properties Reactant
Copyright © 2013 Access Innovations, Inc.
Subheadings as Roles
MeSH Therapeutic use Drug treatment (disease) Adverse effect (drug treatment) Diagnosis
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Treatment and Aspect Codes
Apply codes or types at article level Theoretical New development Experimental Practical
Copyright © 2013 Access Innovations, Inc.
Other Ways of Adding Value
Cross references Facets Notation Roles Treatment Term weighting
Copyright © 2013 Access Innovations, Inc.
Cranfield Project - Cleverdon 1966
Concepts in the main theme 9/10 Major subsidiary theme 7/8 Minor subsidiary theme 5/6
Copyright © 2013 Access Innovations, Inc.
Internet Engines
Complex weighting of terms Use term frequency Rank output wholly automatic Output based on input term weights Can also use “well formed” data -
like a thesaurus hierarchy field formatted data XML files
Copyright © 2013 Access Innovations, Inc.
Automatic and Semi-automatic Classification?
Data Harmony® M.A.I.™ Semio Autonomy - Muscat Net Owl - Names n-Stein Quiver Smart Logic
Copyright © 2013 Access Innovations, Inc.
Machine Aided Indexing Goals Improve
Indexing efficiency Indexing consistency Reduce editorial drift Depth of Indexing
Reduce Over and under indexing Term over use and under use
Copyright © 2013 Access Innovations, Inc.
Machine Aided Indexing Goals
Improve productivity
Indexer Information worker
Disambiguate terms Increase clarity
Copyright © 2013 Access Innovations, Inc.
Machine Aided Indexing - Intellectual Components
Word List or Thesaurus
Knowledge base Rules based
Natural Language (Semantic)
Editorial evaluation
Copyright © 2013 Access Innovations, Inc.
Example:M.A.I.™ Software Components
Rule Builder
Concept Extractor
Statistics Collector
Copyright © 2013 Access Innovations, Inc.
DATA HARMONY DISCOVERY
TOUR
Copyright © 2013 Access Innovations, Inc.
Taxonomies in Search
Copyright © 2013 Access Innovations, Inc.
Do the Data FIRST
What do you have? What does it need? How would you LIKE to access it? Look at the data BEFORE you create the
specifications DTD built without data is not going to work
Then choose the system that will support your data
Copyright © 2013 Access Innovations, Inc.
My Main Frustration
1. Select hardware
2. Select software
3. Design system
4. Try to load the data
5. Add the taxonomy, if at all That’s BACKWARDS
Copyright © 2013 Access Innovations, Inc.
Why Does Search Fail? Most large organizations have 5 different
search 7 All disappointing and sitting on the shelf
Inconsistent results Unclear path to results Lack of single unified clear consistent
vocabulary Not tied to data governance
Taxonomy Other metadata
Copyright © 2013 Access Innovations, Inc.
SEARCH
How search works Measuring accuracy in search
Precision Recall Relevance
Search theoretical basis Bayes, Boole, and the rest of the guys
The taxonomy effect
Copyright © 2013 Access Innovations, Inc.
Parts of Search
Search software Inverted Index Search algorithms
Presentation layer Search box Autocompletion Related and narrower terms Hierarchical display
Copyright © 2013 Access Innovations, Inc.
Hierarchical Display
InvertedFile
Index
Searchable Index
TaxonomyThesaurus
Inverted Files and Boolean are Basic to ALL Search
Copyright © 2013 Access Innovations, Inc.
Note: not available in all systems!
“Outline of Presentation”1 Define key terminology2 Thesaurus tools
Features Functions
3 Costs Thesaurus construction Thesaurus tools
4 Why & when?
Creating an Inverted File Index
Sample DOCUMENT
Copyright © 2013 Access Innovations, Inc.
Simple Inverted File Index ofthe Terms from the “Outline”
&1234constructioncostsdefinefeaturesfunctions
key ofoutlinepresentationterminologythesaurustoolswhenwhy
Copyright © 2013 Access Innovations, Inc.
& - Stop1 - Stop2 - Stop3 - Stop4 - Stopconstruction - L7, P2, SH costs - L6, P1, Hdefine - L2, P1, Hfeatures - L4, P1, SHfunctions - L5, P1, SH
key - L2, P2, Hof - Stopoutline - L1, P1, Tpresentation - L1, P3, Tterminology - L2, P3, Hthesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SHtools - (1) - L3, P2, H (2) - L8, P2, SHwhen - L9, P3, Hwhy - L9, P1, H
Complex Inverted File Index -Placement, Location added
Copyright © 2013 Access Innovations, Inc.
Search Presentation Layer
Automatic completionAnd type ahead
from Thesaurus
Copyright © 2013 Access Innovations, Inc.
Search Presentation Layer
Related
Narrower
Copyright © 2013 Access Innovations, Inc.
Search Presentation Layer
The Hierarchical view of the thesaurus is also a browse able view of the content.
The numbers include the number of hits 1. For the term 2. For the branch
Copyright © 2013 Access Innovations, Inc.
Many parts Search software – of course Computer network Parsing of text – the “inverted file” Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems
How Does Search Work?
Copyright © 2013 Access Innovations, Inc.
Technical Parts of Search
Search technology Ranking algorithms Query language Federators Cache
Inverted index – as discussed above Other enhancements Presentation Layer
Copyright © 2013 Access Innovations, Inc.
Access Innovations – Complex Farm With Perfect Search
SourceData
Query
Search Harmony
Presentation Layer
Repository XIS (cache)
Cleanup, etc.
Federators
Query Servers
Index Builders
DeployHub
Cache Builders
Copyright © 2013 Access Innovations, Inc.
QU
ERY API
CUSTOMCONNECTOR
EMAILCONNECTOR
Core Architectural Components
Pipeline
SEARCHSERVER
QU
ERYPR
OC
ESSOR
Query
Results
VerticalApplications
Portals
CustomFront-Ends
MobileDevicesContent
Push
DO
CU
MEN
TPR
OC
ESSOR
WebContent
Files,Documents
Databases
CustomApplications
CO
NTEN
T API
MANAGEMENT API
Index DBDATABASE
CONNECTOR
FILETRAVERSER
WEBCRAWLER
Pipeline
Email, Groupware
Administrator’sDashboard
FILTERSERVER
Agent DB
Alerts
Data Harmony Governance API
MA
Istro
Search harmony
FAST Search Example
Copyright © 2013 Access Innovations, Inc.
Measuring Accuracy in Search
Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules
263Copyright © 2013 Access Innovations, Inc.
Relevance
How well a set of returned documents answers the information need
“Accuracy” Related to objective of search
Different user communities Information resources
Tension of user needs and context available A confidence “guesstimate”
Copyright © 2013 Access Innovations, Inc.
Recall = Number of relevant items retrieved
Number of relevant items in the collection
Precision = Number of relevant items retrieved Number of items retrieved
Relevance = Germane (Precision) Pertinent (Recall)
The Formulas
Copyright © 2013 Access Innovations, Inc.
Measuring Relevance
Concepts Context Age of documents Completeness (recall) Quality Statistically determined ? Nope, it is subjective
Someone has to determine the rightness of the item A confidence factor = canard!
Copyright © 2013 Access Innovations, Inc.
Kinds of Search Bayesian –
FAST Lucene Autonomy / Verity
Boolean Dialog Endeca Perfect Search
Ranking algorithms Google
267Copyright © 2013 Access Innovations, Inc.
George Booleand Boolean Algebra
George Boole Mathematician 1815-1864
Boolean algebra An algebraic system of logic AND, OR, NOT, ANDNOT, Dialog, BRS, Stairs
268Copyright © 2013 Access Innovations, Inc.
Boolean Representation Venn diagram showing
the intersection of sets A AND B (in violet),
The union of sets A OR B (all the colored regions),
And set A XOR B (all the colored regions except the violet).
The "universe" is represented by the rectangular frame.
269Copyright © 2013 Access Innovations, Inc.
Bayes and Bayes’ Theorem Thomas Bayes
Mathematician 1702 - 1761
Bayesian theorem Uses probability inductively Established a mathematical basis for probability inference
WHAT? A means of calculating,
from the number of times an event has not occurred, the probability that it will occur in future trials
270Copyright © 2013 Access Innovations, Inc.
Bayesian Methods –Cautions
A user might wish to change the distribution of probabilities.
A user will make a novel request for information in a previously unanticipated way.
The computational difficulty of exploring a previously unknown network.
The quality and extent of the prior beliefs used in Bayesian inference processing.
Copyright © 2013 Access Innovations, Inc.
Bayesian Methods - Cautions (continued)
A Bayesian network is only as useful as the prior knowledge is reliable.
An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.
Must ensure the selection of the statistical distribution induced in modeling the data.
Must have the proper distribution model to describe the data.
That is… you have to constantly train and retrain the data
Copyright © 2013 Access Innovations, Inc.
Basic Areas of Natural Language Processing (NLP)
Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense
Copyright © 2013 Access Innovations, Inc.
Basic Areas of AutomaticLanguage Processing (ALP)
Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics
Copyright © 2013 Access Innovations, Inc.
Statistical Search
Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic Etc.
275Copyright © 2013 Access Innovations, Inc.
Word and Term Parsing
Stemming -ing, -ed, -es, -’s, -s’, etc. Depluralization
Truncation Left and right
Wild cards Organi*ation
Variant Spellings Centre, Center
Hyphens Copyright © 2013 Access Innovations, Inc.
The Taxonomy Effect
Where do the terms go? How are they used in search What other ways can I use the taxonomy
in search?
Copyright © 2013 Access Innovations, Inc.
For search all publications
Search database for Journals and pubs
Bookstore search
Search of 53 crawled sites including journals, books, web site, conference sites, etc.
Site search
Navigation
Copyright © 2013 Access Innovations, Inc.
Taxonomy DrivenSearch Presentation
Navigate the full taxonomy “tree”
BROWSE
Auto-completion using the taxonomy
Guide the user
Copyright © 2013 Access Innovations, Inc.
Subject Browsing
Copyright © 2013 Access Innovations, Inc.
Targeted Resources Basedon Subject or User Role
CONFIDENTIALCopyright © 2013 Access Innovations, Inc.
Member Profile Tagging
User pastes or uploads CV
Button to auto-extract taxonomy attributes
Copyright © 2013 Access Innovations, Inc.
TaxoTerm ServerData Harmony
(M.A.I.)
Even
t Han
dle
r
Returns subject metadata
MicrosoftSharePointServer 2010
User uploads a document to SharePoint space
Before uploading to SharePoint server, the EventHandler sends the document to Data Harmony.
Data Harmony automatically attaches indexing terms before uploading to MOSS
Adding Terms
to SharePoint
Copyright © 2013 Access Innovations, Inc.
SharePoint 2010 Only Shows 10 Lines of the Taxonomy
284
This add on makes it all viewable
Copyright © 2013 Access Innovations, Inc.
QU
ERY API
CUSTOMCONNECTOR
EMAILCONNECTOR
Core Architectural Components
Pipeline
SEARCHSERVER
QU
ERYPR
OC
ESSOR
Query
Results
VerticalApplications
Portals
CustomFront-Ends
MobileDevicesContent
Push
DO
CU
MEN
TPR
OC
ESSOR
WebContent
Files,Documents
Databases
CustomApplications
CO
NTEN
T API
FAST MANAGEMENT API
Index DBDATABASE
CONNECTOR
FILETRAVERSER
WEBCRAWLER
Pipeline
Email, Groupware
Administrator’sDashboard
FILTERSERVER
Agent DB
Alerts
Use taxonomy terms hereData Harmony Governance API
MA
Istro
Search harmony
Taxonomies Added in Search Example
Copyright © 2013 Access Innovations, Inc.
Auto suggestion ofTaxonomy Terms
Populate Keywords, Descriptors, Indexing terms, etc.
Allow for manual review of auto-tagging for quality assurance.
Copyright © 2013 Access Innovations, Inc.
Where do I use a taxonomy?
Copyright © 2013 Access Innovations, Inc.
Thesaurus Master
Machine Aided
Indexer (M.A.I.™) Database
Repository
SearchPresentation
Layer
Increasesaccuracy
Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms
Client Taxonomy
Inline Tagging
Metadata and Entity Extractor
Automatic Summarizati
on
Search Software
Client Data
Full Text
HTML, PDF,
Data Feeds,
etc.
Client taxonomy
The Workflow
288
Tag and Createmetadata
Put in data base with tags
Build Search inverted index
Create user interface
Gather source data
Copyright © 2013 Access Innovations, Inc.
Thesaurus Master
Machine Aided
Indexer (M.A.I.™) Reposito
ry
SearchPresentation:
90% accuracy
Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms
Client Taxonomy
Inline Tagging
Metadata and Entity Extractor
Automatic Summarizati
on
SearchSoftwar
e
Client Data
Full Text
HTML, PDF,
Data Feeds, etc.
Client taxonomy
Taxonomy In Sharepoint
Copyright © 2013 Access Innovations, Inc.
[Data Harmony fully integrated with MOSS.]
Adding Terms toInformation Objects
Part of the record XML MARC
A relational table pointing the terms to a record ID number (Secondary key)
Adding data to the HTML META NAME KEYWORD Element
Many other options
Copyright © 2013 Access Innovations, Inc.
Part of the Record - XML
Added as an element in the XML record Need an element to put the data in
<Taxonomy Term> Capture the terms when creating the
records
Copyright © 2013 Access Innovations, Inc.
The author pastes the data to the
document template,
attaching images, graphs, as necessary:
Author Submission
Module
Copyright © 2013 Access Innovations, Inc.
Editorial Workflow IntegrationAuthor Submission Module
The author fills in the data to the document template, attaching images and graphs as necessary.
An API calls Data Harmony and generates a list of indexing terms based on the content.
Copyright © 2013 Access Innovations, Inc.
Authors review the indexing and may change it.
Content is stored into a data repository as HTML, XML, etc.
Editorial Workflow IntegrationAuthor Submission Module
Copyright © 2013 Access Innovations, Inc.
In the HTML Record Makes it crawlable for the internet Used in CMS applications
Content Management Systems Add to the HTML
Manually In Dreamweaver In your CMS like Extron
Author Submissions Example Do the same with SharePoint
Copyright © 2013 Access Innovations, Inc.
META NAME “KEYWORDS”
Copyright © 2013 Access Innovations, Inc.
In Relational Database Table
Primary Key – the record Secondary key all the metadata
Like taxonomy terms Like author Like publication date
Used in Oracle, SQL, etc Need a field to put the taxonomy data in
Supports “Faceted Search” each item in a separate field or element or table
Copyright © 2013 Access Innovations, Inc.
RDBMS Connection
Taxonomy term table
Copyright © 2013 Access Innovations, Inc.
Using Taxonomiesin Applications
• Improve search• Subject browsing• Mobile intelligence• Targeted resources based on
subject or user role• Link to society resources• Author submission module• Author authority database• Expert reviewer identification• Member profiles• Data visualization• More like this
• In “indexing” or categorizing, as subject metadata
• In content management systems
• In SharePoint• In mashups• In social networking sites• In author tagging • In filtering data – e.g., spam
filters and RSS feeds• In web crawlers• Social media - community
Copyright © 2013 Access Innovations, Inc.
A Quick Look
Behind the Scenes
DatabaseManagement
System
Thesaurustool
Indexingtool• Validate terms
• Add terms and rules• Change terms and rules• Delete terms and rules
• Search thesaurus• Validate term entry• Block invalid terms• Record candidates
• Establish rules for term use
• Suggest indexing terms
Copyright © 2013 Access Innovations, Inc.
Taxonomyview
ThesaurusTerm Record
view
Copyright © 2013 Access Innovations, Inc.
Where Does the Subject Metadata Go?
Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or
other database tables
Copyright © 2013 Access Innovations, Inc.
HTML Header
Copyright © 2013 Access Innovations, Inc.
Suggested taxonomy descriptors
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Integrate Taxonomy to Enhance Find-ability
Browsable categories of a directory Browsable faceted navigation
Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms
and relationships
Copyright © 2013 Access Innovations, Inc.
More Taxonomy Enrichment
Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement
Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)
Copyright © 2013 Access Innovations, Inc.
Brand is repeated in several spots and tied to search as well
Copyright © 2013 Access Innovations, Inc.
Raw Full text data
feeds XIS™
Creation
Taxonomy Thesaurus Master®
Printed source
materials
Taxonomy terms
M.A.I.™ Concept Extractor
M.A.I.™ Rule Base
Load toPerfect Search
Search Harmon
™ Display Search
Database Plus Search Workflow
Data Crawls on 53+ sources
Add metadat
a XIS™ repositor
y
SQL for ecommerc
e
Save data to search and repositories at the same time
Copyright © 2013 Access Innovations, Inc.
Raw Full text data
feeds
XIS Creation
Taxonomy Thesaurus
Master
Printed source
materials
Taxonomy terms
MAI Rule Base
Load toSearch
Search Harmony Display Search
Data Base Plus Search Workflow
Data Crawls on data sources
Add metadata
XIS repositor
y
SQL for ecommerce
MAI Concept Extractor
Source data
Clean and enhance data
Search data
Copyright © 2013 Access Innovations, Inc.
Use Case: Inline Tagging
Show the exact point where the concept is mentioned
Mouse-over to view the term record
Statistical summary, showing the number of times each term is mentioned in the article
Copyright © 2013 Access Innovations, Inc.
Inline Tagging HTML View
Copyright © 2013 Access Innovations, Inc.
XML View forInline Tagging
Copyright © 2013 Access Innovations, Inc.
Taxonomyview
ThesaurusTerm Record
view
Copyright © 2013 Access Innovations, Inc.
The New Board Game Applications Implementation The taxonomy
A TAXING SITUATION
Copyright © 2013 Access Innovations, Inc.
The Changing Faces ofWeb Taxonomies
….and how the information is delivered From current site To new version
Depends on TAXONOMY Personalization Feeding ads Consistent information
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
HTML HeadersMETA NAME KEYWORD
Use the taxonomy here
Copyright © 2013 Access Innovations, Inc.
Copyright © 2005 - Access Innovations, Inc.Copyright © 2013 Access Innovations, Inc.
More Innovations! Link topic to article to author to event Make visual links within domain Enable authors to submit and categorize conference
submissions Create author authority database linking to co-authors, topics,
locations, etc. Create expert reviewer database Create member profiles with alternate names, publications,
tagged by topic Visualize data and domain distribution Display interest connections in social network Deliver accurate targeted information through mobile applications Etc.
Copyright © 2013 Access Innovations, Inc.
Change to Ready, Aim, Fire!
Follow the data Look at the data, format and content Design taxonomy for data Leverage the standards Use taxonomy to tag data Choose search and repository software for data Load the data into the system Keep your eye on the target
Copyright © 2013 Access Innovations, Inc.
Standards forMonolingual Thesauri
TEST - Thesaurus of engineering and scientific terms - COSATI 1967
ARNOR NFZ 47-100 1981 French DIN 1463 German 1987-1993 NISO Z39.19 - 1993 - American
Copyright © 2013 Access Innovations, Inc.
Where Can I Get Taxonomy Standards?
www.niso.org Z39.19 (2010) Controlled Vocabularies
www.ISO.ce ISO 25964 parts 1 and 2 (2012 and 2013)
www.bsi.uk.co www.w3c.org SKOS and OWL www.accessinn.com/library
Copyright © 2013 Access Innovations, Inc.
Suggested Reading F.W. Lancaster - 1986
Vocabulary Control 1986 Aitchison, Gilchrist and Bawden
Thesaurus construction and use: a practical manual 4th edition
Accidental Taxonomist Heather Heddon
TaxoDiary.com Blog site
Copyright © 2013 Access Innovations, Inc.
Suggested Reading
Introduction to any thesaurus INSPEC NICEM Pychological Abstracts etc.
Copyright © 2013 Access Innovations, Inc.
It Just Takesa Little
ImaginationThank you
Marjorie M.K. Hlava, PresidentBob Kasenchak, Project CoordinatorAccess [email protected][email protected]
Copyright © 2013 Access Innovations, Inc.