taxonomy governance
DESCRIPTION
Taxonomy Governance. Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC. Agenda. 1:30Welcome & Introductions 1:45Exercise: Taxonomy Revisions 2:15Fundamental Processes 2:30Governance Team Roles and Structures 3:00Tools 3:05Break 3:15Exercise: Organizational Self-Assessment - PowerPoint PPT PresentationTRANSCRIPT
Strategies LLCTaxonomy
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Taxonomy Governance
Ron Daniel, Jr. & Joseph A. Busch
Taxonomy Strategies LLC
2Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
3Taxonomy Strategies LLC The business of organized information
Who we are: Joseph Busch
Over 25 years in the business of organized information Founder, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies Program Manager, Getty Foundation Manager, Pricewaterhouse
Metadata and taxonomies community leadership President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and
Telecommunications Board Reviewer, National Science Foundation Division of Information and Intelligent
Systems Founder, Networked Knowledge Organization Systems/Services
4Taxonomy Strategies LLC The business of organized information
Who we are: Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard Metadata)
working group Acting chair: XML Linking working group Member: RDF working groups Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.
5Taxonomy Strategies LLC The business of organized information
Recent & current projects
Government Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (
www.firstgov.gov) Head Start Infocomm Development Authority of
Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (
www.usda.gov)
Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Siderean Software Sprint Time Inc.
Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant
NGO’s CEN IDEAlliance IMF OCLC
6Taxonomy Strategies LLC The business of organized information
Participant Introductions
Who are you?
What do you do?
What brings you here today?
7Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
8Taxonomy Strategies LLC The business of organized information
Taxonomy Governance Overview
Is “Taxonomy Governance” synonymous with “Taxonomy Maintenance”?
What kinds of changes can be made, and what are their costs?
What kinds of information are needed to determine the changes?
What kind of group should maintain the taxonomy?
What kinds of rules should the group follow to decide on changes?
What should the group do beyond maintaining the taxonomy?
9Taxonomy Strategies LLC The business of organized information
Exercise: Taxonomy Modifications
Divide into small groups
Review assigned sample taxonomy
Discuss changes you would make
In 10 minutes, a spokesperson will speak for the group and briefly: Tell us something good about the
taxonomy Characterize the short-term changes your
group would make Characterize the questions your group
would like answered before making other changes
10Taxonomy Strategies LLC The business of organized information
Exercise Notes
Team Members:
Something good about the taxonomy:
Short term changes:
Questions for other changes:
11Taxonomy Strategies LLC The business of organized information
Group 1 Sample Taxonomy
12Taxonomy Strategies LLC The business of organized information
Group 2 Sample Taxonomy
Business / Accounting / Firms / Directories
Business / Biotechnology & Pharmaceuticals / Education & Training
Business / Employment / By Industry
Business / Healthcare / Employment / Regional
Business / Small Business / Finance / AccountingReference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics
Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums
Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical
Science / Math / Academic Departments / South America / Colombia
Science / Social Sciences / Linguistics / Translation / Associations
Society / People / Women / Science & Technology / Mathematics
Top Level
Random Samples of Detailed Categories
13Taxonomy Strategies LLC The business of organized information
Group 3 Sample Taxonomy
Source: http://householdproducts.nlm.nih.gov/products.htm
Top Level
Detail in Auto Products Category
14Taxonomy Strategies LLC The business of organized information
Predictions
Short-term changes will center on rules of style – ‘&’ vs. ampersand, capitalization, plurals
Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the UI Presentation
Questions for Long-term changes will focus, in decreasing order, on: Who are the users and what are they doing? What is the content and how much is in the various
categories? … What kind of money depends on the taxonomy, and what
kind of maintenance expenses are justified?
Anything else people want to cover?
Editorial Rules
Metadata Specification,
Design for maintainability
How to put it into action?
User Characterization
Content and Metadata
Maintenance
ROI
15Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
16Taxonomy Strategies LLC The business of organized information
Fundamental Processes
What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies? Query log / Click trail examination Tagging Error Correction
What are the key outlooks a taxonomist should try to instill in their organization?
17Taxonomy Strategies LLC The business of organized information
Fundamental Process #1 – Query Log Examination
How can we characterize users and what they are looking for?
Query Log & Click Trail Examination Sophisticated software available,
but don’t wait. 80/20 Rule – 80% of value from
20% of possible reports.
Greatest value comes from: Identifying a person as responsible
for search quality Starting a “Measure & Improve”
mindset
Greatest challenge: Getting a person assigned (≥ 10%) Getting logs turned back on What to do after the obvious fixes
have been made
UltraSeek Reporting
• Top queries • Queries with no
results • Queries with no
click-through • Most requested
documents • Query trend
analysis • Complete server
usage summary Click Trail Packages
iWebTrackNetTrackerOptimalIQ
SiteCatalystVisitorvilleWebTrends
18Taxonomy Strategies LLC The business of organized information
Fundamental Process #2 – Tagging Error Correction
For the Taxonomy to be used, its values must be associated with content. We will refer to this as “Tagging”.
Errors will happen, and some will be found. What are you going to do about them?
Define an error correction process. Process will accommodate questions like:
Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc.
Once an error is corrected, NEVER lose that fact. Manually reviewed pages are vital for training automatic classifiers. Has implications for metadata specification and review procedures.
Over time, multiple error detection methods will be defined. e.g. Statistical sampling of newly added pages Gradually, additional error correction processes may be defined to deal
with particular types of errors.
19Taxonomy Strategies LLC The business of organized information
Fundamental Outlooks
How are we going to build and maintain metadata structures and controlled vocabularies? The taxonomy problem
How are we going to populate metadata elements with complete and consistent values? The tagging problem
How are we then going to use metadata in applications and demonstrate benefits? The ROI problem
Taxonomy Governance is a standards process.
Take tips from other standards efforts Team, with comment-handling
responsibilities and an appeals process
Issue Logs Announcements Release Schedule
Foster a “Measure & Improve” MindsetMust know this to
address other problems!
20Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
21Taxonomy Strategies LLC The business of organized information
Taxonomy Business Processes
Taxonomies must change, gradually, over time if they are to remain relevant
Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions
A team will need to maintain the taxonomy on a part-time basis
Taxonomy team reports to some other steering committee
22Taxonomy Strategies LLC The business of organized information
Published CVs and STs
Consuming Applications
Syndicated Terminologies
IntranetSearch
’’
Web CMS
Archives
ERMS
Custodians
Notifications
Change Requests & Responses
ISO3166-1
Other External
ERP
Other Internal
Vocabulary Management
System
Other Controlled
Items
…
’’
Intranet Nav.
DAM
…
Definitions about the Controlled Vocabulary Governance Environment
Controlled Vocabulary Governance Environment
2: CV Team decides when to update CVs
3: Team adds value via mappings, translations, synonyms, training materials, etc.
1: Syndicated Terminologies change on their own schedule
4: Updated versions of CVs published to consuming applications
CVs
23Taxonomy Strategies LLC The business of organized information
Other Controlled Items
Taxonomy Team will have additional items to manage: Charter, Goals, Performance Measures Editorial rules Team processes Tagger training materials (manual and automatic) Outreach & ROI
Communication plan Website Presentations Announcements
Roadmap
24Taxonomy Strategies LLC The business of organized information
Taxonomy governance | Generic team charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated taxonomy materials, such as:
Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested change Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
25Taxonomy Strategies LLC The business of organized information
Editorial Rules
To ensure consistent style, rules are needed
Issues commonly addressed in the rules: Sources of Terms Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Spaces Synonyms and Acronyms Term Order (Alphabetic or …) Term Label Order (Direct vs. Inverted)
Must also address issue of what to do when rules conflict – which are more important?
Rule Name Editorial Rule
Use Existing Vocabularies
Other things being equal, reusing an existing vocabulary is preferred to creating a new one.
Ampersands The character '&' is preferred to the word ‘and’ in Term Labels.Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”.
Special Characters
Retain accented characters in Term Labels.Example: España
Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma.Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.
Capitalization Use title case (where all words except articles are capitalized).Example: “Education, Learning & Employment”NOT “Education, learning & employment”NOT “EDUCATION, LEARNING & EMPLOYMENT”NOT “education, learning & employment”
… …
26Taxonomy Strategies LLC The business of organized information
Roles in Two Taxonomy Governance Teams
Executive Sponsor Advocate for the taxonomy team
Business Lead Keeps team on track with larger business
objectives Balances cost/benefit issues to decide
appropriate levels of effort Specialists help in estimating costs
Obtains needed resources if those in team can’t accomplish a particular task
Technical Specialist Estimates costs of proposed changes in
terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems
Content Specialist Team’s liaison to content creators Estimates costs of proposed changes in
terms of editorial process changes, additional or reduced workload, etc.
Small-scale Metadata QA Responsibility
Taxonomy Specialist Suggests potential taxonomy changes based on
analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system
with aid of IT specialist
Content Owner Reality check on process change suggestions
Business Lead Custodians
Responsible for content in a specific CV. Training Representative
Develops communications plan, training materials
Work Practices Representative Develops processes, monitors adherence
IT Representative Backups, admin of CV Tool
Info. Mgmt. Representative Provides CV expertise, tie-in with larger IM effort
in the organization.
Team structure at a different org.
27Taxonomy Strategies LLC The business of organized information
Taxonomy governance | Where changes come from
experience
End User
Firewall
Taxonomy
Content TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of NASA
experience
End User
Taxonomy Team
FirewallFirewall
Taxonomy
Content TaggingLogic
TaggingLogic
ApplicationUI
ApplicationUI
TaggingUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of the organization
Team considerations
1. Business goals
2. Changes in user experience
3. Retagging cost
Recommendations by Editor
1. Small taxonomy changes (labels, synonyms)
2. Large taxonomy changes (retagging, application changes)
3. New “best bets” content
Application Logic
28Taxonomy Strategies LLC The business of organized information
Processes
Different organizations will need to consider their own change processes. Organization 1: A custodian is
responsible for the content, but checks facts with department heads before making changes.
Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.
Change process MUST also consider cost of implementing the change Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations
Taxonomy Change CasesCase 1. Renaming a term
Case 2. Adding a new leaf term
Case 3. Inserting a new term
Case 4. Splitting a term
Case 5. Deleting a leaf term or subtree
Case 6. Deleting a term
Case 7. Moving a subtree
Case 8. Merging terms
Case 9. Adding a CV
Case 10. Deleting a CV
29Taxonomy Strategies LLC The business of organized information
Taxonomy governance | Taxonomy maintenance workflow
Analyst Editor
Problem?
Copywriter
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Taxonomy Tool
Copy edit new name
Add to enterprise Taxonomy
Sys Admin
30Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
31Taxonomy Strategies LLC The business of organized information
Taxonomy editing tools vendors
Abi
lity
to E
xecu
telo
whi
gh
Completeness of VisionVisionariesNiche Players
Widely used, cheap, single-user
High functionality, high cost ($100k!)
Most popular taxonomy editor? MS
Excel
Immature industry – no vendors in upper-right quadrant!
32Taxonomy Strategies LLC The business of organized information
Sample Taxonomy Editor Functionality
Standard and Custom Fields
Standard and Custom Relations Data Typing, Restrictions, and
Inference
Flexible Reporting
Flexible Importing
Multiple Vocabulary Support
Inter-Vocabulary Relations
Unique IDs ISO Codes not sufficient
Workflow Voting Change Request Management
Programmability Hierarchy
Browser
Term Editing
33Taxonomy Strategies LLC The business of organized information
Where do I put the metadata?
Where can I store metadata? In the content – HTML Headers, File properties, etc. In a centralized repository – Search index, MDDB, etc. In multiple systems – Common case
Where should I store metadata? Consultant’s answer – “It depends.” If you are moving files through a process, putting it in the file keeps it
from getting dropped at system borders. If you are doing search across multiple documents, it has to be at
least copied out of the files. If you make copies of files and modify them, consistent in-file
metadata will be impossible.
Real question is not where to STORE the metadata, it is how to MAINTAIN the metadata. Web CMS as an example. Central Metadata Database is a very advanced practice.
34Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
35Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
36Taxonomy Strategies LLC The business of organized information
What Processes Should I Try to Institute?
Processes will vary from one organization to another.
Assessing the Organization’s state is the first step.
Determining the ROI and potential resources follows.
Plan on instituting processes over time, beginning with basic ones.
37Taxonomy Strategies LLC The business of organized information
Search and Metadata Self-Assessment Form
Background1) Rate your organization’s search &
metadata maturity from 1 to 10.
2) What was the most recent change to your organization’s search & metadata processes?
3) What is the next step for your organization’s search & metadata processes?
Basic4) Is there a process in place to examine
query logs?
5) Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.?
Intermediate6) Is there an ongoing data cleansing
procedure to look for ROT (Redundant, Obsolete, Trivial content)? If so, describe briefly.
7) Does the search engine index more than 4 repositories around the organization?
8) Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools?
9) Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money?
10) Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly.
Advanced11) Are there established qualitative and
quantitative measures of metadata quality? If so, describe briefly.
12) Can the CEO explain the ROI for search and metadata?
Optional13) Your name:
14) Organization:
15) E-mail:
Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey.
38Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
39Taxonomy Strategies LLC The business of organized information
Metadata Maturity Model
Taxonomy governance processes must fit the organization
As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and Metadata
Honestly assess your organization’s metadata maturity in order to design appropriate governance processes
We are starting to define a maturity model, similar to the CMMI model in the software world.
40Taxonomy Strategies LLC The business of organized information
Metadata Maturity Model
Process Areas Maturity Levels Limiting Processes
Basic Intermediate Advanced Bleeding Edge
Search Capabilities Uniform Search BoxQuery Log Exam.
Index MultipleBest BetsSimple Grouping
Intranet Facet NavigationImproved Ranking
Metadata and taxonomy standards
System MD Stds. Organization MD Std.Reuse ERP
Multiple Repos. ComplyTaxonomy Roadmap
Highly Abstract Subject Taxonomies
Tools and tool selection Requirements, then Tools
Bakeoff Datasets Budget for Bakeoffs
Unneeded Capabil.Tools, then Reqs.
Staff training and hiring Search Analyst Role
Librarian Expertise Pre-hire Testing SME Catalogers
Data creation and QA CM Introduced ROT-Elimination Hybrid Creation Model
Adaptive QualificationQuality Measures
Project management Project Plan Std. Proj. Methodol.X-Functional TeamsCommunication PlanMulti-Year Plan
Early Termination
Executive support and ROI
External Search ROI
Intranet ROI Model CEO knows Search ROI
Use it or Lose It Budgets
Shameless Plug: Tomorrow Morning at 9:45
Call for Data: Leave Self-Assessments with us
41Taxonomy Strategies LLC The business of organized information
Purpose of Maturity Model
Estimating the maturity of an organization’s information management processes tells us: How involved the taxonomy development and maintenance process
should be Overly sophisticated processes will fail
What to recommend as next steps
Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals.
Mature processes have expenses which must be justified by consequent cost savings or revenue gains.
Metadata Maturity may not be core to your business.
42Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
43Taxonomy Strategies LLC The business of organized information
Overview of Best Practices in Metadata and Taxonomy
Avoid monolithic ‘subject’ taxonomies May have a browsing taxonomy constructed from combined facets.
Use (or map to) Dublin Core for basic information. Extend with custom elements for specific facts. Use pre-existing, standard, vocabularies as much as possible.
Validate author names with LDAP directory ISO country codes for locations Product & service info from ERP system
Designate a team to manage the taxonomies and related materials Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI
Design a Metadata QC Process Start with an error-correction process, then get more formal on error
detection. In the future, large-scale ontologies like CYC may be valuable in
automated error detection.
44Taxonomy Strategies LLC The business of organized information
Factor “Subject” into smaller facets
Size DMOZ tries to organize all
web content, has more than 600k categories!
Difficulty in navigating, maintaining
Hidden facet structure “Classification Schemes” vs.
“Taxonomies”
45Taxonomy Strategies LLC The business of organized information
Sources for 7 common vocabularies
Vocabulary Definition Potential Sources
Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your organizational structure, competitors, partners, regulators, etc.
Content Type Structured list of the various types of content being managed or used.
DC Types, AGLS Document Type, AAT Information Forms , Records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
FIPS 66, SIC, NAICS, etc.
Location Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, etc.
Topic Business topics relevant to your mission and goals.
Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc.
Audience Subset of constituents to whom a piece of content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Products and Services
Names of products/programs & services.
ERP system, Your products and services, etc.
Function Functions and processes performed to accomplish mission and goals.
FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.
46Taxonomy Strategies LLC The business of organized information
Facet Principles
Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective “objects”.
Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable. For example, labels like “Anarchist” or “Prime Minister” can be
applied to the same person at different times (e.g. Nelson Mandela).
47Taxonomy Strategies LLC The business of organized information
Iterative Development Vision (More participants and tagged content at each iteration)
1 Identify Objectives
Interview core team and stakeholders
2 Inventory Content
ID sources, spider assets & extract
metadata
Define fields & purpose
3 Specify Metadata
4 Model Content
Define content chunks & XML
DTDs
5 Specify Vocabularies
Compile controlled
vocabularies
6 Specify Procedures
Start with UI sketches,
off-the-shelf rules.
7 Train StaffManually tag small sample
Review tagged samples, default
procedures
Gather additional sources, if
any
Revise if needed, bake
into alpha CMS
Revise if needed, bake into alpha CMS
Revise, use in alpha CMS
alpha workflows in CMS
Use alpha CMS to tag larger
sample
Interview alpha users
Modify CMS for beta
Modify CMS for beta
Revise, use in beta CMS
Modify & extend workflows
Finalize training materials & train
staff
Gather additional sources, if
any
Tailor the default
materials
Use beta CMS to tag larger
sample
Interview beta users
Modify for 1.0
Modify for 1.0
Revise using team
procedure
Finalize procedure materials
Plan & Prototype Alpha Dev & Test Beta D&T Final D&TProject Team Stakeholders and SMEs Friendly Users Audiences
StageParticipants
48Taxonomy Strategies LLC The business of organized information
Planning for Taxonomy Changes
Error Correction – What to do when end-users and tagging staff notice problems? Provide for it in the Error Correction Process Add Query Log Analysis to help detect user problems How to answer questions re. things to add, delete, or rearrange in
the taxonomy? Keep a visible issue log Discuss with SMEs, tag samples, use other testing methods
Per-facet changes: Corporate reorganizations, Product lineup changes, Country splits
& merges, … will happen. Prepare for them when deploying those facets
Long-term – what facets to create, when, and why See Taxonomy Roadmap section
49Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes Brief remarks on Measurements, ROI, Training, Roadmap
4:20 Q &A
4:30 Adjourn
50Taxonomy Strategies LLC The business of organized information
Measuring Metadata and Taxonomy Quality
Taxonomy development is an iterative process
Develop an organizational idea, then test it by tagging sample content
Elicit feedback via walk-throughs and card sorting exercises
Use both qualitative and quantitative methods Time, budget, and availability of tagged data will
determine what methods are possible.
51Taxonomy Strategies LLC The business of organized information
Taxonomy testing | Qualitative methods
Method Process Validation
Walk-throughs Show and explain Approach
Consistency to rules
Accuracy (SME Checking)
Appropriateness to task
Usability Testing Card sorting,
Contextual analysis
Repeatability of user classification
Tasks are completed successfully
Time to complete task is reduced
User Satisfaction Survey Reaction to new interface
Reaction to search results
Tagging samples Tag sample content with taxonomy
Content ‘fit’
Fills out content inventory
Training materials for people & algorithms
Basis for quantitative methods
Include sample pages in walkthroughs, not just the
hierarchy.
52Taxonomy Strategies LLC The business of organized information
Tagged Samples
The Taxonomy must fit the content.
How to verify this? Tag samples! Spreadsheets are a convenient tool
for this. URLs, drop-down choosers, text notes all allowed.
Team can review tagged samples when reviewing taxonomy More sophisticated teams may test
inter-cataloger agreement
Samples should appear in training materials for tagging staff Show typical and unusual cases.
Samples are used to define training sets for automatic classifiers.
Metadata Element
Metadata Value
URL sixbits.atl.frb.org/invoke.cfm?objectid=A01B30D1-10C2-11D6-981100508B104751&method=display
Headline Innovation Awards
Organization Federal Reserve Bank of Atlanta
Content Type Honors & Awards
Subject Salary & Compensation?
DOCUMENT URL FACET A FACET B FACET C FACET D MISSING IDEAS
53Taxonomy Strategies LLC The business of organized information
Quantitative Method | How evenly does it divide the content?
Background: Documents do not distribute uniformly
across categories Zipf (1/x) distribution is expected
behavior 80/20 rule in action (actually 70/20 rule)
Methodology: Part of alpha test of ‘content type’ for
corporate intranet 115 URLs selected at random from
search index were manually categorized. Inaccessible files and ‘junk’ were removed
Results: Results were slightly more uniform than
the Zipf distribution, which is better than expected
Measured and Expected Distribution of Content Types in an Intranet
0
5
10
15
20
25
Peo
ple,
Gro
ups
& P
lace
s
New
s &
Eve
nts
Man
uals
&Le
arni
ngM
ater
ials
Ope
ratio
ns &
Inte
rnal
Com
mun
icat
ions
Mar
ketin
g &
Sal
es
Reg
ulat
ions
,P
olic
ies,
Pro
cedu
res
&
Pap
ers
&P
rese
ntat
ions
Oth
er &
Unc
lass
ified
Pro
gram
s,P
ropo
sals
, P
lans
& S
ched
ules
Content Type
# D
ocu
men
ts
Measured
Expected
Measured and Expected Distribution of Top 10 Content Types in Library of Congress Database
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
Congre
sses
Biogra
phy
Period
icals
Map
s
Fiction
Exhib
itions
Juve
nile l
itera
ture
Bibliog
raph
y
Statis
tics
Top 10 Content Types
Nu
mb
er o
f R
eco
rds
Series2
Series1
54Taxonomy Strategies LLC The business of organized information
Quantitative Method | How intuitive (repeatable) are the categorizations?
Methodology: Closed Card Sort For alpha test of a grocery site 15 Testers put each of 100 best-
selling products into one of 10 pre-defined categories
Categories where fewer than 14 of 15 testers put product into same category were flagged
Results:
% of Testers
Cumulative % of Products
15/15 54%
14/15 70%
13/15 77%
12/15 83%
11/15 85%
<11/15 100%
In the trade, “Corn Tortillas” are a Dairy item!
“Cocoa Drinks – Powder” is best categorized in both
“Beverages” and “Grocery”.
55Taxonomy Strategies LLC The business of organized information
Quantitative Method | How does taxonomy “shape” match that of content?
Term Group % Terms
% Docs
Administrators 7.8 15.8
Community Groups 2.8 1.8
Counselors 3.4 1.4
Federal Funds Recipients and Applicants
9.5 34.4
Librarians 2.8 1.1
News Media 0.6 3.1
Other 7.3 2.0
Parents and Families 2.8 6.0
Policymakers 4.5 11.5
Researchers 2.2 3.6
School Support Staff 2.2 0.2
Student Financial Aid Providers
1.7 0.7
Students 27.4 7.0
Teachers 25.1 11.4
Source: Courtesy Keith Stubbs, US. Dept. of Education
Background: Hierarchical taxonomies allow
comparison of “fit” between content and taxonomy areas
Methodology: 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2 terms per resource)
Counts of terms and documents summed within taxonomy hierarchy
Results: Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%) Mismatches between term% and
document% flagged
56Taxonomy Strategies LLC The business of organized information
Taxonomy ROI
What level of effort in taxonomy creation and maintenance is justified?
57Taxonomy Strategies LLC The business of organized information
Fundamentals of Taxonomy ROI
Building and maintaining a taxonomy, and tagging data with it, are costs not benefits.
There is no benefit without exposing the tagged data to users in some way that cuts costs or improves revenues.
Putting a new taxonomy into operation requires UI changes and/or backend system changes.
You need to determine those changes, and their costs, as part of the taxonomy ROI.
58Taxonomy Strategies LLC The business of organized information
Common Taxonomy ROI Scenarios
Catalog site - ROI based on increased sales through improved product findability product cross-sells and up-sells customer loyalty
Call center - ROI based on cutting costs through fewer customer calls due to improved website self-service faster, more accurate CSR responses through better information access
Knowledge worker productivity - ROI based on cutting costs through less time searching for things less time recreating existing materials, with knock-on benefits of less
confusion and reduced storage and backup costs
Executive mandate No ROI at the start, just someone with a vision and the budget to make it
happen.
59Taxonomy Strategies LLC The business of organized information
Tagging and Training
How are we going to populate metadata elements with complete and consistent values? The tagging problem
How are we going to get people (and/or software) to assign consistent, and accurate, metadata to the content? The tagger training problem
60Taxonomy Strategies LLC The business of organized information
Taxonomy governance: Workflow-driven metadata tagging
Compose in Template
Submit to CMS
Analyst Editor
Review content
Problem?
Copywriter
Copy Edit content
Problem?Hard Copy
Web site
Yes
Yes No
No
Approve/Edit metadata
Automatically fill-in metadata
Tagging Tool Sys Admin
Tagging Process Doesn’t Stop Here!
61Taxonomy Strategies LLC The business of organized information
Training Taxonomy Editors and Tagging Staff
Staff will require training on The structure of the taxonomy The UI they use to tag the
content The rules to follow when deciding
what codes to apply The end-effect of the codes they
apply – have a running prototype or QA environment.
Tagging examples come from samples tagged during taxonomy development.
Hardcopies of the taxonomy, and yellow highlighters, are helpful during training.
Indexing rulesRule Description
Specificity rule
Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.
Repeatable rule
All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.
Appropriateness rule
Not all attributes apply to all assets. Only supply values for attributes that make sense.
Usability rule
Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.
Indexing UI
62Taxonomy Strategies LLC The business of organized information
Tagging tool example—Interwoven MetaTagger
Manual form fill-in w/ check boxes, pull-down lists, etc.
Auto keyword & summarization
Auto-categorization
Parse & lookup (recognize names)
Rules & pattern matching
63Taxonomy Strategies LLC The business of organized information
Taxonomy Roadmap
How to plan for long-term taxonomy development projects?
64Taxonomy Strategies LLC The business of organized information
Taxonomy Roadmap
Most organizations require a phased implementation of an Enterprise Taxonomy
A Taxonomy Roadmap defines the facets to be developed, their timing, and the reasons why
Factors to consider in prioritizing the facets include: Immediacy of application – how will the taxonomy be put into use? A
Search Engine? Portal Navigation? Other? How long will that take? Impact – How many users will a facet help? How big of a help will it be? Ease of development – does the vocabulary exist, can it be bought, or
must it be developed? How big and complex will it be? How often will it change? Are there tools to help manage taxonomy changes or must those be acquired too?
What data must be tagged for that? What are the requirements on the metadata’s density and accuracy? Can those be met with automatic methods, or will more extensive human involvement be needed?
Staff expertise and Team experience.
65Taxonomy Strategies LLC The business of organized information
Roadmap: Dependencies
Roadmap requires an organization plan their projects well in advance, so that upcoming projects can be influenced by the taxonomy Consequently, this is an
advanced practice
Roadmap prioritizes vocabularies according to benefit, cost, and fit with projects.
Governance Team is responsible for maintaining the Roadmap and the necessary outreach.
66Taxonomy Strategies LLC The business of organized information
Roadmap: Facet Prioritization Matrix
Facet Description Impact Effort to create/
maintain CV
Effort to tag
Language* Languages supported by portal Medium (High impact for subset)
Done/Low Low
Format File format (PDF, doc, html, etc…)
Low Low/Low Low
Location* Geo, region, country, site Med-High Done/Low Medium
Content Type Also referred to as genre (news, policy, checklist, form, etc…)
Medium Medium/Low Medium
Organization Publishing organization that owns content
Medium Medium/High Medium
Subject Also referred to as topic (benefits, travel, etc…)
High High/High Medium
Products & Services Corporate product and service offerings
Medium High/High High
Role (level of responsibility)*
Manager, employee, non-employee
High (In use on portal, but search has limited access to secure content)
Done/Low High
Access Control Organization as audience Low Medium/High High
* Facets already in existence in client’s Intranet
67Taxonomy Strategies LLC The business of organized information
Roadmap: Timeline
Language Search
FY04Q2 FY04Q3 FY04Q4 FY05Q1 FY05Q2 FY05Q3 FY05Q4
Organization Search & Org Chart UI
Location (Country)
Search?Index
Taxonomy Tool Projects
Format Search
Content Type Search
Location (Region) Search
Subject Search & Portal Nav
Products/ Services
Search &Index
Access Control
CM?
CM?
Index
Role Search?
Auto-Classification Tool
Timeline lists the facets to be developed, and when those
development efforts start and end.
Timeline shows what projects will make use of the facet, and
how long that should take.
Intermediate and related projects are also shown.Intermediate and related projects are also shown.
68Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
69Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Strategies LLCTaxonomy
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Contact Info
Ron Daniel, Jr.
925-368-8371
Joseph Busch
415-377-7912