taxonomy tools requirements and capabilities tools... · tools • taxonomy editing – data...
TRANSCRIPT
Taxonomy Tools Requirements and Capabilities
Joseph A Busch, Project Performance Corporation
Zachary R Wahl, Project Performance Corporation
Tools
• Taxonomy editing – Data Harmony, Mondeca, MultiTes, PoolParty, protégé, SmartLogic,
Synaptica, Top Braid Composer, Wordmap
• Metadata tagging (automated categorization) – CIS, ConceptSearching, Data Harmony, MetaTagger, nStein, Smartlogic,
temis
• Content management – Documentum, Drupal, Fat Wire Interwoven, Joomla!, OpenText,
SharePoint
Typology of taxonomy tool functions
Functional area Functions
Taxonomy Development Create a taxonomy User roles and permissions
Taxonomy Maintenance Add, edit, move, delete items Assign or modify privileges to one or a group of items Activity logging
Taxonomy Governance Approval workflow for additions and changes
Metadata Controlled Vocabulary
Assign attributes to a category Associate controlled vocabulary with metadata field Thesaurus capabilities
User Interface Search and browse Drag and drop Multiple windows
Reporting Alphabetical, hierarchical and other views Visualizations Importing and exporting taxonomies
Application Integration APIs (WSDL, Scripts, Java, etc.) Application integration (CMS, DMS, search engine, etc.)
3
Normal taxonomy editor functionality requirements
Hierarchy Browser
Term Editing
Standard and custom fields & attributes Standard and custom relations Data typing and restrictions Consistency enforcement Flexible reporting (exporting) Flexible importing
Bas
ic
Ad
van
ced
M
idra
nge
UNICODE Multiple vocabulary support Inter-vocabulary relations Unique IDs/URIs: externally supplied IDs
are not sufficient
Workflow Voting Change Request Mgmt. Stylistic rules enforcement Programmability
Additional functionality for taxonomy editing software
Aliases – Need to deal with synonyms, but also with alternative labels based on language or other factors.
Notes – Useful to have several types of notes fields to keep public notes separate from team’s working notes.
Effective dates – Enable the determination of what was the ‘valid’ taxonomy on dates in the past. Part of a set of strong requirements on provenance.
URIs – Must be able to support the semantic web.
Poly-hierarchy – Must be able to support multiple parents (as well as the same string with different meanings).
Inter-category relations – Must be able to provide links that don’t follow hierarchy, and even go between vocabularies.
Rules checking – Check conformance to style rules like length, use of ampersand, etc.
Workflow – Tracking the handling of change requests, as well as the process of getting approvals for edits.
Scenarios to evaluate taxonomy tools
Functional area Functions
Database Definition How is the database created? Where is it stored? Is it Z39.10 and ISO 2788 compliant? Database license requirement?
Importing/Exporting Data How are data imported? What file formats are supported? Can data files be in batches?
Add, Edit, Delete Categories How easily are categories added, edited, or deleted? Can categories be added, edited, or deleted in batches?
Relationship Types How are relationship types defined? What types are supported? How is polyhierarchy handled?
Add, Edit, Delete Relationships
How easily are relationships added, edited, or deleted? Can relationships be added, edited, or deleted in batches? Does a change propagate to all instances?
Reporting How does the TMS report: new, edited, deleted taxonomies and categories; new, edited, deleted relationship types and relationships; mapped taxonomies and categories? How are the reports presented? What audit logs are available? Can changes be traced to users who suggested them? Is an “approval” step for changes available for administrators?
User Access Can the TMS integrate user accounts with existing authentication systems, e.g. Active Directory, etc.? Is there support for role-based access or defined group membership with configurable access? Is there a workflow to approve changes? What functionality is available or restricted based on a user’s security privileges?
6
MultiTes search, browse and edit term UI
Search
Browse Alpha List
Edit Term
MultiTes add relationships UI
8
Add Relationships
Edit Term
MultiTes strength is report writer
Alphabetical Report Hierarchy (Top term) Report
MultiTes ratings
Functional area MultiTes
Database Definition 5
Importing Data 3
Add, Edit, Delete Categories 2
Relationship Types 4
Add, Edit, Delete Relationships 2
Reporting 1
Exporting Data 3
User Access 0
Total Score 20
Average Score 2.5
10
Strengths • Widely used. • Inexpensive. • Flexible report writer for
exporting data.
Synaptica term record
11
Term
Description/Scope Note
Parent
Children
Variations
Synaptica tree view
12
Term
Tree
Description/Scope Note
Synaptica visualizations
13
Word Map Radial Map
Synaptica ratings
Functional area Synaptica
Database Definition 5
Importing Data 4
Add, Edit, Delete Categories 5
Relationship Types 4
Add, Edit, Delete Relationships 5
Reporting 5
Exporting Data 5
User Access 5
Total Score 38
Average Score 4.75
14
Strengths • Proven performance with
very large data sets.
TopBraid term record
15
Term
Relationships
Hierarchy
TopBraid edit term record
16
TopBraid visualization
17
Graph View
TopBraid ratings
Functional area TopBraid
Database Definition 5
Importing Data 5
Add, Edit, Delete Categories 5
Relationship Types 5
Add, Edit, Delete Relationships 5
Reporting 5
Exporting Data 5
User Access 5
Total Score 40
Average Score 5
18
Strengths • XML RDF under the hood
SmartLogic term record
19
Term
Relationships
Hierarchy
SmartLogic term edit options
20
Add non-preferred terms
Add hier. relationships
SmartLogic visualization
21
Graph View
SmartLogic ratings
Functional area SmartLogic
Database Definition 5
Importing Data 4
Add, Edit, Delete Categories 5
Relationship Types 4
Add, Edit, Delete Relationships 5
Reporting 5
Exporting Data 5
User Access 4
Total Score 37
Average Score 4.63
22
Strengths • Integrations with CMS
(SharePoint, Documentum) and search engines (FAST, Google)
Summary of taxonomy tool ratings
Functional area MultiTes SmartLogic Synaptica TopBraid
Database Definition 5 5 5 5
Importing Data 3 4 4 5
Add, Edit, Delete Categories 2 5 5 5
Relationship Types 4 4 4 5
Add, Edit, Delete Relationships 2 5 5 5
Reporting 1 5 5 5
Exporting Data 3 5 5 5
User Access 0 4 5 5
Total Score 20 37 38 40
Average Score 2.5 4.63 4.75 5
23
Taxonomy editing tools vendors
Abili
ty to E
xecute
lo
w
hig
h
Completeness of Vision Visionaries Niche Players
Most popular taxonomy editor is MS Excel
An immature area– No vendors are in upper-right quadrant!
MultiTes is widely used, cheap with functionality
High functionality /high cost products ($50-100K)
Taxonomy Tools
Vendor Taxonomy Editing Tools URL
Apelon Distributed
Terminology System (DTS
www.apelon.com/Products/DTS/tabid/97/Default.aspx
Cuadra STAR/Thesaurus www.cuadra.com/products/vocabulary.html
Thesaurus Master www.dataharmony.com/products/thesaurus_master.html
MS Excel www.microsoft.com
Intelligent Topic Manager www.mondeca.com/Products/ITM
MultiTes Pro www.multites.com
PoolParty Thesaurus Manager poolparty.punkt.at/poolparty-thesaurus-manager-3-0-release-
notes/
protege protege.stanford.edu
Semaphore Ontology
Manager
www.smartlogic.com/home/products/semaphore-
modules/ontology-manager/ontology-manager-overview
Synaptica www.synapticasoftware.com
SAS Ontology Management www.sas.com/text-analytics/ontology-management/index.html
Temis Luxid www.temis.com/?id=201&selt=1
Term Tree www.termtree.com.au
Top Braid Composer www.topquadrant.com/products/TB_Composer.html
WordMap Designer www.wordmap.com 25
QUESTIONS Thank You
For More Information:
Zach Wahl
Email: [email protected]
Twitter: twitter.com/#!/ppc_corp, twitter.com/#!/ZacharyWahl
Joseph Busch
Email: [email protected]
Twitter: twitter.com/joebusch
27
Blog: blog.ppc.com Web: www.ppc.com/Pages/KMWorld2011.aspx