taxonomy and scratchpads
DESCRIPTION
Presentation by Vince Smith and Simon Rycroft given at the Encyclopedia of Life Code Sprint hosted at the BioSynC facility at the Chicago Field Museum.TRANSCRIPT
Vincent Smith & Simon Rycroft
Taxonomy &Scratchpads
Biological taxonomyFindability, relationships (ontology) & tagging
• Scale• Metadata
Scratchpadshttp://scratchpads.eu/
• Multi-host Drupal (5) site• Drupal customized for taxonomists• Communities apply for a site • 65+ sites, 750+ users, 130k nodes• Taxonomy central to many features
Import / ExportExcel Template (CSV file) & uBio (XML feed)
http://www.ubio.org/webservices/classificationbank/search.php?classification=sp2000&node=Pediculus
(Our) Taxonomy Importhttp://svn.scratchpads.eu/viewvc/trunk/sites/all/modules/taxonomy_import/
ManagementTaxonomy Manager & Taxonomy Core
http://drupal.org/project/taxonomy_manager
• Principle good, but no one uses it• Confusing and slow (HCI issues)• Major cross browser issues (Firefox)• Requires a number of “fixes”…
• Flexible metadata on terms (core)• Treat synonyms as full terms (core)• Link nodes as term attributes (e.g. biblio)• Improve manager HCI (drag-and-drop)
Search & BrowseNavigation for finding tagged content
TinyTax
Automatically creates a mini-menu (block) of avocabulary that is configurable for default term
• Intuitive• Small footprint
• Integrates with a term’s page
TaxTab
http://drupal.org/project/tinytax
Augments default search with a tab for termssearches (includes term autocomplete)
http://svn.scratchpads.eu/viewvc/trunk/sites/all/modules/taxtab/
• Quick & intuitive• Two step submission• Fast (but could be quicker)• Encourages tagging
AutotaggingAutomated tagging of content
Untagged node
Use or ignore discovered tags (drag & drop or add)
http://drupal.org/project/autotag
Mega-VocabulariesSites with a million plus terms
Current Taxonomy Problems
e.g. http://catlife.myspecies.info(2 million+ terms)
• Taxonomy LeftandRight module• Implements nested sets• Over rides 3 taxonomy core functions
- taxonomy_get_tree- taxonomy_overview_terms- taxonomy_select_nodes
• PHP requires too much memory for large hierarchies• Very slow, especially above 50k terms• Some sites with 300k terms (unusable)• 1.8 million known species (6-80M est.)
• Very fast (in use with 2 million terms)• Solves insertion problem with decimals
Possible Solution
http://drupal.org/project/leftandright
Sprint ExpectationsWhat we are looking to achieve
• Import and export of terms (TCS-XML?) from a repository• Improved & flexible term metadata• Handle synonyms as full terms• Link nodes as attributes of terms• Term and metadata management• Permissions on terms (low priority?)
Questions?
Search & Browse 2Split Layout TreeMaps
e.g. http://scratchpads.eu/progress
• Intuitive• Small footprint
• Integrates with a term’s page• Potentially integrates multi-site content