franz et al. crowd sourcing and community management capabilities available within symbiota data...

25
Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals Nico Franz 1 , Corinna Gries 2 , Thomas Nash III 2 & Edward Gilbert 1 1 School of Life Sciences, Arizona State University 2 Center for Limnology, University of Wisconsin TDWD 2013 Annual Conference, Florence, Italy Building and Maintaining Crowd Sourcing Websites and Their Communities October 29, 2013 Presentation Overview @ http://taxonbytes.org/tdwg-2013-crowd-sourcing-and-community-management-capabilities-with-symbiota /

Upload: taxonbytes

Post on 10-May-2015

2.725 views

Category:

Technology


2 download

DESCRIPTION

Symbiota (http://symbiota.org/tiki/tiki-index.php) is an open source software designed to promote and facilitate collaboration among those working to document biodiversity. Symbiota has become increasingly popular in recent years in North America, due in part to its suitability to support large herbarium networks and NSF-sponsored Thematic Collections Networks (TCNs; see https://www.idigbio.org/content/thematic-collectionsnetworks). The specimen-based Content Management System (CMS) provides a shared platform allowing researchers to manage biological resources as an integrated network. Data management through a community-based system has allowed for the development of several features and workflows that have enhanced efficient data entry while improving overall data integrity and quality. On-line data entry directly from an image of the specimen label allows for label transcription and error resolution that can call upon a global user community. A novel crowd sourcing feature in Symbiota offers collection managers the ability to submit specimen label images to a queue for group data entry by a volunteer task force. To improve efficiency and quality, the user interface incorporates Optical Character Recognition (OCR) and Natural Language Processing (NLP) capabilities, as well as duplicate and exsiccati record harvesting and real-time data validation. The duplicate clustering module groups duplicate specimen records across institutions, thereby obviating the need to re-enter a previously processed specimen and enhancing the task of locating and resolving misidentified specimens, viz. by highlighting the most recent annotation events within a cluster. As an additional review step, collections can opt to allow registered users to fix basic errors if and when they encountered them. Collection managers have the ability to review, approve, or revert such edits. Several other novel community features are available through Symbiota, including an integrated loan management module and pre-accessioned data entry by the original collector. We will demonstrate and discuss these features, their underlying concepts, implementation, utility, and future steps to further augment the community of contributing users.

TRANSCRIPT

Page 1: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Crowd Sourcing and Community Management

Capabilities Available within Symbiota Data Portals

Nico Franz1, Corinna Gries2 , Thomas Nash III2 & Edward Gilbert1

1 School of Life Sciences, Arizona State University 2 Center for Limnology, University of Wisconsin

TDWD 2013 Annual Conference, Florence, Italy Building and Maintaining Crowd Sourcing Websites and Their Communities

October 29, 2013

Presentation Overview @ http://taxonbytes.org/tdwg-2013-crowd-sourcing-and-community-management-capabilities-with-symbiota/

Page 2: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Lichen, Bryophytes and Climate Change – scope and goals

• Support – NSF ADBC Program – Award EF 1115116

• Covering ~ 2.3 million specimens: 900,000 lichens & 1.4 million bryophytes

• 90% of all specimens housed in this region; > 60 non-governmental herbaria

• LBCC has 16 focused digitization centers where voucher labels are imaged

= LBCC imaging centers

Page 3: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

LBCC sustaining Symbiota-based data portals

• Consortium of North American Lichen Herbaria

• URL: http://lichenportal.org/portal/

• Currently with 51 member collections

• Total of 1,084,888 records (October, 2013)

• Consortium of North American Bryophyte Herbaria

• URL: http://bryophyteportal/

• Currently with 46 member collections

• Total of 1,437,735 records (October, 2013)

• Each portal is sustained by Symbiota

CNALH

CNABH

Page 4: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Lichen portal – 7,302 visitors / 3 months

Bryophyte portal – 1,530 visitors / 3 months

LBCC member portals are active virtual environments

Page 5: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Overview of the LBCC digitization workflow (label imaging)

• The LBCC digitization workflow (completion of records) depends critically on the sustained participation of editors/transcribers, and of volunteers.

Page 6: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Imaging Stage

Capture Imagebarcode in file

name

Create Skeleton File

species name, country, state, exsiccati, etc.

Upload to FTP server

Image processing extract barcode,

create web versions, map to

portal DBs

Herbarium Database

Automated OCRTesseract, ABBYY

Existing Record simply link image

Upload to FTP server

Image URLs

Manage Specimen Data in

Portal

Manage / Review Records in Portal

SymbiotaEditor

review, edit, keystroke

Create New Record barcode, image,

skeletal data

Automated NLPDarwin Core Parsing

CrowdsourcingCentral

Detailed workflow diagram

LBCC Crowdsourcing elements

Workflow "closes" in home collection

Page 7: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

How does LBCC engage volunteers?

Page 8: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

• LBCC Drupal website is primarily a means to socially and intellectually engage and recruit prospective volunteers.

http://lbcc1.acis.ufl.edu/

Page 9: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Login to each member portal is simple,requiring no special rights.

Page 10: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

"Annotate the Harriman Alaska Expedition"

Page 11: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

"Transcribe the ALCAN Expedition"

"Create your own…" ** Instant feedback on data volume.

Page 12: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Listing of pending "Harriman" records; each Symbiota ID is clickable to edit.

Page 13: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

• The LBCC digitization workflow pipeline has produced a "skeletal record", including:• Record GUID• Thesaurus-ratified Scientific Name (not editable)• OCR of voucher locality label image

• "Parse OCR (LBCC)" [a custom LBCC program] will get the transcription process underway.

LBCCCrowd

SourcingCentral

Record1545184

Page 14: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

1. Initial "Parse OCR" outcome (issues with lat/long transcription)

Page 15: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

2. Correction of the parse using Symbiota tools (e.g. GeoLocate)

Page 16: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

3. Approaching a clean record1 transcription, ready for saving

1 DwC Class: CleanRecord – Utter these two words in front of a TDWG audience, then immediately prepare to… [remainder not yet ratified].

Page 17: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Crowd Sourcing Central – Score Board *

Options to review one's submitted records and review points assigned (by the collection's manager).

* See also Appendix I.

Page 18: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Crowd Sourcing Central – User's Review Pages *

My pending records with LBCC.

My 2/4 approved records/points.

* See also Appendix III.

Page 19: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Crowd Sourcing Central – Collection Manager's Control Panel

* See also Appendix II.

4215 newly digitized records are available for addition to the queue.

25 submissions pending.

Page 20: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Crowd Sourcing Central – Collection Manager's Review Pages

* See also Appendix III.

2 points = default score. Specific feedback possible.

Page 21: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

• A key purpose of the LBCC portal CS entry environment is to create a user

experience that is personalized.

• Special expeditions are a subset of the records queue for CS data entry, and are

identified as being part of a "special group/theme" of specimens.

• Expeditions are meant to educate those who are performing the data entry

about a specific event.

• They also aid data entry because the user generally deals with a homogeneous

type of label format, as opposed to shifting between numerous layout types.

• User input and managerial control (review, feedback, scoring) are interactively

facilitated in the same Crowd Sourcing module implemented in Symbiota.

Lichen, Bryophytes and Climate Change – CS in review

Page 22: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

• TDWG 2013 Symposium organizers – Paul Kenneth Flemons

• Ben Brandt & John Brinda – LBCC software development

• Participating CNALH & CNABH collections

• NSF Award EF-1115116. "Digitization TCN – Collaborative Research: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change."

Acknowledgments

http://taxonbytes.org https://sols.asu.edu

http://symbiota.org/tiki/tiki-index.php http://lbcc1.acis.ufl.edu/http://lichenportal.org/portal/

http://bryophyteportal/

Page 23: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Appendix I: Crowd Sourcing Central "rules of engagement"

1. Available at http://lichenportal.org/portal/collections/editor/crowdsource/central.php

2. Shows scores and collections participating in crowd sourcing, along with their statistics.

3. Available to all viewers of the site, irrespective of whether they are logged in.

4. If users log in, then their scores will be displayed in a separated information "box".

5. The link above will generally – on most portal sites – be added to the main left menu, or

made available from another crowd sourcing page that is custom generated for a

project. For instance, LBCC will likely link to this page from their main Drupal page.

6. Clicking on "review records" within the Current User's Standing box will take the user to

the Review page (see Appendix III).

7. Clicking on numbers within the collection table will take the user to a list of specimens

queued up for data entry (and open specimens within the CS queue).

Page 24: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Appendix II: Collection Manager's Crowdsourcing Control Panel

1. Available at http://lichenportal.org/portal/collections/editor/crowdsource/controlpanel.php?collid=22

2. Available only to collection managers.

3. Shows statistics only for a given collection.

4. Available also from the collections control panel (not yet implemented in the public

site) and in Crowd Sourcing Central (via the editing symbol to the right of the collection

names).

5. Allows managers to edit crowd sourcing instructions or link to a training URL.

6. A link to the right of "Available to Add" is where a collection manager would add their

records to the Crowdsourcing queue.

Page 25: Franz Et Al. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Appendix III: Review Pages for contributors, managers

1. Available from a collection manager's perspective or a user's perspective, yet behaves

somewhat differently depending on the perspective.

2. Collection manager perspective:1. Main purpose is to enable a quick review of specimen records that are pending (or re-review of closed

records).

2. Available from the Collection Manager’s Crowdsourcing Control Panel by clicking on the "Review" link to

the right of the numbers.

3. A collection manager can assign points to an annotated record (2 points is the default value), comment,

and change the CS status to closed (approved).

4. Managers can edit all records, whether they are pending or closed.

3. User perspective:1. Available from Crowd Sourcing Central by clicking on "Review Records" within the Current standing box.

2. Allows user to review and access records with pending status.

3. Allows user to review points and provide comments for closed records.

4. Users can edit all pending records.

5. Users can review yet not edit records that have been closed by a collection manager.