helpingscience
DESCRIPTION
Presentation given at iDigBioTRANSCRIPT
Developed by
SilverBiology
Michael Giddens
LABEL PROCESSING METHOD
HELPINGSCIENCE.ORG
From This To This
• StateProvince: Arkansas
• County: Bradley
• Genus: Botrychium
• SpecificEpithet: biternatum
• Authorship: (Sav.) Underwood
• Collector: Sherri Leslie, Kaylon
Cornish
• CollectorNumber: 593
• DateCollected: 1984-09-23
• TRS: Sec. 3, T12S, R9W
GOAL (REPEAT 60+ MILLION TIMES)
Species Lookup: http://ecat-dev.gbif.org/usage/2650191
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Step 1 To This
IDENTIFYING LABELS
Sign In
Request Image
Click & Drag
Click & Drag
<Enter>
Repeat
Average Time: 300/hr
per person
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Sample JPG label
JSON output $5/1GB per month
~ label cost: $0.001
EVERNOTE OCR PROCESSING
AFTER EVERNOTE
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
What we capture.
• Scientific Information (Fami ly, Genus , Spec ies , Subspec ies , Author )
• Collection Information (Name, Number, Date )
• Geographical Information (Count r y, S tate , County, Loca l i ty,
Lat/Lon , TRS)
• Determination Information (Determiner, Sc ient i f i c Name, Date
Determined)
• Extra Information (Access ion Number, Type S tatus )
What we leave on the label.
• Habitat Information
• Locality Description
• Collector Notes
• Other
STEP 2) IDENTIFYING LABELS
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Internal Step
Compare words to
OCR value and if
it is distinct assign
to lexical set.
Send image to
data entry.
LEXICAL GROUPING
Internal Step
Look at the value
that will be assigned
to the list of images
if any are not the
correct value move to
manual data entry
blacklist.
Repeat
Based on Lexical Groups
BULK VALIDATION
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Public Step
Multiple Interfaces
Dates
Lat/Lng
Names
Scientific Names
User receive vir tual
tokens to use in the
store for every correct
word
DATA ENTRY
DATA ENTRY VARIATIONS
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Computer: Asplenium Frequency
Volunteer 2: Asplenium Asplenium: 2
Volunteer 3: Asplenlum Asplenlum: 1
Volunteer 4: Asplenium Asplenium: 3
Asplenium: 1
Points Earned: Volunteer 2 & 4
FIELD VERIFICATION
Identify Labels
OCR Labels
Identify
Primary DwC
Fields
Lexical Grouping & Verification
Enter Values
for Fields
Accept Value
for Each Field
Assemble Label Data
WORKFLOW
Export Formats
CSV
Darwin Core Archive
Other on request
Filters
By any combination of DarwinCore Fields
Restful web services
OCCURRENCE DATA
HelpingScience depends on a symbiotic relationship between collections providing specimen sheets and volunteers to perform data entry.
Volunteers are given HS Tokens to be used in the HS Store in exchange for their t ime.
The store is a percentage of the cost per label that is given back to the community.
The Store
Fundraisers
Small micro loans given to botany undergraduate students for research
Sponsorships for students to attend scientific conferences
K12 equipment funding for science departments
Charitable Organizations
Fund Small Herbaria Digitization
SUSTAINABILITY
HELPINGSCIENCE.ORG
If you manage a collection and interested in testing or
processing labels please contact:
www.SilverBiology.com