helpingscience

21
Developed by SilverBiology Michael Giddens LABEL PROCESSING METHOD HELPINGSCIENCE.ORG

Upload: mikegiddens

Post on 20-Jun-2015

359 views

Category:

Documents


0 download

DESCRIPTION

Presentation given at iDigBio

TRANSCRIPT

Page 1: HelpingScience

Developed by

SilverBiology

Michael Giddens

LABEL PROCESSING METHOD

HELPINGSCIENCE.ORG

Page 2: HelpingScience

From This To This

• StateProvince: Arkansas

• County: Bradley

• Genus: Botrychium

• SpecificEpithet: biternatum

• Authorship: (Sav.) Underwood

• Collector: Sherri Leslie, Kaylon

Cornish

• CollectorNumber: 593

• DateCollected: 1984-09-23

• TRS: Sec. 3, T12S, R9W

GOAL (REPEAT 60+ MILLION TIMES)

Species Lookup: http://ecat-dev.gbif.org/usage/2650191

Page 3: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 4: HelpingScience

Step 1 To This

IDENTIFYING LABELS

Sign In

Request Image

Click & Drag

Click & Drag

<Enter>

Repeat

Average Time: 300/hr

per person

Page 5: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 6: HelpingScience

Sample JPG label

JSON output $5/1GB per month

~ label cost: $0.001

EVERNOTE OCR PROCESSING

Page 7: HelpingScience

AFTER EVERNOTE

Page 8: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 9: HelpingScience

What we capture.

• Scientific Information (Fami ly, Genus , Spec ies , Subspec ies , Author )

• Collection Information (Name, Number, Date )

• Geographical Information (Count r y, S tate , County, Loca l i ty,

Lat/Lon , TRS)

• Determination Information (Determiner, Sc ient i f i c Name, Date

Determined)

• Extra Information (Access ion Number, Type S tatus )

What we leave on the label.

• Habitat Information

• Locality Description

• Collector Notes

• Other

STEP 2) IDENTIFYING LABELS

Page 10: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 11: HelpingScience

Internal Step

Compare words to

OCR value and if

it is distinct assign

to lexical set.

Send image to

data entry.

LEXICAL GROUPING

Page 12: HelpingScience

Internal Step

Look at the value

that will be assigned

to the list of images

if any are not the

correct value move to

manual data entry

blacklist.

Repeat

Based on Lexical Groups

BULK VALIDATION

Page 13: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 14: HelpingScience

Public Step

Multiple Interfaces

Dates

Lat/Lng

Names

Scientific Names

User receive vir tual

tokens to use in the

store for every correct

word

DATA ENTRY

Page 15: HelpingScience

DATA ENTRY VARIATIONS

Page 16: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 17: HelpingScience

Computer: Asplenium Frequency

Volunteer 2: Asplenium Asplenium: 2

Volunteer 3: Asplenlum Asplenlum: 1

Volunteer 4: Asplenium Asplenium: 3

Asplenium: 1

Points Earned: Volunteer 2 & 4

FIELD VERIFICATION

Page 18: HelpingScience

Identify Labels

OCR Labels

Identify

Primary DwC

Fields

Lexical Grouping & Verification

Enter Values

for Fields

Accept Value

for Each Field

Assemble Label Data

WORKFLOW

Page 19: HelpingScience

Export Formats

CSV

Darwin Core Archive

Other on request

Filters

By any combination of DarwinCore Fields

Restful web services

OCCURRENCE DATA

Page 20: HelpingScience

HelpingScience depends on a symbiotic relationship between collections providing specimen sheets and volunteers to perform data entry.

Volunteers are given HS Tokens to be used in the HS Store in exchange for their t ime.

The store is a percentage of the cost per label that is given back to the community.

The Store

Fundraisers

Small micro loans given to botany undergraduate students for research

Sponsorships for students to attend scientific conferences

K12 equipment funding for science departments

Charitable Organizations

Fund Small Herbaria Digitization

SUSTAINABILITY

Page 21: HelpingScience

HELPINGSCIENCE.ORG

If you manage a collection and interested in testing or

processing labels please contact:

[email protected]

www.SilverBiology.com