making information more accessible
TRANSCRIPT
About me
•Biochemist•Curious about Information•Librarian• Informatician/Project Director/Library Director•Asst. Dean, Scholarly Content & Faculty Engagement
DOMO
Every minute:• Facebook users share nearly 2.5 million pieces of content.• Twitter users tweet nearly 300,000 times.• Email users send over 200 million messages.
Image via Flickr Jean-Etienne Minh-Duy Poirrier
Scientific literature doubles every nine
years via Nature News blog
Cancer - 694,372 articles in the last 5 years
Climate change – 88,565
Species extinction – 8,453
Median # of articles read in a year – 264*
Cance
r
Climat
e Cha
nge
Spec
ies E
xtinct
ion
# of a
rticle
s rea
d/ye
ar100.00
1,000.00
10,000.00
100,000.00
1,000,000.00694372
88565
8453
264
Num
ber
of
Art
icle
s
*Nature News (2014) Scientists may be reaching a peak in reading habits
How to do it? In the past….
Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808
The OCR Problem
Epitonium foliaceicostwm Orbigny Wrinkled-ribbed Wentletrap Southeast Florida to the Lesser Antilles.
Machine Learning for
Species Identification
Reptilia and Batrachia. (1885-1902) by Albert C.L.G. Günther
NetiNetiName Extraction from Textual Information-Name Extraction for Taxonomic Indexing
The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors: the creature bridged the gap for scientists trying to understand the relationship between sea slugs that feed on hydroids and those that dine on corals.
Source: http://ab.co/1ByZcIb Photographer: Robert Bolland
Akella et al. BMC Bioinformatics 2012, 13:211http://www.biomedcentral.com/1471-2105/13/211
Named Entity Recognition (NER)
to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations…
Adjective noun unknown
How does NetiNeti work?
Named Entity Recognition (NER)
The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors:
Adjective noun unknown
How does NetiNeti work?
• Text is tokenized (broken into chunks)• Prefiltering step• Probability that token is a name is calculated (structure
and context)• Training (positive and negative examples)• Features (letter combinations, # of vowels, part of speech)
The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors:
name not a name
Questions?
The language of birds :London: Saunders and Otley,1837.biodiversitylibrary.org/page/47512020via Flickr
Thank You!