open research data: taxonomy
TRANSCRIPT
Donat Agosti Plazihttp://plazi.org
Linked Data Switzerland Workshop8th October, HES-SO, Sierre
Open Access to Scientific Data
Google.maps
Search.ch
Search.ch
food
food
foodcoffee
sleep
Search.ch
4min
8min
8min
9min
7min
Search.ch
46°16'57.80"N 7°32'23.11"E
253m
8min
9min
228m
46°16'56.81"N 7°32'13.59"E
46°17'3.98"N 7°32'13.00"E 46°17'5.18"N 7°32'18.34"E
203m
Search.ch
location
distance
distancedistance
distance
location
location location
distance
location location
Search.ch
host
welfare
welfarew
elfare
welfare
gastronomy
gastronomy gastronomy
welfare
gastronomy gastronomy
Search.ch
Lateinische Namen als Zugang zur Wissenschaft
Treatment
XML
RDF
cites
same as ref
ers tosam
e as refers to
The «Mehlwurm» lives in dry bread... or the Potential of LOD
has traits
Is pa
rt o
f
is part of
refers to
has traits
“The larva of the mealworm lives in dry bread and can be eaten in Switzerland”
The Scientific Challenge
1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt 61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt 121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat 181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca 241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc 301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga 361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca 421 acatttattt
The Scientific Challenge
The Scientific Challenge
The Scientific Challenge
LODPDF
HNS
HNS
The Scientific Challenge
The Scientific Challenge
The Scientific Challenge
The Political Vision: Create the LOD-Cloud
The Plazi Vision: The Giant Global Biodiversity Graph
Plazi’s intended uses of LOD
• Public discovery by people not otherwise connected to other discovery services such as Plazi’s own, the GBIF data repository, …. of data about species extracted from the publications in which they are described
• Public facility for citation of Plazi’s data by arbitrary internet users.• Plazi creates a new dataset from literature/publications rather than republish existing
data sets
The Plazi Vision: The Giant Global Biodiversity Graph
LegalSocialTechnicalOntologiesInfrastructure
500 M pages 5*
What does this mean?
The Linking Open Data cloud diagram
Linked Open Data Cloud
Text
<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus....</treatment>
Semantic enchanced Text(TaxonX/Taxpub)
… alternatives: From human to machine readable text
RDF
Countries (Region)Australia (Queensland)
Export species materials citations (DwC)
Treatment Content Visualization
10.3897/BDJ.3.e5063
Treatment Graph for the Malagasy Ants Aphaenogaster
Treatment Graph for the Malagasy Ants Aphaenogaster
Original description
Re-description cites
cite
s /sy
nony
mize
s
Re-description
Re-de.Re-description
cites
Treatment Graph for the Ant Azteca alfari
https://github.com/plazi/TreatmentOntologies
Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzenikuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulansspinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments
Acacia-ant species: Pseudomyrmex gracili
Treatment: redescription
Associated ant-acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked through citations
Treatment opportunities
Treatment
Verlinkung der Daten mit externen Referenzen
5*2014
NCBI
Zugang zu wissenschaftlicher Literatur: DOI via Zenodo/CERN
Open Access as Necessity
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).
Online catalogueOpen accessOnline library2004
Conversion Workflows: Plazi
Plazi SRS
find scan «OCR» markup store +access
Swiss exceptions to copyright law to extract data is an advantage for the sciences in Switzerland
Conversion Workflows: Plazi: Scientific Names
Conversion Workflows: Plazi: Tables
«Treatment»Wissenschaftliche ArtnameVerbreitungsnachweisBibliographische RecordsExterne Links
ENVO?Namen
Cataglyphis tartessica workersVariable mean ± SDHead length 11.23 ± 0.12Head width 11.15 ± 0.12Scape length 11.47 ± 0.12Mesosoma length 11.94 ± 0.16Femur length 12.03 ± 0.14Cephalic index 0 93.60 ± 3.940Scape index 128.10 ± 7.660
Conversion Workflows: Plazi: Bibliographic References
Conversion Workflows: Plazi: Geographic data
Conversion Workflows: Plazi: Pipelines
Status quo
• 50,000+ treatments life• RDF in Betaversion• GoldenGate Imagine (Text mining tool) in Betaversion• Provider für Daten für NCBI, GBIF, EOL, antweb• Biodiversity Literature Repository functional
Trait information machine ready
BioDiPResolutionReconciliation
TreatmentBank
NAMES MANAGEMENT
CITATIONMANAGEMENT
REFBANK
TREATMENTMANAGEMENT
ATOMIZATION & SEMANTICIZATION
OF CONTENT MARKUP / initial trait extraction
Specialist taxonomic databases
Next steps: HES-SO, HEG Geneva
Next steps: CotentMine
Planned collaboration with ContentMine to extract treatments on a daly bases
http://www.slideshare.net/petermurrayrust/?
BioDiv
article
treatment
CiteshttpURI
cites (DOI)
Scientific name
https://www.wikidata.org/wiki/Property:P1992
Feed Wikipedia with taxonomic data
Publications, one of our footprints
Next steps
• 1 Million treatments life• RDF Version 1• GoldenGate Imagine (Text mining tool) Version 1• Provider für Daten für NCBI, GBIF, EOL, antweb• Biodiversity Literature Repository mit 100,000
Bibliographischen Referenzen und digitalen Versionen
Danke!
Donat [email protected]