open research data: taxonomy

53
Donat Agosti Plazi http://plazi.org Linked Data Switzerland Workshop 8 th October, HES-SO, Sierre Open Access to Scientific Data

Upload: agosti

Post on 15-Apr-2017

586 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Open Research Data: Taxonomy

Donat Agosti Plazihttp://plazi.org

Linked Data Switzerland Workshop8th October, HES-SO, Sierre

Open Access to Scientific Data

Page 2: Open Research Data: Taxonomy

Google.maps

Page 3: Open Research Data: Taxonomy

Search.ch

Page 4: Open Research Data: Taxonomy

Search.ch

Page 5: Open Research Data: Taxonomy

food

food

foodcoffee

sleep

Search.ch

Page 6: Open Research Data: Taxonomy

4min

8min

8min

9min

7min

Search.ch

Page 7: Open Research Data: Taxonomy

46°16'57.80"N 7°32'23.11"E

253m

8min

9min

228m

46°16'56.81"N 7°32'13.59"E

46°17'3.98"N 7°32'13.00"E 46°17'5.18"N 7°32'18.34"E

203m

Search.ch

Page 8: Open Research Data: Taxonomy

location

distance

distancedistance

distance

location

location location

distance

location location

Search.ch

Page 9: Open Research Data: Taxonomy

host

welfare

welfarew

elfare

welfare

gastronomy

gastronomy gastronomy

welfare

gastronomy gastronomy

Search.ch

Page 10: Open Research Data: Taxonomy
Page 11: Open Research Data: Taxonomy

Lateinische Namen als Zugang zur Wissenschaft

Page 12: Open Research Data: Taxonomy
Page 13: Open Research Data: Taxonomy
Page 14: Open Research Data: Taxonomy
Page 15: Open Research Data: Taxonomy
Page 16: Open Research Data: Taxonomy

Treatment

Page 17: Open Research Data: Taxonomy
Page 18: Open Research Data: Taxonomy
Page 19: Open Research Data: Taxonomy

XML

RDF

Page 20: Open Research Data: Taxonomy

cites

same as ref

ers tosam

e as refers to

The «Mehlwurm» lives in dry bread... or the Potential of LOD

has traits

Is pa

rt o

f

is part of

refers to

has traits

“The larva of the mealworm lives in dry bread and can be eaten in Switzerland”

Page 21: Open Research Data: Taxonomy

The Scientific Challenge

1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt 61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt 121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat 181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca 241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc 301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga 361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca 421 acatttattt

Page 22: Open Research Data: Taxonomy

The Scientific Challenge

Page 23: Open Research Data: Taxonomy

The Scientific Challenge

Page 24: Open Research Data: Taxonomy

The Scientific Challenge

Page 25: Open Research Data: Taxonomy

LODPDF

HNS

HNS

The Scientific Challenge

Page 26: Open Research Data: Taxonomy

The Scientific Challenge

Page 27: Open Research Data: Taxonomy

The Scientific Challenge

Page 28: Open Research Data: Taxonomy

The Political Vision: Create the LOD-Cloud

Page 29: Open Research Data: Taxonomy

The Plazi Vision: The Giant Global Biodiversity Graph

Plazi’s intended uses of LOD

• Public discovery by people not otherwise connected to other discovery services such as Plazi’s own, the GBIF data repository, …. of data about species extracted from the publications in which they are described

• Public facility for citation of Plazi’s data by arbitrary internet users.• Plazi creates a new dataset from literature/publications rather than republish existing

data sets

Page 30: Open Research Data: Taxonomy

The Plazi Vision: The Giant Global Biodiversity Graph

LegalSocialTechnicalOntologiesInfrastructure

500 M pages 5*

Page 31: Open Research Data: Taxonomy

What does this mean?

The Linking Open Data cloud diagram

Linked Open Data Cloud

Page 32: Open Research Data: Taxonomy

Text

<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus....</treatment>

Semantic enchanced Text(TaxonX/Taxpub)

… alternatives: From human to machine readable text

RDF

Page 33: Open Research Data: Taxonomy

Countries (Region)Australia (Queensland)

Export species materials citations (DwC)

Treatment Content Visualization

10.3897/BDJ.3.e5063

Page 34: Open Research Data: Taxonomy

Treatment Graph for the Malagasy Ants Aphaenogaster

Page 35: Open Research Data: Taxonomy

Treatment Graph for the Malagasy Ants Aphaenogaster

Original description

Re-description cites

cite

s /sy

nony

mize

s

Re-description

Re-de.Re-description

cites

Page 36: Open Research Data: Taxonomy

Treatment Graph for the Ant Azteca alfari

https://github.com/plazi/TreatmentOntologies

Page 37: Open Research Data: Taxonomy

Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.

allenii

melanoceras

ruddiae

chiapensis

collinsii

cookii

cornigera

globulifera

hindsii

janzenii

mayana

sphaerocephala

boopis

flavicornis

hesperius

ita

janzenikuenckeli

mixtecus

nigrocinctus

nigropilosus

opaciceps

particeps

peperi

reconditus

satanicus

simulansspinicola

subtilissimus

veneficus

ferrugineus

gentlei

gracilis

Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments

Acacia-ant species: Pseudomyrmex gracili

Treatment: redescription

Associated ant-acacia: Acacia gentlei

Ants Plants

Photocredits: Alex Wild

Treatment

Treatments linked through citations

Treatment opportunities

Page 38: Open Research Data: Taxonomy

Treatment

Verlinkung der Daten mit externen Referenzen

5*2014

NCBI

Page 39: Open Research Data: Taxonomy

Zugang zu wissenschaftlicher Literatur: DOI via Zenodo/CERN

Page 40: Open Research Data: Taxonomy

Open Access as Necessity

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).

Online catalogueOpen accessOnline library2004

Page 41: Open Research Data: Taxonomy

Conversion Workflows: Plazi

Plazi SRS

find scan «OCR» markup store +access

Swiss exceptions to copyright law to extract data is an advantage for the sciences in Switzerland

Page 42: Open Research Data: Taxonomy

Conversion Workflows: Plazi: Scientific Names

Page 43: Open Research Data: Taxonomy

Conversion Workflows: Plazi: Tables

«Treatment»Wissenschaftliche ArtnameVerbreitungsnachweisBibliographische RecordsExterne Links

ENVO?Namen

Cataglyphis tartessica workersVariable mean ± SDHead length 11.23 ± 0.12Head width 11.15 ± 0.12Scape length 11.47 ± 0.12Mesosoma length 11.94 ± 0.16Femur length 12.03 ± 0.14Cephalic index 0 93.60 ± 3.940Scape index 128.10 ± 7.660

Page 44: Open Research Data: Taxonomy

Conversion Workflows: Plazi: Bibliographic References

Page 45: Open Research Data: Taxonomy

Conversion Workflows: Plazi: Geographic data

Page 46: Open Research Data: Taxonomy

Conversion Workflows: Plazi: Pipelines

Page 47: Open Research Data: Taxonomy

Status quo

• 50,000+ treatments life• RDF in Betaversion• GoldenGate Imagine (Text mining tool) in Betaversion• Provider für Daten für NCBI, GBIF, EOL, antweb• Biodiversity Literature Repository functional

Page 48: Open Research Data: Taxonomy

Trait information machine ready

BioDiPResolutionReconciliation

TreatmentBank

NAMES MANAGEMENT

CITATIONMANAGEMENT

REFBANK

TREATMENTMANAGEMENT

ATOMIZATION & SEMANTICIZATION

OF CONTENT MARKUP / initial trait extraction

Specialist taxonomic databases

Next steps: HES-SO, HEG Geneva

Page 49: Open Research Data: Taxonomy

Next steps: CotentMine

Planned collaboration with ContentMine to extract treatments on a daly bases

http://www.slideshare.net/petermurrayrust/?

BioDiv

Page 50: Open Research Data: Taxonomy

article

treatment

CiteshttpURI

cites (DOI)

Scientific name

https://www.wikidata.org/wiki/Property:P1992

Feed Wikipedia with taxonomic data

Page 51: Open Research Data: Taxonomy

Publications, one of our footprints

Page 52: Open Research Data: Taxonomy

Next steps

• 1 Million treatments life• RDF Version 1• GoldenGate Imagine (Text mining tool) Version 1• Provider für Daten für NCBI, GBIF, EOL, antweb• Biodiversity Literature Repository mit 100,000

Bibliographischen Referenzen und digitalen Versionen

Page 53: Open Research Data: Taxonomy

Danke!

Donat [email protected]