20110122 vibrant final

66
The Future of Scientific Publishing Donat Agosti (Plazi, Bern) 21 January 2011 Paris

Upload: agosti

Post on 10-May-2015

416 views

Category:

Documents


0 download

DESCRIPTION

lecture presented at ViBRANT meeting in PARIS, January 20, 2011

TRANSCRIPT

Page 1: 20110122 vibrant final

The Future of Scientific Publishing

Donat Agosti (Plazi, Bern) 21 January 2011

Paris

Page 2: 20110122 vibrant final

I don‘t know the future, but I have a dream…

Page 3: 20110122 vibrant final

Immersing in the knowledge

Page 4: 20110122 vibrant final

I want to ask a publication a question, not the author telling me what I have

to read.

Page 5: 20110122 vibrant final

I want to find out

how many and which species are there? how are they related? do they disappear?

how are they distributed?

Page 6: 20110122 vibrant final

I want to find out

how many and which species there are how are they related do they disappear

Other people have different interests

Page 7: 20110122 vibrant final

An example from the Neurocommons text mining pilot:

• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381

• extracted graph of 30,000+ relationships and 5,500 genes and proteins “protein-protein

interaction networks” John Wilbanks, Neurocommons

Page 8: 20110122 vibrant final
Page 9: 20110122 vibrant final

In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:

27,266 papers

4,563 papers41,985 papers

10,365 papers

128,437 papers

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 10: 20110122 vibrant final

It will open up scientific literature for data mining

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 11: 20110122 vibrant final

An example from the taxonomy text mining pilot:

• Every year: > 17,000 new species described / year• Every year: >100,000 species redescribed /year• Total journals: >2,000 with taxonomic content• Total: 1,900,000 species described• Total: >20,000,000 treatments• text mining processed: 0

• extracted graph of 0 species 0 relationships Taxon mining project

Page 12: 20110122 vibrant final

1996

Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination

Page 13: 20110122 vibrant final

2011

Experience, Frustration, Wonder, Excitment, Satisfaction,

Determination

Page 14: 20110122 vibrant final

Modeling taxonomic literature:TaxonX

Taxpub NLM DTDPlazi

Page 15: 20110122 vibrant final

- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names

- Get bibliographic Metadata from HNS (MODS)

- Get bibliographic Guids from bioguid (or EDIT?)

- Get geographic long/lat from geonames.org

Plazi workflow: GoldenGate mark up as an example

-Get Guids for - CBOL- NCBI- specimen- images- .....

Page 16: 20110122 vibrant final

The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to the underlying data: Fisher & Smith, 2008, PLoS ONE.

Page 17: 20110122 vibrant final

Plazi Search and Retrieval Server: Access to data

TAPIR, SPM

You

You

You

human

machine

Page 18: 20110122 vibrant final

The conversion comes at a cost, even though GoldenGate and other editors exist

Page 19: 20110122 vibrant final

Ann. Soc. Entomol. Belg.

0

1

2

3

4

5

6

7

3961

3967

3956

3954

3855

3686

3920

3923

3712

3953

3786

3723

4001

4018

3715

3940

4022

4026

8070

HNS ID

min

Time per minute to produce clean OCR using ABBYY; publications in chronological order

Production metrics to measure effort and compare various approaches and alogrithm

Page 20: 20110122 vibrant final

How to mark up large body of legacy publications?

Inhouse?Build / use commercial services?Use the community, e.g. volunteers?

Activation energy

Gutenberg Semantic Web

Cos

t pe

r kn

owle

dge

Page 21: 20110122 vibrant final

Training and demos...

Page 22: 20110122 vibrant final

Avoid it

Page 23: 20110122 vibrant final

Prospective publications:Zookeys / Phytokeys

Page 24: 20110122 vibrant final

Semantic enhancements to published texts

Page 25: 20110122 vibrant final

2036

?

Page 26: 20110122 vibrant final

Why do we publish?

Page 27: 20110122 vibrant final

Public funded research

Page 28: 20110122 vibrant final

Contribute to the welfare of the nations…

Page 29: 20110122 vibrant final

Dissemination

Page 30: 20110122 vibrant final

Access

Page 31: 20110122 vibrant final

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).

Page 32: 20110122 vibrant final

Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)

Page 33: 20110122 vibrant final

The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copyright, ie older then 1925. ........ to be finished in 2048...

Page 34: 20110122 vibrant final

What is a publication from public funded science?

Page 35: 20110122 vibrant final
Page 36: 20110122 vibrant final

Open Access

Page 37: 20110122 vibrant final

What is a scientific publication?

Print, journal, article, treatment, public funding, pdf, xml

Tool to disseminate scientific knowledge

Page 38: 20110122 vibrant final

Why do we publish the way we publish?

Page 39: 20110122 vibrant final

What kind of publications serve our needs?

Page 40: 20110122 vibrant final

IPBES

Page 41: 20110122 vibrant final

Access

Page 42: 20110122 vibrant final

Beyond the PDF

Page 43: 20110122 vibrant final

Access to what?

Page 44: 20110122 vibrant final

Scratchpad, EOL page, Wikipage, species page

Page 45: 20110122 vibrant final

Treatment

Page 46: 20110122 vibrant final

Treatments come with a lot of overhead

Page 47: 20110122 vibrant final

Title

Author

Abstract

Introduction

Taxon descriptions

Suppl. Materials

AcknowledgmentsReferences

Genus

Diagnosis

Notes

Biology

Distribution

Key to sp.

Species descriptions

Species 1

Species 2

Species 3

Species 4

Species ..

Species n

The structure of a systematics publication

Species treatments

Nomenclature

Diagnosis

Distribution

Material Examined

Comments

Description

Graphic art

Species 1

Page 48: 20110122 vibrant final

Treatments come with a lot of overheadTreatments are highly structured

Page 49: 20110122 vibrant final

Title

Author

Abstract

Introduction

Taxon descriptions

Suppl. Materials

AcknowledgmentsReferences

Genus

Diagnosis

Notes

Biology

Distribution

Key to sp.

Species descriptions

Species 1

Species 2

Species 3

Species 4

Species ..

Species n

The structure of a systematics publication

Species treatments

Nomenclature

Diagnosis

Distribution

Material Examined

Comments

Description

Graphic art

Species 1

Page 50: 20110122 vibrant final

Treatments come with a lot of overheadTreatments are highly structured

Content ist defined

Page 51: 20110122 vibrant final

Treatments come with a lot of overheadTreatments are highly structured

Content ist defined XML can define it

Page 52: 20110122 vibrant final

This can also be applied to entire sections of text, such as the descriptions of a species and its parts.

<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus....</treatment>

Page 53: 20110122 vibrant final

Treatments come with a lot of overheadtreatments are highly structured

Content ist definedXML defines them

The question is, how to get them

Page 54: 20110122 vibrant final

Mark-up of legacy publications

Page 55: 20110122 vibrant final

$$$$$$$$$$$$$$$$$

Page 56: 20110122 vibrant final

Prospective semantic mark-up and linking to external sources is the

future

Page 57: 20110122 vibrant final

Treatment repository+

external resources

Page 58: 20110122 vibrant final

BHL-Modern

Page 59: 20110122 vibrant final

The future is writable.

Page 60: 20110122 vibrant final

Happy Birthday!January 15, 2001

Page 61: 20110122 vibrant final

What is a scientific publication?

Wikipedia entry as a publication?

Page 62: 20110122 vibrant final

Quality control

Page 63: 20110122 vibrant final

What is a scientific publication?

Centrifugal versus centripetal forcesor

are we attractive enough?

Page 64: 20110122 vibrant final

Continuity

Page 65: 20110122 vibrant final

$$$$$$$

Page 66: 20110122 vibrant final

http://plazi.org

Thank you very much!

Donat Agosti

[email protected]