broadening the scope of nanopublications

24
Broadening the Scope of Nanopublications Tobias Kuhn, 1,2 Paolo Emilio Barbano, 3 Mate Levente Nagy, 4 Michael Krauthammer 4,1 1 Department of Pathology, Yale University 2 Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich 3 Department of Mathematics, Yale University 4 Program for Computational Biology and Bioinformatics, Yale University ESWC 2013, Montpellier (France) 29 May 2013

Upload: tobias-kuhn

Post on 10-May-2015

337 views

Category:

Technology


3 download

DESCRIPTION

(CC Attribution License does not apply to included third-party material on slide 3; see the paper for the references: http://www.tkuhn.ch/pub/kuhn2013eswc.pdf )

TRANSCRIPT

Page 1: Broadening the Scope of Nanopublications

Broadening the Scope of Nanopublications

Tobias Kuhn,1,2 Paolo Emilio Barbano,3 Mate Levente Nagy,4

Michael Krauthammer4,1

1Department of Pathology, Yale University2Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich

3Department of Mathematics, Yale University4Program for Computational Biology and Bioinformatics, Yale University

ESWC 2013, Montpellier (France)29 May 2013

Page 2: Broadening the Scope of Nanopublications

Motivation

The key problem of the current system of scholarly communication isthat it is centered around narrative articles:

• They are good for individual consumption by human beings butvery bad for aggregation or automatic processing

• They are very ineffective for sharing scientific information,especially in data-intensive sciences

• There are no rewards for providing, sharing, and maintainingdatasets, software, and other digital artifacts

The current system is slow and inefficient.

Nanopublications have been proposed to solve these problems: theyare minimal portions of scientific contributions in RDF.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 2 / 24

Page 3: Broadening the Scope of Nanopublications

Vision: Changing Scholarly Communication

NowNarrative articles at the center

FutureNanopublications at the center

Images from Mons et al. The value of data. Nature genetics, 43(4):281–283, 2011

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 3 / 24

Page 4: Broadening the Scope of Nanopublications

Structure of Nanopublications

Nanopub0001

Assertion:

opm:wasDerivedFrom d:DataSourceXcito:cites n:nanopub0042dc:created “2013-01-01”pav:createdBy p:Isabelle_Duboisdc:isPartOf c:NanoPubCollection1

Provenance:

ns1:mosquito ns2:malaria

ns3:transmission

Assertion:

• Formalized scientific claim (or hypothesis)

Provenance:

• Reference to article, experimental methods, etc.

• Who published it when and how

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 4 / 24

Page 5: Broadening the Scope of Nanopublications

nanobrowser: Classical Nanopublication

http://nanobrowser.inn.ac

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 5 / 24

Page 6: Broadening the Scope of Nanopublications

Ocean of Nanopublications

authors

users

curators

structured data sources

unstructured data sources

ocean ofnanopublications

nanopublication portals

in silico experiments

network analyses

aggregated views

hypothesis generation

...bots

1

2

3

45

6

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 6 / 24

Page 7: Broadening the Scope of Nanopublications

Proposed Extension 1:Informal Assertions

Nanopub0012

Assertion:

opm:wasDerivedFrom d:DataSourceXcito:cites n:nanopub0042dc:created “2013-01-01”pav:createdBy p:Isabelle_Duboisdc:isPartOf c:NanoPubCollection1

Provenance:

Malaria is transmitted by mosquitoes.

Assertion:

• Informal English sentence

• Sentences are independent entities and represented by URIs:http://purl.org/aida/Malaria+is+transmitted+by+mosquitoes.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 7 / 24

Page 8: Broadening the Scope of Nanopublications

Levels of Formalization

• Informal (only an AIDA sentence)

• Underspecified (formal representation for part of the sentence)

• Fully formal (formal representation for the complete sentence)

Malaria is transmitted by mosquitoes.

Malaria is transmitted by mosquitoes.

ns1:mosquito

ns3:transmission ns1:mosquito

Malaria is transmitted by mosquitoes.

ns2:malaria

ns3:transmission

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 8 / 24

Page 9: Broadening the Scope of Nanopublications

Proposed Extension 2:Non-Scientific Assertions

Nanopub0042

Assertion:

dc:created “2013-05-01”pav:createdBy p:Giuseppe

Provenance:

p:Giuseppe npx:disagrees

a:Malaria+is+transmitted+by+...

Assertion:

• Meta-statement or other non-scientific assertion

• Can include opinions, social relations, introduction of newentities, ...

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 9 / 24

Page 10: Broadening the Scope of Nanopublications

nanobrowser: Nanopublication with Sentence

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 10 / 24

Page 11: Broadening the Scope of Nanopublications

Approach

Related existing approaches: SWAN, EXPO, GeneRIF, BEL

Our approach differs from these approaches in the following respects:

• Very broad application area (science as a whole and beyond)

• Sentences exist independently from authors (no ownership ofsentences)

• Use of a controlled natural language

• Continuum from informal over underspecified to fully formalstatements

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 11 / 24

Page 12: Broadening the Scope of Nanopublications

AIDA Sentences

To fit into the nanopublication concept, the sentences of ourapproach should be AIDA:

• Atomic: a sentence describing one thought that cannot befurther broken down in a practical way

• Independent: a sentence that can stand on its own, withoutexternal references like “this effect” or “we”

• Declarative: a complete sentence ending with a full stop thatcould in theory be either true or false

• Absolute: a sentence describing the core of a claim ignoring theuncertainty about its truth and how it was discovered (no“probably” or “evaluation showed”); typically in present tense

Example

The majority of patients with idiopathic REM sleep behavior disorder who developa neurodegenerative disease develop Parkinson disease and Lewy body dementia.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 12 / 24

Page 13: Broadening the Scope of Nanopublications

Linking Scientific Claims

ns1:mosquito

Malaria is transmitted by mosquitoes.

ns2:malaria

ns3:transmission

Possible relations:

• [CLAIM] is equivalent to / contradicts / is similar to [CLAIM]

• [PERSON] agrees with / disagrees with / challenges [CLAIM]

• [STUDY] provides (counter-)evidence for [CLAIM]

These relations can be published as nanopublications too!

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 13 / 24

Page 14: Broadening the Scope of Nanopublications

nanobrowser: Opinions and Sentence Relations

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 14 / 24

Page 15: Broadening the Scope of Nanopublications

Publishing AIDA-Nanopubs

http://nanobrowser.inn.ac/publish

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 15 / 24

Page 16: Broadening the Scope of Nanopublications

Evaluation

authors

users

curators

structured data sources

unstructured data sources

ocean ofnanopublications

nanopublication portals

in silico experiments

network analyses

aggregated views

hypothesis generation

...bots

1

2

3

45

6

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 16 / 24

Page 17: Broadening the Scope of Nanopublications

Evaluation of AIDA-Nanopublications

So far, the following aspects have been evaluated:

• How well can authors or curators express scientific findings asAIDA sentences? (channels 1 and 3)

• How well can AIDA sentences be automatically extracted fromexisting text sources? (channel 4)

• How well can similar AIDA sentences be automatically clustered?(channel 6)

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 17 / 24

Page 18: Broadening the Scope of Nanopublications

User Study Design

Questionnaire-style online study:

• 16 participants: Scientists with strong background in biologyand/or medicine

• Five short texts (one or two sentences) from conclusion sectionsof structured abstracts of PubMed articles

• Task: Rewrite each short text as one or more AIDA sentences

• Brief explanation of the task and the AIDA concept

Example

Original text: The results of this study showed that the hepatic reticuloendothelialfunction is impaired in cirrhotic patients, but the degree of impairment does notdiffer between patients with and without previous history of SBP.

AIDA 1: The hepatic reticuloendothelial function is impaired in cirrhotic patients.

AIDA 2: The degree of hepatic reticuloendothelial function impairment does notdiffer between cirrhotic patients with and without previous history of SBP.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 18 / 24

Page 19: Broadening the Scope of Nanopublications

User Study Results

Quality of the sentences created within the user study:

163total 100%

114perfect 70%

7typo etc. 4%

10inaccurate 6%

32not AIDA 20%

4not atomic 2%

5not independent 3%

3not declarative 2%

25not absolute 15%

0 20 40 60 80 100 120 140 160sentences

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 19 / 24

Page 20: Broadening the Scope of Nanopublications

Automatic Extraction

Evaluation of automatic extraction of AIDA-nanopubs:

• We used the GeneRIF dataset, which contains sentencesdescribing the functions of genes and proteins

• Simple regular expressions to filter out sentences that are notAIDA-compliant

• Simple transformations on the sentences, such as droppingcertain initial phrases

Example

Original GeneRIF sentence: We have established that LadC plays an importantrole in L. pneumophila infection.

Extracted AIDA sentence: LadC plays an important role in L. pneumophilainfection.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 20 / 24

Page 21: Broadening the Scope of Nanopublications

Automatic Extraction Results

Quality of the sentences extracted from the GeneRIF dataset:

250total 100%

177perfect 71%

8typo etc. 3%

65not AIDA 26%

34not atomic 14%

1not independent 0%

15not declarative 6%

30not absolute 12%

0 25 50 75 100 125 150 175 200 225 250sentences

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 21 / 24

Page 22: Broadening the Scope of Nanopublications

Automatic Clustering

Evaluation of automatic clustering:

• Sentences extracted from GeneRIF: 119 088 unique sentences

• Sentences from user study: 94 unique sentences from five tasks

• Results: Sentences from a user study task were clustered almostexclusively (99.2%) with other sentences from the same task

Example

Hepatic reticuloendothelial function is impaired to the same degree in cirrhoticpatients with or without a previous history of SBP.

History of spontaneous bacterial peritonitis does not affect impairment of hepaticreticuloendothelial function in cirrhotic patients.

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 22 / 24

Page 23: Broadening the Scope of Nanopublications

Conclusions

As AIDA sentences:

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 23 / 24

Page 24: Broadening the Scope of Nanopublications

Thank you for your Attention!

Questions?

Tobias Kuhn, Yale / ETH Zurich Broadening the Scope of Nanopublications 24 / 24