executing the research paper

84
How to Execute the Research Paper Anita de Waard Disruptive Technology Director, Elsevier Labs http://elsatglabs.com/labs/anita

Upload: anita-de-waard

Post on 10-May-2015

1.466 views

Category:

Technology


0 download

DESCRIPTION

Talk held at STM Innovations Workshop, London, UK December 2, 2011http://www.stm-assoc.org/events/stm-innovations-seminar-2011/

TRANSCRIPT

Page 1: Executing the Research Paper

How to Execute the Research Paper

Anita de WaardDisruptive Technology Director, Elsevier Labs

http://elsatglabs.com/labs/anita

Page 2: Executing the Research Paper

How to execute a research paper

- Why?

- Three use cases for linked, integrated knowledge

- What?

- Three technologies for enabling this linking and execution

- How?

- Three tools for annotation, storage and access

- What next?

- Force11 and ideas about the future

Page 3: Executing the Research Paper

Three Use Cases

3

Page 4: Executing the Research Paper

Use case #1: Claim-Evidence Network in Medicine

Page 5: Executing the Research Paper

Background: Proper implementation of clinical decision support systems (CDS) can:- Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.

Use case #1: Claim-Evidence Network in Medicine

Page 6: Executing the Research Paper

Background: Proper implementation of clinical decision support systems (CDS) can:- Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.

Requirements: To that end, such systems need to:- Be able to answer complex questions - Aggregate data from multiple sources, combining complex patient specific data

with information from external sources- Be semantically aware - Be continually updated with the latest validated research results.

Use case #1: Claim-Evidence Network in Medicine

Page 7: Executing the Research Paper

Background: Proper implementation of clinical decision support systems (CDS) can:- Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.

Requirements: To that end, such systems need to:- Be able to answer complex questions - Aggregate data from multiple sources, combining complex patient specific data

with information from external sources- Be semantically aware - Be continually updated with the latest validated research results.

Components: To develop such semantically aware systems, we need:- Flexible frameworks supporting the development of such applications- Seamless integration of relevant content- Content sources with high quality content - Tools enabling the extraction and aggregation of such content.

Use case #1: Claim-Evidence Network in Medicine

Page 8: Executing the Research Paper

5

A. Philips’ Electronic Patient Records B. Elsevier-published Clinical Guideline

C. Elsevier (or other publisher’s) Research Report or Data

Use case #1: Claim-Evidence Network in Medicine

Page 9: Executing the Research Paper

5

A. Philips’ Electronic Patient Records B. Elsevier-published Clinical Guideline

C. Elsevier (or other publisher’s) Research Report or Data

Step 1: Patient data + diagnosis link to Guideline recommendation

Use case #1: Claim-Evidence Network in Medicine

Page 10: Executing the Research Paper

5

A. Philips’ Electronic Patient Records B. Elsevier-published Clinical Guideline

C. Elsevier (or other publisher’s) Research Report or Data

Step 1: Patient data + diagnosis link to Guideline recommendation

Step 2: Guideline recommendation links to evidence in report or data

Use case #1: Claim-Evidence Network in Medicine

Page 11: Executing the Research Paper

Use case #2: Updating Drug-Drug Interactions

Page 12: Executing the Research Paper

Background: - Drug-drug interactions (DDIs) are a significant source of preventable adverse

effects- Factors contributing to the occurrence of preventable DDIs include:

- a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providers

Use case #2: Updating Drug-Drug Interactions

Page 13: Executing the Research Paper

Background: - Drug-drug interactions (DDIs) are a significant source of preventable adverse

effects- Factors contributing to the occurrence of preventable DDIs include:

- a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providers

Requirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will:- Manually mark up a diverse collection of content with DDIs - Develop/train NLP tools to recognize these- Create a triple store to maintain the relationships between drugs-DDIs-content

Use case #2: Updating Drug-Drug Interactions

Page 14: Executing the Research Paper

Background: - Drug-drug interactions (DDIs) are a significant source of preventable adverse

effects- Factors contributing to the occurrence of preventable DDIs include:

- a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providers

Requirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will:- Manually mark up a diverse collection of content with DDIs - Develop/train NLP tools to recognize these- Create a triple store to maintain the relationships between drugs-DDIs-content

Components: To develop this system, we need:- Scientific discourse ontologies to mark up relevant statement and seed NLP- Natural language processing to identify relevant DDI- Linked Data architecture to enable storage and access to this information

Use case #2: Updating Drug-Drug Interactions

Page 15: Executing the Research Paper

7

Use case #2: Updating Drug-Drug Interactions

Images from: Discovering drug–drug interactions: a text-mining and reasoningapproach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382

Page 16: Executing the Research Paper

7

Use case #2: Updating Drug-Drug Interactions

Images from: Discovering drug–drug interactions: a text-mining and reasoningapproach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382

Step 1: Manually identify DDIs and drug names in wide collection of content sources

Page 17: Executing the Research Paper

7

Use case #2: Updating Drug-Drug Interactions

Images from: Discovering drug–drug interactions: a text-mining and reasoningapproach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382

Step 1: Manually identify DDIs and drug names in wide collection of content sources

Step 2: Develop a model of Drug-Drug Interaction and define candidates

Page 18: Executing the Research Paper

7

Use case #2: Updating Drug-Drug Interactions

Images from: Discovering drug–drug interactions: a text-mining and reasoningapproach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382

Step 1: Manually identify DDIs and drug names in wide collection of content sources

Step 2: Develop a model of Drug-Drug Interaction and define candidates

Step 3: Automate this process and store as Linked Data

Page 19: Executing the Research Paper

Use Case #3: Review and share code

Page 20: Executing the Research Paper

Background: - Core of computational papers is the software- If code is not part of the paper, hard to assess quality- Code reuse can reduce waste of time and (taxpayer’s) money

Use Case #3: Review and share code

Page 21: Executing the Research Paper

Background: - Core of computational papers is the software- If code is not part of the paper, hard to assess quality- Code reuse can reduce waste of time and (taxpayer’s) money

Requirements: - Provide a way to create, share and review code- Integrate this with the research paper- Enable integration with publisher’s system

Use Case #3: Review and share code

Page 22: Executing the Research Paper

Background: - Core of computational papers is the software- If code is not part of the paper, hard to assess quality- Code reuse can reduce waste of time and (taxpayer’s) money

Requirements: - Provide a way to create, share and review code- Integrate this with the research paper- Enable integration with publisher’s system

Components: - Integration between workflow and text authoring- Code authoring tools and standards that allow reuse- User environment that allows access to disparate results types

Use Case #3: Review and share code

Page 23: Executing the Research Paper

9

Step 1: Develop Virtual Machine environment for creating code

Use Case #3: Review and share code

Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papersProcedia Computer Science 00 (2011) 1–6

Page 24: Executing the Research Paper

9

Step 1: Develop Virtual Machine environment for creating code

Use Case #3: Review and share code

Step 2: Create authoring/review environment to allow VM evaluation

Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papersProcedia Computer Science 00 (2011) 1–6

Page 25: Executing the Research Paper

9

Step 1: Develop Virtual Machine environment for creating code

Use Case #3: Review and share code

Step 2: Create authoring/review environment to allow VM evaluation

Step 3: Allow access to integrated environment through SciVerse App store

Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papersProcedia Computer Science 00 (2011) 1–6

Page 26: Executing the Research Paper

Three Technologies

10

Page 27: Executing the Research Paper

11

Technology #1: Discourse Annotation - at text level

Page 28: Executing the Research Paper

11

Aristotle QuintilianQuintilian Scientific Paper

prooimionIntroduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesisStatement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

  Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.

Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here.

Results

  Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work

epilogos peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

Technology #1: Discourse Annotation - at text level

Page 29: Executing the Research Paper

11

Aristotle QuintilianQuintilian Scientific Paper

prooimionIntroduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesisStatement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

  Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.

Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here.

Results

  Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work

epilogos peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

Technology #1: Discourse Annotation - at text level

Page 30: Executing the Research Paper

12

The Story of Goldilocks and the Three Bears

Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters

Setting

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location

Setting

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.she walked right in. Attempt

Theme

Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal

Episode 1

Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt

Episode 1

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1. This porridge is too hot! she

exclaimed.Outcome

Episode 1

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

So, she tasted the porridge from the second bowl.

Activity

Episode 1

Data (data not shown),

This porridge is too cold, she said Outcome

Episode 1

Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

 Activity

Episode 1

Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome

Episode 1

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles she ate it all up.  

Episode 1

Data (Figure 1F),

Technology #1: Discourse Annotation - at paragraph level

Page 31: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Technology #1: Discourse Annotation - at clause level

Page 32: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Technology #1: Discourse Annotation - at clause level

Page 33: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Technology #1: Discourse Annotation - at clause level

Page 34: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Technology #1: Discourse Annotation - at clause level

Page 35: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Technology #1: Discourse Annotation - at clause level

Page 36: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Technology #1: Discourse Annotation - at clause level

Page 37: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Technology #1: Discourse Annotation - at clause level

Page 38: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Technology #1: Discourse Annotation - at clause level

Page 39: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Technology #1: Discourse Annotation - at clause level

Page 40: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

Technology #1: Discourse Annotation - at clause level

Page 41: Executing the Research Paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

ExperimentalEvidence

Technology #1: Discourse Annotation - at clause level

Page 42: Executing the Research Paper

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Technology #1: Discourse Annotation - across texts

Page 43: Executing the Research Paper

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Technology #1: Discourse Annotation - across texts

Page 44: Executing the Research Paper

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Technology #1: Discourse Annotation - across texts

Page 45: Executing the Research Paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Technology #1: Discourse Annotation - across texts

Page 46: Executing the Research Paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Technology #1: Discourse Annotation - across texts

Page 47: Executing the Research Paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Technology #1: Discourse Annotation - across texts

Page 48: Executing the Research Paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Fact

Technology #1: Discourse Annotation - across texts

Page 49: Executing the Research Paper

Technology #1: Towards automated Discourse Annotation: CoreSC

Page 50: Executing the Research Paper

Technology #1: Towards automated Discourse Annotation: CoreSC

Page 51: Executing the Research Paper

Technology #1: Towards automated Discourse Annotation: CoreSC

- Classified with Support Vector Machines (SVM)

- Sequence labelling by Conditional Random Fields (CRF)

- F-score between 18% (motivation) and 76% (experimental methods)

- ‘We plan to use CoreSC annotated papers in biology to guide information extraction and retrieval, characterise extracted events and relations and facilitate inference from hypotheses to conclusions in scientific papers.’

Automatic recognition of conceptualisation zones in scientific articles to aid biological information extractionMaria Liakata,, Shyamasree Saha. Simon Dobnik,Colin Batchelor and Dietrich Rebholz-Schuhmann

Bioinformatics 2011 (Accepted)

Page 52: Executing the Research Paper

Technology #2: Linked Data

Page 53: Executing the Research Paper

Technology #2: Linked Data

1. Use URIs to name things

2. Use HTTP URIs so they can be looked up

3. Return useful data when things are looked up

4. Include links to other things in the returned data

Page 54: Executing the Research Paper

Technology #2: Linked Data

1. Use URIs to name things

2. Use HTTP URIs so they can be looked up

3. Return useful data when things are looked up

4. Include links to other things in the returned data

“Linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.”

Tennison J, 2010. Why Linked Data for data.gov.uk? http://www.jenitennison.com/blog/node/140

Page 56: Executing the Research Paper

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

Technology # 3: Workflow integration

Page 57: Executing the Research Paper

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

Technology # 3: Workflow integration

Page 58: Executing the Research Paper

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Technology # 3: Workflow integration

Page 59: Executing the Research Paper

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Technology # 3: Workflow integration

Page 60: Executing the Research Paper

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Technology # 3: Workflow integration

Page 61: Executing the Research Paper

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

Some other publisher

6. User applications: distributed applications run on this ‘exposed data’ universe.

A. de Waard, The Future of the Journal? Integrating research data with scientific discourse

http://precedings.nature.com/documents/4742/version/1

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Technology # 3: Workflow integration

Page 62: Executing the Research Paper

Results

Logs

Results

Metadata PaperSlides

Workflow  16

Workflow  13

Common  pathways

QTL(C)  Dave  De  Roure

Technology # 3: Workflow integration

Page 63: Executing the Research Paper

Results

Logs

Results

Metadata PaperSlides

Workflow  16

Workflow  13

Common  pathways

QTL(C)  Dave  De  Roure

Technology # 3: Workflow integration

Page 64: Executing the Research Paper

Results

Logs

Results

Metadata PaperSlides

Feeds  into

produces

Included  in

produces Published  in

produces

Included  in

Included  in Included  in

Published  in

Workflow  16

Workflow  13

Common  pathways

QTL(C)  Dave  De  Roure

Technology # 3: Workflow integration

Page 65: Executing the Research Paper

19

Three Tools

Page 66: Executing the Research Paper

Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed

Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of

online documents, Bioontologies, 2011

Page 67: Executing the Research Paper

- Allows for manual and automated annotation, or both- Now linked to NCBO text mining tool, expanding to all UIMA- Standoff annotations in Annotation Ontology = RDF format, can be exported

Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed

Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of

online documents, Bioontologies, 2011

Page 68: Executing the Research Paper

- Allows for manual and automated annotation, or both- Now linked to NCBO text mining tool, expanding to all UIMA- Standoff annotations in Annotation Ontology = RDF format, can be exported

Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed

Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of

online documents, Bioontologies, 2011

Page 69: Executing the Research Paper

Tool # 2: Linked Data Repository

Page 70: Executing the Research Paper

Dublin Core and SKOS

Tool # 2: Linked Data Repository

Page 71: Executing the Research Paper

SWAN’s PAV (Provenance, Authoring and Versioning) ontology

Dublin Core and SKOS

Tool # 2: Linked Data Repository

Page 72: Executing the Research Paper

Tool # 3: ScienceDirect app store

Page 73: Executing the Research Paper

Tool # 3: ScienceDirect app store

- Eclipse SDK platform accessing all ScienceDirect/Scopus content

-Build applications on top of content-Offer to users in marketplace

Page 74: Executing the Research Paper

What next?

23

Page 75: Executing the Research Paper

Force11 http://force11.orgForce11 = Future of Research Communication and e-Scholarship, 2011 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing.

Page 76: Executing the Research Paper

Force11 http://force11.orgForce11 = Future of Research Communication and e-Scholarship, 2011 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing.

Page 77: Executing the Research Paper

Force11 http://force11.org

- Individually and collectively, we aim to bring about a change in scholarly communication through the effective use of information technologies

- Next step: work on these issues.- We need more publishers on

board!

Force11 = Future of Research Communication and e-Scholarship, 2011 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing.

Page 78: Executing the Research Paper

Some thoughts about the future:

Page 79: Executing the Research Paper

Some thoughts about the future:

- Let’s think in terms of use cases, not technologies:

- Identify where knowledge exists, within and outside of the article

- Identify what the information needs are, and which components need to be connected

- Only if our content plays well with others does it get to stay in the game!

Page 80: Executing the Research Paper

Some thoughts about the future:

- Let’s think in terms of use cases, not technologies:

- Identify where knowledge exists, within and outside of the article

- Identify what the information needs are, and which components need to be connected

- Only if our content plays well with others does it get to stay in the game!

- Work with scientists, grant agencies, libraries, software developers big and small and.... each other!

Page 81: Executing the Research Paper

Some thoughts about the future:

- Let’s think in terms of use cases, not technologies:

- Identify where knowledge exists, within and outside of the article

- Identify what the information needs are, and which components need to be connected

- Only if our content plays well with others does it get to stay in the game!

- Work with scientists, grant agencies, libraries, software developers big and small and.... each other!

- For instance, let’s collectively look at enabling:

- Standoff annotation formats

- Research data and workflow standards/integration

- Claim-evidence networks and discourse annotation:

Page 82: Executing the Research Paper
Page 83: Executing the Research Paper

- Which discourse annotation schemes are most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains and different genres (research papers, reviews, patents, etc)?

- How can we compare annotations, and how can we decide which features, approaches or techniques work best? What are the most topical use cases? How can we evaluate performance and what are the most appropriate tasks?

- What corpora are currently available for comparing and contrasting discourse annotation, and how can we improve and increase these?

- How applicable are these efforts for improving methods of publishing, detecting and correcting author's errors at the discourse level, or summarizing scholarly text? How close are we to implementing them at a production scale?

Page 84: Executing the Research Paper

Thank you!

More information:

- Data2Semantics: http://www.data2semantics.org

- W3C group on Discourse Structure: http://www.w3.org/wiki/HCLSIG/SWANSIOC

- Executable Paper Challenge: http://www.executablepapers.com

- Parsing rhetoric: http://elsatglabs.com/labs/anita/

- Sapienta: http://www.sapientaproject.com/

- SciVerse: http://developer.sciverse.com

- Force11: http://force11.org

- DSSD2012: http://www.nactem.ac.uk/dssd/

Or contact me: Anita de Waard, [email protected]

- Tim Clark, Paolo Ciccarese, Harvard, Cambridge, USA

- Eduard Hovy, Gully Burns, Cartic Ramakrishnan, ISI/USC, Los Angeles, USA

- Phil Bourne, Maryann Martone, UCSD, USA

- Sophia Ananiadou, NaCTeM, Manchester, UK

- Dave DeRoure, Oxford eScience Center, UK

- Maria Liakata, EBI, Cambridge, UK

- Paul Groth, Frank van Harmelen, Vrije Universiteit, Amsterdam, Netherlands

- Henk Pander Maat, Ted Sanders, Universiteit Utrecht, Netherlands

- The Force11 members