how scientists read, and whether computers can help them
DESCRIPTION
Talk given at the COBRE workshop August 23-25 2012, Bozeman, MT http://www.chemistry.montana.edu/cobre/workshop/Program.htmlTRANSCRIPT
How Scien*sts Read, And Whether Computers Can Help Them
Anita de Waard Disrup*ve Technologies Director
Elsevier Labs
Making Sense of Biological Systems, Bozeman, MT
Outline
• Why do scien*sts read? • How do we read? (Discourse comprehension 101) • What do we need to read:
– Noun phrases – Triples – Metadiscourse – Claims and Evidence
• Can the computer iden*fy these components? • Some thoughts on explaining our texts to computers
How and why scien*sts read: • Why do we read? To learn, i.e.: obtain the knowledge contained within the text and integrate it with what we already know.
• What do we read? Things that are ‘interes*ng’ : – Per*nent – Possibly/probably true – Novel, but in agreement with what I know
• How do we read?
Discourse Comprehension 101 • LeTer < syllable < word < clause < sentence < discourse:
This is how linguis*cs is structured. But it is not how we understand text!
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
Discourse Comprehension 101
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
Discourse Comprehension 101
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
Discourse Comprehension 101
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
Discourse Comprehension 101
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
Discourse Comprehension 101
• LeTer < syllable < word < clause < sentence < discourse: This is how linguis*cs is structured. But it is not how we understand text!
• Kintsch and Van Dijk, ‘93: we read a text at three levels: – surface code: literal text, exact words/syntax – text base: preserves meaning, but not exact wording – situa*on model: ‘microworld’ that the text is about: constructed inferen*ally through interac*on between the text and background knowledge
• We use knowledge about text genre to ac*vate a schema: this allows crea*on of the text base and situa*on model
Discourse Comprehension 101
human breast cancer
noninvasive MCF7-‐Ras
an*sense oligonucleo*des
high-‐grade malignancy
cell viability retroviral vector
miR-‐31
cloned
transiently expressed miRNA sponges
Is it per*nent? -‐> Possibly… Is it true? -‐> ? Is it new, but in agreement with what I know? -‐> -‐?
What is this paper about? A. NOUN PHRASES
miR-‐31 PREVENT acquisi*on of aggressive traits
miR-‐31 INHIBIT noninvasive MCF7-‐Ras cells
miR-‐31 ENHANCE invasion
cell viability AFFECT inhibitor
miR-‐31 expression DEPRIVE metasta*c cells
Is it per*nent? -‐> Possibly… Is it true? -‐> ? Is it new, but in agreement with what I know? -‐>?
What is this paper about? B. TRIPLES
The preceding observa*ons demonstrated that X expression deprives Y cells of aTributes associated with Z. We next asked whether X also prevents the acquisi*on of A traits by B cells. To do so, we transiently inhibited X in C cells with either D or E. Both approaches inhibited X func*on by > 4.5-‐fold (Figure S7A). Suppression of X enhanced invasion by 20-‐fold and mo*lity by 5-‐fold, but F was unaffected by either inhibitor (Figure 3A; Figure S7B). The E sponge reduced X func*on by 2.5-‐fold, but did not affect the ac*vity of other known Js (Figures S8A and S8B). Collec*vely, these data indicated that sustained X ac*vity is necessary to prevent the acquisi*on of Z traits by both K and untransformed B cells.
Is it per*nent? -‐> Need content Is it true? -‐> Sounds likely! I know this stuff! Is it new, but in agreement with what I know? -‐> Need content
What is this paper about? C. METADISCOURSE
Claim: • sustained miR-‐31 ac*vity is necessary to prevent the acquisi*on of aggressive
traits by both tumor cells and untransformed breast epithelial Evidence: Method: • We transiently inhibited miR-‐31 in noninvasive MCF7-‐Ras cells with either
an*sense oligonucleo*des or miRNA sponges. Evidence: Result: • Both approaches inhibited miR-‐31 func*on by >4.5-‐fold (Figure S7A). • Suppression of miR-‐31 enhanced invasion by 20-‐fold and mo*lity by 5-‐fold,
but cell viability was unaffected by either inhibitor (Figure 3A; Figure S7B). • The miR-‐31 sponge reduced miR-‐31 func*on by 2.5-‐fold, but did not affect
the ac*vity of other known an*metasta*c miRNAs (Figures S8A and S8B).
What is this paper about? D. CLAIMS AND EVIDENCE
Is it per*nent? -‐> Probably Is it true? -‐> Sounds likely! Is it new, but in agreement with what I know? -‐> Check/know
Is it per*nent? -‐> Possibly Is it true? Is it new, but in agreement with what I know? -‐> Need background
-‐> Probably!
What is this paper about? E. JOURNAL & AUTHOR’S NAMES/AFFILIATIONS
In summary, how scien*sts read: • Surface code provides noun phrases and triples that offer
pointers re. topical relevance • Text base and and situa*on model are created through specific
metadiscourse conven*ons (e.g. refs at the end) that create a biological reasoning model:
• This can be expressed as a set of claims, linked to evidence, that can help represent key points in the paper
• Journal name and author’s affiliaHon help define schema and provide ‘willingness to be convinced’ socially/interpersonally.
We next asked whether … To do so, we transiently inhibited… Suppression of X enhanced invasion … but F was unaffected …(Figure 3A). … Collec*vely, these data indicated that … .
Hypothesis Goal/Method Result Results Implica*on
Can computers help us iden*fy:
A. Noun phrases B. Triples C. Metadiscourse elements D. Claims + evidence E. Journal and author’s names and affilia*on
A. Noun phrases B. Triples C. Metadiscourse elements D. Claims + evidence E. Journal and author’s names and affiliaHon
Can computers help us iden*fy:
Noun Phrases: some issues • Problem 1: disambigua*ng terms (© GoPubMed):
– Hnrpa1 = Tis = Fli-‐2 = nuclear ribonucleoprotein A1 = helix destabilizing protein = single-‐strand binding protein = hnRNP core protein A1 = HDP-‐1 = topoisomerase-‐inhibitor suppressed.
– Cellulose 1,4-‐beta-‐cellobiosidase = exoglucanase – COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T)
• Problem 2: disambigua*ng en**es (© M. Martone): – 95 an*bodies were (manually!) iden*fied in 8 ar*cles – 52 did not contain enough informa*on to determine the an*body used
– Some provided details in other papers – Failed to give species, clonality, vendor, or catalog number
Noun Phrases: some progress • Despite these difficul*es, noun phrase recall/precision is quite high, e.g. I2B22011 [1], [2], others: 90%-‐98%
• Many tools, see [3] for a list; e.g. GoPubMed:
Triples: some issues: • Con*ngent on good NP & VP detec*on • Hard to parse text! E.g. a commercial tool gave: insulin maintaining glucose homeostasis When insulin secre*on cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. insulin may be involved glucose homeostasis Because PANDER is expressed by pancrea*c beta-‐cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.
Triples: some progress: Biological Expression Language [4]: We provide evidence that these miRNAs are potenHal novel oncogenes parHcipaHng in the development of human tesHcular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-‐type p53. Increased abundance of miR-‐372 decreases ac5vity of TP53 r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer SET Disease = “Cancer”
Ac5vity of TP53 decreases cell growth tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
Metadiscourse: why it maTers
• Voorhoeve et al., 2006: “These miRNAs neutralize p53-‐ mediated CDK inhibi*on, possibly through direct inhibi*on of the expression of the tumor suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a gene*c screen, miR-‐372 and miR-‐373 were found to allow prolifera*on of primary human cells that express oncogenic RAS and ac*ve p53, possibly by inhibi*ng the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-‐372 and-‐373, func*on as poten5al novel oncogenes in tes*cular germ cell tumors by inhibi*on of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-‐372 and miR-‐373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
“[Y]ou can transform .. fic*on into fact just by adding or subtrac*ng references”, Bruno Latour [5]
Metadiscourse: some progress • Hedging cues, specula*ve language, modality/nega*on:
– Light et al [6]: finding specula*ve language – Wilbur et al (Hagit) [7]: focus, polarity, certainty, evidence, and direc*onality
– Thompson et al (Sophia) [8]: level of specula*on, type/source of the evidence and level of certainty
• Sen*ment detec*on (e.g. Kim and Hovy [9] a.m.o.): – Holder of the opinion, strength, polarity as ‘mathema*cal func*on’ ac*ng on main proposi*onal content
• Can make this part of the seman*c web: (e.g., Ontology for Reasoning, Certainty and ATribu*on, ORCA [10]): – Value (Presumed True, Probable, Possible, Unknown) – Source (Author, Named Other, Unknown) – Basis (Data, Reasoning, Unknown)
Claims and Evidence: some issues: • Data2Seman*cs [11]: linking clinical guidelines to evidence.
Inconsistency within guideline and guidelines v. evidence: • Studies have demonstrated inconsistent results regarding the use of such
markers of inflamma*on as C-‐reac*ve protein (CRP), interleukins-‐ 6 (IL-‐6) and -‐8, and procalcitonin (PCT) in neutropenic pa*ents with cancer [55–57]. • [55]: PCT and IL-‐6 are more reliable markers than CRP for predic*ng
bacteremia in pa*ents with febrile neutropenia • [56] In conclusion, daily measurement of PCT or IL-‐6 could help iden5fy
neutropenic pa5ents with a stable course when the fever lasts >3 d. …, it would reduce adverse events and treatment costs.
• [57] Our study supports the value of PCT as a reliable tool to predict clinical outcome in febrile neutropenia.
• Drug Interac*on Knowledgebase [12]: how to iden*fy evidence? • R-‐citalopram_is_not_substrate_of_cyp2c19:
• At 10uM R-‐ or S-‐CT, ketoconazole reduced reac*on velocity to 55 -‐60% of control, quinidine to 80%, and omeprazole to 80-‐85% of control (Fig. 6).
Claims and Evidence: some progress • Defining ‘salient knowledge components’ in text:
– Argumenta*ve zones, CoreSC can both be found – Blake, Claim networks (more soon!) – Claimed Knowledge Updates (Sandor/de Waard, [13]):
Perhaps we should start wri*ng for computers?
• So why doesn’t the author add this informa*on? If you’re know you’re going to mine it, why bury it?
• Authoring tools for en*ty iden*fica*on: MS for Chemistry, Math, proteins; some experiments but no solu*on yet [14]
• Authoring tool for triple iden*fica*on (MS Ac*veText) • But the ques*on remains:
A}er we’ve ‘extracted’ all the ‘facts’, what is all the gunk that remains in the filter?
Aristotle Quin5lian Scien5fic Paper
prooimion Introduc*on/ exordium
The introduc*on of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.
Introduc*on: posi*oning
prothesis Statement of
Facts/narraHo
The speaker here provides a narra*ve account of what has happened and generally explains the nature of the case.
Introduc*on: research ques*on
Summary/ proposHHo
The proposi*o provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusa*on. Summary of contents
pis*s Proof/ confirmaHo
The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here. Results
Refuta*on/ refutaHo
As the name connotes, this sec*on of a speech was devoted to answering the counterarguments of one's opponent. Related Work
epilogos peroraHo Following the refuta*o and concluding the classical ora*on, the perora*o conven*onally employed appeals through pathos, and o}en included a summing up.
Discussion: summary, implica*ons.
Perhaps we should explain: a paper is rhetorical?
- goal of the paper is to be published; it uses author/journal as a host - format has co-‐evolved: predator-‐prey rela*onship with reviewers
Story Grammar The Story of Goldilocks and the Three Bears
Setting Time Once upon a time
Character a little girl named Goldilocks
Location She went for a walk in the forest. Pretty soon, she came upon a house.
Theme Goal She knocked and, when no one answered,
Attempt she walked right in.
Episode Name At the table in the kitchen, there were three bowls of porridge.
Subgoal Goldilocks was hungry.
Attempt She tasted the porridge from the first bowl.
Outcome This porridge is too hot! she exclaimed.
Attempt So, she tasted the porridge from the second bowl.
Outcome This porridge is too cold, she said
Attempt So, she tasted the last bowl of porridge.
Outcome Ahhh, this porridge is just right, she said happily and
Outcome she ate it all up.
Paper Grammar
The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins
Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.
Objects of study
the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
Experimental setup
studied and compared in vivo effects and interactions to those of the human protein
Research���goal
Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.
Hypothesis Atx-1 may play a role in the regulation of gene expression
Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files
Subgoal test the function of the AXH domain
Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.
Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells
Data (data not shown),
Results both genotypes show many large holes and loss of cell integrity at 28 days
Data (Figures 1B-1D).
Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles
Data (Figure 1F),
Perhaps we should explain: a paper is a story?
A closer look at verb tense: Conceptual realm: ‘state’ (gnomic) present • ‘Dopaminergic innervation plays a major role in the control of mood
and its perturbation’ Experimental realm: ‘event’ past • ‘Four out of seven cell lines expressed this cluster’, • ‘Adult rats were individually housed for 2 days before testing.’
Argumentational realm: ‘instantaneous’ present; to-infinitive • ‘These results suggest that...’, • ‘To identify these mechanisms…’
Discourse progression: ‘instantaneous’ present • ‘Fig 2a shows that’ • ‘see figure 7A’,
Reference to other work: present perfect - ‘finalised’ past • ‘Previous work has demonstrated that VPCs are sensitive to the
levels of let-60/RAS (Han and Sternberg, 1990).’
Facts in the eternal present
Endogenous small RNAs (miRNAs) regulate gene expression by mechanisms conserved across metazoans.
I sing of golden-‐throned Hera whom Rhea bare. Queen of the immortals is she, surpassing all in beauty: she is the sister and the wife of loud-‐thundering Zeus, -‐-‐the glorious one whom all the blessed throughout high Olympus reverence and honor.
Events in the simple past
Vehicle-‐treated animals spent equivalent *me inves*ga*ng a juvenile in the first and second sessions in experiments conducted in the NAC and the striatum: T1 values were 122 ± 6 s and 114 ± 5 s.
Now the wooers turned to the dance and to gladsome song, and made them merry, and waited *ll evening should come; and as they made merry dark evening came upon them.
Events with embedded facts
We also generated BJ/ET cells expressing the RASV12-‐ERTAM chimera gene, which is only ac*ve when tamoxifen is added (De Vita et al, 2005).
And she took her mighty spear, *pped with sharp bronze, heavy and huge and strong, wherewith she vanquishes the ranks of men-‐of warriors, with whom she is wroth, she, the daughter of the mighty sire.
AMribu5on in the present perfect
miRNAs have emerged as important regulators of development and control processes such as cell fate determina*on and cell death (Abrahante et al., 2003, Brennecke et al., 2003, Chang et al., 2004, Chen et al., 2004, Johnston and Hobert, 2003, Lee et al., 1993]
In this book I have had old stories wriTen down, as I have heard them told by intelligent people, concerning chiefs who have held dominion in the northern countries, and who spoke the Danish tongue; and also concerning some of their family branches, according to what has been told me.
Implica5ons are hedged, and in the present tense
These results indicate that although miR-‐372&3 confer complete protec*on to oncogene-‐induced senescence in a manner similar to p53 inac*va*on, the cellular response to DNA damage remains intact
Now it is said that ever since then whenever the camel sees a place where ashes have been scaTered, he wants to get revenge with his enemy the rat and stomps and rolls in the ashes hoping to get the rat
Tense use in science and mythology:
Some conclusions: • How we read: surface code, textbase, situa*on model • Useful components: find noun phrases, triples,
metadiscourse, claims and evidence • Computers keep ge�ng beTer at iden*fying these • Authoring tools might let us help computers • But for the forseeable future, scien*sts will con*nue to
need to scan the literature to understand and believe science and make connec*ons between knowledge
• To achieve progress, perhaps focus less on what computers can do and more on how humans communicate?
• Let’s pursue collabora*ons with linguists, cogni*ve psychologists etc. on how we read and learn!
Acknowledgements • Funding:
– Elsevier Labs – NWO
• Collaborators: – Henk Pander Maat, UU – Agnes Sandor, XRCE – Jodi Schneider, DERI – Rinke Hoekstra & co, VU – Richard Boyce & co, UpiT – Maria Liakata, EBI – Sophia Ananiadou & co, NaCTeM
• Discussion partners: – Phil Bourne, UCSD – Ed Hovy, – Gully Burns, ISI – Joanne Luciano, RPI – Tim Clark et al., Harvard
… and all of you J!
References [1] J Am Med Inform Assoc. 2010 September; 17(5): 514–518 hTp://dx.doi.org/10.1136/jamia.2010.003947 [2] Quanzhi Li, Yi-‐Fang Brook Wu (2006): Iden*fying important concepts from medical documents, Journal of Biomedical Informa*cs 39 (2006) 668–679 [3] Useful list of resources in bioinforma*cs hTp://www.bioinforma*cs.ca/ [4] Biological Expression Language – hTp://www.openbel.org [5] Latour, B. and Woolgar, S., Laboratory Life: the Social Construc*on of Scien*fic Facts, 1979, Sage Publica*ons [6] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, specula*ons, and statements in between. BioLINK 2004: Linking Biological Literature, Ontologies and Databases 2004:17-‐24. [7] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New direc*ons in biomedical text annota*ons: defini*ons, guidelines and corpus construc*on. BMC Bioinforma*cs 2006, 7:356. [8] Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp Building and Evalua*ng Resources for Biomedical Text Mining 2008. [9] Kim, S-‐M. Hovy, E.H. (2004). Determining the Sen*ment of Opinions. Proceedings of the COLING conference, Geneva, 2004. [10] de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and ATribu*on (ORCA), Seman*c Technologies Applied to Biomedical Informa*cs and Individualized Medicine workshop at ISWC 2012 (submibed) [11] Data2Seman*cs project: hTp://www.data2seman*cs.org/ [12] Boyce R, Collins C, Horn J, Kalet I. (2009) Compu*ng with evidence Part I: A drug-‐mechanism evidence taxonomy oriented toward confidence assignment. J Biomed Inform. 2009 Dec;42(6):979-‐89. Epub 2009 May 10, see also hTp://dbmi-‐icode-‐01.dbmi.piT.edu/dikb-‐evidence/front-‐page.html [13] Sándor, Àgnes and de Waard, Anita, (2012). Iden*fying Claimed Knowledge Updates in Biomedical Research Ar*cles, Workshop on Detec*ng Structure in Scholarly Discourse, ACL 2012. [14] See e.g. hTp://ucsdbiolit.codeplex.com/ and hTp://research.microso}.com/en-‐us/projects/ontology/ for MS Word ontology add-‐ins
Logical structure of epistemic evalua*ons:
For a Proposi*on P, an epistemically marked clause E is an evalua*on of P, where EV, B, S(P), with:
– V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (-‐ 1= possibly untrue, -‐ 2 = probably untrue, -‐3 = assumed untrue)
– B = Basis: Reasoning Data
– S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit Model suggested by Eduard Hovy,
InformaHon Sciences InsHtute University South Califormia
Adding Epistemic Evalua*on Claim ORCA Value
Together, Lats2 and ASPP1 shunt p53 to proapopto*c promoters and promote the death of polyploid cells [1]. (…)
Value = 3 Source = N Basis = 0
Further biochemical characteriza*on of hMOBs showed that only hMOB1A and hMOB1B interact with both LATS1 and LATS2 in vitro and in vivo [39]. (…)
Value = 3 Source = N Basis = Data
Our findings reveal that miR-‐373 would be a poten*al oncogene and it par*cipates in the carcinogenesis of human esophageal cancer by suppressing LATS2 expression.
Value = 1 or 2 ? Source = Author Basis = Data
Furthermore, we demonstrated that the direct inhibi*on of LATS2 protein was mediated by miR-‐373 and manipulated the expression of miR-‐373 to affect esophageal cancer cells growth.
Value = 2 (or 3?) Source = Author Basis = Data
Textual Markers • Modal auxiliary verbs (e.g. can, could, might) • Qualifying adverbs and adjec*ves (e.g. interesHngly, possibly, likely, potenHal, somewhat, slightly, powerful, unknown, undefined)
• References, either external (e.g. ‘[Voorhoeve et al., 2006]’) or internal (e.g. ‘See fig. 2a’).
• Repor*ng/epistemic verbs (e.g. suggest, imply, indicate, show) – either within the clause: ‘These results suggest that...’ – or in a subordinate clause governed by repor*ng-‐verb matrix clause ‘{These results suggest that} indeed, this represents the true endogenous acHvity.’
Markers v. Types: 1 paper, 640 segments Value Modal
Aux Repor5ng Verb
Ruled by RV
Adverbs/Adjec5ves
References
None Total
Total value = 3 1 (0.5%) 81 (40%) 24 (12%) 7 (4%) 41 (20%) 47 (24%) 201(100%)
Total Value = 2 29 (51%) 23 (40%) 1 (2%) 4(7%) 57(100%)
Total Value = 1 9(27%) 11(33%) 11(33%) 1(3%) 1(3%) 33(100%)
Total Value = 0 9 (64%) 3 (21%) 1(7%) 1(7%) 14(100%)
Total No Modality 16(37%) 3(7%) 0 3(7%) 22(50%) 44(100%)
Overall Total 10 (2%) 146(23%) 64(10%) 10(2%) 50(8%) 69(11%) 640(100%)
Most prevalent clause type: “These results suggest that...”
Adverb/Connec*ve thus, therefore, together, recently, in summary
Determiner/Pronoun it, this, these, we/our
Adjec*ve previous, future, beber
Noun phrase data, report, study, result(s); method or reference
Modal form of ‘to be’, may, remain
Adjec*ve oken, recently, generally
Verb show, obtain, consider, view, reveal, suggest, hypothesize, indicate, believe
Preposi*on that, to
Repor*ng verbs vs. epistemic value: Value = 0 (unknown)
establish, (remain to be) elucidated, be (clear/useful), (remain to be) examined/determined, describe, make difficult to infer, report
Value = 1 (hypothe*cal)
be important, consider, expect, hypothesize (5x), give insight, raise possibility that, suspect, think
Value = 2 (probable)
appear, believe, implicate (2x), imply, indicate (12x), play a role, represent, suggest (18x), validate (2x),
Value = 3 (presumed true)
be able/apparent/important /posi*ve/visible, compare (2x), confirm (2x), define, demonstrate (15x), detect (5x), discover, display (3x), eliminate, find (3x), iden*fy (4x), know, need, note (2x), observe (2x), obtain (success/results-‐ 3x), prove to be, refer, report(2x), reveal (3x), see(2x), show(24x), study, view