how scientists read, and whether computers can help them

44
How Scien*sts Read, And Whether Computers Can Help Them Anita de Waard Disrup*ve Technologies Director Elsevier Labs Making Sense of Biological Systems, Bozeman, MT

Upload: anita-de-waard

Post on 10-May-2015

1.549 views

Category:

Technology


5 download

DESCRIPTION

Talk given at the COBRE workshop August 23-25 2012, Bozeman, MT http://www.chemistry.montana.edu/cobre/workshop/Program.html

TRANSCRIPT

How  Scien*sts  Read,  And  Whether  Computers  Can  Help  Them  

Anita  de  Waard  Disrup*ve  Technologies  Director  

Elsevier  Labs  

Making  Sense  of  Biological  Systems,  Bozeman,  MT  

Outline  

•  Why  do  scien*sts  read?  •  How  do  we  read?  (Discourse  comprehension  101)  •  What  do  we  need  to  read:    

– Noun  phrases  –  Triples  – Metadiscourse  –  Claims  and  Evidence  

•  Can  the  computer  iden*fy  these  components?    •  Some  thoughts  on  explaining  our  texts  to  computers  

How  and  why  scien*sts  read:  •  Why  do  we  read?    To  learn,  i.e.:  obtain  the  knowledge  contained  within  the  text  and  integrate  it  with  what  we  already  know.  

•  What  do  we  read?    Things  that  are  ‘interes*ng’  :  – Per*nent  – Possibly/probably  true  – Novel,  but  in  agreement  with  what  I  know  

•  How  do  we  read?    

Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  

This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

Discourse  Comprehension  101  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

Discourse  Comprehension  101  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

Discourse  Comprehension  101  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

Discourse  Comprehension  101  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

Discourse  Comprehension  101  

•  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:  This  is  how  linguis*cs  is  structured.    But  it  is  not  how  we  understand  text!  

•  Kintsch  and  Van  Dijk,  ‘93:  we  read  a  text  at  three  levels:  –  surface  code:  literal  text,  exact  words/syntax  –  text  base:  preserves  meaning,  but  not  exact  wording  –  situa*on  model:  ‘microworld’  that  the  text  is  about:  constructed  inferen*ally  through  interac*on  between  the  text  and  background  knowledge  

•  We  use  knowledge  about  text  genre  to  ac*vate  a  schema:    this  allows  crea*on  of  the  text  base  and  situa*on  model  

Discourse  Comprehension  101  

Examples  of  schema’s:    

What  is  this  paper  about?    

human  breast  cancer    

noninvasive  MCF7-­‐Ras  

an*sense  oligonucleo*des    

high-­‐grade  malignancy  

cell  viability    retroviral  vector  

miR-­‐31  

cloned    

transiently  expressed  miRNA  sponges  

Is  it  per*nent?  -­‐>  Possibly…  Is  it  true?  -­‐>  ?  Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  -­‐?  

What  is  this  paper  about?    A.  NOUN  PHRASES  

miR-­‐31  PREVENT  acquisi*on  of  aggressive  traits  

miR-­‐31  INHIBIT  noninvasive  MCF7-­‐Ras  cells    

miR-­‐31  ENHANCE    invasion    

cell  viability  AFFECT  inhibitor    

miR-­‐31  expression  DEPRIVE  metasta*c  cells  

Is  it  per*nent?  -­‐>  Possibly…  Is  it  true?  -­‐>  ?  Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>?  

What  is  this  paper  about?    B.  TRIPLES  

The  preceding  observa*ons  demonstrated  that  X  expression  deprives  Y  cells  of  aTributes  associated  with  Z.    We  next  asked  whether  X  also  prevents  the  acquisi*on  of  A  traits  by  B  cells.  To  do  so,  we  transiently  inhibited  X  in  C  cells  with  either  D  or  E.    Both  approaches  inhibited  X  func*on  by  >  4.5-­‐fold  (Figure  S7A).  Suppression  of  X  enhanced  invasion  by  20-­‐fold  and  mo*lity  by  5-­‐fold,    but  F  was  unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).      The  E  sponge  reduced  X  func*on  by  2.5-­‐fold,  but  did  not  affect  the  ac*vity  of  other  known  Js  (Figures  S8A  and  S8B).    Collec*vely,  these  data  indicated  that  sustained  X  ac*vity  is  necessary  to  prevent  the  acquisi*on  of  Z  traits  by  both  K  and  untransformed  B  cells.    

Is  it  per*nent?  -­‐>  Need  content  Is  it  true?  -­‐>  Sounds  likely!  I  know  this  stuff!  Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  content    

What  is  this  paper  about?    C.  METADISCOURSE  

Claim:    •  sustained  miR-­‐31  ac*vity  is  necessary  to  prevent  the  acquisi*on  of  aggressive  

traits  by  both  tumor  cells  and  untransformed  breast  epithelial  Evidence:  Method:    •  We   transiently   inhibited   miR-­‐31   in   noninvasive   MCF7-­‐Ras   cells   with   either  

an*sense  oligonucleo*des  or  miRNA  sponges.  Evidence:  Result:    •  Both  approaches  inhibited  miR-­‐31  func*on  by  >4.5-­‐fold  (Figure  S7A).    •  Suppression  of  miR-­‐31   enhanced   invasion  by   20-­‐fold   and  mo*lity   by   5-­‐fold,  

but  cell  viability  was  unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).    •  The  miR-­‐31   sponge   reduced  miR-­‐31   func*on   by   2.5-­‐fold,   but   did   not   affect  

the  ac*vity  of  other  known  an*metasta*c  miRNAs  (Figures  S8A  and  S8B).  

What  is  this  paper  about?    D.  CLAIMS  AND  EVIDENCE  

Is  it  per*nent?  -­‐>  Probably  Is  it  true?  -­‐>  Sounds  likely!        Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  Check/know  

Is  it  per*nent?  -­‐>  Possibly    Is  it  true?    Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  background  

-­‐>  Probably!  

What  is  this  paper  about?    E.  JOURNAL  &  AUTHOR’S  NAMES/AFFILIATIONS  

In  summary,  how  scien*sts  read:  •  Surface  code  provides  noun  phrases  and  triples  that  offer  

pointers  re.  topical  relevance  •  Text  base  and  and  situa*on  model  are  created  through  specific  

metadiscourse  conven*ons    (e.g.  refs  at  the  end)  that  create  a  biological  reasoning  model:    

•  This  can  be  expressed  as  a  set  of  claims,  linked  to  evidence,  that  can  help  represent  key  points  in  the  paper  

•  Journal  name  and  author’s  affiliaHon  help  define  schema  and  provide  ‘willingness  to  be  convinced’  socially/interpersonally.  

We  next  asked  whether  …  To  do  so,  we  transiently  inhibited…    Suppression  of  X  enhanced  invasion  …    but  F  was  unaffected  …(Figure  3A).    …  Collec*vely,  these  data  indicated  that  …  .  

Hypothesis  Goal/Method  Result  Results  Implica*on  

Can  computers  help  us  iden*fy:  

A.  Noun  phrases  B.  Triples  C.  Metadiscourse  elements  D.  Claims  +  evidence  E.  Journal  and  author’s  names  and  affilia*on  

A.  Noun  phrases  B.  Triples  C.  Metadiscourse  elements  D.  Claims  +  evidence  E.  Journal  and  author’s  names  and  affiliaHon  

Can  computers  help  us  iden*fy:  

Noun  Phrases:  some  issues  •  Problem  1:  disambigua*ng  terms  (©  GoPubMed):  

–  Hnrpa1  =  Tis  =  Fli-­‐2  =  nuclear  ribonucleoprotein  A1  =  helix  destabilizing  protein  =  single-­‐strand  binding  protein  =  hnRNP  core  protein  A1  =  HDP-­‐1  =  topoisomerase-­‐inhibitor  suppressed.  

–  Cellulose  1,4-­‐beta-­‐cellobiosidase  =  exoglucanase  –  COLD  =/  C.O.L.D.  =/  cold  (runny  nose)  =/  cold  (low  T)    

•  Problem  2:  disambigua*ng  en**es  (©  M.  Martone):  –  95  an*bodies  were  (manually!)  iden*fied  in  8  ar*cles  –  52  did  not  contain  enough  informa*on  to  determine  the  an*body  used  

–  Some  provided  details  in  other  papers  –  Failed  to  give  species,  clonality,  vendor,  or  catalog  number  

Noun  Phrases:  some  progress  •  Despite  these  difficul*es,  noun  phrase  recall/precision  is  quite  high,  e.g.  I2B22011  [1],  [2],  others:  90%-­‐98%  

•  Many  tools,  see  [3]  for  a  list;  e.g.  GoPubMed:      

Triples:  some  issues:  •  Con*ngent  on  good  NP  &  VP  detec*on  •  Hard  to  parse  text!  E.g.  a  commercial  tool  gave:  insulin    maintaining      glucose  homeostasis      When  insulin  secre*on  cannot  be  increased  adequately  (type  I  diabetes  defect)  to  overcome  insulin  resistance  in  maintaining  glucose  homeostasis,  hyperglycemia  and  glucose  intolerance  ensues.    insulin    may  be  involved      glucose  homeostasis      Because  PANDER  is  expressed  by  pancrea*c  beta-­‐cells  and  in  response  to  glucose  in  a  similar  way  to  those  of  insulin,  PANDER  may  be  involved  in  glucose  homeostasis.  

Triples:  some  progress:  Biological  Expression  Language  [4]:    We  provide  evidence  that  these  miRNAs  are  potenHal  novel  oncogenes  parHcipaHng  in  the  development  of  human  tesHcular  germ  cell  tumors  by  numbing  the  p53  pathway,  thus  allowing  tumorigenic  growth  in  the  presence  of  wild-­‐type  p53.    Increased  abundance  of  miR-­‐372  decreases  ac5vity  of  TP53  r(MIR:miR-372) -| tscript(p(HUGO:Trp53))

Context:  cancer  SET Disease = “Cancer”

Ac5vity  of  TP53  decreases  cell  growth  tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”  

Metadiscourse:  why  it  maTers  

•  Voorhoeve  et  al.,  2006:  “These  miRNAs  neutralize  p53-­‐  mediated  CDK  inhibi*on,  possibly  through  direct  inhibi*on  of  the  expression  of  the  tumor  suppressor  LATS2.”  

•  Kloosterman  and  Plasterk,  2006:  “In  a  gene*c  screen,  miR-­‐372  and  miR-­‐373  were  found  to  allow  prolifera*on  of  primary  human  cells  that  express  oncogenic  RAS  and  ac*ve  p53,  possibly  by  inhibi*ng  the  tumor  suppressor  LATS2  (Voorhoeve  et  al.,  2006).”  

•  Yabuta  et  al.,  2007:    “[On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,  func*on  as  poten5al  novel  oncogenes  in  tes*cular  germ  cell  tumors  by  inhibi*on  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important  tumor  suppressor  (Voorhoeve  et  al.,  2006).”    

•  Okada  et  al.,  2011:  “Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly  inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the  presence  of  p53  (Voorhoeve  et  al.,  2006).”  

“[Y]ou  can  transform  ..  fic*on  into  fact  just  by  adding  or  subtrac*ng  references”,  Bruno  Latour  [5]

Metadiscourse:  some  progress  •  Hedging  cues,  specula*ve  language,  modality/nega*on:  

–  Light  et  al  [6]:  finding  specula*ve  language  – Wilbur  et  al  (Hagit)  [7]:  focus,  polarity,  certainty,  evidence,  and  direc*onality  

–  Thompson  et  al  (Sophia)  [8]:  level  of  specula*on,  type/source  of  the  evidence  and  level  of  certainty      

•  Sen*ment  detec*on  (e.g.  Kim  and  Hovy  [9]  a.m.o.):    –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathema*cal  func*on’  ac*ng  on  main  proposi*onal  content    

•  Can  make  this  part  of  the  seman*c  web:  (e.g.,  Ontology  for  Reasoning,  Certainty  and  ATribu*on,  ORCA  [10]):    –  Value  (Presumed  True,  Probable,  Possible,  Unknown)  –  Source  (Author,  Named  Other,  Unknown)  –  Basis  (Data,  Reasoning,  Unknown)  

Claims  and  Evidence:  some  issues:  •  Data2Seman*cs  [11]:  linking  clinical  guidelines  to  evidence.  

Inconsistency  within  guideline  and  guidelines  v.  evidence:      •  Studies  have  demonstrated  inconsistent  results  regarding  the  use  of  such  

markers  of  inflamma*on  as  C-­‐reac*ve  protein  (CRP),  interleukins-­‐  6  (IL-­‐6)  and  -­‐8,  and  procalcitonin  (PCT)  in  neutropenic  pa*ents  with  cancer  [55–57].    •  [55]:  PCT  and  IL-­‐6  are  more  reliable  markers  than  CRP  for  predic*ng  

bacteremia  in  pa*ents  with  febrile  neutropenia  •  [56]  In  conclusion,  daily  measurement  of  PCT  or  IL-­‐6  could  help  iden5fy  

neutropenic  pa5ents  with  a  stable  course  when  the  fever  lasts  >3  d.  …,    it  would  reduce  adverse  events  and  treatment  costs.    

•  [57]  Our  study  supports  the  value  of  PCT  as  a  reliable  tool  to  predict  clinical  outcome  in  febrile  neutropenia.  

•  Drug  Interac*on  Knowledgebase  [12]:  how  to  iden*fy  evidence?    •  R-­‐citalopram_is_not_substrate_of_cyp2c19:    

•  At  10uM  R-­‐  or  S-­‐CT,  ketoconazole  reduced  reac*on  velocity  to  55  -­‐60%  of  control,  quinidine  to  80%,  and  omeprazole  to  80-­‐85%  of  control  (Fig.  6).    

Claims  and  Evidence:  some  progress  •  Defining  ‘salient  knowledge  components’  in  text:  

– Argumenta*ve  zones,  CoreSC  can  both  be  found  – Blake,  Claim  networks  (more  soon!)  – Claimed  Knowledge  Updates  (Sandor/de  Waard,  [13]):    

 

Perhaps  we  should  start  wri*ng  for  computers?  

•  So  why  doesn’t  the  author  add  this  informa*on?    If  you’re  know  you’re  going  to  mine  it,  why  bury  it?  

•  Authoring  tools  for  en*ty  iden*fica*on:  MS  for  Chemistry,  Math,  proteins;  some  experiments  but  no  solu*on  yet  [14]  

•  Authoring  tool  for  triple  iden*fica*on  (MS  Ac*veText)  •  But  the  ques*on  remains:    

A}er  we’ve  ‘extracted’  all  the  ‘facts’,  what  is  all  the  gunk  that  remains    in  the  filter?    

 

Aristotle   Quin5lian   Scien5fic  Paper  

prooimion   Introduc*on/  exordium  

The  introduc*on  of  a  speech,  where  one  announces  the  subject  and  purpose  of  the  discourse,  and  where  one  usually  employs  the  persuasive  appeal  to  ethos  in  order  to  establish  credibility  with  the  audience.    

Introduc*on:  posi*oning  

prothesis  Statement  of  

Facts/narraHo  

The  speaker  here  provides  a  narra*ve  account  of  what  has  happened  and  generally  explains  the  nature  of  the  case.    

Introduc*on:  research  ques*on  

    Summary/  proposHHo  

The  proposi*o  provides  a  brief  summary  of  what  one  is  about  to  speak  on,  or  concisely  puts  forth  the  charges  or  accusa*on.     Summary  of  contents  

pis*s   Proof/  confirmaHo  

The  main  body  of  the  speech  where  one  offers  logical  arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results  

    Refuta*on/  refutaHo  

As  the  name  connotes,  this  sec*on  of  a  speech  was  devoted  to  answering  the  counterarguments  of  one's  opponent.   Related  Work  

epilogos   peroraHo    Following  the  refuta*o  and  concluding  the  classical  ora*on,  the  perora*o  conven*onally  employed  appeals  through  pathos,  and  o}en  included  a  summing  up.  

Discussion:  summary,  implica*ons.  

Perhaps  we  should  explain:  a  paper  is  rhetorical?  

-   goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host  -   format  has  co-­‐evolved:  predator-­‐prey  rela*onship  with  reviewers  

Story Grammar The Story of Goldilocks and the Three Bears

Setting Time Once upon a time

Character a little girl named Goldilocks

Location She went for a walk in the forest. Pretty soon, she came upon a house.

Theme Goal She knocked and, when no one answered,

Attempt she walked right in.

Episode Name At the table in the kitchen, there were three bowls of porridge.

Subgoal Goldilocks was hungry.

Attempt She tasted the porridge from the first bowl.

Outcome This porridge is too hot! she exclaimed.

Attempt So, she tasted the porridge from the second bowl.

Outcome This porridge is too cold, she said

Attempt So, she tasted the last bowl of porridge.

Outcome Ahhh, this porridge is just right, she said happily and

Outcome she ate it all up.

Paper Grammar

The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

Objects of study

the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

Research���goal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

Hypothesis Atx-1 may play a role in the regulation of gene expression

Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Subgoal test the function of the AXH domain

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

Data (data not shown),

Results both genotypes show many large holes and loss of cell integrity at 28 days

Data (Figures 1B-1D).

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

Data (Figure 1F),

Perhaps  we  should  explain:  a  paper  is  a  story?  

A  closer  look  at  verb  tense:  Conceptual realm: ‘state’ (gnomic) present •  ‘Dopaminergic innervation plays a major role in the control of mood

and its perturbation’ Experimental realm: ‘event’ past •  ‘Four out of seven cell lines expressed this cluster’, •  ‘Adult rats were individually housed for 2 days before testing.’

Argumentational realm: ‘instantaneous’ present; to-infinitive •  ‘These results suggest that...’, •  ‘To identify these mechanisms…’

Discourse progression: ‘instantaneous’ present •  ‘Fig 2a shows that’ •  ‘see figure 7A’,

Reference to other work: present perfect - ‘finalised’ past •  ‘Previous work has demonstrated that VPCs are sensitive to the

levels of let-60/RAS (Han and Sternberg, 1990).’  

Facts  in  the  eternal  present  

Endogenous  small  RNAs  (miRNAs)  regulate  gene  expression  by  mechanisms  conserved  across  metazoans.  

I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.  Queen  of  the  immortals  is  she,  surpassing  all  in  beauty:  she  is  the  sister  and  the  wife  of  loud-­‐thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the  blessed  throughout  high  Olympus  reverence  and  honor.  

Events  in  the  simple  past  

Vehicle-­‐treated  animals  spent  equivalent  *me  inves*ga*ng  a  juvenile  in  the  first  and  second  sessions  in  experiments  conducted  in  the  NAC  and  the  striatum:    T1  values  were  122  ±  6  s  and  114  ±  5  s.  

Now  the  wooers  turned  to  the  dance  and  to  gladsome  song,  and  made  them  merry,  and  waited  *ll  evening  should  come;  and  as  they  made  merry  dark  evening  came  upon  them.  

Events  with  embedded  facts  

We  also  generated  BJ/ET  cells  expressing  the  RASV12-­‐ERTAM  chimera  gene,  which  is  only  ac*ve  when  tamoxifen  is  added  (De  Vita  et  al,  2005).  

And  she  took  her  mighty  spear,  *pped  with  sharp  bronze,  heavy  and  huge  and  strong,  wherewith  she  vanquishes  the  ranks  of  men-­‐of  warriors,  with  whom  she  is  wroth,  she,  the  daughter  of  the  mighty  sire.  

AMribu5on  in  the  present  perfect  

miRNAs  have  emerged  as  important  regulators  of  development  and  control  processes  such  as  cell  fate  determina*on  and  cell  death  (Abrahante  et  al.,  2003,  Brennecke  et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,  2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,  1993]  

In  this  book  I  have  had  old  stories  wriTen  down,  as  I  have  heard  them  told  by  intelligent  people,  concerning  chiefs  who  have  held  dominion  in  the  northern  countries,  and  who  spoke  the  Danish  tongue;  and  also  concerning  some  of  their  family  branches,  according  to  what  has  been  told  me.  

Implica5ons  are  hedged,  and  in  the  present  tense  

These  results  indicate  that  although  miR-­‐372&3  confer  complete  protec*on  to  oncogene-­‐induced  senescence  in  a  manner  similar  to  p53  inac*va*on,  the  cellular  response  to  DNA  damage  remains  intact  

Now  it  is  said  that  ever  since  then  whenever  the  camel  sees  a  place  where  ashes  have  been  scaTered,  he  wants  to  get  revenge  with  his  enemy  the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to  get  the  rat  

Tense  use  in  science  and  mythology:  

Some  conclusions:  •  How  we  read:  surface  code,  textbase,  situa*on  model  •  Useful  components:  find  noun  phrases,  triples,  

metadiscourse,  claims  and  evidence    •  Computers  keep  ge�ng  beTer  at  iden*fying  these  •  Authoring  tools  might  let  us  help  computers  •  But  for  the  forseeable  future,  scien*sts  will  con*nue  to  

need  to  scan  the  literature  to  understand  and  believe  science  and  make  connec*ons  between  knowledge  

•  To  achieve  progress,  perhaps  focus  less  on  what  computers  can  do  and  more  on  how  humans  communicate?  

•  Let’s  pursue  collabora*ons  with  linguists,  cogni*ve  psychologists  etc.  on  how  we  read  and  learn!  

Acknowledgements  •  Funding:    

–  Elsevier  Labs  –  NWO  

•  Collaborators:    –  Henk  Pander  Maat,  UU  –  Agnes  Sandor,  XRCE  –  Jodi  Schneider,  DERI  –  Rinke  Hoekstra  &  co,  VU  –  Richard  Boyce  &  co,  UpiT  – Maria  Liakata,  EBI  –  Sophia  Ananiadou  &  co,  NaCTeM  

 

•  Discussion  partners:    –  Phil  Bourne,  UCSD  –  Ed  Hovy,    –  Gully  Burns,  ISI  –  Joanne  Luciano,  RPI  –  Tim  Clark  et  al.,  Harvard  

 …  and  all  of  you  J!  

Ques*ons?    

 Anita  de  Waard  

[email protected]  hTp://elsatglabs.com/labs/anita/    

References  [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hTp://dx.doi.org/10.1136/jamia.2010.003947    [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Iden*fying  important  concepts  from  medical  documents,  Journal  of  Biomedical  Informa*cs  39  (2006)  668–679  [3]  Useful  list  of  resources  in  bioinforma*cs  hTp://www.bioinforma*cs.ca/  [4]  Biological  Expression  Language  –  hTp://www.openbel.org    [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construc*on  of  Scien*fic  Facts,  1979,  Sage  Publica*ons  [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  specula*ons,  and  statements  in  between.  BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.  [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direc*ons  in  biomedical  text  annota*ons:  defini*ons,  guidelines  and  corpus  construc*on.  BMC  Bioinforma*cs  2006,  7:356.  [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.  Proc.  LREC  2008  Wkshp  Building  and  Evalua*ng  Resources  for  Biomedical  Text  Mining  2008.  [9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Sen*ment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,  2004.    [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  ATribu*on  (ORCA),  Seman*c  Technologies  Applied  to  Biomedical  Informa*cs  and  Individualized  Medicine  workshop  at  ISWC  2012  (submibed)  [11]  Data2Seman*cs  project:  hTp://www.data2seman*cs.org/    [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compu*ng  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy  oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also  hTp://dbmi-­‐icode-­‐01.dbmi.piT.edu/dikb-­‐evidence/front-­‐page.html    [13]  Sándor,  Àgnes  and  de  Waard,  Anita,  (2012).  Iden*fying  Claimed  Knowledge  Updates  in  Biomedical  Research  Ar*cles,  Workshop  on  Detec*ng  Structure  in  Scholarly  Discourse,  ACL  2012.    [14]  See  e.g.  hTp://ucsdbiolit.codeplex.com/  and  hTp://research.microso}.com/en-­‐us/projects/ontology/  for  MS  Word  ontology  add-­‐ins  

Appendix:  ORCA  

Logical  structure  of  epistemic  evalua*ons:  

For  a  Proposi*on  P,  an  epistemically  marked  clause  E  is  an  evalua*on  of  P,    where    EV,  B,  S(P),  with:  

–  V  =  Value:  3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,    (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)  

–  B  =  Basis:  Reasoning  Data    

–  S  =  Source:  A  =  speaker  is  author  A,  explicit  IA  =  speaker  author,  A,  implicit  N  =  other  author  N,  explicit  NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,    

InformaHon  Sciences  InsHtute  University  South  Califormia  

Adding  Epistemic  Evalua*on  Claim   ORCA  Value  

Together,  Lats2  and  ASPP1  shunt  p53  to  proapopto*c  promoters  and  promote  the  death  of  polyploid  cells  [1].  (…)    

Value  =  3  Source  =  N  Basis  =  0    

Further  biochemical  characteriza*on  of  hMOBs  showed  that    only  hMOB1A  and  hMOB1B  interact  with  both  LATS1  and  LATS2  in  vitro  and  in  vivo  [39].  (…)    

Value  =  3  Source  =  N  Basis  =  Data      

Our  findings  reveal  that  miR-­‐373  would  be  a  poten*al  oncogene  and  it  par*cipates  in  the  carcinogenesis  of  human  esophageal  cancer  by  suppressing  LATS2  expression.        

Value  =  1  or  2  ?  Source  =  Author  Basis  =  Data      

Furthermore,  we  demonstrated  that  the  direct  inhibi*on  of  LATS2  protein  was  mediated  by  miR-­‐373  and  manipulated  the  expression  of  miR-­‐373  to  affect  esophageal  cancer  cells  growth.      

Value  =  2  (or  3?)  Source  =  Author  Basis  =  Data      

Textual  Markers  •  Modal  auxiliary  verbs  (e.g.  can,  could,  might)    •  Qualifying  adverbs  and  adjec*ves  (e.g.  interesHngly,  possibly,  likely,  potenHal,  somewhat,  slightly,  powerful,  unknown,  undefined)  

•  References,  either  external  (e.g.  ‘[Voorhoeve  et  al.,  2006]’)  or  internal  (e.g.  ‘See  fig.  2a’).    

•  Repor*ng/epistemic  verbs  (e.g.  suggest,  imply,  indicate,  show)    –  either  within  the  clause:  ‘These  results  suggest  that...’    –  or  in  a  subordinate  clause  governed  by  repor*ng-­‐verb  matrix  clause  ‘{These  results  suggest  that}  indeed,  this  represents  the  true  endogenous  acHvity.’  

Markers  v.  Types:  1  paper,  640  segments  Value   Modal  

Aux    Repor5ng  Verb  

Ruled  by  RV  

Adverbs/Adjec5ves  

References  

None   Total    

Total  value  =  3   1  (0.5%)   81  (40%)   24  (12%)   7  (4%)   41  (20%)   47  (24%)  201(100%)  

Total  Value  =  2   29  (51%)   23  (40%)   1  (2%)   4(7%)   57(100%)  

Total  Value  =  1   9(27%)   11(33%)   11(33%)   1(3%)   1(3%)   33(100%)  

Total  Value  =  0   9  (64%)   3  (21%)   1(7%)   1(7%)   14(100%)  

Total  No  Modality   16(37%)   3(7%)   0   3(7%)   22(50%)   44(100%)  

Overall  Total   10  (2%)   146(23%)   64(10%)   10(2%)   50(8%)   69(11%)  640(100%)  

Most  prevalent  clause  type:    “These  results  suggest  that...”  

Adverb/Connec*ve   thus,  therefore,  together,  recently,  in  summary    

Determiner/Pronoun     it,  this,  these,  we/our  

Adjec*ve   previous,  future,  beber  

Noun  phrase   data,  report,  study,  result(s);  method  or  reference  

Modal   form  of    ‘to  be’,  may,  remain  

Adjec*ve   oken,  recently,  generally  

Verb   show,  obtain,  consider,  view,  reveal,  suggest,  hypothesize,  indicate,  believe  

Preposi*on     that,  to  

Repor*ng  verbs  vs.  epistemic  value:  Value  =  0  (unknown)  

establish,  (remain  to  be)  elucidated,    be  (clear/useful),  (remain  to  be)  examined/determined,  describe,  make  difficult  to  infer,  report  

Value  =  1  (hypothe*cal)  

be  important,  consider,  expect,  hypothesize  (5x),  give  insight,  raise  possibility  that,  suspect,  think  

Value  =  2  (probable)  

appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a  role,  represent,  suggest  (18x),  validate  (2x),    

Value  =  3  (presumed  true)  

be  able/apparent/important  /posi*ve/visible,  compare  (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),  discover,  display  (3x),  eliminate,  find  (3x),  iden*fy  (4x),  know,  need,  note  (2x),  observe  (2x),  obtain  (success/results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),  see(2x),  show(24x),    study,  view