semantic relations for interpreting dna microarray data and for novel hypotheses generation
DESCRIPTION
Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation. Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 B orut Peterlin, 2 MD PhD , Thomas C Rindflesch, 3 PhD 1 Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia - PowerPoint PPT PresentationTRANSCRIPT
1
Semantic Relations for Interpreting DNA Microarray Data
and for Novel Hypotheses Generation
Dimitar Hristovski,1 PhD, Andrej Kastrin,2 Borut Peterlin,2 MD PhD, Thomas C Rindflesch,3 PhD
1Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia
2Institute of Medical Genetics, University Medical Centre, Ljubljana, Slovenia3National Library of Medicine, National Institutes of Health, Bethesda, MD,
U.S.A.
e-mail: [email protected]
2
Introduction
Microarray experiments:
• great potential to support progress in biomedical research,
• results NOT EASY to interpret,
• information about functions and relations of relevant genes needs to be extracted from the vast biomedical literature
Related Work
• Text mining and microarray analysis
• Literature-based Discovery
4
Proposed Solution
• Computerized text analysis system• Extract semantic relations from literature
– SemRep
• Integrate with microarray experiments• Develop tools for:
– Interpretation– Novel hypotheses generation
Overall Design
Medline GEO
SemRepSem.rels Extraction
R Bioconductorscripts
Integrated Database=semantic relations +
microarrays
Interpretation & Discovery Tools
semantic relationsmicroarra
ys
SemRep
• Extracts semantic relations from biomedical text (implemented in Prolog)
• Based on UMLS Metathesaurus and Semantic Network– <MetaConc> SEMNET RELATION <MetaConc>
• Database of relations extracted from MEDLINE– 6.7M citations (01/01/1999 through 03/31/2009)– 43M sentences– 21M relation instances– 7M relation types
6
7
Semantic Relations Extracted
• Wide range of relations in:– Clinical medicine– Molecular genetics– Pharmacogenomics
• Genetic Etiology: associated_with, predisposes, causes• Substance Relations: interacts_with, inhibits, stimulates • Pharmacological Effects: affects, disrupts, augments • Clinical Actions: administered_to, manifestation_of, treats, • Organism Characteristics: location_of, part_of, process_of • Co-existence: co-exists_with
8
Examples
• “… the loss of Mbd1 could lead to autism-like behavioral phenotypes …”
• Relation: MDB1 causes Autistic Disorder • “… Mbd1 can directly regulate the
expression of Htr2c, one of the serotonin receptors, …”
• Relation: MBD1 interacts_with HTR2C
10
Interpretation of Microarrays
Find known facts from the literature:
• Desease related:– Associated genes– Current treatments– …
• Microarray Genes:– Relations between genes (INHIBITS, STIMULATES, …)– Relations between the genes and anything else
Relations with “Parkinson” as Argument?
What Treats Parkinson?
What (causes, associated_with) Parkinson?
Sentences from which Relations are Extracted
Genes from the Microarray Related to Anything?
16
Novel Hypotheses Generation
• Based on discovery patterns
• Discovery patterns:– search templates that have a higher likelihood of
returning a new discovery
• Specific discovery patterns for specific discovery tasks
17
Discovery Patterns
• Inhibit the upregulated:– Search for substances, genes, ... which, according to the
literature, inhibit the top N (e.g. 300) genes that are upregulated on a given microarray
– Such substances, genes, … might be used to regulate the upregulated genes
• Stimulate the downregulated:– Search for substances, genes, ... which, according to the
literature, stimulate the top N (e.g. 300) genes that are downregulated on a given microarray
– Such substances, genes, … might be used to regulate the downregulated genes
Discovery Patterns – Graphical View
Disease X
Maybe_Treats2?
Upregulated
Downregulated
Genes Y1
Genes Y2
Drug Z1
(or substance)
Drug Z2
(or substance)
Inhibits
Stimulates
Maybe_Treats1?
Microarray Literature
19
Results – Inhibit the Upregulated
Paclitaxel INHIBITS HSPB1|HSPB1 protein
Paclitaxel completely inhibited the expression of HSP27 (PMID: 15304155)
Quercetin INHIBITS HSPB1|HSPB1 gene
Quercetin …, inhibited the expression of both HSP70 and HSP27 (PMID: 12926076)
•Parkinson microarray GSE8397
•HSP27 (HSPB1) gene is upregulated on the microarray
•We identified paclitaxel and quercetin as substances that inhibit the expression of this gene
Inhibit the Upregulated
21
Results – Stimulate the
Downregulated• NR4A2 downregulated on the microarray• We found out that:
– Pramipexol stimulates expression of NR4A2 – NR4A2 is associated with Parkinson disease
pramipexol STIMULATES NR4A2
… the increase of Nurr1 gene expression induced by PRX, ... (PMID: 15740846)
… the induction of Nurr1 gene expression by PRX ... (PMID: 15740846)
NR4A2 ASSOCIATED_WITH
Parkinson Disease
… lower levels of NURR1 gene expression were associated with significantly increased risk for PD (PMID: 18684475)
Explaining a Relation - Closed Discovery
Closed Discovery – Aligned Relations
Evaluation• Estimate – based on [Masseroli, BMC Bioinformatics
2006]:• Extract known facts – baseline precision on 2,042
extracted relations:– Gene – Disease (causes, assoc_with, …) P=74.2%– Gene – Gene (inhibits, stimulates, …) P=41.95%
• Propose Argument-Predicate distance for filtering (Gene-Gene):– At distance no more than 1: P=70.75%; R=43.6%– At distance no more than 2: P=55.88%; R=66.28%
• We use Argument-Predicate distance for ranking of semantic relations and we show relations more likely to be correct first.
25
Conclusion
• A new bioinformatics tool for interpretation and novel hypotheses generation
• Based on integration of semantic relations extracted from literature with microarrays
• Available at:
• http://sembt.mf.uni-lj.si
Syntactic Processing
Mbd1 can directly regulate the expression of Htr2c• MedPost tagger and shallow parser[ NP[head([… inputmatch(mdb1),tag(noun)])], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… inputmatch(htr2c),tag(noun)])] ]
26
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)])], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358)])] ]
27
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
28
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
29
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
• Apply indicator rule: Verb(regulate) INTERACTS_WITH
30
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
• Apply indicator rule: Verb(regulate) INTERACTS_WITH
31
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
• Apply indicator rule: Verb(regulate) INTERACTS_WITH
• Substitute concepts for semantic types:
32
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
• Apply indicator rule: Verb(regulate) INTERACTS_WITH
• Substitute concepts for semantic types:
33
Semantic Processing
• Identify concepts: MetaMap and ABGene
[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...
[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...
NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]
• Match semantic type patterns to ontology:
<gngm> INTERACTS_WITH <gngm>
• Apply indicator rule: Verb(regulate) INTERACTS_WITH
• Substitute concepts for semantic types:
MBD1 INTERACTS_WITH HTR2C
34