phd thesis presentation
DESCRIPTION
TRANSCRIPT
Next-generation text-mining applied to toxicogenomics data
analysis
Kristina Hettne
PhD thesis defense
20 December, 2012
Toxicogenomics: study if a chemical causes
damage to genes
Text mining: teach a computer to “read”
articles and extract explicit information
Next-generation text mining: teach a
computer to find implicit information in
articles
Image source: The Independent, July 12, 2012
Drug safety is essential! But… how to minimize animal testing?
Toxicogenomics data Interpretation using knowledge from manually curated databases
Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/
Toxicogenomics data Interpretation using knowledge from manually curated databases
Not sufficient in coverage
We hypothesize that next-generation text mining
can increase the information coverage Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/
Information cloud for a chemical concept
Information cloud for a gene concept Shared concepts
7
Next-generation text mining = concept profile matching
Image source: Herman van Haagen
Concepts come from a thesaurus and are identified in text with concept identification software
A good thesaurus = the basis for good concept identification
Image source: Herman van Haagen
9
Research objectives: • Investigate information coverage in public
biomedical and chemical thesauri and databases
• Provide methods to improve the quality and coverage
• Give recommendations for use • Investigate added value of next-
generation text mining when interpreting toxicogenomics data
10
Results
11
A thesaurus of chemical concepts1 and methods1,2,3 to prepare a thesaurus to be used with concept identification software
1. Hettne et al. Bioinformatics, 2009 2. Hettne et al. Journal of Biomedical Semantics, 2010 3. Hettne et al. Journal of Cheminformatics, 2010
http://www.biosemantics.org/casper http://www.biosemantics.org/jochem
12
A next-generation text mining-based method for interpreting biological data
Biological data Statistical test Next-generation text mining
This method gives more, and more specific results1 than other available tools
1. Jelier R, Goeman JJ, Hettne KM, Schuemie MJ, den Dunnen JT, 't Hoen PA. Briefings in Bioinformatics, 2011
http://www.biosemantics.org/weightedglobaltest
Application to toxicogenomics
http://www.biosemantics.org/index.php?page=chemicalresponse-specific-gene-sets
Hettne et al. (submitted)
Image sources1. Verhallen and Piersma, 2011, 2. De Jong et al 2012
See developmental defects in stem cells instead of in animal embryos
A) Control group rat embryo B)Triazole-exposed rat embryo
2.
1. Embryonic structure
Posterior neuropore open
Toxicity class prediction (case study: Triazoles)
1. Chemical
Image source 1: Verhallen and Piersma, 2011
25 times larger chemical-gene matrix compared to manual work (Comparative Toxicogenomics Database)
Next-generation text mining combined with
statistical tests complements, and is
sometimes superior to, manually curated
databases in:
- Relating chemical information to gene
expression data
- Identifying toxic effects already at the
gene expression stage
- Discriminating between different classes
of chemicals
Conclusions
2. Apply the method for new drugs
with unknown toxicity
Early prediction of toxicity -> less animal testing and safer drugs
1. Make the method easier to use
(currently being worked on)
Future
Thank you to all who made
this possible!