phd thesis presentation

Next-generation text-mining applied to toxicogenomics data

analysis

Kristina Hettne

PhD thesis defense

20 December, 2012

Toxicogenomics: study if a chemical causes

damage to genes

Text mining: teach a computer to “read”

articles and extract explicit information

Next-generation text mining: teach a

computer to find implicit information in

articles

Image source: The Independent, July 12, 2012

Drug safety is essential! But… how to minimize animal testing?

Toxicogenomics data Interpretation using knowledge from manually curated databases

Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/

Toxicogenomics data Interpretation using knowledge from manually curated databases

Not sufficient in coverage

We hypothesize that next-generation text mining

can increase the information coverage Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/

Information cloud for a chemical concept

Information cloud for a gene concept Shared concepts

7

Next-generation text mining = concept profile matching

Image source: Herman van Haagen

Concepts come from a thesaurus and are identified in text with concept identification software

A good thesaurus = the basis for good concept identification

Image source: Herman van Haagen

9

Research objectives: • Investigate information coverage in public

biomedical and chemical thesauri and databases

• Provide methods to improve the quality and coverage

• Give recommendations for use • Investigate added value of next-

generation text mining when interpreting toxicogenomics data

10

Results

11

A thesaurus of chemical concepts1 and methods1,2,3 to prepare a thesaurus to be used with concept identification software

1. Hettne et al. Bioinformatics, 2009 2. Hettne et al. Journal of Biomedical Semantics, 2010 3. Hettne et al. Journal of Cheminformatics, 2010

http://www.biosemantics.org/casper http://www.biosemantics.org/jochem

12

A next-generation text mining-based method for interpreting biological data

Biological data Statistical test Next-generation text mining

This method gives more, and more specific results1 than other available tools

1. Jelier R, Goeman JJ, Hettne KM, Schuemie MJ, den Dunnen JT, 't Hoen PA. Briefings in Bioinformatics, 2011

http://www.biosemantics.org/weightedglobaltest

Application to toxicogenomics

http://www.biosemantics.org/index.php?page=chemicalresponse-specific-gene-sets

Hettne et al. (submitted)

Image sources1. Verhallen and Piersma, 2011, 2. De Jong et al 2012

See developmental defects in stem cells instead of in animal embryos

A) Control group rat embryo B)Triazole-exposed rat embryo

2.

1. Embryonic structure

Posterior neuropore open

Toxicity class prediction (case study: Triazoles)

1. Chemical

Image source 1: Verhallen and Piersma, 2011

25 times larger chemical-gene matrix compared to manual work (Comparative Toxicogenomics Database)

Next-generation text mining combined with

statistical tests complements, and is

sometimes superior to, manually curated

databases in:

- Relating chemical information to gene

expression data

- Identifying toxic effects already at the

gene expression stage

- Discriminating between different classes

of chemicals

Conclusions

2. Apply the method for new drugs

with unknown toxicity

Early prediction of toxicity -> less animal testing and safer drugs

1. Make the method easier to use

(currently being worked on)

Future

Thank you to all who made

this possible!

phd thesis presentation

Documents

chemical information

genestext mining

chemical thesauri

toxicogenomics data9

implicit information

good thesaurus

jong et

gene expression data