gene extrapolation models for toxicogenomic data

12
gene EXTRAPOLATION models TOXICOGENOMIC for data daniel gusenleitner nacho caballero

Upload: nacho-caballero

Post on 09-May-2015

339 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Gene Extrapolation Models for Toxicogenomic Data

geneEXTRAPOLATION

modelsTOXICOGENOMIC

for

datadaniel gusenleitner

nacho caballero

Page 2: Gene Extrapolation Models for Toxicogenomic Data

Testing for carcinogenicity is costly

Page 3: Gene Extrapolation Models for Toxicogenomic Data

Genes show clustered responses

Expressioncorrelatesbetween platforms

Page 4: Gene Extrapolation Models for Toxicogenomic Data

2K A

rray

s

11000 Genes

1KLandmark

Genes

10KRegularGenes

We want to extrapolate the expression of regular genes

Page 5: Gene Extrapolation Models for Toxicogenomic Data

1KLandmark

Genes

Expression Gene 1 = X1β1 + X2β2 + …+ X2Kβ2K

X

Predicted Expression = Xβ

Expression Gene 2 = X1β1 + X2β2 + …+ X2Kβ2K

Expression Gene 10K = X1β1 + X2β2 + …+ X2Kβ2K

2K A

rray

s

We fit a linearmodel to eachregular gene

Page 6: Gene Extrapolation Models for Toxicogenomic Data

http://cran.r-project.org/web/packages/glmnet/index.html

Elastic Net

mean e

rror

number of variables

glmnet: Lasso and elastic-net regularized generalized linear models

Page 7: Gene Extrapolation Models for Toxicogenomic Data

Neural Networks

landmarkgenes

hidden layer

regular genes

http://cran.r-project.org/web/packages/nnet/index.html

nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models

Page 8: Gene Extrapolation Models for Toxicogenomic Data

signal-to-

noise

intensity standard deviationextrapolation mean errorSNR =

mean fluorescent intensity

Inte

nsi

ty v

ari

ati

on

rati

o

Page 9: Gene Extrapolation Models for Toxicogenomic Data

Building 10451 models takes a long time…

total runtime

runtime per

model

single CPU

runtime

linear regression

120 x 3 h 2 min 360 h

elasticnet

120 x 16 h 11 min 1920 h

neural network

50 x 0.75 h 45 min 7800 h ?

Page 10: Gene Extrapolation Models for Toxicogenomic Data

Signal-to-Noise ComparisonE-net LM NN

ENSRNOG00000013133 135.58 19.62 13.21ENSRNOG00000011861 209.82 28.82 12.08ENSRNOG00000033466 190.81 26.58 11.86ENSRNOG00000036816 197.82 23.09 9.93ENSRNOG00000003515 273.62 29.35 8.68ENSRNOG00000002254 53.43 8.83 7.21ENSRNOG00000031266 76.19 8.19 6.70ENSRNOG00000005963 145.06 6.99 6.49ENSRNOG00000008613 38.86 3.97 6.07ENSRNOG00000023095 13.57 2.70 5.98ENSRNOG00000020947 17.27 2.41 5.04ENSRNOG00000007258 103.77 13.71 4.91ENSRNOG00000019813 16.53 3.01 4.68ENSRNOG00000014232 61.69 9.17 4.05ENSRNOG00000002454 50.71 5.58 3.80ENSRNOG00000018201 5.04 1.64 3.39

Page 11: Gene Extrapolation Models for Toxicogenomic Data

The elastic net outperforms standard linear regression

Sig

nal-

to-n

ois

e r

ati

o

Elastic Net

Linear Regression

Page 12: Gene Extrapolation Models for Toxicogenomic Data

Additional feature selection

Performance of extrapolation models on carcinogenicity classifiers

Correlation between Luminex and Affymetrix chips