genomic signal processing: ensemble dependence model for classification and prediction of cancer...

19
Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering Frontiers 26 Apr 07

Post on 20-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Genomic Signal Processing:Ensemble Dependence Model for

Classification and Prediction of Cancer Based on Gene Expression Data

Joseph DePasquale

Engineering Frontiers

26 Apr 07

Page 2: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Overview

• Motivation

• Background– Genes, Cancer, DNA Microarrays

• Ensemble Dependence Model– Basic structure– Inclusion in a classification system

• Results

• Conclusions

Page 3: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Motivation

• Estimated 1.4 million new cases of cancer– Roughly 550,000 will die from their disease

• In New Jersey 43,910 new cases – 17,720 deaths

• In 2005, NIH estimates that the overall cost for cancer → 210 billion dollars

Page 4: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Background

• What is cancer?– Uncontrolled division of damaged cells

• Apoptosis

– Risk increases with age

• Cause of unregulated cell growth

Page 5: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Background

• What is a gene?– Components– Functionality

• What is the importance of protein?– Essential to all living things– Participate in all functions within cells

• What is the significance of gene products?

Page 6: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

DNA Microarrays

• Expression profiling– Represents the simultaneous activity of

thousands of individual genes

• Publicly available data– Complexity has led to a need for the

standardization of experimental setup• MIAME• MAQC

Page 7: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Taken from: http://en.wikipedia.org/wiki/DNA_microarray

Page 8: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Ensemble Dependence Model

• Genes with similar expression profiles are combined together into clusters– Expression profile of each cluster is the

average profile of all genes in that cluster

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Page 9: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Ensemble Dependence Model

4

3

2

1

4

3

2

1

434241

343231

242321

141312

4

3

2

1

*

0

0

0

0

n

n

n

n

x

x

x

x

aaa

aaa

aaa

aaa

x

x

x

x

NAXX

Page 10: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Ensemble Dependence Model

• Model-driven method– Feature selection

• Not all genes are relevant• T-test

– Gene clustering• Number of clusters• Gaussian mixture model

– Model learning/classification• Dependence matrices generated for two cases

Page 11: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Classification

• Maximum likelihood rule– Binary hypothesis-testing problem– Tests fit of unknown samples to each model

)(*)(5.0|)|)2log((5.0)|Pr( 11 CCC

TCCC

k MXAXVMXAXVHX

Normal Case:

Cancer Case:

)(*)(5.0|)|)2log((5.0)|Pr( 10 NNN

TNNN

k MXAXVMXAXVHX

Page 12: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

EDM-Based Cancer Classification

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Page 13: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Results

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Page 14: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

ResultsHere, 200 different subsets of gastric data are used to calculate 200

different dependence matrices, eigenvalues of these matrices are plotted

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

NAXX

Page 15: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Results

Eigenvalues = {1, 1, 1, -3} NAXX

01

01

010

3

2

3

1

3

2

3

2

1

2

1

3

1

2

1

321

idealA

Page 16: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Results

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Page 17: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

In Summary

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Page 18: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

Conclusions

• EDM is a model-based system that is used for cancer classification and prediction based on publicly available gene expression data– Dependence of clusters to other clusters

• Classification results are comparable with widely accepted ML algorithm

• Eigenvalues of dependence matrix could be a valuable cancer prediction tool

Page 19: Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering

References[1] P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification

and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

[2] P. Qui, Z. J. Wang, and K.J.R. Liu. “Ensemble dependence model for classification and prediction of cancer and normal gene expression data,” Bioinformatics, vol. 21, no. 14, pp. 3114-3121, May 2005.

[3] D. Anastassiou. “Genomic Signal Processing,” IEEE Signal Processing Magazine, vol. 18, no. 4, pp. 8-20, July 2001.

[4] J. Astola, I. Tabus, I. Shmelevich, and, E. Dougherty. “Genomic Signal Processing,” Signal Processing (Elsevier), vol. 83, pp. 691-694, 2003.

[5] American Cancer Society. “Cancer Facts and Figures 2006,” ACS :: Statistics for 2006 [Online]. Available: http://www.cancer.org/downloads/STT/CAFF2006PWSecured.pdf

[6] http://en.wikipedia.org/wiki/Gene[7] http://en.wikipedia.org/wiki/Gene_expression[8] http://en.wikipedia.org/wiki/Protein[9] http://en.wikipedia.org/wiki/DNA_microarray[10] M. Karnick. “Genomic Signal Processing,” Engineering Frontiers, The presentation

directly previous to mine, Apr 2007.