Transcript
Page 1: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES

Nov. 6th, 2010

YOO-AH KIMNIH / NLM / NCBI

Page 2: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Complex Diseases

Associated with the effects of multiple genesAs opposed to single gene diseases

The combination of genomic alteration may vary strongly among different patients

Dysregulating the same components, thus often leading to the same disease phenotype

Difficult to study and TreatCancer, Heart diseases, Diabetes, etc.

Page 3: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Copy Number Variations

Two copies of each gene are generally assumed to be present in a genome

Genomic regions may be deleted or duplicated causing CNV

Some CNVs are associated with susceptibility or resistance to diseases such as cancer

Copy Number Variations in 158 Glioblastoma patients

Page 4: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Identifying Genomic Causes in Complex Diseases

Identify genotypic causes in individual patients as well as dysregulated pathways

Systems biology approachGenome-wide searchGraph theoretic algorithms

Circuit flowSet cover

158 Glioblastoma multiforme patients

Page 5: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Glioblastoma multiforme (GBM)

the most common and most aggressive type of primary brain tumor in humans

Page 6: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Expression as Quantitative Trait

Genotype:Copy number variations

Phenotype:Gene expression

Page 7: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

eQTL (expression Quantitative Trait Loci) Analysis

While we assume that the genetic variation is the cause and expression change is the effect, we don’t know molecular pathways behind the relation

Putative target gene Putative causal gene/loci

Page 8: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

Page 9: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Target Gene Selection

Select a representative set of disease genes Filter differentially expressed genes

for each case Multi-set cover

Gene 1 Gene 2 Gene 3

.

.

.

.

.

Controls Disease Cases

Gene Expression

Page 10: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and

target genes

eQTL

casestarget genes

tag Loci

cases

Page 11: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Finding Candidate Causal Genes

Genotypic Variations Target Genes

Page 12: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Finding Candidate Causal Genes

?

Genotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

Page 13: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Finding Candidate Causal Genes

Genotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

D

Interaction Network

protein-protein interactions phosphorylation eventstranscription factor interactions.

Page 14: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Finding Candidate Causal GenesGenotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

u

v

D

Current flow

+-

Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2

Interaction Network

Page 15: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Finding Candidate Causal GenesGenotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

D

Current flow

+-

Compute the amount of current entering each causal gene by solving a system of linear equations

Interaction Network

Page 16: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

Page 17: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Page 18: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Page 19: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

WEIGHT

Page 20: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Final Causal Gene Selection

Find a smallest set of genes covering (almost) all cases at least k’ times minimum weighted multi-set cover

Page 21: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Dysregulated Pathways

Causal paths between a target and a causal gene a maximum current path

C1

C2C3

C4

C5

D

Page 22: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Selected Causal Genes

Number of Genes Overlap with GBM genes

Step B: eQTL 16056 0.56 (75)

Step C: Circuit flow 701 0.045 (10)

Step D: Set cover 128 4.7 10-4 (6)

Page 23: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Results

128 causal genes from set cover (STEP D)

701 candidate causal gene from circuit flow algorithm (STEP C)

Page 24: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Causal Genes

BSOSC Review, November 2008

P-value GenesGlioma 0.008 PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN

Cell cycle 0.028 MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1p53 signaling pathway 0.030 CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN

Proteasome 0.026 PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4

Functional analysis using DAVID

The selected causal gene set includes many known cancer implicated genes

Page 25: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

PTEN as causal gene

fold change- 0 +

TF-DNAprotein-protein

kinase

TF

causalgenes

Page 26: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

EGFR as causal and target gene

fold change- 0 +

kinase

TF

causalgenes

TF-DNAprotein-protein

phosphorylation

Causal EGFR

Target EGFR

Page 27: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Conclusion

A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover

Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate

nodes on molecular pathways Our method can be applied to any disease system where

genetic variations play a fundamental causal role

Page 28: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Acknowledgements

Teresa M. Przytycka Stefan Wuchty

Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng

Page 29: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

Page 30: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases
Page 31: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

EGFR as causal and target geneCAU

SAL PATHS

fold change- 0 +

kinase

TF

causalgenes

TF-DNAprotein-protein

phosphorylation

causal EGFR

target EGFR

Page 32: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

PTEN as causal geneCAU

SAL PATHS

fold change- 0 +

TF-DNAprotein-protein

kinase

TF

causalgenes

Page 33: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Our Method

Integrate several types of data Gene expression Copy number variations Molecular interactions

Page 34: Identifying Causal Genes and  Dysregulated  Pathways in Complex Diseases

Methods and Results

Method model the expression change of disease

genes as a function of genomic alterations translated the propagation of information

from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions.

multi-set cover: select most prominent genes

Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes

diseasegene gm

tagSNP

sn

causalgenes

+ -


Top Related