inferring cancer subnetwork markers

Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering

Phuong Dao∗,1, Recep Colak∗,3

Raheleh Salari1, Flavia Moser4, Elai Davicioni5

Alexander Schönhuth†,2, Martin Ester1,†

1School of Computing Science, Simon Fraser University, Canada

2Centrum Wiskunde & Informatica, Amsterdam, Netherlands

3Department of Computing Science, University of Toronto, Canada4Center for Disease Control, University of British Columbia

5GenomeDX Biosciences Inc.

∗: Joint first authors, †: Joint corresponding, last authors


IntroductionPersonalized Medicine

• Determination of disease status based on patientgenetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes


Single Gene Markers

Gene 6

Gene 4

Gene 2Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Gene 2Gene 4

Gene 6Gene 5

Gene 3Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Differentially Expressed

Non−Differentially Expressed

Gene 5

Gene 3

Caveat: Single gene markers vary significantly across different studies


Marker SelectionMultigenic Traits

G2

Gene 4

Gene 2

Gene 1

Case 1

Case 2

Case 3

Contr

ol 1

Contr

ol 2

Contr

ol 3

Gene Expression Profiles Interaction/Association Network

Gene 4

Gene 3

Gene 2

Gene 1

(0.85)

(0.75)

(0.8)(0.9)

(0.95)

G1

G3

G4

Gene 3

Solution: Differentially expressed genes participating in the same pathway[Chuang et al., 2007], [Chowdhury et al. 2010]


Our Approach

Each of our subnetwork markers:• is a

densely connected subnetwork+ Disease-related genes have more PPI interactions thanexpected [Goh et al., PNAS (2007)]

• contains genes which are differentially expressedin a subset of samples

+ cancer tumors vary greatly in phenotype, although belongingto the same (sub)type [Hampton et al., GR (2009)]


Density-Constrained Biclusters

Definition: G is called α-dense ifP

e∈E we

(|V|2 )≥ α ≥ 0.5.

0.75

0.9

0.85

0.7

0.95

S1

S2

S3

G1

G2

G3

G4

01

1 1

1

1 1 1

0

1

1

1

S1

S2

S3

1

1

1 1

1

1

1 1

0

10

0

G4

G5

G6

G7

G2

G4

G1

G3

0.8

0.75

0.85

0.95

0.9

G4

0.70.9

G6

0.95

0.85

G7

G5

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.5

0.9

0.950.650.35

0.750.8

0.8

0.9

0.8 0.9

0.950.85

0.80.9

Our markers are α-densely connected subnetworks of genes that aredifferentially expressed in a subset of patients of size at least k (here: k = 2).


Methods


Density Constrained BiclusteringSearch Strategy

Theorem: Every α-densely connected network of size n contains anα-densely connected subnetwork of size n − 1.

maximal wDCB

B

D0.8A

C0.6

B

A0.4 A

D0.9

B

C

D

C

A

B

D 0.40.9

0.8

A

C

D

0.60.9 B

D

C

0.8A

C

B

0.60.4

0.80.9

0.60.4

C

A

D

B

Not Connected

Not Dense

0.80.9

0.60.4

C

A

D

B

= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45

wDCB

Search Strategy: Breadth-first search.


Classification

1. Marker computation: Feature space creationmarker = dimension

2. Construct classifier using training data3. Perform classification on test data

Cross-platform study:Marker computation and test data from different platforms


Experimental Results


Network Data

Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]

• Edges reflect physicalprotein-protein interactions

• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)

• Scoring system based on KEGGpathways

0.95

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.950.5

0.9

0.85

0.950.75

0.80.650.35

0.750.8

0.80.9

0.8 0.9

0.9

0.85

0.7

0.9


Gene Expression Data

Colon cancer

• GSE8671, 32 patients / tissue pairs

• GSE10950, 24 patients / tissue pairs

• GSE6988, 123 samples across several cancer subtypes

Breast cancer

• GSE3494, 251 patients with different TP53 mutation status (wildtype vs.mutant)


Colon CancerPrediction

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

#Subnetworks/Genes

GSE8671 >> GSE6988

SGMGMI

NETCOVERwDCB


Colon CancerPrognosis

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50

AU

C

# Subnetworks/Genes

GSE8671 >> GSE6988 prognosis

SGMGMI

NETCOVERwDCB


Colon Cancer: PrognosisAccuracy

8671→6988, Prognosis 10950→6988, PrognosisK SGM GMI NC wDCB SGM GMI NC wDCB1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.475 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.6810 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.7420 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.8530 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.8540 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.8950 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91

Top values previous methodsTop value our method


Breast CancerTP53 Wildtype vs. Mutant

0.7

0.75

0.8

0.85

0.9

0 5 10 15 20 25

Acc

urac

y

# Subnetworks/Genes

GSE3494 (Miller et al.)

SGM (mappable)GMI (mappable)

wDCB (mappable)SPM (not mappable)


Subnetwork Marker Statistics

# Subnetworks Enrichment # Subnetworks EnrichmentGMI 806 0.38 755 0.34NC 923 0.12 N/A N/A

wDCB 282 0.76 216 0.748671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)

wDCB = weighted Density Constrained Biclustering# Subnetworks = total number of subnetworks computed

Enrichment = enrichment rate of the top-50 markers


Top Markers in GSE8671

• Enriched with DNA replicationinitiation (p=6.39e-14), DNAmetabolic process (p=6.15e-12)

• TP53, BRCA1: tumor suppressorgenes

• Minichromosome maintenance(MCM) complex

• MCM2, MCM5: early markers forcolon cancer (Burger et al., 2008)


Outlook / Acknowledgments

Outlook:

• Analyze subnetwork signatures

• ncRNA-protein interaction data

Acknowledgments:

• Mehmet Koyutürk

• David DesJardins, Google Inc.

• Lab for Mathematical and Computational Biology, UC Berkeley


Thanks for the attention!


Densely Connected SubnetworksProperties

Let G = (V , E) be a network with edge weights we, e ∈ E .• The density θ(G) of G is

θ(G) :=

∑e∈E we(|V |

2

) =2 ·

∑e∈E we

|V |(|V | − 1)

where(|V |

2

)is the number of possible edges in G.

• G is called α-dense if

θ(G) ≥ α ≥ 0.5

• An α-dense, connected network G is called α-denselyconnected.


Classifier Construction

1. Rank density constrainedbiclusters according to densitysignificance

2. Keep only high-rankedsubnetworks with little overlap

3. Feature space dimension =number of markers

4. SVM classification

Average Gene Expression Profile

1.25

1.5

1.0

1.25

0.5

0.0

0.25

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

1.25

0.5

Marker 1

Marker 2

0.8

0.950.85

0.75

0.9

G4

G6

0.95

G2

G4

G3

G1

0.70.9

0.85

G5

G7

Average

Gene Expression Profile


Colon Cancer: PredictionAccuracy

8671→6988 10950→6988K SGM GMI NC wDCB SGM GMI NC wDCB1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.775 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.8610 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.8820 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.8930 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.8540 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.8950 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89

Top values previous methods, our method

inferring cancer subnetwork markers

Documents