inferring cancer subnetwork markers

24
Introduction Methods Experimental Results Inferring Cancer Subnetwork Markers using Density-Constrained Biclustering Phuong Dao *,1 , Recep Colak *,3 Raheleh Salari 1 , Flavia Moser 4 , Elai Davicioni 5 Alexander Schönhuth ,2 , Martin Ester 1,1 School of Computing Science, Simon Fraser University, Canada 2 Centrum Wiskunde & Informatica, Amsterdam, Netherlands 3 Department of Computing Science, University of Toronto, Canada 4 Center for Disease Control, University of British Columbia 5 GenomeDX Biosciences Inc. * : Joint first authors, : Joint corresponding, last authors

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering

Phuong Dao∗,1, Recep Colak∗,3

Raheleh Salari1, Flavia Moser4, Elai Davicioni5

Alexander Schönhuth†,2, Martin Ester1,†

1School of Computing Science, Simon Fraser University, Canada

2Centrum Wiskunde & Informatica, Amsterdam, Netherlands

3Department of Computing Science, University of Toronto, Canada4Center for Disease Control, University of British Columbia

5GenomeDX Biosciences Inc.

∗: Joint first authors, †: Joint corresponding, last authors

Page 2: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Determination of disease status based on patientgenetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes

Page 3: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Determination of disease status based on patientgenetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes

Page 4: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Single Gene Markers

Gene 6

Gene 4

Gene 2Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Gene 2Gene 4

Gene 6Gene 5

Gene 3Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Differentially Expressed

Non−Differentially Expressed

Gene 5

Gene 3

Caveat: Single gene markers vary significantly across different studies

Page 5: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Marker SelectionMultigenic Traits

G2

Gene 4

Gene 2

Gene 1

Case 1

Case 2

Case 3

Contr

ol 1

Contr

ol 2

Contr

ol 3

Gene Expression Profiles Interaction/Association Network

Gene 4

Gene 3

Gene 2

Gene 1

(0.85)

(0.75)

(0.8)(0.9)

(0.95)

G1

G3

G4

Gene 3

Solution: Differentially expressed genes participating in the same pathway[Chuang et al., 2007], [Chowdhury et al. 2010]

Page 6: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Our Approach

Each of our subnetwork markers:• is a

densely connected subnetwork+ Disease-related genes have more PPI interactions thanexpected [Goh et al., PNAS (2007)]

• contains genes which are differentially expressedin a subset of samples

+ cancer tumors vary greatly in phenotype, although belongingto the same (sub)type [Hampton et al., GR (2009)]

Page 7: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Density-Constrained Biclusters

Definition: G is called α-dense ifP

e∈E we

(|V|2 )≥ α ≥ 0.5.

0.75

0.9

0.85

0.7

0.95

S1

S2

S3

G1

G2

G3

G4

01

1 1

1

1 1 1

0

1

1

1

S1

S2

S3

1

1

1 1

1

1

1 1

0

10

0

G4

G5

G6

G7

G2

G4

G1

G3

0.8

0.75

0.85

0.95

0.9

G4

0.70.9

G6

0.95

0.85

G7

G5

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.5

0.9

0.950.650.35

0.750.8

0.8

0.9

0.8 0.9

0.950.85

0.80.9

Our markers are α-densely connected subnetworks of genes that aredifferentially expressed in a subset of patients of size at least k (here: k = 2).

Page 8: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Methods

Page 9: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Density Constrained BiclusteringSearch Strategy

Theorem: Every α-densely connected network of size n contains anα-densely connected subnetwork of size n − 1.

maximal wDCB

B

D0.8A

C0.6

B

A0.4 A

D0.9

B

C

D

C

A

B

D 0.40.9

0.8

A

C

D

0.60.9 B

D

C

0.8A

C

B

0.60.4

0.80.9

0.60.4

C

A

D

B

Not Connected

Not Dense

0.80.9

0.60.4

C

A

D

B

= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45

wDCB

Search Strategy: Breadth-first search.

Page 10: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Classification

1. Marker computation: Feature space creationmarker = dimension

2. Construct classifier using training data3. Perform classification on test data

Cross-platform study:Marker computation and test data from different platforms

Page 11: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Experimental Results

Page 12: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Network Data

Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]

• Edges reflect physicalprotein-protein interactions

• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)

• Scoring system based on KEGGpathways

0.95

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.950.5

0.9

0.85

0.950.75

0.80.650.35

0.750.8

0.80.9

0.8 0.9

0.9

0.85

0.7

0.9

Page 13: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Gene Expression Data

Colon cancer

• GSE8671, 32 patients / tissue pairs

• GSE10950, 24 patients / tissue pairs

• GSE6988, 123 samples across several cancer subtypes

Breast cancer

• GSE3494, 251 patients with different TP53 mutation status (wildtype vs.mutant)

Page 14: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Colon CancerPrediction

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

#Subnetworks/Genes

GSE8671 >> GSE6988

SGMGMI

NETCOVERwDCB

Page 15: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Colon CancerPrognosis

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50

AU

C

# Subnetworks/Genes

GSE8671 >> GSE6988 prognosis

SGMGMI

NETCOVERwDCB

Page 16: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Colon Cancer: PrognosisAccuracy

8671→6988, Prognosis 10950→6988, PrognosisK SGM GMI NC wDCB SGM GMI NC wDCB1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.475 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.6810 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.7420 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.8530 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.8540 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.8950 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91

Top values previous methodsTop value our method

Page 17: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Breast CancerTP53 Wildtype vs. Mutant

0.7

0.75

0.8

0.85

0.9

0 5 10 15 20 25

Acc

urac

y

# Subnetworks/Genes

GSE3494 (Miller et al.)

SGM (mappable)GMI (mappable)

wDCB (mappable)SPM (not mappable)

Page 18: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Subnetwork Marker Statistics

# Subnetworks Enrichment # Subnetworks EnrichmentGMI 806 0.38 755 0.34NC 923 0.12 N/A N/A

wDCB 282 0.76 216 0.748671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)

wDCB = weighted Density Constrained Biclustering# Subnetworks = total number of subnetworks computed

Enrichment = enrichment rate of the top-50 markers

Page 19: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Top Markers in GSE8671

• Enriched with DNA replicationinitiation (p=6.39e-14), DNAmetabolic process (p=6.15e-12)

• TP53, BRCA1: tumor suppressorgenes

• Minichromosome maintenance(MCM) complex

• MCM2, MCM5: early markers forcolon cancer (Burger et al., 2008)

Page 20: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Outlook / Acknowledgments

Outlook:

• Analyze subnetwork signatures

• ncRNA-protein interaction data

Acknowledgments:

• Mehmet Koyutürk

• David DesJardins, Google Inc.

• Lab for Mathematical and Computational Biology, UC Berkeley

Page 21: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Thanks for the attention!

Page 22: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Densely Connected SubnetworksProperties

Let G = (V , E) be a network with edge weights we, e ∈ E .• The density θ(G) of G is

θ(G) :=

∑e∈E we(|V |

2

) =2 ·

∑e∈E we

|V |(|V | − 1)

where(|V |

2

)is the number of possible edges in G.

• G is called α-dense if

θ(G) ≥ α ≥ 0.5

• An α-dense, connected network G is called α-denselyconnected.

Page 23: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Classifier Construction

1. Rank density constrainedbiclusters according to densitysignificance

2. Keep only high-rankedsubnetworks with little overlap

3. Feature space dimension =number of markers

4. SVM classification

Average Gene Expression Profile

1.25

1.5

1.0

1.25

0.5

0.0

0.25

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

1.25

0.5

Marker 1

Marker 2

0.8

0.950.85

0.75

0.9

G4

G6

0.95

G2

G4

G3

G1

0.70.9

0.85

G5

G7

Average

Gene Expression Profile

Page 24: Inferring Cancer Subnetwork Markers

Introduction Methods Experimental Results

Colon Cancer: PredictionAccuracy

8671→6988 10950→6988K SGM GMI NC wDCB SGM GMI NC wDCB1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.775 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.8610 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.8820 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.8930 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.8540 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.8950 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89

Top values previous methods, our method