inferring cancer subnetwork markers
TRANSCRIPT
Introduction Methods Experimental Results
Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering
Phuong Dao∗,1, Recep Colak∗,3
Raheleh Salari1, Flavia Moser4, Elai Davicioni5
Alexander Schönhuth†,2, Martin Ester1,†
1School of Computing Science, Simon Fraser University, Canada
2Centrum Wiskunde & Informatica, Amsterdam, Netherlands
3Department of Computing Science, University of Toronto, Canada4Center for Disease Control, University of British Columbia
5GenomeDX Biosciences Inc.
∗: Joint first authors, †: Joint corresponding, last authors
Introduction Methods Experimental Results
IntroductionPersonalized Medicine
• Determination of disease status based on patientgenetics/genomics
• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers
• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes
Introduction Methods Experimental Results
IntroductionPersonalized Medicine
• Determination of disease status based on patientgenetics/genomics
• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers
• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes
Introduction Methods Experimental Results
Single Gene Markers
Gene 6
Gene 4
Gene 2Gene 1
Ca
se
1
Ca
se
2
Ca
se
3
Co
ntr
ol 1
Co
ntr
ol 2
Co
ntr
ol 3
Gene 2Gene 4
Gene 6Gene 5
Gene 3Gene 1
Ca
se
1
Ca
se
2
Ca
se
3
Co
ntr
ol 1
Co
ntr
ol 2
Co
ntr
ol 3
Differentially Expressed
Non−Differentially Expressed
Gene 5
Gene 3
Caveat: Single gene markers vary significantly across different studies
Introduction Methods Experimental Results
Marker SelectionMultigenic Traits
G2
Gene 4
Gene 2
Gene 1
Case 1
Case 2
Case 3
Contr
ol 1
Contr
ol 2
Contr
ol 3
Gene Expression Profiles Interaction/Association Network
Gene 4
Gene 3
Gene 2
Gene 1
(0.85)
(0.75)
(0.8)(0.9)
(0.95)
G1
G3
G4
Gene 3
Solution: Differentially expressed genes participating in the same pathway[Chuang et al., 2007], [Chowdhury et al. 2010]
Introduction Methods Experimental Results
Our Approach
Each of our subnetwork markers:• is a
densely connected subnetwork+ Disease-related genes have more PPI interactions thanexpected [Goh et al., PNAS (2007)]
• contains genes which are differentially expressedin a subset of samples
+ cancer tumors vary greatly in phenotype, although belongingto the same (sub)type [Hampton et al., GR (2009)]
Introduction Methods Experimental Results
Density-Constrained Biclusters
Definition: G is called α-dense ifP
e∈E we
(|V|2 )≥ α ≥ 0.5.
0.75
0.9
0.85
0.7
0.95
S1
S2
S3
G1
G2
G3
G4
01
1 1
1
1 1 1
0
1
1
1
S1
S2
S3
1
1
1 1
1
1
1 1
0
10
0
G4
G5
G6
G7
G2
G4
G1
G3
0.8
0.75
0.85
0.95
0.9
G4
0.70.9
G6
0.95
0.85
G7
G5
0.3
0.65
0.75
0.45
0.95
0.55
0.7
0.8
0.45
0.95
0.75
0.6
0.85
0.8
0.25
0.9
0.9
0.5
0.9
0.950.650.35
0.750.8
0.8
0.9
0.8 0.9
0.950.85
0.80.9
Our markers are α-densely connected subnetworks of genes that aredifferentially expressed in a subset of patients of size at least k (here: k = 2).
Introduction Methods Experimental Results
Methods
Introduction Methods Experimental Results
Density Constrained BiclusteringSearch Strategy
Theorem: Every α-densely connected network of size n contains anα-densely connected subnetwork of size n − 1.
maximal wDCB
B
D0.8A
C0.6
B
A0.4 A
D0.9
B
C
D
C
A
B
D 0.40.9
0.8
A
C
D
0.60.9 B
D
C
0.8A
C
B
0.60.4
0.80.9
0.60.4
C
A
D
B
Not Connected
Not Dense
0.80.9
0.60.4
C
A
D
B
= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45
wDCB
Search Strategy: Breadth-first search.
Introduction Methods Experimental Results
Classification
1. Marker computation: Feature space creationmarker = dimension
2. Construct classifier using training data3. Perform classification on test data
Cross-platform study:Marker computation and test data from different platforms
Introduction Methods Experimental Results
Experimental Results
Introduction Methods Experimental Results
Network Data
Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]
• Edges reflect physicalprotein-protein interactions
• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)
• Scoring system based on KEGGpathways
0.95
0.3
0.65
0.75
0.45
0.95
0.55
0.7
0.8
0.45
0.95
0.75
0.6
0.85
0.8
0.25
0.9
0.9
0.950.5
0.9
0.85
0.950.75
0.80.650.35
0.750.8
0.80.9
0.8 0.9
0.9
0.85
0.7
0.9
Introduction Methods Experimental Results
Gene Expression Data
Colon cancer
• GSE8671, 32 patients / tissue pairs
• GSE10950, 24 patients / tissue pairs
• GSE6988, 123 samples across several cancer subtypes
Breast cancer
• GSE3494, 251 patients with different TP53 mutation status (wildtype vs.mutant)
Introduction Methods Experimental Results
Colon CancerPrediction
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 5 10 15 20 25 30 35 40 45 50
AU
C
#Subnetworks/Genes
GSE8671 >> GSE6988
SGMGMI
NETCOVERwDCB
Introduction Methods Experimental Results
Colon CancerPrognosis
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50
AU
C
# Subnetworks/Genes
GSE8671 >> GSE6988 prognosis
SGMGMI
NETCOVERwDCB
Introduction Methods Experimental Results
Colon Cancer: PrognosisAccuracy
8671→6988, Prognosis 10950→6988, PrognosisK SGM GMI NC wDCB SGM GMI NC wDCB1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.475 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.6810 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.7420 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.8530 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.8540 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.8950 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91
Top values previous methodsTop value our method
Introduction Methods Experimental Results
Breast CancerTP53 Wildtype vs. Mutant
0.7
0.75
0.8
0.85
0.9
0 5 10 15 20 25
Acc
urac
y
# Subnetworks/Genes
GSE3494 (Miller et al.)
SGM (mappable)GMI (mappable)
wDCB (mappable)SPM (not mappable)
Introduction Methods Experimental Results
Subnetwork Marker Statistics
# Subnetworks Enrichment # Subnetworks EnrichmentGMI 806 0.38 755 0.34NC 923 0.12 N/A N/A
wDCB 282 0.76 216 0.748671 Subnetworks 10950 Subnetworks
GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)
wDCB = weighted Density Constrained Biclustering# Subnetworks = total number of subnetworks computed
Enrichment = enrichment rate of the top-50 markers
Introduction Methods Experimental Results
Top Markers in GSE8671
• Enriched with DNA replicationinitiation (p=6.39e-14), DNAmetabolic process (p=6.15e-12)
• TP53, BRCA1: tumor suppressorgenes
• Minichromosome maintenance(MCM) complex
• MCM2, MCM5: early markers forcolon cancer (Burger et al., 2008)
Introduction Methods Experimental Results
Outlook / Acknowledgments
Outlook:
• Analyze subnetwork signatures
• ncRNA-protein interaction data
Acknowledgments:
• Mehmet Koyutürk
• David DesJardins, Google Inc.
• Lab for Mathematical and Computational Biology, UC Berkeley
Introduction Methods Experimental Results
Thanks for the attention!
Introduction Methods Experimental Results
Densely Connected SubnetworksProperties
Let G = (V , E) be a network with edge weights we, e ∈ E .• The density θ(G) of G is
θ(G) :=
∑e∈E we(|V |
2
) =2 ·
∑e∈E we
|V |(|V | − 1)
where(|V |
2
)is the number of possible edges in G.
• G is called α-dense if
θ(G) ≥ α ≥ 0.5
• An α-dense, connected network G is called α-denselyconnected.
Introduction Methods Experimental Results
Classifier Construction
1. Rank density constrainedbiclusters according to densitysignificance
2. Keep only high-rankedsubnetworks with little overlap
3. Feature space dimension =number of markers
4. SVM classification
Average Gene Expression Profile
1.25
1.5
1.0
1.25
0.5
0.0
0.25
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
Gene 7
1.25
0.5
Marker 1
Marker 2
0.8
0.950.85
0.75
0.9
G4
G6
0.95
G2
G4
G3
G1
0.70.9
0.85
G5
G7
Average
Gene Expression Profile
Introduction Methods Experimental Results
Colon Cancer: PredictionAccuracy
8671→6988 10950→6988K SGM GMI NC wDCB SGM GMI NC wDCB1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.775 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.8610 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.8820 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.8930 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.8540 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.8950 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89
Top values previous methods, our method