2016 presentation at the university of hawaii cancer center
TRANSCRIPT
It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.
- Attributed to Mark Twain
Everybody knows there are 4 subtypes of HGSC.
Everybody
… but Greg.
Tothill et al. Clinical Cancer Research. 2008
One hundred and seventy one tumors consistently segregated into one of the six k-means clusters. Most of the remaining tumors (80 of 114) could be further assigned to one of the molecular subsets by performing class prediction.
171 clustered cleanly
80 could be assigned 34 ???
12-40% unclear
The Cancer Genome Atlas, Nature. 2011
The silhouette width was computed to filter out expression profiles that were included in a subclass, but that were not robust representatives of the subclass. This resulted in the removal of 51 of 135 samples of the Differentiated subclass; 12 of 107 samples of the Immunoreactive subclass; 0 of 109 samples of the Mesenchymal subclass; and 13 of 138 samples of the Proliferative subclass..
Verhaak et al. JCI. 2013
What’s the deal with HGSC subtypes?
Casey Greene
Assistant Professor Systems Pharmacology and Translational Therapeutics
Unified Bioinformatics Pipeline curatedOvarianData
Remove • <130 tumors • Custom array
technology
Clustering
Analyses
SAM
Overrepresented Pathways
Survival
Match Clusters
Dataset Inclusion Criteria
TCGA
Tothill
Yoshihara
Bonome
*Our group deposited 528 samples to GEO (GSE74357)
Mayo*
Keep • Histology • High Grade
Sample Inclusion Criteria
Gene Selection Criteria
Keep • 1500 MAD • Union
Remove • <130 tumors • Custom array
technology
Clustering
Analyses
SAM
Overrepresented Pathways
Survival
Match Clusters
Dataset Inclusion Criteria
TCGA
Tothill
Yoshihara
Bonome
*Our group deposited 528 samples to GEO (GSE74357)
Mayo*
Keep • Histology • High Grade
Sample Inclusion Criteria
Gene Selection Criteria
Keep • 1500 MAD • Union
Unified Bioinformatics Pipeline
Are HGSC subtypes consistent?
Cluster Comparison
Results are consistent across clustering algorithm.
Cross-population Comparison
Cross-population Comparison
Are HGSC subtypes consistent across populations?
Are HGSC subtypes consistent across populations?
Syn-clusters: Consistent across method, study, and population
Concordance with other publications
TCGA Tothill Konecny
Unified Pipeline
Syn-cluster 1?
Syn-cluster 2?
Syn-cluster 3?
Mesenchymal-like
Proliferative-like
Immunoreactive/Differentiated-like
Why didn’t TCGA (2011) find this?
Why didn’t Konecny (2014) find this?
What about TCGA’s re-analysis of Tothill?
What if you re-analyze Tothill without LMP samples?
Cross-population analysis of high-grade serous ovarian cancer reveals only two robust subtypes. bioRxiv: http://dx.doi.org/10.1101/030239 github: http://github.com/greenelab/hgsc_subtypes
Research is to see what everybody else has seen and to think what
nobody else has thought.�Albert Szent-Györgyi
Image by J.W. McGuire/NIH
Image from You Don’t Know Jack. Vol 3.
Classes of Algorithms
Data
What patterns exist in the data?
Unsupervised algorithms
What fits this pattern in the data?
Supervised algorithms
Early-Mid 2000s Mid 2000s-Present & Mid 2010s -
If you showed 16,000 computers 10 million images from youtube, what would they see?
Le et al. 2012
Analysis with Denoising Autoencoders of �Gene Expression (ADAGE)
Tan et al. Pac Sym Bio 2015; Tan et al. mSystems 2016.
ADAGE Identifies Genes’ Pathways
Assign Pathway
… and produces useful networks
The Transcription Factor Anr Controls P.a. Response to Low O2
Low O2
O2
O2
O2
O2
O2 O2
O2 O2
O2
O2
O2
O2
O2
O2
O2 O2
O2
O2 O2
O2
O2
O2 O2
O2
O2
O2 O2 O2
O2 O2
O2
O2
O2
Anr
CF Lung Epithelium
Node42 reflects Anr Activity
E−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr ActivityE−G
EOD
−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
C
New Experiment Validates Node 42’s Low-O2 Signature
CF lung epithelial cells Jack Hammond
E−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr Activity
E−GEO
D−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
CE−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr Activity
E−GEO
D−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
C
ADAGE complements PCA/ICA
E−GEOD−17179} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
O2
Node42
O2
E−GEOD−33160
E−GEOD−52445
PC4 PC7 IC14
} wt
}}Δanr
Δdnr
O2
} wt
}}Δanr
Δdnr
O2
} wt
}}Δanr
Δdnr
O2−0.5 0 0.51Value
Color Key
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
−1 0 1 2Value
Color Key
O2 O2 O2−2−1 0 1 2 3Value
Color Key
O2 O2 O2−0.5 0.5 1.5
Value
Color Key
−2−1 0 1Value
Color Key
−3−2−1 0 1Value
Color Key
−1 0 1Value
Color Key
−1 0 1 2 3 4Value
Color Key
−1.5−0.5 0.5Value
Color Key
−0.5 0 0.5Value
Color Key
−0.4 0 0.4Value
Color Key
−1 0 1 2Value
Color Key
IC49
} wt
}}Δanr
Δdnr
O2
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
O2
Color Key
Color Key
Color Key
−1 0 1Value
Color Key
−0.5 0.5 1.5Value
−0.5 0 0.51Value
−1 0 1 2Value
}}Δanr
wt
}Δanr
wt
Anr-Microarray
Anr-RNAseq
}}Δanr
wt
}}Δanr
wt
}}Δanr
wt
}}Δanr
wt
Value
Color Key
Value
Color Key
Value
Color Key
Value
Color Key
−0.6 0.60 −0.1 0 0.1 −0.1 0 0.1 0.2 −0.1 0 0.1
Value
Color Key
Value
Color Key
Value
Color Key
Value
Color Key
−15 0 10Value
Color Key
Color Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
−5 0 5
Color Key
Value
Color Key
Value
}}}
}}
Δanr
wt
PAO1
J215
}Δanr
wt
}}}
}}
Δanr
wt
PAO1
J215
}Δanr
wt
}}}
}}
Δanr
wt
PAO1
J215
}Δanr
wt
}}}
}}
Δanr
wt
PAO1
J215
}Δanr
wt
}}}
}}
Δanr
wt
PAO1
J215
−10 0 10 −1.5 0 1 −1 0 1 −0.05 0 0.1 −0.2 0 0.2
Cross-platform normalization of microarray and RNA-seq data for machine learning applications
Thompson, Tan, Greene. PeerJ. Jeff Thompson
Cross-platform normalization of microarray and RNA-seq data for machine learning applications
Thompson, Tan, Greene. PeerJ.
New Experiment Validates Node 42’s Low-O2 Signature
CF lung epithelial cells Jack Hammond
E−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr Activity
E−GEO
D−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
CE−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr Activity
E−GEO
D−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
C
E−GEOD−17179
} wt
}}Δanr
Δdnr
E−GEOD−17296
}}}}}}
ΔanrΔroxSR
ΔanrΔroxSR
wt
wt
}}
EXP
STAT
O2
E−GEO
D−52445
O2
Node42 - Anr Activity
E−GEO
D−33160
O2
A
B
−15 0 10Value
Color KeyColor Key
−10 0 10Value
Color Key
Value−10 0 10
−10 0 15
Color Key
Value
}}Δanr
wt
}}Δanr
wt }}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value−4 0 4
Color Key
Value−2 0 2
Microarray RNAseq PAO1
RNAseq J215
C
ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions�bioRxiv: http://dx.doi.org/10.1101/030650�mSystems. 2016.
LeCun, Bengio, and Hinton. Nature 2015.
How do we move from �this to mechanisms?
What “pathways” did my experiment affect?
ADAGE-based Pathway Analysis of Transcriptomic Changes
ADAGE of Cancer Biopsies.
Hyperactive RAS
TP53
FOXM1
TCGA & METABRIC
Greene et al. Journal of Cell. Phys. 2014
Molecular Subtype Features
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
Acc
urac
y
Subtype METABRIC Discovery
METABRIC Test
TCGA Evaluation
Node30
Node29Node5Node66Node42
Node6
WWW.NATURE.COM/NATURE | 6
WWW.NATURE.COM/NATURE | 6
Basal Her2-enriched Luminal A Luminal B Normal-like
TCGA, 2012 Tan et al. 2015, PSB
But what about survival?
- Usually some portion of the audience
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
ER status
MonthsD
isea
se−s
peci
fic s
urviva
l pro
bablity Logrank p = 8.8e−09
ER pos: 1494(342)ER neg: 434(161)
Logrank p=8.8e-09
Survival-associated Feature
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
LumA Subtype
Months
Dis
ease
−spe
cific
surviva
l pro
bablity Logrank p = 2.5e−17
LumA LumA: 709(110)LumA non−LumA: 1263(396)
Logrank p=2.5e-17
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
Tumor Grade
Months
Dis
ease
−spe
cific
surviva
l pro
bablity Logrank p = 3.3e−08
Grade 3: 950(305)Grade 1,2: 933(186)
Logrank p=3.3e-08
Node5Logrank p = 2.1e−20
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
Months
Dis
ease
−spe
cific
sur
viva
l pro
babl
ity
Node5 active: 912(153)Node5 inactive: 1060(353)
Logrank p=2.1e-20
Pathway Analysis of Survival Feature (Node 5)
Pathway FDR q-value
FOXM1 transcription factor network <1×10-4
Aurora B signaling 4.93×10-4
Aurora A signaling 0.001
PLK1 signaling 0.003
Integrin-linked kinase signaling 0.068
C-MYB transcription factor network 0.074
Cell cycle
Luminal Subtype
Tumor Progression
Pan-Cancer Analysis�(Node 196 of 300)
supermarketnews.com
Mine Your�
Own Business
Greene and Troyanskaya, Nucleic Acids Research. 2011
Mine Your�
Own Business
Wong*, Park*, Greene* et al., Nucleic Acids Research. 2012
Mine Your�
Own Business
Greene,* Wong,* Krishnan,* et al. Nature Genetics. 2015.
Mine Your�
Own Business
Zelaya and Greene, In Preparation
ADAGE Webserver coming soon! http://www.greenelab.com/webservers
When you’re caught in the data deluge…
… don’t grab an umbrella…
… get a bucket.
Greene Lab: Jie Tan (Grad Student) Gregory Way (Grad Student) Brett Beaulieu-Jones (Grad Student) Sammy Klasfeld (Rotation Student) René Zelaya (Programmer) Matt Huyck (Programmer) Dongbo Hu (Programmer) Kathy Chen (Undergrad) Mulin Xiong (Undergrad) Tim Chang (Undergrad)
Collaborators: Jen Doherty & James Rudd Deb Hogan & Jack Hammond
Data: All investigators who publicly release their gene expression data.
Images: Artists who release their work under a Creative Commons license.
Funding: G&B Moore Investigator in Data-Driven Discovery National Science Foundation Cystic Fibrosis Foundation American Cancer Society
Find us online: http://www.greenelab.com Twitter: @GreeneScientist