tools for analyzing cancer variation - encode · tools for analyzing cancer variation ekta khurana,...
TRANSCRIPT
Tools for analyzing cancer variation
Ekta Khurana, PhD Assistant Professor
Meyer Cancer Center Englander Institute for Precision Medicine Institute for Computational Biomedicine
Department of Physiology and Biophysics Weill Cornell Medicine, New York, NY
1
@ekta_khurana
2
FirstcancerWGS,
Ley,Mardis,etal.,Nature,2008
~500cancerWGSAlexandrov,etal.,Nature,
2013
~3000WGSfromICGC/TCGA
2014
3
International Cancer Genome Consortium & The Cancer Genome Atlas
~3000WGS(tumor&normal),~1600RNA-Seq,~1500methylaQon
4
Most variants are
in noncoding
regions
Khurana et al, Nature Rev Genet, 2016
MB:medulloblastomaDLBC:BcelllymphomaSTAD:gastricBRCA:breastPAAD:pancreaHcPRAD:prostateLIHC:liverPA:pilocyHcAstrocytomaLUAD:Lungadenocarcinoma
5
CGGAGG
CGGAAG mRNA
Gain-of-motif TF
TF 0.0
1.0
2.0WT
Mutated
Loss-of-motif
Loss-of-motif
TATCTAT X
TF
TATTTAT TF
WT
Mutated
T0.0
1.0
2.0
CGT
ATG
TGCA5G
CTA
CTG
ATT10AG
TGA
G
ACT
CGAT
ACT15GTA
C
Altered binding effects 2.0
Promoter Gene
X
• MYBmoHfcreated&drivesTAL1
overexpressioninT-ALL(Mansouretal,Science,2014)
Modes of action of noncoding variants: transcription factor binding disruption
TERT promoter mutated in many different cancer types
Killela et al, PNAS, 2013 Horn et al, Science, 2013 Huang et al, Science, 2013
Co-variates of mutation rates: Increased mutation density at TF binding sites in melanoma and lung
cancer
6
Perera et al, Nature, 2016 Sabarinathan et al, Nature, 2016 Khurana, Nature News & Views, 2016
Outline
• Variants with high functional impact: FunSeq
• Driver elements w/ more recurrent & high functional impact mutations than expected randomly: CompositeDriver
7
Khurana et al, Nature Rev Genet, 2016
IdenQfyingnoncodingvariants
associatedwithcancer
Khurana et al, Nature Rev Genet, 2016
IdenQfyingnoncodingvariants
associatedwithcancer
FunSeq
10
EvoluQonaryconservaQon- Typicallydefinedbycomparisonacrossspecies
ConservaQonamonghumans-DepleHonofcommonvariants/Enrichmentofrarevariants
1
2
4
3
Estimating negative selection
Commonvariant Rarevariants
FracHonofrarevariants=(Numofrarevariants/Totalnumofvariants)
Enrichment of rare SNPs as a metric for negative selection
11
0.55
0.60
0.65
0.70
0.4
0.5
0.6
0.7
0.8
0.9
A
Frac
tion
of ra
re S
NPs
(non
syn)
All C
odin
g
LOF-
tol.
Rece
ssive
GW
AS
Dom
inan
t
Esse
ntia
l
Canc
er
B
Frac
tion
of ra
re S
NPs
DHS Coding
Bone
Urot
heliu
mEm
br s
tem
cel
lBr
east
Bloo
dPa
ncr d
uct
Panc
reas
Skin
Eye
Live
rM
uscle
Bloo
d ve
ssel
Hear
tLu
ngBl
astu
laM
yom
etriu
mEp
ithel
ium
Pros
tate
Brai
nEm
br lu
ngG
ingi
val
Kidn
eySp
inal
cor
dFo
resk
inCo
nnec
tive
Adip
ose
Live
rCe
rvix
Trac
hea
Sple
enPr
osta
teEs
opha
gus
Lung
Test
isPl
acen
taBl
adde
rCo
lon
Hear
tTh
yroi
dTh
ymus
Kidn
eyBr
ain
Ova
ry
(rare=derivedallelefreq<0.5%)
• Depletion of common polymorphisms in regions under selection
Negative selection restricts the allele frequency of deleterious mutations.
• Results for coding genes consistent with known phenotypic impacts
• Other metrics for selection • EvoluHonaryconservaHon
(e.g.GERP)• SNPdensity
(confoundedbymutaHonrate)
LOF-tol(Loss-of-funcQontolerant):leastnegaQveselecQonCancer:mostselecQon Khurana et al., Science, 2013
Organism-level negative selection in noncoding elements
12Khurana et al., Science, 2013
Negative selection and tissue-specificity of coding and noncoding regions
13
q Ubiquitously expressed genes and bound regions show stronger selection
q Differences in constraints amongst tissues q Constraints in coding genes and regulatory genes are
correlated across tissues
Which noncoding categories are under very strong “coding-like” selection ?
q Topcategoriesamongranked102categories
q BindingpeaksofsomegeneralTFs(egFAM48A)
q CoremoHfsofsomeTFfamilies(egJUN,GATA)
q DHSsitesinspinalcordandconnecHveHssue
~0.4%genomiccoverage(~top25)
~0.02%genomiccoverage(top5)
~400-fold
~40-fold
Enrichmentofknowdisease-causingmutaHonsfromHumanGeneMutaHon
database
14
Human regulatory network from ENCODE ChIP-Seq
Peak Calling (ChIP-Seq) Assigning TF binding sites to targets Filtering high confidence edges
~28K proximal edges
PotenHalDistalEdge
StrongProximalEdge
TF TF
Nodes119TFsand~9000
targetgenesEdges
28,000interacHons
UsingcorrelaHonwithexpressiondata 15Gerstein¶…..Khurana¶…., Nature, 2012 (¶ co-first authors) Yip et al, Genome Res, 2012
Gene essentiality and human regulatory network
16
Non-TF target
In-degree = 1
Out-degree = 5
TF
Gerstein¶…..Khurana¶…., Nature, 2012 (¶ co-first authors)
LoF-tolerant Essential
0.0
0.5
1.0
1.5
2.0
2.5
Reg
ulat
ory
degr
ee
Wilcoxon pvalue=1.29e-2
Totalde
gree(IN+OUT)
(logscale)
Essential genes tend to be central
Khurana et al., PLoS Comp. Bio., 2013
16
EssenHal
LoF-tolerant
Sizeofnodesscaledbytotaldegree
ZGumusiCAVEmovie
17
Identification of noncoding mutations with high impact: FunSeq
• Feature weight - Weighted with mutation patterns in natural polymorphisms
(features frequently observed weighed less) - entropy based method
!! = 1+ !!!"#!!! + 1− !! !"#! 1− !! !!!
!!"#$%! = ! !!!!!!!"!!"#$%&$'!!"#$%&"'!
18
HOTregion
SensiHveregion
Polymorphisms
Genome
p=probabilityofthefeatureoverlappingnaturalpolymorphisms
Featureweight:
Foravariant:
wdp
FunSeq2:weightedscoringscheme
Fu et al., Genome Biology, 2014 hkps://github.com/khuranalab/FunSeq_PCAWGhkp://funseq2.gersteinlab.org
IdenQfyingnoncodingvariants
associatedwithcancer
CompositeDriverFunSeq
CompositeDriver for detecting driver coding & noncoding elements
(A)AlteraHonsarefuncHonallyannotatedbyFunSeq2pipelineFSi=originalFunSeq2score
(B)CalculateposiHonalrecurrenceofeachmutaHoninthecohortSample1Sample2Sample3
(C)WithineachfuncHonalregion,compositefuncHonalscore(CFSr)issumofrecurrencemulHpliedbyFunSeq2scoreineachposiHonwithalteraHon.
r = region (cds, promoter, enhancer and lincRNA)n = number of variants in rWi = number of samples with variant i
(D)P-valueforeachregionisproducedfrompermutaHontestandBenjaminiandHochbergmethodtocorrectmulHplehypothesistesHng.
FS1 FS3 FS4 FS5 FS7FS6FS2 FS8 FS9
20Eric Minwei Liu
Results from 40 lung adenocarcinoma samples Codingseq. Promoters
lincRNA
Expected(-logP)Expected(-logP)
Expected(-logP)
DatafromTCGA
0 1 2 3 4 5 60
1
2
3
4
5
6Q−Q plot for SNV's in promoter regions
Expected (−logP)
Obse
rved (−
logP
)
0 1 2 3 40
1
2
3
4Q−Q plot for SNV's in coding regions
Expected (−logP)
Obse
rved (−
logP
)
SPOP
0 1 2 3 4 5 6 70
1
2
3
4
5
6
7Q−Q plot for SNV's in enhancer regions
Expected (−logP)
Obse
rved (−
logP
)
Q-Q plot of SNVs in coding regions Q-Q plot of SNVs in promoters Q-Q plot of SNVs in enhancers
22
Results from 188 prostate cancer samples
DatafromICGC,BacaetalCell2013,BergeretalNature2011
Functional validation of candidates in prostate cancer
WDR74 promoter
q Sanger sequencing in 19 additional samples confirms the recurrence
q WDR74 shows increased expression in tumor samples
benign
PCa
23
RET promoterIncreased activity
In collaboration w/ Mark Rubin
EIF4EBP3 promoterReduced activity
24
Acknowledgements
Yale Yao Fu (now at Bina), Xinmeng Mu (now at
Broad), Jieming Chen, Lucas Lochovsky, Arif Harmanci, Alexej Abyzov,
Suganthi Balasubramanian, Cristina Sisu, Declan Clarke, Mike Wilson, Yong Kong, Mark
Gerstein
Sanger Vincenza Colonna, Yuan Chen, Yali Xue, Chris
Tyler-Smith
Cornell Steven Lipkin, Jishnu Das, Robert Fragoza,
Xiaomu Wei, Haiyuan Yu
Andrea Sboner, Dimple Chakravarty, Naoki Kitabayashi, Vaja Liluashvili,
Zeynep H. Gümüş, Kellie Cotter, Mark A. Rubin
U of Michigan Hyun Min Kang U of Geneva
Tuuli Lappalainen (NYGC), Emmanouil T. Dermitzakis
Baylor Daniel Challis, Uday Evani, Donna
Muzny, Fuli Yu, Richard Gibbs EBI
Kathryn Beal, Laura Clarke, Fiona Cunningham, Paul Flicek, Javier Herrero, Graham R. S. Ritchie
Boston College Erik Garrison, Gabor Marth
Mass Gen Hospital Kasper Lage, Daniel G. MacArthur,
Tune H. Pers Rutgers
Jeffrey A. Rosenfeld
FuncQonalInterpretaQon
Group~50parHcipants
~40InsHtutes~550parHcipants
Khurana lab Eric Minwei Liu Priyanka Dhingra Alexander Fundichely Tawny Cuykendall Andrea Sboner Mark Rubin Dimple Chakravarty Kellie Cotter Steve Lipkin Chason Lee
Sandra and Edward Meyer Cancer Center Englander Institute for Precision Medicine
Institute for Computational Biomedicine
PostdocposiQonsavailablekhuranalab.med.cornell.edu/jobs