r21: enhanced deconvolution and prediction of mutational

13
R21: Enhanced deconvolution and prediction of mutational signatures Joshua D. Campbell, PhD Masanao Yajima, PhD Section of Computational Biomedicine, Department of Medicine Boston University School of Medicine ITCR Annual Meeting 5/28/2020

Upload: others

Post on 31-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R21: Enhanced deconvolution and prediction of mutational

R21: Enhanced deconvolution and prediction of mutational signatures

Joshua D. Campbell, PhDMasanao Yajima, PhD

Section of Computational Biomedicine, Department of MedicineBoston University School of Medicine

ITCR Annual Meeting5/28/2020

Page 2: R21: Enhanced deconvolution and prediction of mutational

Various exogenous exposures or endogenous biological processes contribute to the mutational load in cancer.

Tri-nucleotide context

Smoking

UV radiation

C>T at TCC/CCC

C>A

C>G at TCT/TCA

APOBEC

Page 3: R21: Enhanced deconvolution and prediction of mutational

Deconvolution of mutational signatures using non-negative matrix factorization (NMF)

Alexandrov et al, Deciphering Signatures of Mutational Processes Operative in Human Cancer, Cell Reports, 2013.

Page 4: R21: Enhanced deconvolution and prediction of mutational

Limitations of NMF and current software packages for mutational signature inference

1. No inherent method for predicting new samples given an existing training model.

2. Limited flexibility to include additional information into signature inference processes.

Train Test?

ACGT

Base+

3

Base+

2

+-

C>A_ACAC>A_ACCC>A_ACGC>A_ACT

T>G_TTAT>G_TTCT>G_TTGT>G_TTT

...

C>A_AAACAAAC>A_AAACAACC>A_AAACAAGC>A_AAACAAT

T>G_TTTTTTAT>G_TTTTTTCT>G_TTTTTTGT>G_TTTTTTT

... .

..

C>A_ACAC>A_ACCC>A_ACGC>A_ACT

T>G_TTAT>G_TTCT>G_TTGT>G_TTT

...

...

ACGT

Base-2

Base-3

X X X X

A) B) C)

Number of itemsin distributionPowered with

standard data?

NMFTrinucleotide Context

NMFHeptanucleotide Context

96

Yes

24,576

No

4

Yes

4

Yes

4

Yes

4

Yes

96

Yes

Novel model with joint probabilityHeptanucleotide Context with Strand

Mutatio

nStra

nd

X

2

Yes

Mutatio

nMutatio

n

-

-

Characterize flanking bases Utilize pre-existing in signature discovery

KnownSigs

Tumors

Page 5: R21: Enhanced deconvolution and prediction of mutational

LDA identified similar signatures to NMF in a Pan-Lung cancer dataset from TCGA.

0.2

0.6

Trinucleotidecontext

1.0

APOB

ECMu

tatio

nsAl

l oth

er

Muta

tions

A

BFr

actio

n of

bas

es

0

4

8

P-va

lue (-

log10

)

Diffe

rent

ialba

se u

sage

0.2

0.6

1.0

-1-2-3-4-5-6 65432

Mutation 1

Frac

tion

of b

ases

Downstream (3’)Flanking bases

Upstream (5’)Flanking Bases

Position

-3

T>GT>CT>AC>TC>GC>A

TGCA

Bases

Mutation

0.000.050.100.150.20

0.000.050.100.150.20

0.000.050.100.150.20

0.000.050.100.150.20

0.000.050.100.150.20

0.000.050.100.150.20

Mut

ation

type

pro

babil

ity

NMF-dervied signatures LDA-derived signatures

Trinucleotide context

R=0.98

R=0.99

R=0.85

R=0.94

R=0.79

C>A C>G C>T T>A T>C T>G

Trinucleotide context

UV

Smoking

MMR

APOBEC

Clock-like/aging

“Pan-Lung” dataset = 1,144 lung cancer exomes (Campbell et al, Nature Genetics, 2016)

Page 6: R21: Enhanced deconvolution and prediction of mutational

Development of a novel Bayesian model that allows for inclusion of other features such as additional flanking bases.

θi ∼ DirK(α) for i = 1..SMk ∼ DirT (γ) for k = 1..KDk,a ∼ Dir4(β) for k = 1..K;a = (p− w)..(p− 1)Uk,b ∼ Dir4(β) for k = 1..K;b = (p+ 1)..(p+ w)Gk,f ∼ DirEf (δ) for k = 1..K;f = 1..Fzi,j ∼ Categorical(θi) for i = 1..S; j = 1..Ni

mi,j ∼ Categorical(Mzi,j) for i = 1..S; j = 1..Ni

di,j,a ∼ Categorical(Dzi,j ,a) for i = 1..S; j = 1..Ni;a = (p− w)..(p− 1)ui,j,b ∼ Categorical(Uzi,j ,b) for i = 1..S; j = 1..Ni;b = (p+ 1)..(p+ w)gi,j,f ∼ Categorical(Gzi,j ,f ) for i = 1..S; j = 1..Ni;f = 1..F

S is the number of samples.K is the number of mutational signatures.Ni is the number of mutations for sample i.T is the number of mutations typesEf is the number of entries in feature distribution Gf .F is the total number of genomic feature distributions.p is the position of mutation in the genome.w is the length of the flanking motif.a,b are the index positions in the flanking sequence relative to pmi,j is the jth observed mutation for sample idi,j,t is the jth observed base at downstream position t for sample iui,j,t is the jth observed base at upstream position t for sample igi,j,f is the jth observed feature in f for sample i

αθzmM

up+wup+1dp-1dp-w gf

Up+1 Up+wDp-1Dp-w Gf

gF

GF......

...

.........

SN

Kγ βp+1 βp+wβp-1βp-w δf δF.........p(θ,Z,M ,D,U ,G,m,d,u, g|α, γ,β, δ =

S

i=1

p(θi|α) p(zi,j |θi)K

k=1

p(Mk|γ)p−1

a=p−w

p(Dk,a|β)p+w

b=p+1

p(Uk,b|β)F

f=1

p(Gk,f |δ)Ni

j=1

p(mi,j |Mzi,j )p(di,j,a|Dzi,j ,a)p(ui,j,b|Uzi,j ,b)p(gi,j,f |Gzi,j ,f )

A) Model specification B) Plate diagram

C) Likelihood)

ϖ ϖ ϖ ϖ ϖ ϖ

Models flanking bases in signatures as independently observed variables.

Page 7: R21: Enhanced deconvolution and prediction of mutational

Development of a novel Bayesian model that allows for joint learning of known and novel signatures.

FixedSignatures

EstimatedSignatures

Page 8: R21: Enhanced deconvolution and prediction of mutational

Developing a comprehensive R packagefor mutational signature inference.

Page 9: R21: Enhanced deconvolution and prediction of mutational

Comparison of features across existing software packages

Page 10: R21: Enhanced deconvolution and prediction of mutational

Developing a comprehensive R packagefor mutational signature inference.

DBS Ins Del

VCF/MAF/Table

SNV-96SNV-192 with transcription

strand

SNV-192 replication

strand

SNV-96

Ins

Del

DBS

Ins

SNV-192 transcription

strand

Del

Custom

SNV-96 Custom

DBS

Deconvolution

Create tables

Mix and match

Page 11: R21: Enhanced deconvolution and prediction of mutational

Comprehensive sets of visualizations for exploratory analysis.

COSMIC V2Signatures

Signatures Tumor profiles Comparisons between signatures

Embedding of tumors in 2D

Page 12: R21: Enhanced deconvolution and prediction of mutational

LDA had significantly better performance thanNMF/deconstructSigs in a 5-fold cross-validation.

Cross-validation can be useful for determining

signature discovery stability.

Subsampling can be useful for determining

signature discovery sensitivity.

Page 13: R21: Enhanced deconvolution and prediction of mutational

Acknowledgements

Boston University School of MedicineComputational Biomedicine

EvanJohnson

MasanaoYajima

ShiyiYang

Aaron Chevalier

KellyGeyer

https://github.com/campbio/BAGEL/