identifying structural templates using alignments of designed sequences stefan m. larson pande group...
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/1.jpg)
Identifying structural templates using Identifying structural templates using alignments of designed sequencesalignments of designed sequences
Stefan M. LarsonPande GroupBiophysics ProgramDecember, 2002 [email protected]
![Page 2: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/2.jpg)
Structure prediction & sequence spaceStructure prediction & sequence space
ASDJFHLKASDLFHASDFLHUHOUIQWEQWEONBLQWEROKJASDFPOIQWERUHOQWEORSADFLKJIJ
ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFGQWOIEGTXKNBVALHERTASDLFHIUWERHSDDFGHKBJDDURMWOFBMFERTJFGJDKEGORTMVIRGHRT
ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFG
ASDJFHLKASDASDFLHUHOUIQWEONBLQWERASDFPOIQWERQWEORSADFLK
![Page 3: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/3.jpg)
Multiple sequence alignments aid Multiple sequence alignments aid comparative protein modelingcomparative protein modeling
• 1 in 3 sequences are recognizably related to at least one protein structure.
• A significant fraction of the remaining 2/3 have solved structural homologues, but they are not recognized through sequence similarity searching techniques.
• Marti-Renom et al. (2000)
• Multiple sequence alignments greatly improve the efficacy and accuracy of almost all phase of comparative modeling.
• Venclovas (2001)
![Page 4: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/4.jpg)
Computational protein designComputational protein design
Native structure
Iterative refinementNew sequence
![Page 5: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/5.jpg)
Large scale sequence Large scale sequence generationgeneration
200,000Total sequences generated
4,000Processors available
80 daysTotal time of data collection
26,400Total backbone variants
264Total structures
“Reverse BLAST” study:
![Page 6: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/6.jpg)
““Reverse BLAST”: Reverse BLAST”: finding templates for finding templates for
comparative modelingcomparative modeling
Larson SM, Garg A, Desjarlais JR, Pande VS. (2003) Proteins: Structure, Function, and Genetics
![Page 7: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/7.jpg)
Experiment: Sequence qualityExperiment: Sequence quality
ASDFASDFASDFASFDSAFASDFASDFAFASDFASDFASDFAFHFDIDIFERIDKDADHFYWTEFHHASDASDFYEFHGASDFVADHFYWTEFHHASDASDFYEFHGASDFVDGSAHDYERCNDFKAKSLKALSDFPLAK
Design BLAST E<0.01
![Page 8: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/8.jpg)
Results: Sequence qualityResults: Sequence quality
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
0.0001
0.001
0.01
0.1
1
10
0 25 50 75 100 125 150 175 200 225
Designed sequence profile (ranked by E-value)
E-v
alu
e o
f b
est
PD
B h
it
0
5
10
15
20
25
30
Ave
rag
e id
enti
ty t
o n
ativ
e se
qu
ence
(%
)
![Page 9: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/9.jpg)
Method: “Reverse BLAST”Method: “Reverse BLAST”
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
BLAST E<0.01
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF
Designed Sequences Hypothetical Proteins Structural Templates
![Page 10: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/10.jpg)
Do the designed sequences help?Do the designed sequences help?
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
2 3 4 5 6 7 8 9 10
E-value threshold (-log(E))
hit
s w
ith
seq
uen
ce a
lig
nm
ent
: h
its
wit
ho
ut
0
20
40
60
80
100
120
140
160
Tota
l u
niq
ue
hit
s
Correctly identified structural templates
fold-increase in # of templates
fold-increase in # of genes
total hits
![Page 11: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/11.jpg)
0
5
10
15
20
25
30
35P
yroc
occu
s h
orik
osh
ii S
ulfo
lobu
s so
lfata
ricu
s T
herm
op
lasm
a a
cid
ophi
lum
T
herm
op
lasm
a vo
lca
niu
m
Tre
pone
ma
pal
lidum
H
elic
oba
cte
r p
ylo
ri 2
669
5
Hel
ico
bact
er
pyl
ori
J99
C
ampy
loba
cte
r je
jun
i M
yco
bact
eriu
m t
ube
rcul
osis
CD
C15
51
Myc
oba
cter
ium
tu
berc
ulos
is H
37R
v R
icke
ttsia
pro
wa
zeki
i C
hlam
ydop
hila
pne
um
iae
AR
39
Chl
amyd
oph
ila p
neu
mia
e C
WL0
29
Chl
amyd
oph
ila p
neu
mia
e J
138
M
yco
bact
eriu
m le
pra
e
Chl
amyd
ia m
urid
aru
m
Chl
amyd
ia tr
acho
ma
tis
Aqu
ifex
aeo
licus
M
yco
plas
ma
ge
nita
lium
M
yco
plas
ma
pn
eum
onia
e
Myc
opl
asm
a p
ulm
onis
S
tre
pto
cocc
us
pyo
gen
es
Mes
orh
izob
ium
loti
Met
han
oco
ccus
jann
asc
hii
Bor
relia
bur
gdo
rfe
ri D
eino
cucc
us
rad
iodu
ran
s U
reap
lasm
a u
real
ytic
um
H
alob
acte
rium
sp
C
aulo
bact
er c
resc
entu
s L
acto
cocc
us la
ctis
A
rcha
eog
lob
us fu
lgid
us
Pyr
ococ
cus
aby
ssi
Met
han
oba
cte
rium
the
rmo
auto
tro
phic
um
Nei
sser
ia m
en
ingi
tidis
MC
58
Nei
sser
ia m
en
ingi
tidis
Z2
491
H
aem
ophi
lus
influ
enza
e
Xyl
ella
fast
idio
sa
Buc
hne
ra s
p
Sta
phyl
ococ
cus
aur
eus
Mu5
0
Sta
phyl
ococ
cus
aur
eus
N31
5
Pas
teur
ella
mul
toci
da
The
rmo
toga
ma
ritim
a
Vib
rio
cho
lera
e B
acill
us s
ubtil
is
Pse
udo
mon
as
aeru
gin
osa
S
yne
choc
ystis
PC
C6
803
E
sche
richi
a co
li O
157
H7
ED
L933
E
sche
richi
a co
li O
157
H7
E
sche
richi
a co
li K
12
Genome searched
Nu
mb
er
of
str
uc
tura
l te
mp
late
s id
en
tifi
ed
Remote homology detectionRemote homology detection
![Page 12: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/12.jpg)
Optimizing structural diversityOptimizing structural diversity
0
10
20
30
40
50
60
70
80
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
RMSD of structural ensemble (Angstroms)
(%)
0
1
2
3
4
5
6
Seq
uen
ce e
ntr
op
y
sequence entropy
prediction accuracy
prediction coverage
mean pairwise %ID
mean native %ID
![Page 13: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/13.jpg)
Future workFuture work
• Compare “reverse BLAST” to other remote homology detection approaches (3D-PSSM, HHMER, etc).
• Retrodict CASP targets, especially those which were not successfully predicted by comparative modeling.
• Increase the coverage and accuracy of the designed sequence sets.
![Page 14: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a18388/html5/thumbnails/14.jpg)
CollaboratorsCollaborators
Stanford University• Amit Garg• Dr. Vijay Pande
Harvard University• Jeremy England
Xencor, Inc.• Dr. John Desjarlais