protein structure prediction using coarse grain force fields · nasir mahmood. 12.02.2010. protein...
TRANSCRIPT
Nasir Mahmood
12.02.2010
Protein Structure Prediction using Coarse Grain Force Fields
• Introduction
• Probabilistic Ab Initio – Standard– Score function– Search Method– Results
• Probabilistic Ab Initio - Extended– Score Function : Introducing Solvation– Search Method: Bias Fix– Results
• Outlook
• Summary
Overview
2
“All the information required by protein to adopt its final conformation is encoded in its sequence”
Christian B. Anfinsen (1916 - 1995)
Source: http://nobelprize.org/
3
• information he referred to has not been decoded yet
• interestingly, these days we also know about proteins like ‘prions’
ExperimentalMethods
X-Ray Crystallography
NMRSpectroscopy
Cryo-EM
Time (year)
N
More than 3 decades and only 60000+ structures
ExperimentalMethods
X-Ray Crystallography
NMRSpectroscopy
Cryo-EM
Time (year)
N
5
Time (year)
N
610100×
61090×
61080×
61070×
61060×
61050×
61040×
61030×
61020×
61010×
SequenceDatabase Growth
6
ExperimentalMethods
X-Ray Crystallography
NMRSpectroscopy
Cryo-EM
Methods
ExperimentalMethods
X-Ray Crystallography
NMRSpectroscopy
Cryo-EM
Computational Methods
Homology Modeling
FoldRecognition
Ab Initio Modeling
PDB
Accu
racy
Computation cost
PDB
dep
ende
nce
Physical Principles
Experimental Data
7
• Physics-based• Best but most difficult (Force fields)• Computationally expensive
• Statistics-based• Boltzmann distributions• Statistical mechanical ensembles
• We use Descriptive Statistics• Bayesian formulation• No hidden approximations• No energies but find distributions
• Monte Carlo Methods
• Molecular Dynamics
Ab Initio Methods
TE/k-i e B∆=P
8
• Purely Probabilistic Force Field• Mixture of Probabilities:• Sequence, Structure, Solvation
• No energies• No Boltzmann statistics
• Coarse Grained • reduced dimensionality• relies on dihedral angles• no side chains• 5-atoms representation• Fragment Assembly
• Simulated Annealing / Monte Carlo• Move set: biased & unbiased• Acceptance criterion: ratio
of probabilities
Our Ab Initio Method
9
ProbabilisticScore Function
10
•Representation : • Reduced, Simplified• 5-atoms per amino acid• dihedral angles (phi, psi)
•Bivariate Gaussian
2. Structure
•Multi-way Bernoulli1. SequenceS A E M P
WN
FYK HQ
T SG
DIL C
11
(A)
i
i + 1
i + 2
(B)Sequence Structure
P L E N R R V 3.11.1
2.00.9
1.5-2.5
1.72.3
-2.0-0.9
-1.5-1.2
-1.2-0.8
i
i + 1
i + 2
N
(C)
A S T C W R I -3.1-1.1
-2.0-0.9
-0.5-0.7
-1.7-0.5
-2.0-0.3
-1.5-0.8
-2.2-1.0
MS T C W R I -1.1-1.1
-2.0-0.9
-0.5-0.7
-1.7-0.5
-2.0-0.3
-1.5-0.8
-2.2-1.0
MT C W R I -1.1-1.1
-2.1-0.4
-0.5-0.7
-1.7-0.5
-2.0-0.3
-1.5-0.8
-2.2-1.0
F……
6101.5×
12
13
ExpectationMaximization
Fragment Library
BayesianClassifier
GGGG ..GAEG ..GAEG ..DCWF ..WFDC ..
STDC ..STST..WFTG ..CCAD ..ACAD ..
Classified
Statistical Models
Fragment Generation
Sequence
A S L T 2087
05-71
-3215
80-07
Structure
AS
LT
208705-71
-3215
80-07
SL
TM
208705-71
-3215
80-07
LT
LT
208705-71
-3215
80-07
TL
TI
208705-71
-3215
80-07
LT
TA
208705-71
-3215
80-07
TT
AT
208705-71
-3215
80-07
TA
QW
208705-71
-3215
80-07
AQ
WW
208705-71
-3215
80-07
QW
WE
208705-71
-3215
80-07
WW
EW
208705-71
-3215
80-07
WE
WC
208705-71
-3215
80-07
class 0class 1class 2
class 5
class 3class 4
class 6Classified
GGGG ..
GAEG ..
GAEG ..
DCWF ..
WFDC ..
STDC ..
STST..
WFTG ..
CCAD ..
ACAD ..
14
Search Method
15
Prob
abili
ty
Conformational space
Final Model
(i-1)
(i)
Relative probabilities:
• Normal methods :
( )( )1-i
ii xp
xp=PInitial (random)
conformation
TE/k-i e B∆=P
16
73167 543117793 1466
psiphi
Random Angle Generator
0-180
180
0
180
phi
psi
-180
180
-180 180
0
0
PDB
FragmentLibrary fragments
Unbiased Biased
6102×≈
17
18
Interplay of Cartesian Coordinates & Dihedral Angles
Choi, V.: 2005, On Updating torsion angles of molecular conformations, J Chem Inf Model 46, 438–444.
Results
19
NativeModel
Results
20
2hfq
NativeModel
Results
21
2hd3
NativeModel
Results
22
Phi
Psi
2gzv
Model
Results
23
Time
Scor
e
2hj1
Native
Temperature
Results
24
Time
Scor
e
Phi
Psi
Temperature
Score Function:Introducing Solvation
25
26
PDB
27
28
Trp
Gly Lys Ser
PDB
• Representation : • Reduced, Simplified• 5-atoms per amino acid• dihedral angles (phi, psi)
• Bivariate Gaussian
2. Structure
• Multi-way Bernoulli
1. Sequence
S A E MPW
NFY
K HQ T SG DI L C
• Simple Gaussian
3. Solvation
29
• Mixture Models: Connections Residues Geometry Location in protein
ExpectationMaximization
Fragment Library
BayesianClassifier
GGGG ..GAEG ..GAEG ..DCWF ..WFDC ..
STDC ..STST..WFTG ..CCAD ..ACAD ..
Re-Classified
Statistical Models
PDB
Sequence StructureA S L T -3.1
-1.1-2.0-0.9
-0.5-0.7
-1.7-0.5 12 07 08 11
Solvation
S L T I -2.0-0.9
-0.5-0.7
-1.7-0.5
-1.2-0.4 07 08 11 09
30
Search Method:Bias Fix & Combining
Fragments
31
32
Bias Fix
33
Combining Fragments andProbabilities
Results
34
Native Model
1fsv
2hep
Results
35
2k4x
1agt
Results
Native Model36
2k53
2k4n
Native Model
Results
37
Results
Native Model
2hf1
38
Future Outlook
Hydrogen bond energy(kcal/mol)
• Introduce hydrogen bonds – as a probabilistic term
• Hydrogen bond energies have normal distribution
• Use Simple Gaussian model
N
39
Summary
•Purely Probabilistic Approach for Protein Structure Prediction
• Score function consists of a set of probability distributions•Conformation probabilities - mixture of probabilities, no
energies at all
• generates protein/protein-like conformations• long-range interactions not well represented• In future, hydrogen bond term could improve results
• Application to sequence optimization•Rapid sampling – combine with other score functions
40
Thanks for your attention!