comparative modeling with modeller ben webb, andrej sali lab uc san francisco maya topf, birkbeck...
TRANSCRIPT
Comparative modeling with MODELLER
http://salilab.org/modeller/
Ben Webb, Andrej Sali Lab
UC San Francisco
Maya Topf, Birkbeck College, London
Comparative modeling overview
Why build comparative models?Many more sequences available than structures (millions vs. tens of thousands)
Many applications (e.g. determination of function) rely on structural information
Structure is often more conserved than sequence (in 2005, 900K of 1.7M structures modeled), since evolution tends to preserve function
Comparative modeling overview
How does it work?Extract information from known structures (one or more templates), and use to build the structure for the ‘target’ sequence
Should also consider information from other sources: physical force fields, statistics (e.g. PDB mining)
Classes of methods for comparative modelingAssembly of rigid bodies (core, loops, sidechains)
Segment matching
Satisfaction of spatial restraints
Comparative modeling by satisfaction of spatial restraints - MODELLER
A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.
1. Align sequence with structures
First, must determine the template structures
Simplistically, try to align the target sequence against every known structure’s sequence
In practice, this is too slow, so heuristics are used (e.g. BLAST)
Profile or HMM searches are generally more sensitive in difficult cases (e.g. Modeller’s profile.build method, or PSI-BLAST)
Could also use threading or other web servers
Alignment to templates generally uses global dynamic programming
Sequence-sequence: relies purely on a matrix of observed residue-residue mutation probabilities (‘align’)
Sequence-structure: gap insertion is penalized within secondary structure (helices etc.) (‘align2d’)
Other features and/or user-defined (‘salign’) or use an external program
2. Extract spatial restraints
Spatial restraints incorporate homology information, statistical preferences, and physical knowledge
Template Cα- Cα internal distances
Backbone dihedrals (φ/ψ)
Sidechain dihedrals givenresidue type of both targetand template
Force field stereochemistry(bond, angle, dihedral)
Statistical potentials
Other experimental constraints
etc.
3. Satisfy spatial restraints
All information is combined into a single objective function
Restraints and statistics are converted to an “energy” by taking the negative logForce field (CHARMM 22) simply added in
Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s)Multiple models are generally recommended; ‘best’ model or cluster or models chosen by simply taking the lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, fit to EM density, or external programs such as PROSA or DFIRE
Typical errors in comparative models
Distortion/shifts in
aligned regions
Region without a
template
Sidechain packing
Incorrect template
MODEL
X-RAY
TEMPLATE
Misalignment
Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.
Model Accuracy as a Function ofTarget-Template Sequence Identity
Sánchez, R., Šali, A. Proc Natl Acad Sci U S A. 95 pp13597-602. (1998).
Model accuracy
Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.
MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY
NM23Seq id 77%
CRABPSeq id 41%
EDNSeq id 33%
X-RAY / MODEL
Scope for improvement:Sidechains
Cα equiv 147/148RMSD 0.41Å
SidechainsCore backboneLoops
Cα equiv 122/137RMSD 1.34Å
SidechainsCore backbone, LoopsAlignment, Fold assignment
Cα equiv 90/134RMSD 1.17Å
Applications of protein structure models
D. Baker & A. Sali. Science 294, 93, 2001.
Loop modeling
Often, there are parts of the sequence which have no detectable templates (usually loops)
“Mini folding problem” – these loops must be sampled to get improved conformations
Database searches only complete for 4-6 residue loops
Modeller uses conformational search with a custom energy function optimized for loop modeling (statistical potential derived from PDB)
Fiser/Melo protocol (‘loopmodel’)
Newer DOPE + GB/SA protocol (‘dope_loopmodel’)
Accuracy of loop models as a function of amount of optimization
Fraction of loops modeled with medium accuracy (<2Å)
Fitting Structural Models in cryoEM maps
• Problem: comparative models are often inaccurate.
• Solution: Use cryoEM maps to assess the models by rigid density fitting.
refinement
Δ G
• Problem: the structures may exhibit conformational changes (induced fit, target-template differences).
• Solution: use flexible fitting to refine the structures in the map.
• Problem: the resolution of the map can be too low for an unambiguous placement of a component.
• Solution: use additional information to determine the assembly architecture.
Topf & Sali. Curr Opin Struct Biol 2005.
Errors in Comparative Modelingvs. Resolution
Distortion and
shifts of
aligned regions
Regions
without
a template
Sidechain
packing
Incorrect
templates
MisalignmentsRigid-body
movements
20 Å 10 Å 2 Å
Rigid fitting
Rigid Density Fitting with MODELLER/Mod-EM
CC(Ra,rk ) EM (rj )probe (Rarj rk )
j1
J
LE
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
r
θ
probe native
•LE - Local exhaustive search (rotations only or rotations+translations)
MC
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
θ
probe
nativer
•MC - Monte Carlo in translation, with exhaustive rotation
•SMC - Scanning of the map to find regions with high CC; LE or MC search
probe
probe
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
probe
SMC
native
Rigid fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.
Quality of fit vs. quality of model
0.55
0.7
0.85
1
0 0.2 0.4 0.6 0.8 1
Native overlap
Mo
d-E
M s
co
re
1DXT, 8
1DXT, 12
R2=0.6-0.7
Fit
tin
g s
core
(C
C)
Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.
Ranking:
Native (1dxt ): 1
Best model: 2
Template (1hbg):
132 (8 Å), 139 (12 Å)
Rigid model-fitting
Structural overlap
cryoEM Density Map Selects an Accurate Model
1cid:2rhe
12% seq. identity
10 Å resolution
Native structure
(0)1.00
Structural Overlap
(Rank by CC)
Template
(101)0.55
Best-fitting model
(1)0.69 (11)0.73
Most accurate model
Rigid fitting
QuickTime™ and ampeg4 decompressor
are needed to see this picture.
30.0
40.0
50.0
60.0
70.0
80.0
0 5 10 15 20 25
Iteration
Na
tiv
e o
ve
rla
p (
%)
[0,1]
8i1b:4fgf
14% seq. identity
10 Å resolution
Iterative Alignment, Model Building and CC-based Assessment
Topf, Baker, Marti-Renom, Chiu & Sali. J. Mol. Biol., 2006.
X-ray Initial model (A)Final model (B)
Rigid fitting
B
A
Cα RMSD = 9.1ÅStructural Overlap = 36%
Cα RMSD = 5.2ÅStructural Overlap = 62%
Modeling, Rigid and Flexible Fitting Protocol
QuickTime™ and ampeg4 decompressor
are needed to see this picture.
initial modelfinal modelx-ray structure
33% sequence identity
RMSD: 5.4 Å -> 3.5 Å
Density-based real-Space refinement while maintaining correct stereochemistry
Flexible fitting
Arranging Components in a CryoEM Map of their Assembly
Simultaneous optimization of multi-component assembly.
Assembly architecture
Single component fitting result
Multi-component optimization result
20 Å resolution
with Keren Lasker & Haim Wolfson
Programs, servers and databases
External ResourcesPDB, Uniprot, GENBANK, NR, PIR, INTERPRO, Kinase Resource
UCSC Genome Browser, CHIMERA, Pfam, SCOP, CATH
LS-SNPWeb Server
http://salilab.org/LS-SNP/Predicts functional impact
of residue substitution
MODBASEDatabase
http://salilab.org/modbase/
Fold assignments,alignments
models, model assessments for all
sequences related to a known structure
CCPRCenter for
Computational Proteomics Research
http://www.ccpr.ucsf.edu
MODWEBWeb Server
http://salilab.org/modweb/
Provides a web interface to MODPIPE
ICEDBDatabase/LIMShttp://nysgxrc.orgTracks targets for
structural genomics by NYSGXRC
MODELLERProgram
http://salilab.org/modeller/
Implements most operations in comparative
modeling
MODLOOPWeb Server
http://salilab.org/modloop/
Models loops in protein structures
EVAWeb Server
http://salilab.org/eva/Evaluates and ranks web
servers for protein structure prediction
PIBASEDatabase
http://salilab.org/pibase/Contains structurally
defined protein interfaces
DBALIDatabase
http://salilab.org/DBAli/Contains a
comprehensive set of pairwise and multiple
structure-based alignments
LIGBASEDatabase
Ligand binding sites and inheritance
(accessiblethrough MODBASE)
MODPIPEProgram
Automatically calculates comparative models of
many protein sequences
Useful resources
http://salilab.org/bioinformatics_resources.shtml
For further examples…
http://salilab.org/modeller/tutorial/
ReferencesProtein Structure Prediction:
Marti-Renom el al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.Baker & Sali. Science 294, 93-96, 2001.
Comparative Modeling:Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.Marti-Renom et al. Current Protocols in Protein Science 1, 2.9.1-2.9.22, 2002.Shen & Sali. Protein Science 15, 2507 - 2524, 2006.Eswar et al. Current Protocols in Bioinformatics, Supplement 15, 5.6.1-5.6.30, 2006. Madhusudhan et al, The Proteomics Protocols Handbook. Humana Press Inc.,831-860, 2005.
MODELLER:Sali & Blundell. J. Mol. Biol. 234, 779-815, 1993.
Density fitting:Topf et al. J Struct Biol 2005Topf et al. J Mol. Biol. 2006Topf & Sali Curr Opin Struct Biol 2006
UCSF
Andrej Sali LabNarayanan EswarUrsula PieperM. S. MadhusudhanMarc Marti-RenomRoberto Sanchez (MSSM)Min-yi ShenAndras Fiser (AECOM)David EramianMark PetersonFrancisco Melo (Catholic U.)Ash Stuart (Rampallo Coll.)Eric Feyfant (GI)Valentin Ilyin (NE)Frank AlberBino John (Pitsburg U.)Fred DavisAndrea Rossi
Tom Goddard (Chimera group)
Acknowledgements
Baylor College
Wah Chiu
Matt Baker
Yao Cong
Irina Serysheva
Mike Schmid
Tel Aviv University
Haim Wolfson
Keren Lasker