comparative modeling with modeller ben webb, andrej sali lab uc san francisco maya topf, birkbeck...

Comparative modeling with MODELLER

http://salilab.org/modeller/

Ben Webb, Andrej Sali Lab

UC San Francisco

Maya Topf, Birkbeck College, London

Comparative modeling overview

Why build comparative models?Many more sequences available than structures (millions vs. tens of thousands)

Many applications (e.g. determination of function) rely on structural information

Structure is often more conserved than sequence (in 2005, 900K of 1.7M structures modeled), since evolution tends to preserve function

Comparative modeling overview

How does it work?Extract information from known structures (one or more templates), and use to build the structure for the ‘target’ sequence

Should also consider information from other sources: physical force fields, statistics (e.g. PDB mining)

Classes of methods for comparative modelingAssembly of rigid bodies (core, loops, sidechains)

Segment matching

Satisfaction of spatial restraints

Comparative modeling by satisfaction of spatial restraints - MODELLER

A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.

1. Align sequence with structures

First, must determine the template structures

Simplistically, try to align the target sequence against every known structure’s sequence

In practice, this is too slow, so heuristics are used (e.g. BLAST)

Profile or HMM searches are generally more sensitive in difficult cases (e.g. Modeller’s profile.build method, or PSI-BLAST)

Could also use threading or other web servers

Alignment to templates generally uses global dynamic programming

Sequence-sequence: relies purely on a matrix of observed residue-residue mutation probabilities (‘align’)

Sequence-structure: gap insertion is penalized within secondary structure (helices etc.) (‘align2d’)

Other features and/or user-defined (‘salign’) or use an external program

2. Extract spatial restraints

Spatial restraints incorporate homology information, statistical preferences, and physical knowledge

Template Cα- Cα internal distances

Backbone dihedrals (φ/ψ)

Sidechain dihedrals givenresidue type of both targetand template

Force field stereochemistry(bond, angle, dihedral)

Statistical potentials

Other experimental constraints

etc.

3. Satisfy spatial restraints

All information is combined into a single objective function

Restraints and statistics are converted to an “energy” by taking the negative logForce field (CHARMM 22) simply added in

Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s)Multiple models are generally recommended; ‘best’ model or cluster or models chosen by simply taking the lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, fit to EM density, or external programs such as PROSA or DFIRE

Typical errors in comparative models

Distortion/shifts in

aligned regions

Region without a

template

Sidechain packing

Incorrect template

MODEL

X-RAY

TEMPLATE

Misalignment

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

Model Accuracy as a Function ofTarget-Template Sequence Identity

Sánchez, R., Šali, A. Proc Natl Acad Sci U S A. 95 pp13597-602. (1998).

Model accuracy

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY

NM23Seq id 77%

CRABPSeq id 41%

EDNSeq id 33%

X-RAY / MODEL

Scope for improvement:Sidechains

Cα equiv 147/148RMSD 0.41Å

SidechainsCore backboneLoops


SidechainsCore backbone, LoopsAlignment, Fold assignment


Applications of protein structure models

D. Baker & A. Sali. Science 294, 93, 2001.

Loop modeling

Often, there are parts of the sequence which have no detectable templates (usually loops)

“Mini folding problem” – these loops must be sampled to get improved conformations

Database searches only complete for 4-6 residue loops

Modeller uses conformational search with a custom energy function optimized for loop modeling (statistical potential derived from PDB)

Fiser/Melo protocol (‘loopmodel’)

Newer DOPE + GB/SA protocol (‘dope_loopmodel’)

Accuracy of loop models as a function of amount of optimization

Fraction of loops modeled with medium accuracy (<2Å)

Fitting Structural Models in cryoEM maps

• Problem: comparative models are often inaccurate.

• Solution: Use cryoEM maps to assess the models by rigid density fitting.

refinement

Δ G

• Problem: the structures may exhibit conformational changes (induced fit, target-template differences).

• Solution: use flexible fitting to refine the structures in the map.

• Problem: the resolution of the map can be too low for an unambiguous placement of a component.

• Solution: use additional information to determine the assembly architecture.

Topf & Sali. Curr Opin Struct Biol 2005.

Errors in Comparative Modelingvs. Resolution

Distortion and

shifts of

aligned regions

Regions

without

a template

Sidechain

packing

Incorrect

templates

MisalignmentsRigid-body

movements

20 Å 10 Å 2 Å

Rigid fitting

Rigid Density Fitting with MODELLER/Mod-EM

CC(Ra,rk ) EM (rj )probe (Rarj rk )

j1

J

LE

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

r

θ

probe native

•LE - Local exhaustive search (rotations only or rotations+translations)

MC

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

θ

probe

nativer

•MC - Monte Carlo in translation, with exhaustive rotation

•SMC - Scanning of the map to find regions with high CC; LE or MC search

probe

probe

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

probe

SMC

native

Rigid fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.

Quality of fit vs. quality of model

0.55

0.7

0.85

1

0 0.2 0.4 0.6 0.8 1

Native overlap

Mo

d-E

M s

co

re

1DXT, 8

1DXT, 12

R2=0.6-0.7

Fit

tin

g s

core

(C

C)

Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.

Ranking:

Native (1dxt ): 1

Best model: 2

Template (1hbg):

132 (8 Å), 139 (12 Å)

Rigid model-fitting

Structural overlap

cryoEM Density Map Selects an Accurate Model

1cid:2rhe

12% seq. identity

10 Å resolution

Native structure

(0)1.00

Structural Overlap

(Rank by CC)

Template

(101)0.55

Best-fitting model

(1)0.69 (11)0.73

Most accurate model

Rigid fitting

QuickTime™ and ampeg4 decompressor

are needed to see this picture.

30.0

40.0

50.0

60.0

70.0

80.0

0 5 10 15 20 25

Iteration

Na

tiv

e o

ve

rla

p (

%)

[0,1]

8i1b:4fgf

14% seq. identity

10 Å resolution

Iterative Alignment, Model Building and CC-based Assessment

Topf, Baker, Marti-Renom, Chiu & Sali. J. Mol. Biol., 2006.

X-ray Initial model (A)Final model (B)

Rigid fitting

B

A

Cα RMSD = 9.1ÅStructural Overlap = 36%

Cα RMSD = 5.2ÅStructural Overlap = 62%

Modeling, Rigid and Flexible Fitting Protocol

QuickTime™ and ampeg4 decompressor

are needed to see this picture.

initial modelfinal modelx-ray structure

33% sequence identity

RMSD: 5.4 Å -> 3.5 Å

Density-based real-Space refinement while maintaining correct stereochemistry

Flexible fitting

Arranging Components in a CryoEM Map of their Assembly

Simultaneous optimization of multi-component assembly.

Assembly architecture

Single component fitting result

Multi-component optimization result

20 Å resolution

with Keren Lasker & Haim Wolfson

Programs, servers and databases

External ResourcesPDB, Uniprot, GENBANK, NR, PIR, INTERPRO, Kinase Resource

UCSC Genome Browser, CHIMERA, Pfam, SCOP, CATH

LS-SNPWeb Server

http://salilab.org/LS-SNP/Predicts functional impact

of residue substitution

MODBASEDatabase

http://salilab.org/modbase/

Fold assignments,alignments

models, model assessments for all

sequences related to a known structure

CCPRCenter for

Computational Proteomics Research

http://www.ccpr.ucsf.edu

MODWEBWeb Server

http://salilab.org/modweb/

Provides a web interface to MODPIPE

ICEDBDatabase/LIMShttp://nysgxrc.orgTracks targets for

structural genomics by NYSGXRC

MODELLERProgram

http://salilab.org/modeller/

Implements most operations in comparative

modeling

MODLOOPWeb Server

http://salilab.org/modloop/

Models loops in protein structures

EVAWeb Server

http://salilab.org/eva/Evaluates and ranks web

servers for protein structure prediction

PIBASEDatabase

http://salilab.org/pibase/Contains structurally

defined protein interfaces

DBALIDatabase

http://salilab.org/DBAli/Contains a

comprehensive set of pairwise and multiple

structure-based alignments

LIGBASEDatabase

Ligand binding sites and inheritance

(accessiblethrough MODBASE)

MODPIPEProgram

Automatically calculates comparative models of

many protein sequences

Useful resources

http://salilab.org/bioinformatics_resources.shtml

For further examples…

http://salilab.org/modeller/tutorial/

ReferencesProtein Structure Prediction:

Marti-Renom el al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.Baker & Sali. Science 294, 93-96, 2001.

Comparative Modeling:Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.Marti-Renom et al. Current Protocols in Protein Science 1, 2.9.1-2.9.22, 2002.Shen & Sali. Protein Science 15, 2507 - 2524, 2006.Eswar et al. Current Protocols in Bioinformatics, Supplement 15, 5.6.1-5.6.30, 2006. Madhusudhan et al, The Proteomics Protocols Handbook. Humana Press Inc.,831-860, 2005.

MODELLER:Sali & Blundell. J. Mol. Biol. 234, 779-815, 1993.

Density fitting:Topf et al. J Struct Biol 2005Topf et al. J Mol. Biol. 2006Topf & Sali Curr Opin Struct Biol 2006

UCSF

Andrej Sali LabNarayanan EswarUrsula PieperM. S. MadhusudhanMarc Marti-RenomRoberto Sanchez (MSSM)Min-yi ShenAndras Fiser (AECOM)David EramianMark PetersonFrancisco Melo (Catholic U.)Ash Stuart (Rampallo Coll.)Eric Feyfant (GI)Valentin Ilyin (NE)Frank AlberBino John (Pitsburg U.)Fred DavisAndrea Rossi

Tom Goddard (Chimera group)

Acknowledgements

Baylor College

Wah Chiu

Matt Baker

Yao Cong

Irina Serysheva

Mike Schmid

Tel Aviv University

Haim Wolfson

Keren Lasker

comparative modeling with modeller ben webb, andrej sali lab uc san francisco maya topf, birkbeck...

Documents

extract information

homology information

target sequenceshould

structures millions

best model

build method

model assessment method

residue type