homology modeling - biojuncture · homology modeling - applications structure-based assessment of...

43
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton, Bioinformatics, genes, protein & computers; A.M. Lesk, Introduction to Bioinformatics; A.D. Baxevanis & B.F. Ouellette, Bioinformatics, a practical guide to the analysis of genes and proteins; several online materials (George Washington University, University of Houston, Tel-Aviv University) and resources (RCSB, NCBI, SWISS-PROT) as well as personal research data.

Upload: others

Post on 26-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Homology Modeling

Roberto LinsEPFL - summer semester 2005

Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton, Bioinformatics, genes, protein &computers; A.M. Lesk, Introduction to Bioinformatics; A.D. Baxevanis & B.F. Ouellette, Bioinformatics, a practical guide to the analysis of genes and proteins; several online

materials (George Washington University, University of Houston, Tel-Aviv University) and resources (RCSB, NCBI, SWISS-PROT) as well as personal research data.

Page 2: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)

Genome

Expressome

Proteome

Metabolome

Functional GenomicsFunctional Genomics

algorithmdatabase

algorithm

algorithm

algorithm

database

database

database

Page 3: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Annotated proteins in the databank: ~ 100,000

Limitations of Experimental MethodsLimitations of Experimental Methods

Proteins with known structure: ~5,000 !

Total number including ORFs: ~ 700,000

ORF, or Open Reading Frame, is a region of genome that codesfor a protein

Have been identified by whole genome sequencing effortsORFs with no known function are termed orphan

Datasetfor analysis

Page 4: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Structural Biology Consortia:Structural Biology Consortia:Brute Force Approach Towards Structure ElucidationBrute Force Approach Towards Structure Elucidation

Employment of a Ph.Ds & Postdocs army

Aim to solve about 400 structures a year

Large-scale expression & crystallization attempts

++

–– Basic strategies remain the same

No (known) new tricks

**

Enhances the statistical base for inferring sequence– structure relationships

“Unrelenting” ones will be ignored

Page 5: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Can we predict structure from sequence?

GCTCCTCACTGTCTGTGTTTATTCTTTTAGCTTCTTCAGATCTTTTAGTCTGAGGAAGCCTGGCATGTGCAAATGAAGTTAACCTAA...

Page 6: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Structure is much more conserved than sequenceduring evolution

Comparative ModelingComparative Modeling(Homology Modeling)(Homology Modeling)

BasisBasis

Higher the similarity, higher is theconfidence in the modeled structure

Limited applicabilityLimited applicability

A large number of proteins and ORFs have no similarityto proteins with known structure

Page 7: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

What’s homology modeling?Predicts the three-dimensional structure of a given proteinsequence (target) based on an alignment to one or more knownprotein structures (templates).

If similarity between the target sequence and the templatesequence is detected, structural similarity can be assumed.

In general, 30% sequence identity is required to generate an usefulmodel.

It can be used to understand function, activity, specificity, etc.

It is of interest to drug companies wishing to do structure-aideddrug design

A keystone of structural proteomics

Page 8: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Homology modeling - applications

Structure-based assessment of target drugability

Structure-guided design of mutagenesis experiments

Tool compound design for probing biological function

Homology model based ligand design

Design of in vitro test assays

Structure-based prediction of drug metabolism and toxicity

Page 9: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Accuracy and application of protein structure

Page 10: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Does sequence similarity impliesstructure similarity?

Twilight zone

Safe zone (thanks to evolution!)

Page 11: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

RMSD

of

back

bone

ato

ms

(Ǻ)

% identical residues in core

0.0

0.5

2.5

2.0

1.5

1.0

100 75 50 25 0

Chotia & Lesk, 1986

Natoms

d

RMSD

Natoms

i

i!== 1

2

Natoms = total number of atoms; di = distance between the coordinates of anatom i at t0 and tn , when the structures are superimposed.

Page 12: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

My target sequence has over 30% sequence identitywith a known protein structure, so I want to generate

a 3D model.

What do I have to do?

Page 13: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Structure prediction by homology modeling

Page 14: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

– The structure of a protein is determined by its primaryamino acid sequence (Anfinsen).

– During evolution, the structure of protein a has changedmuch slower than its sequence.

• Similar sequences adopt identical structures anddistantly related sequences fold into similarstructures.

Homology modeling makes two fundamental assumptions

Page 15: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

1) Template recognition & initial alignment

2) Alignment correction

3) Backbone generation

4) Loop modeling

5) Side-chain modeling

6) Model optimization

7) Model validation

In summary: homology modeling steps

Page 16: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Template recognition & initial alignment

Select the best template from a library of known protein structuresderived from the PDB

Templates can be found using the target sequence as a query forsearching using FASTA or BLAST

Page 17: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Gaining confidence in template searching

Once a suitable template is found, a literature search on therelevant fold can determine what biological role it plays

Does this match the biological/biochemical function that youexpect?

Ligand(s) present?

Resolution of the template

Family of Proteins

Multiple templates?

Page 18: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Further Considerations:

duplication

speciation

species 1 species 2

paralogues

orthologues

Function may berelated or verydifferent!

Function more likely to be conserved

Proteins are homologous if they are related by divergence from a common ancestor

Page 19: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

In summary: there are two types of homologous

- Orthologs: proteins that carry out the same function in differentspecies -Paralogs: proteins that perform different, but related functionswithin one organism

Page 20: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Alignment of the target onto the template

Correct alignment is necessary to create the most probable 3Dstructure of the target

If sequences aligns incorrectly, it will result in false positive ornegative results

Important to consider:- algorithms- scoring alignments- gap penalties

Identity SCRs (Structure Conserved Regions and SVRs(Structure Variable Regions)

Page 21: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

The (true) alignment indicates the evolutionary processgiving rise to the different sequences starting from thesame ancestor sequence and then changing throughmutations (insertions, deletions, and substitutions)

Alignment Outcome

Page 22: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Alignment vs. databases

Task: given a query sequence and millions of databaserecords, find the optimal alignment between thequery and a record

AGTCTCCAGTTATGCCA…

Page 23: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Alignment vs. databases

Tool: given two sequences, there exists an algorithm to find thebest alignment.

Naïve solution: apply algorithm to each of the records, one by one.

Problem: an exact algorithm is just too slow to run millions oftimes (even linear time algorithm will run slowly on a hugedatabase).

Solution: - run in parallel (expensive)- use of a fast (heuristic) method to discard

irrelevant records and the apply the exact algorithm to theremaining few

Page 24: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Sequence alignment algorithms

Used to calculate a similarity score to infer sequence homologybetween two sequences

Examples: the two most used in homology modeling are:

BLAST: General strategy is to optimise the maximal segmentpair (MSP) score - BLAST computes similarity, not alignment(Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J., J. Mol. Biol.(1990) 215:403-410)

FastA (local alignment): searches for both full and partialsequence matches, i.e., local similarity obtained; more sensitivethan BLAST, but slower; many gaps may represent a problem(Pearson, W. R., Lipman, D. J., P.N.A.S. (1988) 85:2444-2448).

Page 25: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Sequence alignment outputsFa

stA

BLA

ST

Page 26: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Alignment correctionsAlignments are scored (substitution score) in order to definesimilarity between 2 aa residues in the sequences

A substitutions score is calculated for each aligned pair of letters.

Substitution matrices:

- reflect the true probabilities of mutations occurringthrough a period of evolution

- PAM family: based on global aligments of closely relatedproteins. Mutation probability matrix.

- BLOSUM family: based on observed alignments, noextrapolation of sequences that are related.

Page 27: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Gap is one or more empty spaces in one sequence aligned withletters in the other sequence

Gap Penalties

These empty spaces may or may not be treated as penalties:

- higher penalty score is assigned for the first missing aa then thesubsequent ones; it considers the fact that each mutational eventcan insert or delete many residues at a time

Page 28: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Gap Penalties

Page 29: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

N

C

Insertion/deletion of structural domains can ‘easily’ be done at loop sites

Gap Penalties

Page 30: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Gap Penalties

The overall alignment score is the sum of similarity and gap scores:

the higher the overall alignment score, the better the alignment(more conserved)

Page 31: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Corrections by hand may still be needed!

Page 32: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Multiple nucleotide or amino sequence alignment techniques areusually performed to fit one of the following scopes :

-to characterize protein families, identify shared regions ofhomology in a multiple sequence alignment; (this happens generallywhen a sequence search revealed homologies to several sequences) ;

-to determine the consensus sequence of several aligned sequences;

-to help prediction of the secondary and tertiary structures of newsequences;

- preliminary step in molecular evolution analysis using Phylogeneticmethods for constructing phylogenetic trees.

Multiple Sequence Alignments

Page 33: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound
Page 34: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Backbone generation

Uses known structurally conserved regions to generate coordinatesfor the unknown

For SCRs - copy coordinates from known structures

For variable regions (VR) - copy from known structure, if theresidue types are similar; otherwise, use databases forfragtmented loop sequences.

Page 35: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Backbone generation

Template-based fragment assembly

a) Find structurally conserved regionsb) build model core

Page 36: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Loop modeling

Page 37: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Loop modeling

1. Database search for segments from known protein structuresfitting fixed end-points2. Molecular mechanics/molecular dynamics3. Combination of 1+2

Page 38: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Loop modeling

Ab initio rebuilding (e.g., Monte Carlo, MD, etc) to build missing loops

Page 39: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Side chain modeling1. Use of rotamer libraries (backbone dependent)

2. Molecular mechanics optimization- Dead-end elimination (heuristic)- Monte Carlo (heuristic)- Branch & Bound (exact)

3. Mean-field methods

Page 40: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Model optimizationMolecular mechanics methods

Model validation/evaluationModel should be evaluated for:

- correctness of the overall fold/structure- errors over localized regions- stereochemical parameters: bond lengths, angles, etc

Some softwares for model verification:

- Procheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html-WHAT IF http://swift.cmbi.kun.nl/whatif-PROSA II http://www.came.sbg.ac.at/Services/prosa.html-Profile 3D & Verify 3D http://shannon.mbi.ucla.edu/DOE/Services

Page 41: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Model validation/evaluation

The Ramachandran plot

Page 42: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Model validation/evaluation

Page 43: Homology Modeling - BioJuncture · Homology modeling - applications Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound

Model validation/evaluation

Profile 3D & Verify 3D:

-verify newly solved structures or homology models-find structures/folds compatible with a given sequence-find sequences compatible with known structure/fold from adatabase of sequences