structure alignment

63
Michael Schroeder BioTechnological Center TU Dresden [email protected] www.biotec.tu-dresden.de Biotec Structure Alignment

Upload: brady-adkins

Post on 30-Dec-2015

53 views

Category:

Documents


1 download

DESCRIPTION

Structure Alignment. Structure Alignment. +. Content. Motivation Some basics Double Dynamic Programming. PART I: Motivation. Motivation: Conformational changes. Upon ligand binding structures may change Structural alignment can highlight the changes. GEFs. GAPs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structure Alignment

Michael Schroeder BioTechnological CenterTU [email protected] Biotec

Structure Alignment

Page 2: Structure Alignment

By Michael Schroeder, Biotec 2

Structure Alignment

+

Page 3: Structure Alignment

By Michael Schroeder, Biotec 3

Content

Motivation Some basics Double Dynamic Programming

Page 4: Structure Alignment

By Michael Schroeder, Biotec 4

PART I: Motivation

Page 5: Structure Alignment

By Michael Schroeder, Biotec 5

Motivation: Conformational changes

Upon ligand binding structures may change Structural alignment can highlight the changes

Page 6: Structure Alignment

By Michael Schroeder, Biotec 6

GEFs

GAPs

Conformational changes: Small GTPases

Small GTPases act as molecular switches to control and regulate important functions and pathways within in cell

Activated by guanine nucleotide exchange factors (GEF)

Inactivated by GTPase activating proteins (GAP)

Page 7: Structure Alignment

By Michael Schroeder, Biotec 7

G proteins: Conformational change in GTP and GDP bound state

Page 8: Structure Alignment

By Michael Schroeder, Biotec 8

Open and closed conformation of cytrate synthase (1cts,5cts)

Open: oxalacetate, Closed: oxalacetate and co-enzyme A Loop between two helices moves by 6A and rotates by 28º, some atoms

move by 10A

Page 9: Structure Alignment

By Michael Schroeder, Biotec 9

Page 10: Structure Alignment

By Michael Schroeder, Biotec 10

Hinge motion in Lactoferrin (1lfh, 1lfg) Lactoferrin is an iron-binding protein found in

secretions such as milk or tears Rotation of 54º upon iron-binding

Page 11: Structure Alignment

By Michael Schroeder, Biotec 11

Hinge motion in Lactoferrin (1lfh, 1lfg) Lactoferrin is an iron-binding protein found in

secretions such as milk or tears Rotation of 54º upon iron-binding

Page 12: Structure Alignment

By Michael Schroeder, Biotec 12

Page 13: Structure Alignment

By Michael Schroeder, Biotec 13

Motivation: (Distant) Relatives Sequence similarity may be low, but structural

similarity can still be high

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

Page 14: Structure Alignment

By Michael Schroeder, Biotec 14

Distant relatives

Globins occur widely Primary function: binding oxygen Assembly of helices surrounding haem group

Page 15: Structure Alignment

By Michael Schroeder, Biotec 15

Relatives

Sperm whale myoglobin (2lh7) and Lupin leghaemoglobin (1mbd)

Page 16: Structure Alignment

By Michael Schroeder, Biotec 16

Distant Relatives

Page 17: Structure Alignment

By Michael Schroeder, Biotec 17

Relatives Actinidin (2act) and Papain (9pap) Sequence identity 49%, rmsd 0.77A Same family: Papain-like

Page 18: Structure Alignment

By Michael Schroeder, Biotec 18

Relatives

Plastocyanin (5pcy) and azurin (2aza) Core of structure is conserved

Page 19: Structure Alignment

By Michael Schroeder, Biotec 19

Relatives

Structure classifications like CATH and FSSP use structural alignments to identify superfamilies.

Page 20: Structure Alignment

By Michael Schroeder, Biotec 20

Motivation: Convergent Evolution

Page 21: Structure Alignment

By Michael Schroeder, Biotec 21

Sequence similarity: low

>1cse SubtilisinAQTVPYGIPLIKADKVQAQGFKGANVKVAVLDTGIQASHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAALDNTTGVLGVAPSVSLYAVKVLNSSGSGSYSGIVSGIEWATTNGMDVINMSLGGASGSTAMKQAVDNAYARGVVVVAAAGNSGNSGSTNTIGYPAKYDSVIAVGAVDSNSNRASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMASPHVAGAAALILSKHPNLSASQVRNRLSSTATYLGSSFYYGKGLINVEAAAQ>1acb ChymotrypsinCGVPAIQPVLSGLSRIVNGEEAVPGSWPWQVSLQDKTGFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQGSSSEKIQKLKIAKVFKNSKYNSLTINNDITLLKLSTAASFSQTVSAVCLPSASDDFAAGTTCVTTGWGLTRYTNANTPDRLQQASLPLLSNTNCKKYWGTKIKDAMICAGASGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCSTSTPGVYARVTALVNWVQQTLAAN

Page 22: Structure Alignment

By Michael Schroeder, Biotec 22

Structural similarity: low

1CSE:E, 1ACB:E

Page 23: Structure Alignment

By Michael Schroeder, Biotec 23

Convergent Evolution

c.41.1 and b.47.1 share interaction partners

c.41.1Subtilisin-like

d.58.3Protease propeptides/

inhibitors

d.84.1Subtilisin inhibitor

d.40.1

CI-2 family of serine protease inhibitors

b.47.1Trypsin-like

serine proteases

c.56.5

Zn-dependentexopeptidase

g.15.1Ovomucoid/PCI-1

like inhibitor

Page 24: Structure Alignment

By Michael Schroeder, Biotec 24

Convergent Evolution

1OYV

4sgbOvomucoid/PCI-1 like inhibitor, g.15.1, topTrypsin-like serine proteases, b.47.1.2, bottom

1oyvOvomucoid/PCI-1 like inhibitor, g.15.1topSubtilisin like c.41.1bottom

Page 25: Structure Alignment

By Michael Schroeder, Biotec 25

Aligned structures

1cseCI-2 family of serine proteases inhitors, d.40.1 topSubtilisin like c.41.1bottom

1acbCI-2 family of serine proteases inhitors, d.40.1 topTrypsin-like serine proteases, b.47.1.2, bottom

Convergent Evolution

Page 26: Structure Alignment

By Michael Schroeder, Biotec 26

Catalytic Triad

>1cse SubtilisinAQTVPYGIPLIKADKVQAQGFKGANVKVAVLDTGIQASHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAALDNTTGVLGVAPSVSLYAVKVLNSSGSGSYSGIVSGIEWATTNGMDVINMSLGGASGSTAMKQAVDNAYARGVVVVAAAGNSGNSGSTNTIGYPAKYDSVIAVGAVDSNSNRASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMASPHVAGAAALILSKHPNLSASQVRNRLSSTATYLGSSFYYGKGLINVEAAAQ>1acb ChymotrypsinCGVPAIQPVLSGLSRIVNGEEAVPGSWPWQVSLQDKTGFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQGSSSEKIQKLKIAKVFKNSKYNSLTINNDITLLKLSTAASFSQTVSAVCLPSASDDFAAGTTCVTTGWGLTRYTNANTPDRLQQASLPLLSNTNCKKYWGTKIKDAMICAGASGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCSTSTPGVYARVTALVNWVQQTLAAN

Page 27: Structure Alignment

By Michael Schroeder, Biotec 27

Convergent evolution

A and B are native, C is viral

C

BA

A’

A CB C

Henschel et al., Bioinformatics 2006

Page 28: Structure Alignment

By Michael Schroeder, Biotec 28

Comparison of Nef-SH3 and intra-chain interaction of catalytic domain and SH3 of Hck, PDBs: 1efn and 2hck

No evidence of homology between Nef and Kinase

HIV1-Nef

Kinase (Src Haematopoeitic cell kinase, Catalytic domain)

Fyn-SH3/Hck-SH3

HIV Nef mimics kinase in binding SH3

Henschel et al., Bioinformatics 2006

Page 29: Structure Alignment

By Michael Schroeder, Biotec 29

Automatic calculation of equivalent residues

Apart from PxxP motif matches: Arg71/Lys249, Phe90/His289

Residues with equivalents are strictly conserved in HIV-Nef

Nef Kinase

Henschel et al., Bioinformatics 2006

Page 30: Structure Alignment

By Michael Schroeder, Biotec 30

Caspase (red) P35 (yellow) IAP (green)

Upon infection cell starts apoptosis programme, p35 tries to stop it

Mimickry of baculovirus p35 and human inhibitor of apoptosis

Henschel et al., Bioinformatics 2006

Page 31: Structure Alignment

By Michael Schroeder, Biotec 31

HIV capsid protein (yellow)

Cyclophilin (red, green)

Cyclophilin A restricts HIV infectivity

Upon mutation of cyclophilin or inhibition with cyclophorin, infectivity goes up >100 (Towers, Nature Medicine, 2003)

Mimickry of Capsids and Cyclophilin

Henschel et al., Bioinformatics 2006

Page 32: Structure Alignment

By Michael Schroeder, Biotec 32

PART II: Some basics

Page 33: Structure Alignment

By Michael Schroeder, Biotec 33

What do we need?

To main operations to align structures: Translation Rotation

How to evaluate a structural alignment? Root mean square deviation, rmsd

Page 34: Structure Alignment

By Michael Schroeder, Biotec 34

Basic Operations: Translation

Page 35: Structure Alignment

By Michael Schroeder, Biotec 35

Basic Operations: Translation

Page 36: Structure Alignment

By Michael Schroeder, Biotec 36

Basic Operations: Translation

Page 37: Structure Alignment

By Michael Schroeder, Biotec 37

Basic Operations: Rotation

Page 38: Structure Alignment

By Michael Schroeder, Biotec 38

Root Mean Square Deviation What is the distance between two points a with

coordinates xa and ya and b with coordinates xb and yb? Euclidean distance:

d(a,b) = √ (xa--xb )2 + (ya -yb )2

And in 3D?

a

b

Page 39: Structure Alignment

By Michael Schroeder, Biotec 39

Root Mean Square Deviation

In a structure alignment the score measures how far the aligned atoms are from each other on average

Given the distances di between n aligned atoms, the root mean square deviation is defined as

rmsd = √ 1/n ∑ di2

Page 40: Structure Alignment

By Michael Schroeder, Biotec 40

Quality of Alignment and Example Unit of RMSD => e.g. Ångstroms

Identical structures => RMSD = “0” Similar structures => RMSD is small (1 – 3 Å) Distant structures => RMSD > 3 Å

Page 41: Structure Alignment

By Michael Schroeder, Biotec 41

PART III: Dynamic Programming

Page 42: Structure Alignment

By Michael Schroeder, Biotec 42

A very simple algorithm…

…to align identical structures with conformational changes

Generate a sequence alignment (not necessary if both sequences are really 100% identical)

Compute center of mass for both structures Move both structures so that the centers of mass are

the origin Compute the angle between all aligned residues Rotate structure by median of all angles

Page 43: Structure Alignment

By Michael Schroeder, Biotec 43

A very simple algorithm…

…to align identical structures with conformational changes

Generate a sequence alignment (not necessary if both sequences are really 100% identical)

Compute center of mass for both structures Move both structures so that the centers of mass are

the origin Compute the angle between all aligned residues Rotate structure by median of all angles

Question: How?Assume n atoms

(x1,y1,z1) to (xn,yn,zn)(for one structure)

Page 44: Structure Alignment

By Michael Schroeder, Biotec 44

A very simple algorithm…

…to align identical structures with conformational changes

Generate a sequence alignment (not necessary if both sequences are really 100% identical)

Compute center of mass for both structures Move both structures so that the centers of mass are

the origin Compute the angle between all aligned residues Rotate structure by median of all angles

Question: How?

Question: How?Assume n atoms(x1,y1,z1) to (xn,yn,zn:)Center of mass (xCoM,yCoM,zCoM) = (1/n n

i=1 xi , 1/n ni=1 yi 1/n n

i=1 zi )

Page 45: Structure Alignment

By Michael Schroeder, Biotec 45

A very simple algorithm…

…to align identical structures with conformational changes

Generate a sequence alignment (not necessary if both sequences are really 100% identical)

Compute center of mass for both structures Move both structures so that the centers of mass are

the origin Compute the angle between all aligned residues Rotate structure by median of all angles

For all i: do xi:= xi-xCoM, yi:= yi-yCoM, yi:= yi-yCoM,

Question: How?Assume n atoms (x1,y1,z1) to (xn,yn,zn:)Center of mass (xCoM,yCoM,zCoM) = (1/n n

i=1 xi , 1/n ni=1 yi 1/n n

i=1 zi

Page 46: Structure Alignment

By Michael Schroeder, Biotec 46

A very simple algorithm…

…to align identical structures with conformational changes

Generate a sequence alignment (not necessary if both sequences are really 100% identical)

Compute center of mass for both structures Move both structures so that the centers of mass are

the origin Compute the angle between all aligned residues Rotate structure by median of all angles

Why median andnot mean?

Page 47: Structure Alignment

By Michael Schroeder, Biotec 47

A refinement: Alternating alignment and superposition

1. P = initial alignment (e.g. based on sequence alignment)

2. Superpose structures A and B based on P 3. Generate distance-based scoring matrix R from

superposition 4. Use dynamic programming to align A and B using

scoring matrix R 5. P‘ = new alignment derived from dynamic

programming step 6. If P‘ is different from P then go to step 2 again

Page 48: Structure Alignment

By Michael Schroeder, Biotec 48

Distance-based scoring matrix Let d(Ai, Bj) be the Euclidean distance between Ai and Bj

Let t be the upper distance limit for residues to be rewarded

The scoring matrix R is defined as follows:

R(Ai, Bj) = 1 / d(Ai, Bj) - 1 / t

if R(Ai, Bj) > max. score then R(Ai, Bj) = max. score

The gap/mismatch penalty is set to 0

Page 49: Structure Alignment

By Michael Schroeder, Biotec 49

Let d(Ai, Bj) be the Euclidean distance between Ai and Bj

Let t be the upper distance limit for residues to be rewarded

The scoring matrix R is defined as follows:

R(Ai, Bj) = 1 / d(Ai, Bj) - 1 / t

if R(Ai, Bj) > max. score then R(Ai, Bj) = max. score

The gap/mismatch penalty is set to 0

Distance-based scoring matrix

What size doesPAM have?

What size doesR have?

Page 50: Structure Alignment

By Michael Schroeder, Biotec 50

Example

R(Ai, Bj) = 1/d(Ai, Bj) - 1/t for t=1/10 and max. score =2

Page 51: Structure Alignment

By Michael Schroeder, Biotec 51

Part IV: Double dynamic programming (chapter 9)

Page 52: Structure Alignment

By Michael Schroeder, Biotec 52

Doube dynamic programming

Goal: Simultaniously align and superpose structures Double dynamic programming is a heuristic which

tries to achieve goal Implemented as part of SSAP (used e.g. by CATH)

Page 53: Structure Alignment

By Michael Schroeder, Biotec 53

Idea of double dynamic programming

Use two levels of dynamic programming: High level, which

summarises low level DP

Low level, which generates alignment based on assumption that ai and bj are part of an optimal alignment

Page 54: Structure Alignment

By Michael Schroeder, Biotec 54

Low level matrix

ijR is the low level scoring matrix assuming the pair ai and bj are aligned

ijRkl is the score showing how well ak fits onto bl under the constraint that ai and bj are aligned

Perform dynamic programming for all pairs i,j using ijR with constraint that optimal alignment includes (i,j)

Page 55: Structure Alignment

By Michael Schroeder, Biotec 55

Page 56: Structure Alignment

By Michael Schroeder, Biotec 56

Page 57: Structure Alignment

By Michael Schroeder, Biotec 57

Questions: How was max. score set in this example?

Page 58: Structure Alignment

By Michael Schroeder, Biotec 58

Page 59: Structure Alignment

By Michael Schroeder, Biotec 59

Page 60: Structure Alignment

By Michael Schroeder, Biotec 60

Page 61: Structure Alignment

By Michael Schroeder, Biotec 61

Page 62: Structure Alignment

By Michael Schroeder, Biotec 62

Page 63: Structure Alignment

By Michael Schroeder, Biotec 63

Summary

Structural alignments are useful to study conformational changes, to classify domains into families (DDP is used in CATH), to study proteins with distant relationships and hence low sequence similarity

Algorithms Basic operations: translate and rotate Simple algorithm based on dynamic programming Double dynamic programming:

low-level programming using substitution matrix based residue distance

Aggregation of best paths for high-level programming