conformational space

48
Conformational Space

Upload: kelton

Post on 11-Jan-2016

74 views

Category:

Documents


0 download

DESCRIPTION

Conformational Space. Conformational Space. Conformation of a molecule: specification of the relative positions of all atoms in 3D-space, Typical parameterizations : List of coordinates of atom centers List of torsional angles (e.g., the f - y - c for a protein) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Conformational Space

Conformational Space

Page 2: Conformational Space

Conformational Space Conformation of a molecule: specification

of the relative positions of all atoms in 3D-space,

Typical parameterizations: List of coordinates of atom centers List of torsional angles (e.g., the -- for

a protein)

Conformational space: Space of all conformations

Page 3: Conformational Space

Conformational Space

q1

qi

q2

qj

qN-1

qN

Page 4: Conformational Space

Conformational Space

q1

q3

q0

qn

q4

Page 5: Conformational Space

Relation to Robotics/Graphics

q1

q3

q0

qn

q4

q2

(t)

Configuration space

Page 6: Conformational Space

Need for a Metric

Simulation and sampling techniques can produce millions of conformations

Which conformations are similar? Which ones are close to the folded

one? Do some conformations form small

clusters (e.g. key intermediates while folding)?

Page 7: Conformational Space

Metric in Conformational Space

A metric over conformational space C is a function:

d: c,c’ C d(c,c’) +{0}such that: d(c,c’) = 0 c = c’ (non-degeneracy) d(c,c’) = d(c’,c) (symmetry) d(c,c’) + d(c’,c”) d(c,c”) (triangle

inequality)

Page 8: Conformational Space

But not all metrics are “good”

Euclidean metric:

d(c,c’) = i=1,...,n(|i-i’|2+ |i-i’|2)

Page 9: Conformational Space
Page 10: Conformational Space
Page 11: Conformational Space

Metric in Conformational Space

A “good” metric should measure how well the atoms in two conformations can be aligned

Usual metrics: cRMSD, dRMSD

Page 12: Conformational Space

RMSD

Given two sets of n points in 3 A = {a1,…,an} and B = {b1,…,bn}

The RMSD between A and B is:

RMSD(A,B) = [(1/n)i=1,…,n||ai-bi||2]1/2

where ||ai-bi|| denotes the Euclidean distance between ai and bi in 3

RMSD(A,B) = 0 iff ai = bi for all i

Page 13: Conformational Space

cRMSD Molecule M with n atoms a1,…,an Two conformations c and c’ of M ai(c) is position of ai when M is at c

cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers:

minT[(1/n)i=1,…,n||ai(c) – T(ai(c’))||2]1/2

where the minimization is over all possible rigid-body transform T

Page 14: Conformational Space
Page 15: Conformational Space
Page 16: Conformational Space

cRMSD cRMSD verifies triangle inequality cRMSD takes linear time to compute Often, cRMSD is restricted to a

subset of atoms, e.g., the C atoms on a protein’s backbone

Page 17: Conformational Space

Representation Restricted to C Atoms

Protein 1tph

- The positions of AA residue centers (Cα atoms) mainly determine the structure of a protein.- In structural comparison, people usually work only on the backbone of Cα atoms, and neglect the other atoms.

Page 18: Conformational Space

Possible project: Design a method for efficiently finding nearest neighbors in a sampled conformation space of a protein, using the cRMSD metric.

Page 19: Conformational Space

dRMSD

Molecule M with n atoms a1,…,an

Two conformations c and c’ of M {dij(c)}: nn symmetrical intra-molecular

distance matrix in M at c dRMD(c, c’) is :

[(1/n(n-1))i=1,…,n-1j=i+1,…,n(dij(c) – dij(c’))2]1/2

{dij} is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone

Page 20: Conformational Space

Intra-Molecular Distance Matrix

Distances between C pairs of a protein with 142 residues. Darker squares represent shorter distances.

Page 21: Conformational Space

Intra-Molecular Distance Matrix

Distances between C pairs of a protein with 142 residues. Darker squares represent shorter distances.

1

40

85

45

Page 22: Conformational Space

Intra-Molecular Distance Matrix

Page 23: Conformational Space

dRMSD

Molecule M with n atoms a1,…,an

Two conformations c and c’ of M {dij(c)}: nn symmetrical intra-molecular

distance matrix in M at c dRMSD(c, c’) =

[(2/n(n-1))i=1,…,n-1j=i+1,…,n(dij(c) – dij(c’))2]1/2

{dij} is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone

Page 24: Conformational Space

dRMSDdRMSD

Molecule M with n atoms a1,…,an

Two conformations c and c’ of M {dij(c)}: nn symmetrical intra-molecular

distance matrix in M at c dRMSD(c, c’) =

[(2/n(n-1))i=1,…,n-1j=i+1,…,n(dij(c) – dij(c’))2]1/2

{dij} is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone

Advantage: No aligning transform Drawback: Takes quadratic time to compute

Page 25: Conformational Space

Is dRMSD a metric? dRMSD(c, c’) =

[(2/n(n-1))i=1,…,n-1j=i+1,…,n(dij(c) – dij(c’))2]1/2

is a metric in the n(n-1)/2-dimensional space, where a conformation c is represented by {dij(c)}

But, in this representation, the same point represents both a conformation and its mirror image

Page 26: Conformational Space

k-Nearest-Neighbors Problem

Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c (w.r.t. cRMSD, dRMSD, other metric)

Can be done in time O(N(log k + L)) where: - N = size of S- L = time to compare two conformations

Page 27: Conformational Space

k-Nearest-Neighbors Problem

The total time needed to compute the k nearest neighbors of every conformation in S is O(N2(log k + L))

Much too long for large datasets where N ranges from 10,000’s to millions!!!Can be improved by:

1. Reducing L 2. More efficient algorithm (e.g., kd-tree)

Page 28: Conformational Space

kd-Tree

In a d-dimensional space, where d>2, range searching for a point takes O(dn1-1/d)

Page 29: Conformational Space

k-Nearest-Neighbors Problem

Idea: simplify protein’s description

Page 30: Conformational Space

cRMSD O(n) timedRMSD O(n2) time

Assume that each conformation is described by the coordinates of the n C atoms

Page 31: Conformational Space

This representation is highly redundant

Proximity along the chain entails spatial proximity

Atoms can’t bunch up, hence far away atoms along the chain are on average spatially distant

3d l

ci cj

Page 32: Conformational Space

m-Averaged Approximation

Cut the backbone into fragments of m C atoms Replace each fragment by the centroid of the m

C atoms Simplified cRMSD and dRMSD

3n coordinates 3n/m coordinates

Page 33: Conformational Space

8 diverse proteins (54 -76 residues)

Decoy sets of N =10,000 conformations from the Park-Levitt set [Park et al, 1997]

Evaluation: Test Sets[Lotan and Schwarzer, 2003]

m cRMSD dRMSD3 0.99 0.96-0.98

4 0.98-0.99 0.94-0.97

6 0.92-0.99 0.78-0.93

9 0.81-0.98 0.65-0.96

12 0.54-0.92 0.52-0.69Higher correlation for random sets ( greater savings)

Correlation:

Page 34: Conformational Space

Running Times

Page 35: Conformational Space

Further Reduction for dRMSD

1) Stack m-averaged distance matrices as vectors of a matrix A

Page 36: Conformational Space

Ar

N

Vector ai of elements of distance matrix of

ith conformation (i = 1 to N)

1 n n

r 12 m m

2

m j ji i

1dRMSD (c,c )= a-a

r

Page 37: Conformational Space

Further Reduction for dRMSD

1) Stack m-averaged distance matrices as vectors of a matrix A

2) Compute the SVD A = UDVT

Page 38: Conformational Space

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

Vector aj of elements of distance matrix of

jth conformation (j = 1 to N)

Orthonormal(rotation) matrix

Diagonal matrix

Page 39: Conformational Space

A(rxN)

r

N

U(rxr)

VT

(rxN)=

SVD Decomposition

Vector aj of elements of distance matrix of

jth conformation (j = 1 to N)

Orthonormal(rotation) matrix

Diagonal matrix

s1

s2

sr

0

0

s1 s2 ... sr 0 (singular values)

Page 40: Conformational Space

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

Vector aj of elements of distance matrix of

jth conformation (j = 1 to N)

Orthonormal(rotation) matrix

Diagonal matrix

Matrix withorthonormal rows

vjT

vkT

vi and vj are orthogonal unit Nx1 vectors

Page 41: Conformational Space

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

r-dimensional space

x

y

X

Y

Representation ofA in space (X,Y)

2

m j ji i

1dRMSD (c,c )= a-a

rdoes not depend on thecoordinate system!

1 n n

r 12 m m

Page 42: Conformational Space

v1T

v2T

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

s1

s2

s3

sr

||s1v1|| ||s2v2|| ...

Page 43: Conformational Space

v1T

v2T

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

s1

s2

s3

sr

vpT

p principal components

Page 44: Conformational Space

A(rxN)

r

N

U(rxr)

D(rxr)

VT

(rxN)=

SVD Decomposition

s1

s2

sp

v1T

v2T

vpT

p principal components

0

Page 45: Conformational Space

Further Reduction for dRMSD

1) Stack m-averaged distance matrices as vectors of a matrix A

2) Compute the SVD A = UDVT

3) Project onto p principal components

Page 46: Conformational Space

Correlation PC

4dRMSDbetween dRMSD and

is reduced to summing up 12 to 20 terms(instead of ~ 80 to 200, since the proteins have 54 to 76 amino acids)

PC4dRMSD

Page 47: Conformational Space

Complexity of SVD SVD of rxN matrix, where N > r, takes

O(r2N) time Here r ~ (n/m)2

So, time complexity is O(n4N) Would be too costly without m-

averaging

Page 48: Conformational Space

Evaluation for 1CTF Decoy Sets

[Lotan and Schwarzer, 2003] N = 100,000, k = 100, 4-averaging, 16 PCs 70% correct, with furthest NN off by 20% Brute-force: 84 h Brute-force + m-averaging: 4.8 h Brute-force + m-averaging + PC: 41 min kD-tree + m-averaging + PC: 19 min Speedup greater than x200 6k approximate NNs contain all true k NNs Use m-averaging and PC reduction

as fast filters