the side-chain positioning problem joint work with bernard chazelle and mona singh carl kingsford...

The Side-Chain Positioning Problem

Joint work with Bernard Chazelle and Mona Singh

Carl KingsfordPrinceton University

VC

R

R

Proteins

Many functions: Structural, messaging, catalytic, …

Sequence of amino acids strung together on a backbone

Each amino acid has a flexible side-chain

Proteins fold. Function depends highly on 3D shape

Backbone

Protein Structure

Side-chains

Side-chain Positioning Problem

Given:• fixed backbone• amino acid sequence

Find the 3D positions for the side-chains that minimize the energy of the structure

Assume lowest energy is best

IILVPACW…IILVPACW…

Side-chain Positioning Applications

Homology-modeling: Use known backbone of similar protein to predict new structure

Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII NV CKNG NCY S S + ITDCR G+SKYPNC YKT+ KHII Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII

Rotamers

Each amino acid has some number of statistically preferred side-chain positions

These are called rotamers

Continuum of positions is well approximated by rotamers

3 rotamers of Arginine

An Equivalent Graph Problem

For protein with p side-chains:

p-partite graph:

• part Vi for each side-chain i

• node u for each rotamer

• edge {u,v} if u interacts with v

Weights:

• E(u) = self-energy

• E(u,v) = interaction energy

n nodes

rotamer

position

interaction

V1

V2

Feasible Solution

Feasible solution: one node from each part

cost(feasible) = cost of induced subgraph

Hard to approximate within a factor of cn

where n is the # of nodes

rotamer

position

interaction

V1

V2

Determining the Energy

• Energy of a protein conformation is the sum of several energy terms

• No -inequality

van der Waals

electrostatics

bond lengthsbond angles

dihedral angleshydrogen bonds

0+ -

A

B

Plan of Attack

1.Formulate as a quadratic integer program

2.Relax into a semidefinite program

3.Solve the SDP in polynomial time

4.Round solution vectors to choice of rotamers

Quadratic Integer Program

min

for each posn j

subject to

for each posn j, node v

Relax Into Vector Program

Use xu = xu2 for to write as pure quadratic

programVariables n-dimensional vectors ( )

minimize

subject to

for each posn j

for each node v, posn j

Rewrite As Semidefinite Program

X (xuv) is PSD xuv = xuTxv

minimize

subject tofor each posn j

for each node v, posn j

position constraintssum of the node variables in each position is 1

Vi

xvv

Constraints & Dummy Position

xu0V0

Insert a new position with a single node.No edges, no node cost.

xuv Vj

flow constraintssum of edge variables adjacent to a nodeequals that node variable

Geometry of the Solution Vectors

Let Simple algebra shows that:

Geometry of Solution Vectors

Lemma.

Proof.

• Length of y is 1

• Length of xu0 is 1

• Length of projection of y onto xu0 is 1

.

Solution Vectors Lie on a Sphere

xu0

xu

a

O

because

Note. Length of projection of xu onto xu0 is

the length of vector xu squared.

Each solution vector lies on a sphere of radius ½ centered at xu0

/2:

a2 =

How do we round the solution of the SDP relaxation?

Convert fractional solutions into feasible 0/1 solutions

• Projection rounding• Perron-Frobenius rounding

Projection Rounding

O

Since , the xuu give a probability distribution at

at each position.

Pick node u with probability xuu

xu0 xu

xv

xuu = length of the projection onto xu0

.

X =

Drift for Projection Rounding

Drift expected difference between fractional & rounded solutions.

Comes entirely from pairwise interactions.

In fact,

yuyv

xuxv

By Cauchy-Schwartz,

uv = E(u,v)(xuv – Pr[uv])

Because xu are on a sphere,

Perron-Frobenius Rounding

0/1 characteristic n-vector of optimal solutionOptimal integral X* T rank(X*) = 1

Idea: Approximate fractional X by a rank 1 matrix qqT

Want to sample from , but settle for q

= 0 1 1 1 10 0 0 0 0 00 0 0 0

= 1 = 1 = 1 = 1q =

q needs to contain probability distributions for each

position. How do we choose q?

Lemma. Any nonnegative vector q with L1-norm p in the image space of X contains the required set of probability distributions.

Proof. X = WTW, where W = [x1 x2 … xn].

Let 1i characteristic vector for position i

Suppose q = Xy for some y.

Then,

The final value is independent of i each position sums to 1.

Possible Choices for q

A Choice for q

By spectral decomposition

where

Take

By Perron-Frobenius theorem for nonnegative matrices q ≥ 0.

By Lemma, q contains the needed probability distributions.

z1 is in the image space of X.

Computational Results

Compare solutions from Simple LP SDP Fractional Projection rounded Perron-Frobenius rounded

30 random graphs

60 nodes, 15 positions

edge probability ½

weights uniformly from [0,1]

Future Work

Can the rounding schemes be applied to other problems?

Can the semidefinite program be sped up?

─ Can only routinely solve graphs with ≤ 120 nodes (reasonable protein problems contain 1000 to 5000 nodes)

─ xuv ≥ 0 constraints are the bottleneck

Can the requirement of a fixed backbone be relaxed?

We’ve worked quite a bit with real proteins using a LP approach Seems an SDP formulation might be useful

More Information

The Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding Schemes

, B. Chazelle, C. Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of Computing and Knowledge: Paris Kanellakis Memorial Workshop (2003).

http://www.cs.princeton.edu/~carlk/papers.html

the side-chain positioning problem joint work with bernard chazelle and mona singh carl kingsford...

Documents

node v slide

node u

length of x u

chains slide

x u t x v

posn j slide

node variable slide

b slide