the side-chain positioning problem joint work with bernard chazelle and mona singh carl kingsford...

26
The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Post on 19-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

The Side-Chain Positioning Problem

Joint work with Bernard Chazelle and Mona Singh

Carl KingsfordPrinceton University

Page 2: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

VC

R

R

Proteins

Many functions: Structural, messaging, catalytic, …

Sequence of amino acids strung together on a backbone

Each amino acid has a flexible side-chain

Proteins fold. Function depends highly on 3D shape

Page 3: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Backbone

Protein Structure

Side-chains

Page 4: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Side-chain Positioning Problem

Given:• fixed backbone• amino acid sequence

Find the 3D positions for the side-chains that minimize the energy of the structure

Assume lowest energy is best

IILVPACW…IILVPACW…

Page 5: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Side-chain Positioning Applications

Homology-modeling: Use known backbone of similar protein to predict new structure

Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII NV CKNG NCY S S + ITDCR G+SKYPNC YKT+ KHII Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII

Page 6: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Rotamers

Each amino acid has some number of statistically preferred side-chain positions

These are called rotamers

Continuum of positions is well approximated by rotamers

3 rotamers of Arginine

Page 7: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

An Equivalent Graph Problem

For protein with p side-chains:

p-partite graph:

• part Vi for each side-chain i

• node u for each rotamer

• edge {u,v} if u interacts with v

Weights:

• E(u) = self-energy

• E(u,v) = interaction energy

n nodes

rotamer

position

interaction

V1

V2

Page 8: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Feasible Solution

Feasible solution: one node from each part

cost(feasible) = cost of induced subgraph

Hard to approximate within a factor of cn

where n is the # of nodes

rotamer

position

interaction

V1

V2

Page 9: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Determining the Energy

• Energy of a protein conformation is the sum of several energy terms

• No -inequality

van der Waals

electrostatics

bond lengthsbond angles

dihedral angleshydrogen bonds

0+ -

A

B

Page 10: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Plan of Attack

1.Formulate as a quadratic integer program

2.Relax into a semidefinite program

3.Solve the SDP in polynomial time

4.Round solution vectors to choice of rotamers

Page 11: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Quadratic Integer Program

min

for each posn j

subject to

for each posn j, node v

Page 12: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Relax Into Vector Program

Use xu = xu2 for to write as pure quadratic

programVariables n-dimensional vectors ( )

minimize

subject to

for each posn j

for each node v, posn j

Page 13: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Rewrite As Semidefinite Program

X (xuv) is PSD xuv = xuTxv

minimize

subject tofor each posn j

for each node v, posn j

Page 14: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

position constraintssum of the node variables in each position is 1

Vi

xvv

Constraints & Dummy Position

xu0V0

Insert a new position with a single node.No edges, no node cost.

xuv Vj

flow constraintssum of edge variables adjacent to a nodeequals that node variable

Page 15: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Geometry of the Solution Vectors

Page 16: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Let Simple algebra shows that:

Geometry of Solution Vectors

Lemma.

Proof.

• Length of y is 1

• Length of xu0 is 1

• Length of projection of y onto xu0 is 1

.

Page 17: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Solution Vectors Lie on a Sphere

xu0

xu

a

O

because

Note. Length of projection of xu onto xu0 is

the length of vector xu squared.

Each solution vector lies on a sphere of radius ½ centered at xu0

/2:

a2 =

Page 18: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

How do we round the solution of the SDP relaxation?

Convert fractional solutions into feasible 0/1 solutions

• Projection rounding• Perron-Frobenius rounding

Page 19: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Projection Rounding

O

Since , the xuu give a probability distribution at

at each position.

Pick node u with probability xuu

xu0 xu

xv

xuu = length of the projection onto xu0

.

X =

Page 20: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Drift for Projection Rounding

Drift expected difference between fractional & rounded solutions.

Comes entirely from pairwise interactions.

In fact,

yuyv

xuxv

By Cauchy-Schwartz,

uv = E(u,v)(xuv – Pr[uv])

Because xu are on a sphere,

Page 21: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Perron-Frobenius Rounding

0/1 characteristic n-vector of optimal solutionOptimal integral X* T rank(X*) = 1

Idea: Approximate fractional X by a rank 1 matrix qqT

Want to sample from , but settle for q

= 0 1 1 1 10 0 0 0 0 00 0 0 0

= 1 = 1 = 1 = 1q =

q needs to contain probability distributions for each

position. How do we choose q?

Page 22: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Lemma. Any nonnegative vector q with L1-norm p in the image space of X contains the required set of probability distributions.

Proof. X = WTW, where W = [x1 x2 … xn].

Let 1i characteristic vector for position i

Suppose q = Xy for some y.

Then,

The final value is independent of i each position sums to 1.

Possible Choices for q

Page 23: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

A Choice for q

By spectral decomposition

where

Take

By Perron-Frobenius theorem for nonnegative matrices q ≥ 0.

By Lemma, q contains the needed probability distributions.

z1 is in the image space of X.

Page 24: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Computational Results

Compare solutions from Simple LP SDP Fractional Projection rounded Perron-Frobenius rounded

30 random graphs

60 nodes, 15 positions

edge probability ½

weights uniformly from [0,1]

Page 25: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

Future Work

Can the rounding schemes be applied to other problems?

Can the semidefinite program be sped up?

─ Can only routinely solve graphs with ≤ 120 nodes (reasonable protein problems contain 1000 to 5000 nodes)

─ xuv ≥ 0 constraints are the bottleneck

Can the requirement of a fixed backbone be relaxed?

We’ve worked quite a bit with real proteins using a LP approach Seems an SDP formulation might be useful

Page 26: The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

More Information

The Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding Schemes

, B. Chazelle, C. Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of Computing and Knowledge: Paris Kanellakis Memorial Workshop (2003).

http://www.cs.princeton.edu/~carlk/papers.html