the side-chain positioning problem joint work with bernard chazelle and mona singh carl kingsford...
Post on 19-Dec-2015
222 views
TRANSCRIPT
The Side-Chain Positioning Problem
Joint work with Bernard Chazelle and Mona Singh
Carl KingsfordPrinceton University
VC
R
R
Proteins
Many functions: Structural, messaging, catalytic, …
Sequence of amino acids strung together on a backbone
Each amino acid has a flexible side-chain
Proteins fold. Function depends highly on 3D shape
Backbone
Protein Structure
Side-chains
Side-chain Positioning Problem
Given:• fixed backbone• amino acid sequence
Find the 3D positions for the side-chains that minimize the energy of the structure
Assume lowest energy is best
IILVPACW…IILVPACW…
Side-chain Positioning Applications
Homology-modeling: Use known backbone of similar protein to predict new structure
Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII NV CKNG NCY S S + ITDCR G+SKYPNC YKT+ KHII Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII
Rotamers
Each amino acid has some number of statistically preferred side-chain positions
These are called rotamers
Continuum of positions is well approximated by rotamers
3 rotamers of Arginine
An Equivalent Graph Problem
For protein with p side-chains:
p-partite graph:
• part Vi for each side-chain i
• node u for each rotamer
• edge {u,v} if u interacts with v
Weights:
• E(u) = self-energy
• E(u,v) = interaction energy
n nodes
rotamer
position
interaction
V1
V2
Feasible Solution
Feasible solution: one node from each part
cost(feasible) = cost of induced subgraph
Hard to approximate within a factor of cn
where n is the # of nodes
rotamer
position
interaction
V1
V2
Determining the Energy
• Energy of a protein conformation is the sum of several energy terms
• No -inequality
van der Waals
electrostatics
bond lengthsbond angles
dihedral angleshydrogen bonds
0+ -
A
B
Plan of Attack
1.Formulate as a quadratic integer program
2.Relax into a semidefinite program
3.Solve the SDP in polynomial time
4.Round solution vectors to choice of rotamers
Quadratic Integer Program
min
for each posn j
subject to
for each posn j, node v
Relax Into Vector Program
Use xu = xu2 for to write as pure quadratic
programVariables n-dimensional vectors ( )
minimize
subject to
for each posn j
for each node v, posn j
Rewrite As Semidefinite Program
X (xuv) is PSD xuv = xuTxv
minimize
subject tofor each posn j
for each node v, posn j
position constraintssum of the node variables in each position is 1
Vi
xvv
Constraints & Dummy Position
xu0V0
Insert a new position with a single node.No edges, no node cost.
xuv Vj
flow constraintssum of edge variables adjacent to a nodeequals that node variable
Geometry of the Solution Vectors
Let Simple algebra shows that:
Geometry of Solution Vectors
Lemma.
Proof.
• Length of y is 1
• Length of xu0 is 1
• Length of projection of y onto xu0 is 1
.
Solution Vectors Lie on a Sphere
xu0
xu
a
O
because
Note. Length of projection of xu onto xu0 is
the length of vector xu squared.
Each solution vector lies on a sphere of radius ½ centered at xu0
/2:
a2 =
How do we round the solution of the SDP relaxation?
Convert fractional solutions into feasible 0/1 solutions
• Projection rounding• Perron-Frobenius rounding
Projection Rounding
O
Since , the xuu give a probability distribution at
at each position.
Pick node u with probability xuu
xu0 xu
xv
xuu = length of the projection onto xu0
.
X =
Drift for Projection Rounding
Drift expected difference between fractional & rounded solutions.
Comes entirely from pairwise interactions.
In fact,
yuyv
xuxv
By Cauchy-Schwartz,
uv = E(u,v)(xuv – Pr[uv])
Because xu are on a sphere,
Perron-Frobenius Rounding
0/1 characteristic n-vector of optimal solutionOptimal integral X* T rank(X*) = 1
Idea: Approximate fractional X by a rank 1 matrix qqT
Want to sample from , but settle for q
= 0 1 1 1 10 0 0 0 0 00 0 0 0
= 1 = 1 = 1 = 1q =
q needs to contain probability distributions for each
position. How do we choose q?
Lemma. Any nonnegative vector q with L1-norm p in the image space of X contains the required set of probability distributions.
Proof. X = WTW, where W = [x1 x2 … xn].
Let 1i characteristic vector for position i
Suppose q = Xy for some y.
Then,
The final value is independent of i each position sums to 1.
Possible Choices for q
A Choice for q
By spectral decomposition
where
Take
By Perron-Frobenius theorem for nonnegative matrices q ≥ 0.
By Lemma, q contains the needed probability distributions.
z1 is in the image space of X.
Computational Results
Compare solutions from Simple LP SDP Fractional Projection rounded Perron-Frobenius rounded
30 random graphs
60 nodes, 15 positions
edge probability ½
weights uniformly from [0,1]
Future Work
Can the rounding schemes be applied to other problems?
Can the semidefinite program be sped up?
─ Can only routinely solve graphs with ≤ 120 nodes (reasonable protein problems contain 1000 to 5000 nodes)
─ xuv ≥ 0 constraints are the bottleneck
Can the requirement of a fixed backbone be relaxed?
We’ve worked quite a bit with real proteins using a LP approach Seems an SDP formulation might be useful
More Information
The Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding Schemes
, B. Chazelle, C. Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of Computing and Knowledge: Paris Kanellakis Memorial Workshop (2003).
http://www.cs.princeton.edu/~carlk/papers.html