bio-cs exploration of molecular conformational spaces jean-claude latombe computer science...
Post on 19-Dec-2015
217 views
TRANSCRIPT
Bio-CSBio-CSExploration of Molecular Exploration of Molecular Conformational SpacesConformational Spaces
Jean-Claude LatombeComputer Science Department
Robotics Laboratory & Bio-X Clark Center
Range of Bio-CS ResearchRange of Bio-CS Research
Gene
Molecules
Tissue/Organs
Body system
Robotic surgery
Molecular structures,similaritiesand motions
Soft-tissue simulation andsurgical trainingCells
Simulation ofcell interaction
Soft-tissue simulation andsurgical training
Range of Bio-CS ResearchRange of Bio-CS Research
Gene
Molecules
Tissue/Organs
Body system
Robotic surgery
CellsSimulation ofcell interaction
Accuray
Molecular structures,similaritiesand motions
Range of Bio-CS ResearchRange of Bio-CS Research
Gene
Molecules
Tissue/Organs
Body system
Robotic surgery
Molecular structures,similaritiesand motions
Soft-tissue simulation andsurgical trainingCells
Simulation ofcell interaction
Motion Motion Structure Structure
1
2 3
4
Motion Motion Structure Structure Function Function
Develop efficient algorithms and data structuresto explore protein conformational spaces: Sampling Similarities Pathways
Vision for the FutureVision for the Future
In-silico experiments
Drugs on demand
“Interactive” Biology
Analogy with RoboticsAnalogy with Robotics
free space
[Kavraki, Svetska, Latombe,Overmars, 95][Kavraki, Svetska, Latombe,Overmars, 95]
But Biology But Biology Robotics … Robotics …
Energy field, instead of joint controlContinuous energy field, instead of binary free and in-collision spacesMultiple pathways, instead of single collision-free pathPotentially many more degrees of freedomRelation to real world is more complex
OverviewOverview
Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular MotionsM.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Computational Biology, 10(3-4):257-281, 2003.
Part IIChainTree: A Data Structure for Efficient Monte Carlo Simulation of ProteinsI. Lotan, F. Schwarzer, J.C. Latombe. Efficient Energy Computation for Monte Carlo Simulation of Proteins. 3rd Workshop on Algorithms in Bioinformatics (WABI), Budapest, Hungary, Sept., 2003.
Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular MotionsSerkan Apaydin, Doug Brutlag1, Carlos Guestrin, David Hsu2, Jean-Claude Latombe, Chris VarmaComputer Science DepartmentStanford University1 Department of Biochemistry, Stanford University2 Computer Science Department, Nat. Univ. of Singapore
Initial WorkInitial Work[Singh, Latombe, Brutlag, 99][Singh, Latombe, Brutlag, 99]
Study of ligand-protein bindingProbabilistic roadmaps with edges weighted by energetic plausibility
vi
vj
wij
Initial WorkInitial Work[Singh, Latombe, Brutlag, 99][Singh, Latombe, Brutlag, 99]
Study of ligand-protein bindingProbabilistic roadmaps with edges weighted by energetic plausibility
Search of most plausible path
vi
vj
wij
Initial WorkInitial Work[Singh, Latombe, Brutlag, 99][Singh, Latombe, Brutlag, 99]
Study of energy profiles along most plausible paths
Extensions to protein folding[Song and Amato, 01] [Apaydin et al., 01]
But: Molecules fold/bind along a myriad of pathways. Any single pathway is of limited interest.
CatalyticSite
energy
New Idea: New Idea: Capture the stochastic nature of Capture the stochastic nature of molecular motion by assigning molecular motion by assigning probabilities to edgesprobabilities to edges
vi
vj
Pij
Edge probabilitiesEdge probabilities
Follow Metropolis criteria:
otherwise. ,
1
;0 if ,)/exp(
i
iji
Bij
ij
N
EN
TkE
P
Self-transition probability:
ijijii PP 1
vj
vi
Pij
Pii
Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution
S
Stochastic Roadmap SimulationStochastic Roadmap Simulation
Pij
Problems with Problems with Monte Carlo SimulationMonte Carlo Simulation
Much time is wasted escaping local minima Each run generates a single pathway
Proposed SolutionProposed Solution
Pij
Treat a roadmap as a Markov chain and use First-Step Analysis tool
Example #1: Example #1:
Probability of Folding pProbability of Folding pfoldfold
Unfolded state Folded state
pfold1- pfold
“We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is
very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition
Coordinate for Protein Folding” Journal of Chemical Physics (1998).
HIV integrase[Du et al. ‘98]
Pii
F: Folded setU: Unfolded set
First-Step AnalysisFirst-Step Analysis
Pij
i
k
j
l
m
Pik Pil
Pim
Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1 =1
One linear equation per node Solution gives pfold for all nodes
No explicit simulation run All pathways are taken into account Sparse linear system
In Contrast …In Contrast …
Computing pfold with MC simulation requires:
For every conformation c of interest
Perform many MC simulation runs from c
Count number of times F is attained first
Computational TestsComputational Tests• 1ROP (repressor of
primer)• 2 helices• 6 DOF
• 1HDD (Engrailed homeodomain)
• 3 helices• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
1ROP
Correlation with MC ApproachCorrelation with MC Approach
Computation Times (1ROP)Computation Times (1ROP)
Monte Carlo:
49 conformations Over 11 days ofcomputer time
Over 106 energy
computations
Roadmap:
5000 conformations1.5 hours ofcomputer time
~15,000 energycomputations
~4 orders of magnitude speedup!
Example #2: Example #2: Ligand-Protein InteractionLigand-Protein Interaction
Computation of escape time from funnels of attraction around potential binding sites
funnel = ball of 10Å rmsd[Camacho, Vajda, 01]
Similar Computation Similar Computation Through Simulation Through Simulation [Sept, Elcock and McCammon `99]
10K to 30K independent simulations
Computing Escape Time with Computing Escape Time with RoadmapRoadmap
Funnel of Attraction
ij
kl
m
Pii
Pim
PilPikPij
i = 1 + Pii i + Pij j+ Pik k + Pil l + Pim m
(escape time is measured as number of stepsof stochastic simulation)
= 0
Distinguishing Catalytic SiteDistinguishing Catalytic Site
Given several potential binding sites,which one is the catalytic site?
Energy: electrostatic + van der Waals + solvation free energy terms
Complexes StudiedComplexes Studied
ligand protein # random nodes
# DOFs
oxamate 1ldm 8000 7
Streptavidin 1stp 8000 11
Hydroxylamine 4ts1 8000 9
COT 1cjw 8000 21
THK 1aid 8000 14
IPM 1ao5 8000 10
PTI 3tpi 8000 13
Distinction Based on Distinction Based on EnergyEnergy
Protein Bound state
Best potential binding site
1stp -15.1 -14.6
4ts1 -19.4 -14.6
3tpi -25.2 -16.0
1ldm -11.8 -13.6
1cjw -11.7 -18.0
1aid -11.2 -22.2
1ao5 -7.5 -13.1 (kcal/mol)
Able to distinguish
catalytic site
Not able
Distinction Based on Escape Distinction Based on Escape TimeTimeProtein Bound
stateBest potential binding site
1stp 3.4E+9 1.1E+7
4ts1 3.8E+10 1.8E+6
3tpi 1.3E+11 5.9E+5
1ldm 8.1E+5 3.4E+6
1cjw 5.4E+8 4.2E+6
1aid 9.7E+5 1.6E+8
1ao5 6.6E+7 5.7E+6(# steps)
Able to distinguishcatalytic
site
Not able
ConclusionConclusion
Probabilistic roadmaps are a promising tool for computing ensemble properties of molecular pathways
Current work: Non-uniform sampling strategies to
handle more complex molecules More realistic energetic models Extension to molecular dynamic
simulation Connection to in-vitro experiments
(interaction of two proteins)
Part IIPart II ChainTree: ChainTree: A Data Structure for Efficient A Data Structure for Efficient Monte Carlo Simulation of Monte Carlo Simulation of ProteinsProteinsItay Lotan, Fabian Schwarzer, Dan Halperin1,
Jean-Claude LatombeComputer Science DepartmentStanford University1 Computer Science Department, Tel Aviv University
Used to study thermodynamic and kinetic properties of proteins
Random walk through conformation space At each attempted step:
– Perturb current conformation at random– Accept step with probability:
Problem: How to maintain energy efficiently?
/( ) min 1, bE k TP accept e
Monte Carlo Simulation Monte Carlo Simulation (MCS)(MCS)
Energy FunctionEnergy Function
E = bonded terms + non-bonded terms
Bonded terms, e.g. bond length Easy to compute
Non-bonded terms, e.g. Van der Waals, depend on distances between pairs of atomsExpensive to compute, O(n2)
Energy FunctionEnergy Function
Non-bonded terms
Use cutoff distance (6 - 12Å) Only O(n) interacting pairs
[Halperin & Overmars ’98]
Problem: How to find interacting pairswithout enumerating all atom pairs?
Grid MethodGrid Method
Subdivide space into cubic cells Compute cell that contains each atom
center Store results in hash table
dcutof
f• Θ(n) time to update grid• O(1) time to find
interactions for each atom• Θ(n) to find all interactions
Asymptotically optimal in worst-case!
Can We Do Better on Can We Do Better on Average?Average?
Proteins are long kinematic chains
Protein’s Kinematic Protein’s Kinematic StructureStructure
Angles for backbone andfor side-chains Conformational space
torsional dof
Can We Do Better on Can We Do Better on Average?Average?
Proteins are long chain kinematics
Few DOFs are perturbed at each MC step
Long sub-chains stay rigid at each stepMany partial energy sums remain constant
How to retrieve unchanged partial sums?
Two New Data StructuresTwo New Data Structures
1. ChainTree Fast detection of interacting atom pairs
2. EnergyTree Reuse of unchanged partial energy sums
ChainTreeChainTree
Combination of two hierarchies: Transform hierarchy:
Bounding volume hierarchy:
ChainTreeChainTree
Combination of two hierarchies: Transform hierarchy:
approximate kinematics of protein backbone at successive resolutions
ChainTreeChainTree
Combination of two hierarchies:
Bounding volume hierarchy: approximate geometry of protein at successive resolutions
(Larsen et al., ’00)
ChainTreeChainTree
Updating the ChainTreeUpdating the ChainTree
Update path to root– Recompute transforms that shortcut change– Recompute bounding volumes that contain change
Finding Interacting PairsFinding Interacting Pairs
vs.
• Do not search inside rigid sub-chains (unmarked nodes)
• Do not test two nodes with no marked node between them
Finding Interacting PairsFinding Interacting Pairs
vs.
• Do not search inside rigid sub-chains (unmarked nodes)
• Do not test two nodes with no marked node between them
Computational ComplexityComputational Complexity
• n : total number of DOFs in protein backbone
• k : number of simultaneous DOF changes at each step of MCS
• Updating complexity:
• Worst-case complexity of finding all interacting pairs:
but performs much better in practice!!!
logn
O kk
43( )n
EnergyTreeEnergyTree
E(N,N) E(N,O)
E(P,P)
E(O,O)
EnergyTreeEnergyTree
( , ) ( , ) ( , ) ( , ) ( , )
( , ) ( , ) ( , ) ( , )l l r r l r r l
l l r r l r
E E E E E
E E E E
E(N,N) E(N,O)
E(P,P)
E(O,O)
Experimental SetupExperimental Setup
Energy function:– Van der Waals– Electrostatic– Attraction between native contacts– Cutoff at 12Å
300,000 steps MCS Early rejection for large vdW terms
Results: 1-DOF changeResults: 1-DOF change
(68) (144) (374) (755)
Results: 5-DOF changeResults: 5-DOF change
(68) (144) (374) (755)
Two-Pass ChainTreeTwo-Pass ChainTree
(68) (144) (374) (755)
ConclusionConclusion
• Chain/EnergyTree reduces average time per step in MCS of proteins (vs. grid)
• Exploit chain kinematics of protein • Larger speed-up for bigger proteins
and for smaller number of simultaneous DOF changes
What is Computational Biology?What is Computational Biology?
Using computers in Biology?Designing efficient algorithms for analyzing biological data and simulating biological processes?Using Biology to design new algorithms and computing hardware?
Cultural clash Biology classificationComputer Science abstraction
In any case, Computational Biology will be a critical domain for the next 20 years, probably the next “big thing” after the Internet