conformation networks: an application to protein folding zoltán toroczkai center for nonlinear...

44
Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear Stud Erzsébet Ravasz Gnana Gnanakaran (T- 10) Theoretical Biology and Biophysics

Upload: prosper-byrd

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Conformation Networks: an Application to Protein

Folding

Zoltán Toroczkai

Center for Nonlinear Studies

Los Alamos National Laboratory

Center for Nonlinear Studies

Erzsébet Ravasz

Gnana Gnanakaran (T-10)Theoretical Biology and Biophysics

Page 2: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Proteins

the most complex molecules in nature

globular or fibrous

basic functional units of a cell

chains of amino acids (50 – 103)

peptide bonds link the backbone

unique 3D structure (native physiological conditions)

biological function

fold in nanoseconds to minutes

about 1000 known 3D structures: X-ray crystallography, NMR

Native state

Page 3: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

153 Residues, Mol. Weight=17181 [D], 1260 Atoms

Main function: primary oxygen storage and carrier in muscle tissue

It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO

Myoglobin

Page 4: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Protein conformations

• defined by dihedral angles

2 angles with 2-3 local minima of the torsion energy

N monomers about 10N different conformations

Amino-acid

Page 5: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Levinthal’s paradox

• Levinthal’s paradox, 1968

finding the native state by random sampling is not possible

40 monomer polypeptide 1013 conf/s

3 1019 years to sample all

universe ~ 2 1010 years old

Wetlaufer, P.N.A.S. 70, 691 (1973)

Levinthal, J. Chim. Phys. 65, 44-45 (1968)

nucleation

folding pathways

• Anfinsen: thermodynamic hypothesis

native state is at the global minimum of the free energy

Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol. 28, 439 (1963)

Page 6: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Free energy landscapes

• Bryngelson & Wolynes, 1987

free energy landscape

Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)

a random hetero-polymer typically does NOT fold

Davidson & Sauer, P.N.A.S. 91, 2146 (1994)

Experiment:— random sequences— GLU, ARG, LEU— 80-100 amino-acids

~ 95% did not fold in a stable manner

Page 7: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Funnels

• Leopold, Mortal & Onuchic, 1992

Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992)

many folding pathways

Energy funnels Given any amino-acid sequence: can we tell if it is a good

folder? experiments (X-ray, NMR) molecular dynamics simulations homology modeling

Difficult and slow

Page 8: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Molecular dynamics

• State of the art

supercomputer (LANL)

Ribosome in explicit solvent:– targeted MD– 2.64x106 atoms (2.5x105 + water)– Q machine, 768 processors– 260 days of simulation (event: 2 ns)

Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)

– more than 100,000 CPU’s– simulation of complete folding event

» BBA5, 23-residue, implicit water» 10,000 CPU days/folding event (~1s)

distributed computing (Stanford, Folding@home)

Shirts & Pande, Science 290, 1903 (2000)Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002)

~ 1016 times slower

Page 9: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Configuration networks

• Configuration networks

NODE configurationLINK change of one

degree of freedom (angle)

refinement of angle values continuous case

Protein conformations

dihedral angles have few preferred values

Ramachandran mapPDB structures

Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963)

• Helix• Sheet• other

Page 10: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Why networks?

• VERY LARGE: 100 monomers 10100 nodes. However:

Generic features of folding are determinedby STATISTICAL properties

of the configuration network

degree distribution average distance clustering degree correlations

Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)

toolkit from network research

captures the high dimensionality

faster algorithms to simulate folding events

pre-screening synthetic proteins

insights into misfolding

Page 11: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

A real example

• The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342342, 299 (2004)

beta3s: 20 monomers, antiparallel beta sheets

MD simulation, implicit water

330K, equilibrium folded random coil

NODE -- 8 letters / AA (local secondary struct)

LINK -- 2ps transition

Page 12: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Its native conformation has been studied by NMR experiments:

De Alba et.al. Prot.Sci. 8, 854 (1999).

Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state.

•Simulations @ 330K

•The average folding time from denaturated state ~ 83ns

•The average unfolding time ~83ns

•Simulation time ~12.6s

•Coordinates saved at every 20ps (5105 snapshots in 10s)

•Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured).

•The native state: -EEEESSEEEEEESSEEEE-

•There are approx. 818 1016 conformations.

•Nodes: conformations, transitions: links.

Page 13: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Many real-world networks are scale free

hubs

co-authorship (=1 - 2.5) citations (=3) sexual contacts (=3.4) movie actors (=2.3) Internet (y=2.4) World Wide Web (=2.1/2.5) Genetic regulation (=1.3) Protein-protein interactions ( =2.4) Metabolic pathways (=2.2) Food webs (=1.1)

Barabási & Albert, Science 286, 509,

(1999);

Scale-free network

beta3srandomized

Many reasons behind SF topology

• Why is the protein network scale free?• Why does the randomized chain have similar degree distribution?• Why is = - 2 ?

Page 14: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Robot arm networks

000 100 200

010

020

021

000 100 200

010

020

021

n-dimensional hypercube

binomial degree distribution

20 1

00

0102 10 11 12 20

21

22

n=0

n=1

n=2 • Steric constraints?

missing nodes

missing links

Swiss cheese

00 10 20

01

02 12 22

2111

Homogeneous

Page 15: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

A bead-chain model

• Beads on a chain in 3D: robot arm model

similar to C protein models

rod-rod angle

3 positions around axis

N=18; = 120 2212112212111122

N=6; = 90

Honeycutt & Thirumalai, Biopolymers 32, 695 (1992)

Homogeneous network

Page 16: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Another example: L = 7, = 75 , r = 0.25

“00100”

state “00100”

allowed state

forbidden state

Page 17: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world.

Page 18: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Shortcuts in Folding Space

Page 19: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear
Page 20: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

The “dilemma”

HOMOGENEOUS

• from studies of conformation networks

bead chain

robot arm

SCALE FREE

• from polypeptide MD simulations

beta3s

randomized version

?

Page 21: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Gradient NetworksGradient Networks

Ex.:

Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes”

Load balancing in parallel computation and packet routing on the internet

Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.).

Naturally, gradients will induce flows on networks as well.

Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004)

Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262

References:

Page 22: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Setup:

Let G=G(V,E) be an undirected graph, which we call the substrate network.

}1,...,2,1,0{},...,,{ 110 −≡= − NxxxV N The vertex set:

loops)-self (no ),,( , , ExxjixxeEeVVE ji ∉==∈×⊂ The edge set:

A simple representation of E is via the Nx N adjacency (or incidence) matrix AA

⎩⎨⎧

∉∈

==Eji

EjiaxxA ijji ),( if 0

),( if 1),(

Let us consider a scalar field ℜ→Vh :}{

Set of nearest neighbor nodes on G of i :)1(

iS

(1)

Page 23: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Definition 1 The gradient h(i) of the field {h} in node i is a directed edge:

))(,()( iiih =∇

Which points from i to that nearest neighbor }{)1( iSi U∈ for G for which the increase in the

scalar is the largest, i.e.,:

)(maxarg)(}{)1(

jiSj

hii U∈

=

The weight associated with edge (i,) is given by:

ihhih −=∇ )(

)(),()( then )( If iiiihii 0≡=∇= The self-loop )(i0.. is a loop through i

with zero weight.

Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network:

),( FVGG ∇=∇

(3)

(2)

If (3) admits more than one solution, than the gradient in i is degenerate.

Page 24: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

In the following we will only consider scalar fields with non-degenerate gradients. This means:

0}),( if {Prob. =∈= Ejihh ji

Theorem 1 Non-degenerate gradient networks form forests.

Proof:

Page 25: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Theorem 2 The number of trees in this forest = number of local maxima of {h} on G.

0.43

0.1

0.2

0.5

0.2

0.15

0.7

0.6

0.87

0.440.24

0.14

0.18

0.16 0.13

0.15

0.05

0.65 0.8

0.55

0.160.19

0.2

0.670.44

0.05

0.82

0.46

0.48

0.650.67

0.53

0.650.22

0.32

0.65

Page 26: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

, 1 , 1

)(

: 1 , . , ,0limit In the

Npzlzl

lR

zconstNpzNp

N =<≤≈

>>==∞→→

For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:

Page 27: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear
Page 28: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

The Configuration model

A. Clauset, C. Moore, Z.T., E. Lopez, to be published.

Page 29: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

K-th Power of a Ring

Generating functions: ∑=i

ki zkzg )(

∫ ⎟⎟⎠

⎞⎜⎜⎝

⎛′′

−−=1

0 )1(

)()1(1 )(

g

xgxzgdxzR

Page 30: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

( )

( )

( )

( )⎪⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪⎪

=+

−≤≤+++++++

+

=+++

++

−≤≤+++++++

+++

=

KlK

KlKlKlKlK

K

KlKKKK

KK

KllKlKlKlK

KlKK

lR K

2 ,14

1

121 ,)32)(22)(12(

124

,)33)(23)(13(3

7726

11 ,)32)(22)(12)(2(

24934

)(

2

2

)2(

Page 31: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

2K+l

Power law with exponent =- 3

Page 32: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear
Page 33: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

The energy landscape

What generates = - 2 ?

• Energy associated with each node (configuration)

the gradient network

most favorable transitions

T=0 backbone of the flow

MD simulation

tracks the flow network

biased walk close to the gradient network

trees

basins of local minimaThe REM generates an exponent of -1.The REM generates an exponent of -1.

Page 34: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Model ingredients

• A network model of configuration spaces

network topology

homogeneous

degree correlations

constrained (folded)small kconf

lower energy

loose (random coil)large kconf

higher energy

k, E increases

how to associate energies

Page 35: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Random geometric graph

• random geometric graph

in higher D: similar to hypercube with holes

degree correlations k

E

• Energy proportional to connectivity

R=0.113, <k>=20

Dall & Christensen, Phys.Rev.E 66, 026121 (2002)

Page 36: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

N=30000, <k> = 1000, d=2.

Page 37: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Exponent is - 22 essential ingredients:

1) k1-k2 correlations2) <E> with k monotonic

Page 38: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

AttractiveRepulsive

Lennard-Jones potential

Bead-chain model

• more realistic model: bead-chain

configuration network excluded volume

energy: Lennard-Jones

Page 39: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

L = 30, = 75

Page 40: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear
Page 41: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

The case of the -helix

AKA peptide

• ALA: orange• LYS: blue• TYR: green

MD simulations, no water.

Page 42: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

T = 400

More than one simulation

3 different runs: yellow, red and green

The MD traced network

The role of temperature

Page 43: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear
Page 44: Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Center for Nonlinear Studies Los Alamos National Laboratory Center for Nonlinear

Conclusions

• A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details.• Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure.• The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected.• Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours.

Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.