computer simulation of protein tyrosine phosphatase ...161004/fulltext01.pdfstep in low molecular...

65
Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 667 _____________________________ _____________________________ Computer Simulation of Protein Tyrosine Phosphatase Reaction Mechanisms and Dihydrofolate Reductase Inhibition BY KARIN KOLMODIN ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2001

Upload: others

Post on 29-Jan-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

  • Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 667

    _____________________________ _____________________________

    Computer Simulation ofProtein Tyrosine Phosphatase

    Reaction Mechanisms andDihydrofolate Reductase Inhibition

    BY

    KARIN KOLMODIN

    ACTA UNIVERSITATIS UPSALIENSISUPPSALA 2001

  • - 2 -

    Dissertation for the Degree of Doctor of Philosophy in Molecular Biotechnology presented at Uppsala University in 2001 ABSTRACT Kolmodin, K. 2001. Computer Simulation of Protein Tyrosine Phosphatase Reaction Mechanisms and Dihydrofolate Reductase Inhibition. Acta Universitatis Upsaliensis. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 667. 65 pp. Uppsala. ISBN 91-554-5148-9. Protein tyrosine phosphatases catalyse the hydrolysis of phosphotyrosine residues in proteins, which is an important reaction in the cell signalling system. The three dimensional structure of such a protein tyrosine phosphatase has been used in a computational study of the reaction mechanism at the atomic level. Free energy calculations of different reaction pathways were performed using the empirical valence bond method in combination with free energy perturbation and molecular dynamics simulations. The objective was to find a reaction mechanism that is compatible with experimental data and to elucidate the specific interactions important for catalysis.

    The free energy calculations of the simulated reaction in the solvated enzyme–substrate complex correctly reproduce the observed reaction rates for wild type and mutant enzymes. However, the results show that a different reaction mechanism is energetically more plausible than that previously proposed. The difference pertains to the ionisation state of the enzyme–substrate complex. This mechanism is found to be compatible with enzymological and structural data from earlier studies of protein tyrosine phosphatases.

    Molecular dynamics simulations of a cdc25 phosphatase reveal that this enzyme has to undergo a conformational change in association with substrate binding in order to efficiently catalyse the phosphate hydrolysis reaction. The predicted change in the protein structure has later been confirmed by X-ray crystallography.

    Kinetic isotope effects are often used to investigate possible reaction mechanisms in phosphoryl transfer processes in enzymes and solution. Quantum mechanical calculations of heavy atom isotope effects on phosphate and phosphate esters demonstrate the importance of a realistic representation of the surrounding solvent. Calculations with a dielectric continuum are not adequate because the vibrational coupling to the solvent molecules has to be included in order to accurately reproduce the experimentally measured isotope effects.

    Computational studies of enzyme–inhibitor complexes were also conducted as a part of a multi-disciplinary drug design project. The linear interaction energy method as well as empirical scoring functions were used to predict the affinity and species selectivity of novel lipophilic inhibitors of the enzyme dihydrofolate reductase. The aim was to design compounds that preferentially inhibit dihydrofolate reductase from Pneumocystis carinii and not the corresponding human enzyme, which turns out to be a challenging task. Karin Kolmodin, Department of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden © Karin Kolmodin 2001 ISSN 1104-232X ISBN 91-554-5148-9 Printed in Sweden by Uppsala University, Tryck & Medier, Uppsala 2001

  • - 3 -

    “Computers are useless. They can only give you answers.” - Pablo Picasso

  • - 4 -

    Papers included in the thesis This thesis is based on the following publications and manuscripts, which will be referred to in the summary by their Roman numerals. I Kolmodin, K. and Åqvist, J. (1999). Computational modelling of catalysis and

    binding in low-molecular-weight protein tyrosine phosphatase. Int. J. Quant. Chem. 73, 147-159.

    II Kolmodin, K., Nordlund, P. and Åqvist, J. (1999). Mechanism of substrate dephosphorylation in low Mr protein tyrosine phosphatase. Proteins 36, 370-379.

    III Kolmodin, K. and Åqvist, J. (1999). Computational modelling of the rate limiting step in low molecular weight protein tyrosine phosphatase. FEBS Letters 456, 301-305.

    IV Kolmodin, K. and Åqvist, J. (2000). Prediction of a ligand induced confor-mational change in the catalytic core of cdc25A. FEBS Letters 465, 8-11.

    V Kolmodin, K. and Åqvist, J. (2001). The catalytic mechanism of protein tyrosine phosphatases revisited. FEBS Letters 498, 208-213.

    VI Kolmodin, K., Luzhkov, V.B. and Åqvist, J. (2001). The influence of solvent on phosphate deprotonation equilibrium isotope effects. Submitted for publication in J. Am. Chem. Soc.

    VII Graffner-Nordberg, M., Kolmodin, K., Åqvist, J., Queener, S.F. and Hallberg, A. (2001). Design, synthesis, computational prediction and biological evaluation of ester soft drugs as inhibitors of dihydrofolate reductase from Pneumocystis carinii. J. Med. Chem. 44, 2391-2402.

    VIII Graffner-Nordberg, M., Kolmodin, K., Åqvist, J., Queener, S.F. and Hallberg, A. (2001). Design, synthesis and computational affinity prediction of ester soft drugs as inhibitors of dihydrofolate reductase from Pneumocystis carinii. Submitted for publication in J. Med. Chem.

    Reprints were made with permission from the copyright holders. Related publications i Marelius, J., Kolmodin, K., Feierberg, I. and Åqvist, J. (1998). Q: An MD

    program for free energy calculations and empirical valence bond simulations in biomolecular systems. J. Mol. Graph. Model. 16, 213-225.

    ii Åqvist, J., Kolmodin, K., Floriàn, J. and Warshel, A. (1999). Mechanistic alternatives in phosphate monoester hydrolysis: What conclusions can be drawn from available experimental data? Chem. Biol. 6, R71-R80.

    iii Kolmodin, K., Luzhkov, V.B. and Åqvist, J. (2001). Computational enzymology: Protein tyrosine phosphatase reactions. In Theoretical biochemistry: Processes and properties of biological systems. (Ed. Eriksson, L.A.) Elsevier Science B.V.

  • - 5 -

    Contents

    1. Introduction................................................................................................................. 7

    2. Some concepts in physical chemistry ........................................................................ 9

    2.1 Chemical reactions and free energy .................................................................................... 9 2.2 Linear free energy relationships........................................................................................ 11 2.3 Isotope effects ................................................................................................................... 11 2.4 Enzyme catalysis ............................................................................................................... 13

    3. Computational methods for studies of chemical processes ................................... 14

    3.1 Quantum chemical calculations ........................................................................................ 14 3.2 Molecular mechanics and dynamics ................................................................................. 15 3.3 Free energy perturbation ................................................................................................... 17 3.4 The empirical valence bond method ................................................................................. 18 3.5 Calculation of protein–ligand affinity............................................................................... 21

    4. Computational studies of protein tyrosine phosphatases...................................... 25

    4.1 Protein tyrosine phosphatases ........................................................................................... 25 4.2 The structure and catalytic mechanism of PTPs (V) ........................................................ 26 4.3 Modelling the reaction mechanism in low Mr PTP (I-III)................................................. 30 4.4 Prediction of a ligand induced conformational change in cdc25A (IV) ........................... 39

    5. Calculation of 18O isotope effects on phosphates ................................................... 43

    5.1 Alternative mechanisms in phosphate monoester hydrolysis ........................................... 43 5.2 The use of isotopes in phosphoryl transfer reactions........................................................ 44 5.3 Calculation of 18O isotope effects on phosphates in solution (VI) ................................... 46

    6. Affinity prediction of DHFR inhibitors .................................................................. 50

    6.1 Dihydrofolate reductase as a drug target .......................................................................... 50 6.2 Soft drugs as inhibitors of DHFR from Pneumocystis carinii .......................................... 50 6.3 Design of novel ester soft drugs (VII,VIII)....................................................................... 51

    7. Acknowledgements ................................................................................................... 55

    8. References.................................................................................................................. 57

  • - 6 -

    Abbreviations ATP Adenosine triphosphate DHFR Dihydrofolate reductase DSP Dual specific phosphatase EIE Equilibrium isotope effect EVB Empirical valence bond FEP Free energy perturbation fs femtosecond, 10-15 s ∆G Difference in Gibb’s free energy h Planck’s constant hDHFR human DHFR K Kelvin Keq Equilibrium constant KIE Kinetic isotope effect k Rate constant kB Boltzmann’s constant LFER Linear free energy relationship MD Molecular dynamics MM Molecular mechanics PCP Pneumocystis carinii pneumonia PTP Protein tyrosine phosphatase pcDHFR Pneumocystis carinii DHFR pNPP para-nitrophenyl phosphate ps picosecond, 10-12 s QM Quantum mechanics R Gas constant T Temperature

  • - 7 -

    1. Introduction

    Phosphates are central chemical building blocks in biomolecules including DNA, RNA, various co-factors and the biological unit of energy, ATP. In the process of synthesis and degradation of these molecules, phosphate groups are transferred from one molecule to another. Phosphoryl transfer processes are also important in the complex cell signalling system where specific amino acids in proteins are phosphorylated and dephosphorylated as a regulatory mechanism. A phosphorylated residue may change the activity of an enzyme or provide a recognition site for other proteins. The hydrolysis of such phosphorylated tyrosine residues is catalysed by enzymes known as protein tyrosine phosphatases (PTPs). The ability of these enzymes to accelerate phosphate hydrolysis reactions by several orders of magnitude has been the focus of much recent research. In the case of PTPs, the interpretation of enzymological experiments have not been entirely clear and contradictions regarding the exact reaction mechanism are found in the literature. The objective of this work was to elucidate the structure–function relationship of this enzyme family by applying ‘computational enzymology’ starting from the three dimensional structure of a representative PTP. Thus, the first part of this thesis concerns enzymes and how they may operate in vivo. Occasionally, the enzyme machinery does not work properly and some external regulation is needed to retain a balanced level of activity in the organism. It may also be desirable to completely block the life-supporting mechanisms of infectious pathogens by the inhibition of a specific protein. Therapeutic drugs are such molecular regulators. The development of new drugs is time-consuming and very costly. All means by which the speed and economy of the drug discovery process can be improved are most welcome, since the demand for new efficient drugs increases faster than ever. Also here structure based computer modelling can be a useful tool, because reliable protein–ligand affinity predictions may save expensive laboratory efforts. The second part of this work constitutes a drug design project where computational methods were used to predict structures and binding energies of complexes between the pharmaceutically important enzyme dihydrofolate reductase and a set of new potential drug molecules. The aim was to gain more information about the structural requirements for species selective enzyme inhibitors and to support the medicinal chemists working in the drug discovery process. My contributions to the two research projects are solely based on computer simulations of biomolecular systems. Different levels of theory are applied in the study of enzyme kinetics with focus on phosphate monoester hydrolysis reactions and in the prediction of protein–ligand interactions. The most evident advantage with modelling approaches is that chemical processes can be resolved and analysed in bits and pieces, which may rationalise the understanding and interpretation of experimental results. It is also possible to predict properties that are not easily accessible by experiments. Another

  • - 8 -

    advantage is that alternative molecular models and mechanisms, physical as well as unphysical, can be simulated and validated. A model which shows good agreement with experiments is more likely to be operational than one that does not. Outline of this thesis

    The outline of this comprehensive summary is as follows. First a short overview of some basic concepts in physical chemistry concerning chemical reactions and free energy. Then a brief introduction to the computational methods which have been used to calculate free energies in this work (Chapter 3). In the three following chapters applications of these methods are presented. Chapter 4 is assigned to the reaction mechanism catalysed by the protein tyrosine phosphatases, while Chapter 5 concerns uncatalysed phosphate hydrolysis reaction with focus on isotope effects. Chapter 6 presents calculations of protein–ligand affinities. The interested reader should read this summary together with the appended papers and manuscripts where more detailed descriptions of the parts constituting this work can be found. Papers I-III present the computational work on the catalytic mechanism of the low molecular weight protein tyrosine phosphatase. The first two papers concern substrate binding and the first step of the reaction. I and II overlap to a large extent, but I also contains parts that are not found in II and vice versa. Paper III treats the second and rate-limiting reaction step, and the complete free energy profile of the enzyme is presented. In IV the active site structure of a second phosphatase, cdc25A is discussed. The computational work on these enzymes led to a revised model of the PTP reaction mechanism. The compatibility of this new model with available experimental and theoretical data is reviewed in V. Studies of enzyme mechanisms require knowledge also about uncatalysed chemistry and VI presents quantum chemical calculations of isotope effects on phosphates in solution. In the last two papers, VII and VIII the enzyme of interest is dihydrofolate reductase. A collaboration with medicinal chemists was focused on the development of novel soft drugs against Pneumocystis carinii pneumonia by the design of species selective DHFR inhibitors.

  • - 9 -

    2. Some concepts in physical chemistry

    2.1 Chemical reactions and free energy

    A chemical reaction can be described as the transformation of one set of compounds, the reactants into another, the products. Since molecules consist of atoms, which are connected to each other by chemical bonds, chemical reactions involve breaking of bonds in the reactants and the formation of new bonds to obtain the products. The chemical bonds consist of valence electrons shared between the atoms and thus, the electronic structure has to rearrange during a reaction.

    The transformation from reactants to products usually involves an activation barrier. The highest point of this barrier is known as the transition state and is characterised by an energy maximum along the reaction path and a short-lived transition state structure. An elementary reaction is a reaction step that only involves one such transition state (Figure 1) and hence, a complex chemical reaction can proceed via a series of elementary reaction steps. A reaction mechanism is the order of elementary reactions in which the reactants are transformed into the final products. A chemical reaction can either require energy or release energy depending on the difference in standard reaction free energy (∆G°) between the reactants and products. This free energy difference is related to the equilibrium constant by the following expression:

    ∆G RT K= − ln eq

    Equilibrium constants do not only describe the equilibrium between reactants and products, but also thermodynamic processes such as binding, phase partitioning, pKa conformational transitions, etc. For instance, imagine two molecules interacting to form a complex:

    A+B AB

    Figure 1. The relationship between reactants, transition state and products in terms of

    activation free energy, ∆∆∆∆G‡ and reaction free energy, ∆∆∆∆G°°°°.

    (1)

    (2)

  • - 10 -

    The equilibrium constant, or rather the dissociation constant describing the strength of that interaction is given by Equation 3 where c° is a concentration factor for the standard state 1M at 25°C.

    [ ][ ][ ]K

    A BAB c

    GRTd

    exp= = � �∆

    As mentioned above, a chemical reaction passes though a transition state on the way from reactants to products. Also the free energy difference between the reactants and the transition state (∆G‡) may be regarded as an equilibrium. This equilibrium constant describes the rate of the reaction as a function of the activation energy according to Eyring’s equation:

    kk T

    c hG

    RTBm=

    −� �−κ { }

    exp� 1

    ∆ ‡

    where c° is a concentration factor and m the order of the reaction. κ is the transmission coefficient reflecting the dynamical effects on the reaction rate. This factor is often approximated to unity assuming that once the transition state is reached, the reaction will always proceed to the product state without re-crossing the barrier. The concept of free energy is central in thermodynamics and chemistry due to its close relation to measurable quantities. Free energy can be viewed as a probability measure and it describes the average energetics of a large ensemble of molecules, for example a sample in a test tube. In chemistry, one usually refers to Gibb’s free energy, since most chemical experiments take place at constant temperature and pressure, in contrast to Helmholtz’ free energy referring to constant volume conditions. Gibb’s free energy can be defined in a number of different ways. The most common definition is:

    G H TS= −

    as described in thermodynamics, where H is the enthalpy, T the temperature and S the entropy of the system. In statistical mechanics Gibb’s free energy is instead defined as:

    G k T Q k TVQ

    VB B T= − + � �ln

    ln∂∂ ( )Q E k Tv Bv= −exp /

    where Q is the canonical partition function and V the volume of the system. The sum is over all quantum mechanical states Eν of all degrees of freedom of the system i.e. including contributions from translations, rotations and vibrations. For classical particles the partition function is dependent on the configuration integral Z over the positions of N particles:

    (6)

    (3)

    (4)

    (5)

  • - 11 -

    ( ) ( ){ }Q C E Z V r r k T dr drkin pot N B N= ⋅ = −... exp ... / ...1 1

    Here, C is a factor describing the kinetic contribution to the partition function and Vpot is the intermolecular interaction energies. The most important consequence of statistical mechanics is that the above relations enable calculations of macroscopic quantities such as free energies from microscopic molecular interactions using the quantum mechanical or classical mechanical methods described in Chapter 3. The calculated free energies can then be directly related to experimental equilibria or rate constants.

    2.2 Linear free energy relationships

    In many cases, the activation free energies (∆G‡) of related reactions are linearly correlated with the corresponding reaction free energies (∆G°). As a reaction becomes more favourable the reaction rate increases. This phenomenon is known as a linear free energy relationship (LFER). LFERs are empirical and thus, the proportionality constant for each type of reaction must be experimentally determined. According to the Hammond postulate,1 reactions where the transition state is close to the reactants (exothermic) ∆G‡ is not particularly sensitive to a change in ∆G° leading to a proportionality constant close to zero. On the other hand, for reactions with a transition state close to the products (endothermic) a change in ∆G° would directly show up in the reaction rate resulting in a proportionality constant close to unity. Where LFERs exist they can be used to elucidate reaction mechanisms, predict reaction rates and to discover under what conditions a change in reaction mechanism occurs. One example of a linear free energy relationship is the correlation between the activation free energy of proton transfer from an acid to a base and the corresponding change in standard free energy (difference in pKa).2 Similar relations are also measured for phosphoryl transfer reactions where the reaction rate depends on the pKa of the nucleophile and the leaving group.3 These LFERs have been used in the work on phosphate hydrolysis reactions in Chapter 4 and in I-III.

    2.3 Isotope effects

    The nature of chemical reactions can also be studied by measuring isotope effects. The basic principle behind this type of experiment is that if one atom of the reactant species is replaced by its (usually) heavier isotope, the rate of the reaction and the equilibrium constant will be affected, whereas the reaction path will remain essentially the same.

    (7)

  • - 12 -

    The change in mass mainly affects the vibrational frequencies included in the partition function of the molecular system. The effects of isotopes on equilibrium constants are known as equilibrium isotope effects (EIE). The EIE is defined as the ratio between the equilibrium constants of the reference reaction (L as in light) and the isotopically substituted (H as in heavy) reaction:

    EIE / exp(( ) / )eq= = −L H

    eq H LK K G G RT∆ ∆

    Studying the isotope effects on reaction rates one deals with kinetic isotope effects (KIE), which are defined as the ratio between rate constants:

    KIE / exp(( ) / )= = −L H H Lk k G G RT∆ ∆‡ ‡

    where ∆GL

    ‡ and∆GH‡ are the activation free energies of the light and heavy reactions

    respectively. A possible isotope effect on the transmission coefficient κ is neglected in this expression. Isotope effects can be determined experimentally or obtained from calculated differences in free energy. Primary isotope effects describe the change in equilibrium or reaction rate when a bond involving the heavier isotope is broken in the reaction. Secondary isotope effects, which are usually smaller than the primary effects, are studied when the heavier isotope is not directly involved in the broken bond, but still affects the rate or equilibrium constant by changing the vibrations of the over-all system. Solvent isotope effects refer to the case where the surrounding solvent contains heavier isotopes, whereas other reactants may not. The maximum value of the isotope effect is related to the relative change in mass. Replacing hydrogen with deuterium the mass of the atom in that position is doubled and the isotope effect large. If an already heavy atom, for example carbon or oxygen, is replaced by a heavier isotope the relative change in mass will be smaller and thus also the observed isotope effect. In general, secondary KIEs are the most interesting isotope effects since they can provide information about the transition state structure. This is because the magnitude of the KIE is also related to the change in bond order at the substituted atom in the transition state. Calculations of heavy atom EIEs and KIEs on phosphates are presented in Chapter 5 and in VI.

    (8)

    (9)

  • - 13 -

    2.4 Enzyme catalysis

    Catalysts enhance the rate of chemical reactions by lowering the activation free energy ∆G‡. This work concerns enzymes which are proteins working as biological catalysts. Enzymes possess outstanding catalytic power and can speed up complex chemical reactions by several orders of magnitude at ‘normal’ temperatures and atmospheric pressure. This implies altered free energy profiles for the enzyme catalysed reactions, compared to the corresponding uncatalysed reactions is solution. An enzyme acts by specifically binding the substrate(s) in its active site where the reaction is confined. After the reaction has been catalysed the product(s) is released and the retained enzyme is ready for a new round of catalysis. Thus, the energetics of an enzyme catalysed reaction includes the actual reaction rate as well as the substrate/product binding processes (Figure 2). Enzyme catalysis is usually explained by the concept that enzymes interact more favourably with the transition state than with the substrate. The enzymes must also stabilise the transition state more efficiently than bulk water does. Exactly how this is accomplished is difficult to measure and the exact origin of enzyme catalysis has be the subject of both study and controversy. Among the different theories are: transition state stabilisation by providing a geometric/electrostatic complementary active site, substrate destabilisation, desolvation, entropy trapping, orbital steering, dynamical effects, strain etc.4 At least one thing is certain, the catalytic power is somehow stored in the three dimensional structure of the enzyme, determined by its primary sequence of amino acids. Computer simulation of enzyme reactions allows dissection of the catalytic origin, because energy contributions from different parts of the system can be calculated separately. It has for example been shown that electrostatic stabilisation of the transition state by pre-organised dipoles is important for enzyme catalysis,5,6 but entropic factors have also been invoked.7

    Kass [ES]‡ Kdiss E + S ES EP E+P

    kcat Figure 2. Schematic representation of an enzyme catalysed reaction. Here, E is the enzyme,

    S the substrate and P the product. The processes of substrate binding and product release are described by the equilibrium constants Kass and Kdiss, respectively. The turnover rate of the enzyme is kcat which is related to the activation free energy ∆∆∆∆G‡ according to Equation 4.

  • - 14 -

    3. Computational methods for studies of chemical processes

    Obviously, computational chemistry depends on the use of computers and has therefore a rather limited history in time. However, the field has grown just as exponentially as the use of computers in the rest of the society. Computational chemistry covers today a wide range of theoretical methods from calculations of quantum mechanical properties of atoms and molecules to simulations of macromolecular systems and statistical analysis of large amounts of chemical data. Common for these applications is the use of numerical calculations in the study of chemical problems. This thesis describes chemical problems addressed by computational methods of varying complexity: quantum mechanics (QM), molecular mechanics (MM) as well as a combination of the two, QM/MM. The basic principles behind these different methods will be shortly introduced in this section, but comprehensive backgrounds can be found in text books on the specific topics.8

    3.1 Quantum chemical calculations

    Quantum chemistry is based on the equations of quantum mechanics and deals with the electrons and how these are distributed around the nuclei in a molecule. A fundamental approximation in most quantum chemical methods is the Born-Oppenheimer approxi-mation, which separates the nuclear motion from the electronic motion. The idea is that the mass of the electron is much smaller than the masses of the nuclei and therefore the electrons move much faster. This implies that the electrons will adjust almost instantaneously to any change in the position of the nuclei. The task in quantum chemistry is to solve the electronic Schrödinger equation for which the solution is the wave function. The wave function depends only parametrically on the relative positions of the nuclei and can be calculated from the first principles of theory, ab initio. The Schrödinger equation has an exact solution only for the simplest molecules (like H2+), so for larger systems further approximations need to be introduced. One approximation is the commonly used Hartree-Fock method by which each electron only interacts with the mean field of the other electrons. This means that calculations at the Hartree-Fock level neglect the electron-electron correlation, resulting in overestimated energies. The electron-electron correlation can be included in QM calculations by methods like Møller-Plesset (MP) perturbation theory or configuration interaction (CI). Another approximation is the introduction of basis sets. The atomic orbitals involved in solving the Schrödinger equation are approximated as linear combinations of spatial gaussian functions. The more gaussian functions in the basis set used to describe the different orbitals, the better the approximation. However, more exact models are

  • - 15 -

    computationally more demanding and one always has to compromise between exactness and computational effort. Some of the soft-ware packages developed for quantum chemical calculations contain an optimisation algorithm used for finding the spatial arrangement of the nuclei that corresponds to the lowest electronic energy of the system. During the optimisation the nuclei approach the closest minimum on a potential energy surface created by the surrounding electrons. Quantum chemical calculations can thus be used to find the geometry corresponding to an energy minimum on this potential energy surface, but also the geometry of transition states, and of course the corresponding energies. Analysis of vibrational modes at the stationary points can further be used to calculate spectroscopic data and thermodynamic quantities, such as free energies, from the partition function (Equation 6). These methods are used in the work on isotope effects presented in Chapter 5 and in VI. QM methods give very detailed descriptions of molecular systems on the electronic level, but the computational work needed for solving the Schrödinger equation increases quickly with the number of electrons. In practice, this means that only a relatively small number of atoms in gas phase can be handled with this level of theory. The effects of solvents, e.g., water molecules or a protein surrounding the molecules of interest are therefore difficult to include in a realistic way. There are, however, implicit solvation models describing the macroscopic effects of solvents,9 but explicit hydrogen bonds between the solute and the surrounding are then ignored. Density functional methods are also increasingly used in studies of larger systems such as protein active sites, but most efforts today are focused on development of mixed quantum/classical QM/MM methods that can take into account the entire environment in an explicit way. These methods will be discussed below.

    3.2 Molecular mechanics and dynamics

    Molecular mechanics (MM) provides a simplified model of atoms and molecules compared to quantum mechanical methods. The structure of a molecule is modelled as a system of soft spheres carrying partial charges (atoms) connected by springs (bonds). The energy of such a molecular system, which in quantum chemistry is described by the wave function, is instead described by simple analytical functions. These potential energy functions describe both bonded and non-bonded interactions between atoms and the total sum of potentials is called a force field. The energy functions are quickly evaluated and are therefore suitable for calculations on large complex molecules such as proteins and systems containing many solvent molecules.

  • - 16 -

    ( ) ( ) ( )( )V k r r k k npot btorsionsanglesbonds

    = − + − + − − +½ ½ cos02

    02

    1θ ϕθ θ ϕ δ

    ( )+ − + + −�� ��−

    <−

    <

    ½kq qr

    Ar

    Br

    i j

    ij

    ij

    ij

    ij

    ijnon bondedi j

    non bondedi j

    impropertorsions

    ξ ξ ξ πε02

    012 6

    14

    Equation 10 demonstrates a typical pair-potential derived in the late 1960s. It is still in use for calculations on proteins and other organic molecules and includes energy contributions from bond stretching, angle bending, torsional rotations and non-bonded terms for electrostatic and van der Waals interactions. The total potential energy of a molecular system depends on the positions of all the atoms in space and also the parameters k, r0, θ0, n, δ, ξ0, q, A, B etc., which must be defined for all bonds, angles, torsions and atom types of the system. These parameters are calibrated to reproduce energies, frequencies and geometries obtained either from experiments or QM methods. There are many different force fields developed today and which one to use depends on the application, availability or tradition. Some examples of force fields applicable to proteins are AMBER,10 CHARMM,11 GROMOS 12 and OPLS-AA.13 The force field can be used to calculate the potential energy of a molecular system as a function of the positions of all the atoms. In addition, the negative derivative (with respect to position) of the potential energy gives the force exerted on each atom at that particular geometry. The force applied on a particle is related to its acceleration by Newton’s second law:

    �FVr

    m aipot

    ii i= − =

    ∂∂

    This means that the particle of mass mi will change its position and velocity depending on the direction and magnitude of the applied force. Newton’s laws of motion can be numerically integrated with respect to time which yields the velocity and position of all particles at any moment. The process of numerical integration over time of a system of second order differential equations is known as molecular dynamics (MD) simulation. Thus, MD simulations allow us to follow the motion of a system of atoms or molecules during a period of time (the trajectory of the system), given the initial positions and velocities of all atoms. The initial positions can for instance be obtained from crystal structures or other molecular models optimised by MM or QM methods. The initial velocities are usually taken randomly from a Maxwell-Boltzmann distribution at an appropriate simulation temperature. Due to the relation between temperature and the velocities of the atoms, the temperature of the simulated system can be easily regulated by periodically scaling the velocities. The velocities are scaled proportionally to the difference between the actual temperature and the target temperature using a coupling parameter and this scaling procedure is known as coupling to a thermal bath.14

    (10)

    (11)

  • - 17 -

    MD simulations can be used to study structural properties of molecules, e.g., root mean square deviation from the initial structure as in II or as in IV where a conformational change in the active site of a protein is studied. However, studies of structural motions during trajectories are not the only application for MD simulations. MD simulations can also be used to sample the available conformational space of the molecular system as the atoms move around. Conformations of low energy will be explored more often than those of high energy and if the simulation is long enough, the set of explored conformations has a Boltzmann distribution of energies. The time average of a property will then approach a thermodynamic ensemble average according to the ergodic assumption and these averages are needed to calculate macroscopic (thermodynamic) quantities. The results presented in I-III, VII and VIII are based on MD simulations of protein–ligand complexes in solution. The average potential energies obtained from the generated conformations are used to calculate free energies, which are directly related to measured reaction rates and equilibrium constants.

    3.3 Free energy perturbation

    In principle it would be possible to calculate the configurational free energy of a system from an MD simulation, since the Boltzmann distributed conformations constitute the configuration integral (Equation 7) from which the free energy can be calculated. However, due to bad convergence this is not easily pursued. On the other hand, it is much easier to calculate relative free energies using free energy perturbation (FEP) methods.15-19 The free energy difference between two systems is related to the corresponding difference in potential energy in the expression derived directly from statistical mechanics:

    ( )∆G RT k T V VB B A A= − − −� �ln exp

    1

    where VA and VB are the potential energy functions of the two states. The ensemble average denoted can be calculated from MD simulation in one of the states A. However, this method converges only if the difference in potential energy of the two states is small for a given configuration. If this is not the case, intermediate potentials are created by linear combinations of the two potentials using a coupling parameter λ.

    ( ) ( )V V Veff m m m A m Bλ ε λ λ= = − +1

    (12)

    (13)

  • - 18 -

    The coupling parameter λ is incrementally changed from 0 to 1 resulting in a gradual transformation of the effective potential from VA to VB. The free energy contribution from each transformation step can then be added to give the total free energy difference between state A and B. The FEP method is preferentially used in combination with a thermodynamic cycle which allows calculation of relative free energies of solvation, binding, pKa shifts, etc. It is also an important technique in combination with the empirical valence bond method described in the next section.

    3.4 The empirical valence bond method

    As mentioned earlier, chemical reactions involve reorganisation of the valence electrons constituting the chemical bonds. Consequently, computational studies of chemical reactions require a quantum mechanical description of the pathway from reactants to products. Unfortunately, QM methods can only solve the Schrödinger equation for a rather small number of atoms. Therefore, it is impossible to study interesting chemical reactions taking place inside an enzyme or in a system of explicit bulk solvent using a high level of theory for the entire system. One solution to this problem is to divide the system of interest in two parts: One part comprising the important regions of the active site, which is studied using a detailed quantum mechanical method. The rest of the system, the surrounding, can be described using a less detailed MM model. This partitioning approach was pioneered by Warshel and Levitt20 and is the idea behind all hybrid QM/MM methods. QM/MM approaches are often employed for enzyme catalysed reactions, since continuum models cannot easily model the inhomogeneous dielectric properties of a protein interior. These hybrid methods cover a range from ab initio (or semi-empirical) coupled MM models20-25 to the empirical valence bond (EVB) method5,26-31 used in this thesis. The EVB method was developed by Warshel et al. following the ideas of Coulson and Danielsson.32 It was originally used for studies of chemical reactions in solution,26 but has successfully been used in studies of enzyme catalysed reactions, e.g., serine proteases,5 lysozyme,5 triosephosphate isomerase,33 acetylcholine esterase,34 carbonic anhydrase,27 glyoxalase I,35 orotidine 5’-mono-phosphate decarboxylase.36 In the EVB method the reactants, intermediates and products along the reaction pathway are defined as resonance structures, or valence bond (VB) states. Each VB state (i) is described by a regular force field expression representing its atomic arrangement and charge distribution.

    ε αi ii bondsi

    anglesi

    torsionsi

    nb rri

    nb rsi

    ssiH V V V V V V= = + + + + + +( ) ( ) ( ) ,

    ( ),

    ( ) ( )

    The first four terms in this expression are similar to those of Equation 10 with the main difference being that bonds which are broken or formed during the reaction need to be

    (14)

  • - 19 -

    represented by Morse potentials of finite dissociation energy in contrast to the commonly used harmonic potentials. Energy contributions from angles and torsions are coupled to the degree of bond formation. The fifth term represents the interactions between the quantum atoms and the surrounding across the QM/MM boundary and the sixth term denotes the potential energy of the surrounding (protein/solvent) system. The last term of Equation 14 is the intrinsic gas phase energy of VB state i with all fragments at infinite separation. The reaction proceeds by transformation between relevant VB states which is accomplished by the free energy perturbation method. The ground state energy is calculated at each sampled conformation by solving the corresponding secular equation:

    HC C= Eg

    For a two-state system the solution to this eigenvalue problem is simply the lowest root of a second order equation:

    E Hg = + − − +12

    12

    41 2 1 22

    122( ) ( )ε ε ε ε

    The diagonal elements of the Hamiltonian matrix are the pure VB potentials (Equation 14) and the off-diagonal elements Hij, which determine the degree of coupling of these states, is represented by Equation 17.28

    H A r r r rij ij ij ij= − −exp( ( )) exp( ( ) )µ η0 02

    Here A, µ and η are calibration parameters, whereas r and r0 are the actual and reference distances between two reacting atoms respectively. A reaction path must somehow be characterised by a reaction coordinate, which describes the progress of the reaction from reactants to products. The reaction coordinate can for instance be distant dependent, but it may be difficult to choose a proper geometric coordinate in a multi-dimensional system. Constraining an atom–atom distance also leads to neglect of non-equilibrium solvation effects.6 A more general reaction coordinate is therefore the energy gap between the VB states.

    X i j= −ε ε The energy gap can be discretised in intervals X’ and as the reaction is mapped out by the parameter λ, the position of each configuration on the reaction coordinate is calculated. The ground state free energy profile along X can be calculated using the free energy perturbation umbrella sampling method given by Equation 19 (Figure 3).

    (17)

    (18)

    (15)

    (16)

  • - 20 -

    ( ) ( ) ( ) ( ) ( )[ ]{ }∆ ∆G X G RT X X E X X RTm g mm

    = − − × − −λλλλ ln ' exp ' ' /δ ε

    ( ) ( ){ }∆G RT RTm n n nnm

    λλλλ = − − −+=

    ln exp /ε ε10

    1

    As a result of this sampling method and the fluctuations in the simulations, each value of the mapping parameter λ will generate data points in more than one interval on the reaction coordinate X, and each interval X’ will contain data from different λs. The sample size weighted average gives the final free energies at X’. A simple, but very useful, quality control of the conformational sampling along the perturbation is to plot the sampled energies with the reaction coordinate X on one axis and the mapping parameter λ on the other as in Figure 4. This should give a smooth diagonal of data points, without discontinuities or outliers, for reliable sampling. Discontinuities in the perturbation, caused for example by sudden conformational changes, are easily revealed in such as plot, but may not show up in the ground state energy profile. One advantage with the EVB method is that the Hamiltonian can be calibrated to reproduce free energies of a reference reaction obtained from either experiments or quantum chemical calculations. This surface fitting procedure involves adjusting the parameters α(i) in the force field expression of Hii, and the coupling parameters Hij, so that the calculated free energy profile coincide with the energetics of the corresponding reference reaction. The constant α, or rather the difference in α between two VB states, ∆αij is related to the difference in gas phase heat of formation, which is not included in

    0

    20

    40

    60

    80

    -100 -60 -20 20 60 100

    X [kcal/mol]

    Free energy [kcal/mol]

    ∆g (Φ2) ∆g (Φ1)

    ∆g

    Figure 3. Free energy functions for two diabatic states ΦΦΦΦ1 and ΦΦΦΦ2, and the ground state free energy ∆∆∆∆G corresponding to the conversion between these states. ∆∆∆∆G is calculated according

    to Equation 19 as a function of the energy gap X (αααα2≠≠≠≠0, H12≠≠≠≠0).

    (19)

  • - 21 -

    the force field per se. After calibration the effect of a change in the surrounding on the reaction energy profile can be evaluated using exactly the same Hamiltonian as was used in the reference reaction. This is possible since the energy contributions from the environment only enters as Vrs in the diagonal elements Hii, which is a consequence of the regular QM/MM approximation that no charge transfer occurs across the quantum/classical boundary. The ability to calculate the activation free energy for a reaction in water and the corresponding reaction in a protein allows us to estimate the rate acceleration in enzyme catalysed reactions. The EVB method is used in the work on phosphate monoester hydrolysis by protein tyrosine phosphatases presented in Chapter 4 and in I-III.

    3.5 Calculation of protein–ligand affinity

    A major task in computational chemistry is to predict the strength of non-covalent binding of a small molecule (peptide, substrate or drug molecule), to the active site of a protein in terms of binding free energy (Equation 3). This is not a simple problem considering that measured binding constants are averages from large ensembles of dynamic molecules where both sides of the equilibrium (free protein and ligand as well as the complex) are represented. Nevertheless, affinity predictions based on the three dimensional structure of protein–ligand complexes can be accomplished by a number of different techniques each having its advantages and limitations with respect to accuracy

    0

    0.2

    0.4

    0.6

    0.8

    1

    -150 -100 -50 0 50 100 150

    X [kcal/mol]

    λ

    Figure 4. Example of a well-sampled EVB simulation. Each interval of the reaction

    coordinate X contains data from several values of the mapping parameter λλλλ and vice versa.

  • - 22 -

    and computational time. Three commonly used methods, which have been used in this work will be presented below. Free energy perturbation (FEP)

    The free energy perturbation method can be derived directly from statistical mechanics, as shown in a previous section, and does not rely on any calibration of empirical parameters other that those of the force field. This method is also the most computationally demanding method since extensive conformational sampling of the transformation is required for good convergence. In practice, this means that only relative binding free energies of rather similar systems can be obtained with good accuracy (for reviews in this field, see Refs.15-19). Calculation of relative binding free energies using FEP requires the use of a thermo-dynamic cycle and two separate perturbations, according to Figure 5. One perturbation where ligand A is transformed to ligand B free in solution, and one similar perturbation with the ligand bound to the solvated protein. The difference in binding free energy between ligand B and A can then be calculated from the energies corresponding to the two unphysical perturbations (vertical arrows) instead of the actual binding processes (horizontal arrows), which would be very complicated to model. The same methodology can also be used to calculate the effect of a mutation in the protein. The two analogous perturbations are then the transformation of one amino acid side chain into another, with and without the ligand bound. Free energy perturbations are used to calculate relative binding free energies in I and II. The linear interaction energy (LIE) method

    The LIE method introduced by Åqvist and co-workers is developed for calculation of absolute binding free energies.37,38 The binding free energy is expressed as a linear function of the difference in average intermolecular interaction energies of the solvated protein–ligand complex (p) and the solvated ligand (w). The average interaction

    ∆Gbind(A) A(water)+P AP

    ∆GFEP(water) ∆GFEP(protein) ∆Gbind(B) B(water)+P BP

    ∆∆ ∆ ∆ ∆ ∆G G G G Gbind bind bind FEP FEP= − = −( ) ( ) (protein) (water)ΒΒΒΒ ΑΑΑΑ

    Figure 5. Thermodynamic cycle for the calculation of relative binding free energies using free energy perturbation.

    (20)

  • - 23 -

    energies are force field energies obtained from MD simulations, so this method also involves two separate simulations just like FEP, but no transformation between different states is needed resulting in a less time-consuming method.

    ( ) ( )∆G V V V Vbind l svdW p l svdW w l sel p l sel w= − + − +− − − −α β γ The interaction energy of the ligand and its surrounding (denoted l-s in Equation 21) is separated in a non-polar (vdW) and an electrostatic (el) term. The coefficient β describing the contribution from the electrostatic interactions to the total free energy is ½ according to the linear response approximation, which is at least valid for charged ligands.39 The coefficients α and γ are left for calibration against experimental data. The initial parametrisation using a set of proteins with similar polar binding sites resulted in α = 0.181, whereas the constant term γ was close to zero and therefore excluded.38 The values of α and β appear to be relatively robust for many sets of proteins and ligands. However, recent studies show that the value of γ cannot be generalised for all systems. For example, a set of thrombin inhibitors required γ = -2.9 for good correlation between calculated and measured absolute affinities.40 Advances in the use of the LIE method as well as approaches to calibrate Equation 21 are reviewed in Ref. 41. The LIE method is used in the studies of DHFR inhibitors summarised in Chapter 6 and in VII and VIII. Empirical scoring functions

    Empirical scoring functions are by far the quickest methods for estimating binding free energies, since these functions are intended to work on a single protein–ligand conformation. Hence, this type of scoring approach neglects thermal averaging and the explicit unbound state of the binding process, which is included in the FEP and LIE methods. An affinity score is primarily based on a simple energy function as exemplified below: ∆ ∆ ∆ ∆ ∆ ∆ ∆G G N G N G N G N G N Gbind

    scoreH bond H bond ionic ionic lipo lipo aro aro rot rot= + + + + +− − 0

    Here the affinity score ∆Gbind

    score is a sum of contributions from hydrogen bonds, ionic, lipophilic and aromatic interaction, the number of rotations in the ligand and a constant term. Some scoring functions also include the important contribution from solvation effects. N… in the first four terms are the number of interactions between the protein and ligand. Usually, only short-range interactions are included, so N is expressed as a distance dependent function with a value between 0 and 1 for each counted protein–ligand interaction. Nrot is the number of rotatable bonds in the ligand and reflects entropic contributions to the binding energy. The coefficients ∆G… are empirically

    (22)

    (21)

  • - 24 -

    determined using linear regression analysis from a set of protein–ligand complexes with known binding constants. Thus, the quality of the score is limited by the number of empirical parameters and the amount of experimental data available for calibration. The speed of evaluation makes the empirical scoring functions suitable in docking programs and in applications where are large virtual libraries of chemical compounds are screened for affinity. Automated docking is a useful technique for predicting the favoured conformation of a protein–ligand complex. Different binding modes are generated by a search algorithm and each conformation is evaluated using a scoring function. The conformation corresponding to the best score is the outcome of such a docking procedure and consequently, the quality of the docking is highly dependent on the quality of the scoring function. It is of crucial importance that the conformation giving the best score is as close as possible to the ‘correct’ structure. Letting a docking program re-dock protein–ligand complexes of known structure one can easily test its performance. Docking a flexible ligand to a rigid protein is a relatively simple task, i.e. finding an energy minimum on the potential energy surface given by the scoring function. However, the commonly used approach with a rigid protein structure is not always accurate, because the protein may undergo conformational changes upon ligand binding as illustrated in IV. Tight binding may also be accomplished by water mediated interactions, which is difficult to predict in a straightforward way. There are a large number of docking programs with associated scoring functions available today: AutoDock,42 FlexX,43 DOCK,44 GOLD,45 just to mention a few. Two of these programs are used in VII and VIII where the interactions between small organic compounds and dihydrofolate reductase are studied.

  • - 25 -

    4. Computational studies of protein tyrosine phosphatases

    All processes taking place in living cells are regulated by complex signalling systems in order to guarantee the correct response of the cell to an external signal. Protein tyrosine phosphorylation is a central regulatory mechanism in these cell signalling networks. The addition of phosphates to tyrosine residues may change the catalytic activity of enzymes or provide specific binding sites for other proteins in the cell. The regulatory tyrosines are phosphorylated by specific protein tyrosine kinases (PTKs), which use ATP as the phosphate source. However, the level of tyrosine phosphorylation is also dependent on the specific action of the protein tyrosine phosphatases (PTPs), which counteract the PTKs by removing the phosphate groups from the same residues.46 Tyrosine phosphorylation is involved in the regulation of many important processes such as growth control, proliferation, differentiation, T-cell response, adhesion and insulin action,47 and perturbations in the level of tyrosine phosphorylation may underlie diseases such as cancer and diabetes. Much research effort has been focused on the functions of the PTK and PTP enzyme families due to their biomedical relevance. However, much is still to be known about the phosphatases in particular, since these enzymes have only been studied during the last two decades. This work concerns the structure and catalytic mechanism of the protein tyrosine phosphatases, and nothing else will be said about the kinases.

    4.1 Protein tyrosine phosphatases

    The family of protein tyrosine phosphatases can be divided into three subclasses. i) The tyrosine specific phosphatases constitute the largest group of PTPs and have been extensively reviewed in the literature.46-54 The first tyrosine specific phosphatase to be characterised and crystallised was PTP1B from human placenta55-57 and after that 50-100 homologous phosphatases have been identified.58 They are either membrane integrated receptors or cytosolic proteins.59 This diverse group of proteins all share at least one common catalytic domain of about 240 residues and in most cases, additional cellular localisation domains are associated with it. ii) The low molecular weight PTPs are ubiquitous cytosolic enzymes of 140-180 residues with no additional localisation domains. iii) The dual specific phosphatases (DSPs) can dephosphorylate pSer and pThr in addition to phosphotyrosines due to a somewhat more shallow active site. The DSPs can be divided into two groups characterised by the cdc25 phosphatases, responsible for the activation of the cyclin-dependent kinase complex, and the VHR (vaccinia virus late H1 gene related) phosphatase, respectively (for a review on DSPs see Ref. 60). There is also a large number of specific serine/threonine phosphatases, which have a totally different strategy for catalysis. These enzymes utilise bound metal ions for catalysing the reaction, which is not the case for the PTPs and DSPs.

  • - 26 -

    There has been much progress in the structural investigations of the PTPs and crystal structures representing all subclasses of are available.57,61-65 Structural data show that the low molecular weight PTPs and cdc25 phosphatases are neither sequentially nor structurally homologous to the tyrosine specific phosphatases. Yet, they share the same active site architecture implying a common catalytic mechanism. In this work, the bovine liver low molecular weight (low Mr) PTP was used for modelling the catalytic mechanism of the PTPs. This enzyme was chosen because our collaborators in the group of Nordlund provided us with high-resolution X-ray structures. In addition, large amounts of enzymological data are reported in the literature. The low molecular weight PTPs

    The low Mr PTPs have a molecular weight of 18 kD and are found in a wide range of organisms from prokaryotes and yeast to mammals (for a review see Ref. 66). The enzyme dephosphorylates a number of phosphotyrosine containing peptides and proteins in addition to other small arylphosphates in vitro, but the natural substrates are not yet identified and the biological function is still unclear. Nevertheless, overexpression of low Mr PTP has been shown to prevent Src binding to the platelet derived growth factor (PDGF) receptor, thus decreasing PDGF induced mitotic signalling.67 It is also known that the ephrin receptor recruits low Mr PTP upon binding of clustered multimeric ephrin B1.68 This receptor is part of the Eph family of tyrosine kinase receptors involved in axon development.69 The low Mr PTP can itself be regulated by tyrosine phosphorylation70 and recent data on a crystallised dimer suggest a potential self-regulating mechanism.71

    4.2 The structure and catalytic mechanism of PTPs (V)

    All PTPs possess the active site signature motif H/V-C-(X)5-R-S/T comprising the characteristic phosphate binding loop, referred to as the P-loop. The backbone amide NH-groups of the P-loop residues are oriented towards the centre of the substrate binding crevice forming a phosphate anion hole (Figure 6). As an extension of the P-loop backbone, the guanidinium group of the invariant arginine side chain is involved in binding the substrate and stabilising the transition states, by forming a bidentate interaction with two of the non-bridging oxygens of the substrate phosphate group. The hydroxyl group of the serine/threonine residue immediately after the arginine (not in cdc25) forms an important hydrogen bond with the catalytic cysteine.72,73 In addition to the P-loop, all PTPs (except possibly cdc25) also feature a conserved aspartic residue positioned on a more or less flexible loop close to the active site. In most PTP crystal structures this side chain is at hydrogen bond distance to the bridging oxygen of the ligand. Therefore, this residue is believed to function as a general acid which donates its proton to the leaving group oxygen.

  • - 27 -

    PTPs catalyse the hydrolysis of phosphotyrosines yielding inorganic phosphate and the dephosphorylated residue as products. The fact that active site structures, kinetic properties such as formation of an cysteinyl phosphate intermediate, pH-rate profiles etc. are similar for different types of PTPs72,74,75 indicates that they all employ a common mechanism for catalysis. The catalytic reaction in PTPs has been shown to proceed via a double displacement mechanism involving a phosphoenzyme intermediate where the phosphate group is covalently bound to the cysteine residue in the active site motif.76-78 The formation of this thiophosphate intermediate is accomplished by a substitution reaction where the catalytic cysteine attacks the phosphorus atom and the leaving group oxygen is protonated by the general acid as the P–O bond is cleaved.79,80 This aspartate residue subsequently activates a water molecule which hydrolyses the phosphorylated cysteine in the following step (Figure 7). It is most likely that the catalytic cysteine is in its ionised form when the first nucleophilic displacement takes place. pKa estimations by enzyme inactivation experiments indicate that the cysteine may also be ionised in the free enzyme at physiological pH.81-84 The identification of the important catalytic residues and their function has been verified from a wealth of experiments including enzyme kinetics, site directed mutagenesis and structure determination. Hence, the basis for catalysis is well known, but there are some uncertainties regarding the interpretation of the available data and several contradictions are found in the literature. In particular, the pKas and ionisation states of the catalytic groups are not well established (Figure 7). These issues have also been the subject of

    Figure 6. The active site structure of a typical PTP. Here, low Mr PTP in complex with

    sulphate63, a competitive inhibitor. The nucleophilic cysteine resides at the bottom of the P-loop formed by the backbone of the conserved sequence C-X5-R-S. Hydrogen bonds are

    indicated with dashed lines.

    Asp129

    Arg18

    Cys12 Ser19

  • - 28 -

    computational studies aimed at a more detailed picture of phosphate hydrolysis by the PTPs.72,85-88 In V these experimental data and theoretical calculations on the PTP reaction mechanism are reviewed. We have found that the general model of the mechanism with an ionised nucleophile and a dianionic substrate may be questioned on the basis of available data. The model is mainly based on pH-rate profiles which do not have a unique interpretation. Nor is it supported by structural data, since available crystal structures indicate that only two negative charges are preferred in the active site. Furthermore, none of the recent theoretical studies of the PTPs have supplied evidence for this ionisation state. An alternative model would be the case where one proton is present on the nucleophile and the substrate in the enzyme–substrate complex. Depending on the relative pKa of these groups the hydrogen would either be bound to the sulphur or to one of the equatorial oxygens of the phosphate group when the Michaelis complex is formed. In fact, both theoretical studies as well as analysis of crystal structures support such a mechanistic model without being in conflict with enzymological data. The following section describes how the empirical valence bond method was used to study the energetics of the reaction catalysed by low Mr PTP. Different plausible reaction pathways and protonation states were investigated. The aim was to find a reaction mechanism that is compatible with the experimental observations and to understand the specific interactions important for catalysis. By combining the EVB results with binding free energy calculations the most probable protonation state of the reacting groups could be determined. The consistency of the modelled reaction mechanism was further verified by studies of mutant enzymes.

    Asp

    CO O

    H

    SCys

    O(H)

    PO O

    O

    R

    O

    RH

    Asp

    CO O

    H

    SCys

    Asp

    CO O

    SCys P O

    O(H)

    OO

    HH

    Figure 7. The reaction mechanism catalysed by the PTPs. The two proposed protonation states of the substrate are indicated.

  • - 29 -

    Φ7 SN

    O

    H

    P O

    HO

    OO

    H

    Φ8 S HN

    O

    H

    P O

    HO

    OO2-

    Φ5 SN

    O

    H

    P

    O

    O O

    H O

    O

    OH

    H

    Φ6 SN

    O

    H

    P

    O

    O O

    H

    HO

    O

    O

    H

    2-

    ..... .....

    Φ12-

    S H

    N

    O

    H

    P O

    O

    OO

    Φ3 ..... ......SN

    O

    H

    P

    O

    O O

    H

    HO

    O

    O

    2-

    Ψ2 SN

    O

    H

    P O

    O

    OO

    2-

    Ψ3 ..... ......SN

    O

    H

    P

    O

    O O HO

    O

    O

    3-

    Ψ4 SN

    O

    H

    P

    O

    O OO

    O

    O

    H2-Φ4 S

    N

    O

    H

    P

    O

    O O

    HO

    O

    O

    H

    Φ2 SN

    O

    H

    P O

    O

    OO

    H

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    Figure 8. Valence bond states used in the EVB calculations of the reaction catalysed by low Mr PTP. The formal charges of the reacting groups are indicated.

  • - 30 -

    4.3 Modelling the reaction mechanism in low Mr PTP (I-III)

    The phosphate hydrolysis reaction was divided into a series of elementary reaction steps each defined as the conversion between two valence bond (VB) states. The final VB structures used are shown in Figure 8. The first step of the reaction (Φ1→Φ2) represents activation of the nucleophile by proton transfer from the cysteine to the dianionic phosphate group of the substrate.72 The next step is the formation of a transient high energy penta-coordinated structure (Φ3), followed by release of the leaving group with concerted proton transfer from the general acid residue (Φ3→Φ4). In Φ4 the phosphate group is covalently bound to the enzyme via a thiophosphate linkage. In the second part of the reaction the leaving group is replaced by a water molecule which hydrolyses the phosphoenzyme via a second penta-coordinated structure (Φ5→Φ6). Inorganic phosphate is obtained as the final product when the S–P bond is broken (Φ6→Φ7) and finally, one of the protons is transferred back to the cysteine (Φ7→Φ8) restoring the initial state of the enzyme. The first substitution reaction was also examined by simulating a plausible pathway for an unprotonated mechanism (Ψ2→Ψ3→Ψ4) with a total charge of –3 on the reacting fragments. In this case there is no proton transfer between the nucleophile and the phosphate group and the negatively charged cysteine reacts directly with the phosphorus atom of the dianionic substrate. The EVB Hamiltonians of the different reaction steps simulated in water were calibrated utilising experimental data from uncatalysed reactions, linear free energy relationships as well as semi-empirical and ab initio geometry optimisations, as described in detail in I. The corresponding reactions in the solvated enzyme structure were then simulated in an analogous manner and the catalytic effect due to the change in environment, from water to protein, could be evaluated.

    Substrate dephosphorylation

    We found that for the protonated reaction the protein environment facilitates proton transfer from Cys12 to the phosphate group of the substrate (Φ1→Φ2), thus ensuring availability of the nucleophilic anion for the substitution reaction. This proton transfer would then correspond to a substrate assisted reaction mechanism if the cysteine is in its thiol form in the free enzyme. A small difference in free energy between Φ1 and Φ2 (1.5 kcal/mol) indicates that the pKa of the cysteine is close to that of the substrate, i.e. it is lowered by the enzymatic environment. It has previously been shown that, among other interactions, a hydrogen bond from Ser19 is important for lowering the pKa of Cys12.72,73

  • - 31 -

    The simulations show that the presence of the protein structure lowers the transition state energies of the protonated reaction (Φ1→Φ4) by 9 kcal/mol compared to the water reaction, resulting in an transition state energy of 14 kcal/mol in perfect agreement with the experimental turn-over rate of this step with pNPP as substrate at pH 5 (540 s-1,89 789.5 s-1 79). This transition state stabilisation originates almost entirely from non-bonded electrostatic stabilisation by the P-loop residues. Surprisingly, the corresponding reaction of the deprotonated complex (Ψ2→Ψ4) is catalysed by as much as 20 kcal/mol, resulting in the same barrier height of 14 kcal/mol. Also here the P-loop contributes with the very same electrostatic stabilisation. The additional 10 kcal/mol comes from the large substrate destabilisation, which originates from electrostatic repulsion between the reactants when they carry a total charge of –3. The restraints imposed on the substrate by the protein do not allow this repulsion to relax as much as in water. Thus, the unprotonated reaction represents an electrostatically strained enzyme–substrate complex suggesting that the substrate may not bind at all. An unfavourable state of the enzyme–substrate complex was also reflected in a calculated exothermicity of 13 kcal/mol for the unprotonated (Ψ2→Ψ4) reaction. The difference in free energy between Ψ4 and Φ4 is given by 1.36·(pH–pKa), where the relevant pKa is that of the thiophosphate group in the enzyme. This pKa value is most likely close to the pH normally used in experiments and thus, only a small free energy difference between Ψ4 and Φ4 is expected. Here, the calculated difference is 16 kcal/mol, which indicates that the free energy profiles for the two simulated reactions are shifted relative to each other and that the exothermicity results from destabilisation of the reactants (Ψ2) rather than a large stabilisation of the phosphoenzyme intermediate (Ψ4). Simulations of the P–O bond cleavage and leaving group departure clearly indicate that bond cleavage at the bridging oxygen has to be concerted with protonation of the leaving group in order to suppress a charge separation in the active site. This is in agreement with interpretations of solvent isotope effects and proton inventory experiments which suggest that the proton from the general acid is largely transferred to the bridging oxygen in the transition state.90 The bond cleavage was first simulated along a stepwise pathway with consecutive P–O bond break and proton transfer, via a phenolate species (Figure 9). This pathway was predicted to be energetically unfavourable in the enzyme, yielding a barrier of ∼22 and ∼35 kcal/mol for the protonated and unprotonated reactions respectively. A developing negative charge on the leaving group oxygen apparently cannot be stabilised by the relatively hydrophobic surrounding in this region and, since the binding cavity is very narrow, solvating water molecules are excluded from the active site. The concerted pathway, Φ3→Φ4 is strongly facilitated by the enzyme and the resulting negative charge on Asp129 is, unlike the phenolate ion, accessible to solvent.

  • - 32 -

    Substrate binding

    The EVB simulations indicate that the unprotonated enzyme–substrate complex is energetically strained. The ability to bind the substrate is of course a prerequisite for enzyme catalysis and an important step in the reaction. The binding step should therefore be considered for a complete understanding of the enzyme energetics. In this case, substrate binding appeared to be the key for solving the question regarding a monoanionic or dianionic substrate. We examined the issue by evaluating the difference in substrate affinity for the two different protonation states. This was performed using free energy perturbation (FEP) calculations where the substrate phenyl phosphate was transformed from monoanion to dianion in aqueous solution and in the solvated protein with Cys12 in its anionic form according to the thermodynamic in Figure 5. The calculated difference in binding free energy was 15.9±0.9 kcal/mol for the monoanion to dianion perturbation, indicating that there is much less affinity for a dianionic substrate than a monoanionic substrate with Cys12 ionised. The result also shows that the pKa of the substrate–nucleophile complex is highly elevated. This was confirmed by a corresponding FEP calculation where the cysteine was transformed from thiol to thiolate in the absence and presence of a dianionic substrate. Combining these two perturbations with the EVB result from the proton transfer step we can close the thermodynamic cycle depicted in Figure 10. Average MD structures from the perturbation showed that the distance between the nucleophile and the phosphorus atom increased significantly (from 3.6 to 4.6 Å) due to electrostatic repulsion, as the perturbation proceeded from monoanion to dianion. Superposition of the MD structures from the protonated and unprotonated states and the crystal structure showed that the overall P-loop structure was significantly distorted in the case where the proton was absent. On the other hand, with Cys12 ionised and a

    ..... ......S P

    O

    O O

    H

    HO

    O

    O

    2-

    14

    22

    p.t.

    l.g. H OO

    O

    S P

    O

    O O

    H

    _

    _

    O

    O

    O

    HS P

    O

    O O

    H _

    Figure 9. The energetics of step-wise and concerted leaving group departure and proton

    transfer from the general acid. The numbers are the calculated activation energies in kcal/mol for the two simulated pathways.

  • - 33 -

    proton on one of the oxygens the average MD structures were in excellent agreement with the crystal structure. In accordance with the calculated energetics the MD structures showed that a dianionic substrate, although having favourable interactions with the P-loop amide nitrogens and the positively charged Arg18, is in an electrostatically disfavoured environment. The destabilisation of the unprotonated ES complex obtained from the binding calculations agrees well with the exothermicity of 13 kcal/mol observed in the Ψ2→ Ψ3η→Ψ4 simulations with no proton present on the reacting groups. This allows us to rather accurately close the thermodynamic cycle describing the states involved in the two possible reaction pathways. The difference in binding free energy shifts the unprotonated ES state (Ψ2) by +16 kcal/mol relative the corresponding protonated state (Φ2) and as a result, the levels of Ψ4 and Φ4 closely coincide, as expected (see Figure in I or II).

    Phosphoenzyme hydrolysis

    The second step of the reaction, phosphoenzyme hydrolysis, was simulated analogously to the first reaction step, but here only the protonated mechanism was considered. As can be seen in Figure 11, where the complete reaction is summarised, all steps of the reaction are significantly catalysed by the enzyme compared to the uncatalysed reference reaction in water. In particular, the activation barrier of the rate limiting step, formation of the second penta-coordinated high-energy structure (Φ5→Φ6), is lowered by as much as 15 kcal/mol. The calculated rate limiting barrier is 16 kcal/mol which is in excellent agreement with the reported kcat value of 27.5 s–1 for phenyl phosphate.89 The larger catalytic effect on the second reaction step compared to the first reflects an increased pKa of the aspartic acid. This pKa perturbation improves its role as a general base, beneficial for the rate limiting step, at the expense of its role as a general acid. The increased pKa was confirmed by free energy perturbation calculations where the aspartic acid was ionised free in solution and in the enzyme–substrate complex. The calculated difference in ionisation free energy (∆∆G) was 4.2 kcal/mol corresponding to a pKa value increased by 3 units.

    monoanion + thiolate dianion + thiolate∆∆G=16 1±

    ∆∆G=-2 1± ∆∆G=-13 1±

    Figure 10. Thermodynamic cycle describing the ionisation of the substrate-nucleophile

    complex. ∆∆∆∆∆∆∆∆G refers to calculated ∆∆∆∆Gprotein–∆∆∆∆Gwater in kcal/mol.

  • - 34 -

    The role of Cys17

    MD trajectories of the wild type phosphoenzyme intermediate showed that two to three water molecules interact directly with the phosphate group. One of these water molecules will be in the right position for the hydrolysis reaction to occur. In PTP1B Gln262 has been found to be an important residue for coordinating the nucleophilic water molecule. Mutating this residue to alanine resulted in phosphoenzyme trapping which made it possible to crystallise the reaction intermediate.78 Although very similar in active site structure, there is no corresponding glutamine present in low Mr PTP. However, our simulations of the hydrolysis reaction (Φ5→Φ6) showed that Cys17 interacts with the nucleophilic water. It seems that this interaction is involved in coordinating the water molecule in favour of the reaction. The involvement of Cys17 in the phosphoenzyme hydrolysis step was proposed by Cirri et al.91 already in 1993, before the structure was solved. When Cys17 was mutated to a serine the enzyme displayed low activity, but significant amounts of phosphoenzyme intermediate was trapped. This suggests that the larger thiol group better orients the water molecule than the smaller hydroxyl group in position 17. We therefore calculated the free energy profile for the water attack (Φ5→Φ6) in the Cys17Ser mutant enzyme and it was found that the free energy barrier increased with 1.6 kcal/mol. This is totally consistent with the 6% residual activity compared to the wild type enzyme presented by Davis et al.92 Superimposing the active site residues of PTP1B and low Mr PTP reveals that the proposed water coordinating residues (Gln262 in PTP1B and Cys17 in low Mr PTP) are in the same spatial position relative to the active site, although not sequentially related.

    -5

    0

    5

    10

    15

    20

    25

    30

    waterLow Mr PTP

    Φ3 Φ4/5Φ1 Φ2 Φ7 Φ8Φ6

    ∆G (kcal/mol)

    Reactioncoordinate

    Figure 11. Free energy profile for the reaction catalysed by low Mr PTP. The reaction coordinate refers to the valence bond states shown in Figure 8. The solid curve is the

    reference reaction simulated in water and calibrated against experimental data. The dashed curve shows the calculated free energy profile for the corresponding reaction in the enzyme.

  • - 35 -

    Cys17 is a residue in the phosphate binding loop, whereas Gln262 is positioned in a flexible loop that can apparently move in and out of the active site.78 An alternative reaction mechanism for mutants lacking the general acid

    The Asp129Ala mutant of low Mr PTP has been extensively studied by enzymological experiments. This mutant lacks the catalytically important general acid/base residue. However, the mutant is not entirely inactive, but retains a turn-over rate around 3000 times slower than that of the wild-type enzyme.79 The EVB calculations showed that protonation of the leaving group is essential for catalysis of phenyl phosphate hydrolysis since release of a negatively charged phenolate species is energetically disfavoured. If the leaving group was modelled to depart as an anion the calculations predicted an energy barrier that is not compatible with the experimentally observed activity. This led to the hypothesis that the phosphate group itself may act as an acid in the first reaction step of this mutant and protonate the leaving group concertedly with its release. The alternative reaction mechanism for Asp129Ala was simulated in the same way as the wild type reaction, but with a slightly different set of valence bond states. This alternative protonation of the leaving group yielded an activation barrier of the first step that was 5 kcal/mol higher than the corresponding wild type reaction step. This corresponds to a decrease in rate by a factor of 4000 which is consistent with experiments. It then seemed reasonable that the –2 charged phosphocysteine could itself abstract a proton from the attacking water molecule in the second hydrolysis step of the Asp129Ala mutant enzyme. This mechanism would then be similar to the substrate assisted reaction mechanism proposed for the acylphosphatase.93 The complete free energy profile for this reaction mechanism in the Asp129Ala mutant is shown in Figure 12. The free energy level of the phosphoenzyme intermediate lies somewhat below the initial enzyme–substrate level. From this lowest point of the profile the rate limiting

    -5

    0

    5

    10

    15

    20

    Low Mr PTPAsp129Ala mutant

    Φ3 Φ4/5Φ1 Φ2 Φ7 Φ8Φ6 Reactioncoordinate

    ∆G (kcal/mol)

    Figure 12. Calculated free energy profile for wild type and Asp129Ala mutant low Mr PTP.

  • - 36 -

    barrier Φ5→Φ6 was calculated to be 20 kcal/mol which is in accordance with the observed turn-over rate of 0.012 s–1.79 For the wild type enzyme, the phosphoenzyme intermediate (Φ4/5) is higher in energy than the initial enzyme–substrate complex, while in the mutant it is slightly lower. This would imply that more phosphoenzyme intermediate accumulates in the Asp129Ala mutant than in the wild-type, which has also been observed by phosphoenzyme trapping experiments.79 The results from these simulations suggest that mutations of the general acid/base residue may enforce a change in the reaction mechanism. Conclusions

    Arylphosphate hydrolysis is effectively catalysed by the PTPs without the use of active site bound cations utilised by many other proteins that handle phosphorylated substrates. The catalytic power of the PTPs instead arises from the perfectly designed active site structure which stabilises each step of the reaction by electrostatic interactions. The major properties that contribute to catalysis can be summarised as follows: i) The essential nucleophilic thiolate species is stabilised by the interaction with a hydroxyl group and a number of backbone amides hydrogen bonds. This stabilisation lowers the pKa of the cysteine favouring its activation (Φ1→Φ2). ii) The P-loop backbone amides and the side chain of the arginine residue provide perfect electrostatic stabilisation of the equatorial oxygens of the penta-coordinated transition states by a network of hydrogen bonds. iii) An increased pKa of the general acid/base residue results in a larger catalytic effect in the second, rate limiting step where the water molecule is activated by the general base (Φ5→Φ6), compared to the first step where the same residue acts as an acid (Φ3→Φ4). iv) The proton transfer has to occur synchronously with the P–O bond breakage to suppress a negative charge on the leaving group. v) The hydrolytic water molecule is stabilised by interactions with Cys17. The effect of these structural features on the phosphate hydrolysis reaction is demonstrated by the reported EVB calculations and the obtained free energies of each reaction step is compatible with experimental observations. The fact that also the energetics of low Mr PTP mutants (Asp129Ala and Cys17Ser) are fully consistent indicates that the present computational modelling approach can successfully describe the catalytic process in a PTP. Most importantly, the calculations show that the P-loop is designed to stabilise exactly two negative charges, which means that the reacting fragments (nucleophile and phosphate group) must be singly protonated in the Michaelis complex. This is in contradiction to the previously proposed reaction mechanism, but fully compatible with the reported structural, enzymological and theoretical data as discussed in V. Establishing the protonation state of the groups involved is essential for fully understanding the energetics of the catalytic mechanism and thus, the results presented

  • - 37 -

    here could serve as a framework in which enzymological experiments may be interpreted. Simulation details

    All MD/FEP/EVB calculations were carried out using the program Q94 developed in our lab. The force field parameters for the different VB states were taken as far as possible from the GROMOS87 potential,95 which was also used to model the rest of the protein. However, bonds within the reacting fragments were represented by Morse potentials using standard bond lengths and dissociation energies. Charges for the non-standard moieties involving S–P bonding were also derived from AM1-SM2 calculations and merged with those of the standard GROMOS fragments to maintain compatibility with these. Charges and van der Waals parameters for the thiolate and phosphate species were those developed by Hansson et al.72 The protein coordinates used in the MD simulations were those of bovine liver low Mr PTP in complex with sulphate ion (PDB entry 1PHR).63 The simulation system was spherical with a radius of 16 Å and the VB structures were surrounded by SPC water in the solution (calibration) simulations and by the solvated X-ray structure of the protein in the enzyme simulations. Water molecules generated closer than 2.3 Å to the protein or crystal waters were removed. Protein atoms outside the simulations sphere were restrained to their crystallographic coordinates and interacted only via bonds, angles and torsions across the boundary. A non-bonded cut-off radius of 10 Å was used together with the local reaction field (LRF) method96 for longer range electrostatics. The water surface was subjected to radial and polarisation surface restraints according to a new model described by Marelius et al.94 After equilibration the MD trajectories were run at 300 K using a time step of 1 fs and energy data were collected every fifth step. The free energy perturbations were sampled using around 50 λ-points and 5 ps simulation for each value of λ. Data from the first 2 ps of each step were discarded for equilibration. This sampling protocol yielded a standard deviation of 0.2–0.5 kcal/mol at an interval X’ on the reaction coordinate. To exclude the risk of force field dependent results the energetics of the first displacement reactions (Φ1→Φ2→Φ3→Φ4 and Ψ1→Ψ2→Ψ3) were verified using the CHARMM22 force field11 as implemented in Q.94 These simulations were performed using the same simulation protocols as above, but with completely independent atomic charges and atom type parameters. Other differences compared to the GROMOS simulations were an all-atom representation of the protein and the TIP3P water model.97 The free energy profiles obtained using CHARMM22 agreed almost perfectly with those of GROMOS87 (Figure 13) and the transition state energies differed by less than 1 kcal/mol.

  • - 38 -

    Φ1==========================Φ2=================================Φ30

    4

    8

    12

    Reaction coordinate

    Free

    ene

    rgy

    [kca

    l/mol

    ]

    CHARMM22GROMOS87

    Figure 13. Calculated reaction free energy profiles representing the two