xpang_paper_a2
TRANSCRIPT
A Mathematical Model for Peptide Inhibitor Design
*XIAODONG PANG,1,2 *LINXIANG ZHOU,1 MINGJUN ZHANG,3
FANG XIE,3 LONG YU,3 LILI ZHANG,4 LINA XU,4 and XINYI ZHANG1,2
ABSTRACT
This article presents a mathematical model on the design of peptide inhibitors for proteins.This model is a combination of the two rules on protein-ligand interaction, Miyazawa-Jernigan (M-J) matrix and hidden Markov model (HMM). The model is applied to predictpeptide inhibitors for the protein cyclophilin A (CypA) and FKBP12, and then validated bythe highest occupied molecular orbital calculation, dock process between protein and in-hibitor, and biological experiments. The results are encouraging and suggest that we havetaken a step forward towards building a mathematical theory on the design of peptideinhibitors for proteins. The mathematical model is rough at present, but if it represents acorrect direction of the theoretical trends of biology as we believe, then this theory can befurther developed and become more and more precise.
Key words: hidden Markov model, mathematical model, Miyazawa-Jernigan matrix, peptide
inhibitor design, protein-ligand interaction.
1. INTRODUCTION
So far, with respect to finding a real ligand for a given target protein, we are limited to experi-
mental screening from a large number of small molecules in drug databases or computer scans through free
energy calculation of assessing a ligand. Here, we build a mathematical model to help find the peptide inhibitor
of a protein. As the Nobel laureate Walter Gilbert said (Gilbert, 1991), ‘‘The new paradigm, now emerging, is
that all the ‘genes’ will be known (in the sense of being resident in databases available electronically), and that
the starting point of a biological investigation will be theoretical. An individual scientist will begin with a
theoretical conjecture, only then turning to experiment to follow or test that hypothesis. The biology will not be
a science based on observation and experiment only, it would have theoretical trends.’’
Under specific circumstances, peptides can play an important role in the discovery of lead compounds.
De novo peptide design was started in 1995 and ever since has received intense scrutiny in drug discovery
for its key advantage of easy synthesis. Current methods for structure-based drug design can be roughly
divided into two categories. The first category is directly screening a large number of candidates from an
existing database. The second category is structure generation (Lewis and Leach, 1994), also referred to as
de novo design.
1State Key Laboratory of Surface Physics and Department of Physics, 2Synchrotron Radiation Research Center, and3State Key Laboratory of Genetic Engineering Institute of Genetics, School of Life Sciences, Fudan University,Shanghai, China.
4Department of Electrical and Computer Engineering, Rice University, Houston, Texas.*These two authors contributed equally to this work.
JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 17, Number 8, 2010
# Mary Ann Liebert, Inc.
Pp. 1081–1093
DOI: 10.1089/cmb.2009.0272
1081
The main propose of this article is to present a new method of peptide design. However, it is also our
purpose here to test our whole theoretical approach to peptide inhibitor design and to prove that it works.
Our target is to build a mathematical model, and then make the design work of peptide inhibitor as a step on
the path of our theory. Our mathematical model comprises four knowledge blocks: (1) the two rules on
protein-ligand interaction, (2) Miyazawa-Jernigen (M-J) matrix, (3) hidden Markov model (HMM), and (4)
residue-residue contact preferences.
The outline of this article is as follows. First, we explain the four knowledge blocks of our mathematical
model, as well as our particular mathematical model. In Results and Discussion, we present the design of
the tripeptide inhibitor Ala-Gly-Pro (AGP) for protein cyclosporine A (CypA) and dipeptide Gly-Gln for
protein FKBP12, as well as our criteria and experimental results. Finally, we provide a conclusion.
2. METHODS
2.1. Two rules on protein-ligand interaction
Combining the full electronic structure calculation and surface pocket calculation of proteins, we pro-
pose two rules on protein-ligand interaction. For more detail, please refer to Pang et al. (2008). The first
rule is that interactions only occur between the lowest unoccupied molecular orbitals (LUMOs) of a protein
and the highest occupied molecular orbital (HOMO) of its ligand, not between the HOMOs of a protein and
the LUMO of its ligand. This provides a rough criterion to ligand selection. The second rule is that only
those residues or atoms located both on the LUMOs of a protein and in a surface pocket of a protein are
active residues or active atoms of the protein and the corresponding pocket is the ligand binding site. This
enables us to identify not only the ligand binding site, but also the active residues, and even the active
atoms of a protein.
2.2. Miyazawa-Jernigen matrix
Miyazawa and Jernigan (1985) used existing protein databases and demonstrated that the stronger the
interaction between two residues is the greater their chance of connecting with each other. They analyzed
thousands of crystal structures of proteins in the existing protein databases and obtained the statistical
contact energy between all 20 kinds of residues to construct a 20�20 symmetry matrix. (We call it the ‘‘M-J
matrix,’’ and the unit is RT¼ 0.60 Kcal/mol¼ 4.2�10�21 J¼ 0.0260160 eV.)
The M-J matrix has 20�20¼ 400 elements. As a symmetry matrix, it has 210 independent elements,
including 20 diagonal elements and 190 off-diagonal elements. But when you subtract the average value
from each element and solve its eigen equation to obtain 22 independent elements (Li et al., 1997), 20 of
them express the relative average potential of 20 residues in the folded protein. As for the other two, one
expresses the potential strength, and the other expresses the interaction coupled strength between residues.
Nonetheless that potential strength is two orders of magnitude larger than the interaction coupled strength
between residues, which means that the structure of a protein is not decided by the interaction between
residues, but rather by the average potential in fold.
If we do not consider degeneration, the M-J matrix should have 20 eigen vectors Va and eigen value la.The M-J matrix can be expressed by formula (1):
Mij¼X20
a¼1
kaVa, iVa, j (1)
where i, j is the index of residue. Two of the 20 eigen values have the biggest absolute value:
k1¼ � 22:49
k2¼ 18:62 (2)
kother ¼ 0:013~2:17
Therefore, taking the average hMiji, the formula (1) can be written as
Mij¼hMijiþ k1V1, iV1, jþ k2V2, iV2, j (3)
And the eigen vectors V1 and V2 are relative.
1082 PANG ET AL.
V2, i¼ bþ cV1, i¼ � 0:30� 0:90V1, i (4)
Let qi¼V1,i, then Mij can be simplified as M0ij
M0ij¼C0þC1(qiþ qj)þC3qiqj¼ � 1:492þ 5:030(qiþ qj)� 7:400qiqj (5)
The q value of each residue is shown in Table 1.
The M-J matrix has two large eigen values, which expresses that 20 kinds of residue can be roughly
divided into two groups: hydrophobic residue (H) and polar residue (P). Given this fact, the interaction
between residues has three varieties: H-H, P-P, and H-P. We also find an interesting phenomenon from the
q value of residues: the q values are divided into two groups, and there is a gap between them (Fig. 1).
2.3. HMM
Markov chain is a stochastic process. The next state of a one-step Markov chain is relative to the present
state only, but not relative to the previous state. In this article, we simply call one-step Markov chain a
‘‘Markov chain.’’
Suppose some of the previous states are X0, X1, . . . , Xt, then the probability P(Xtþ1 jXt) of its next state Xtþ1
is only relative to the state Xt. The probability from the state i to the state j is called the ‘‘transition matrix’’:
Pij¼PfXtþ1¼ ajjXt¼ aig (6)
The HMM contains two sequences of stochastic variable. One is the non-observable Markov chain,
which is expressed by the transition matrix. The other is an observable stochastic sequence, which describes
the likely output probability of each observable value under some state of Markov chain through an
emission matrix.
Another element of a Markov chain is its initial distribution p¼ {p:}, so HMM has five elements:
1. The state number M of a Markov chain.
2. The observable N values for each state.
3. A transition matrix T of a Markov chain with M�M dimensions; the sum of each row is one.
4. An emission matrix E with M�N dimensions; the sum of each row is one.
5. An initial distribution of a Markov chain: p.
These five elements form HMM l¼ (TEp).
Table 1. The q Value of Each Residue
Residue q (RT)
Leu (L) �0.443
Phe (F) �0.438
Ile (I) �0.390
Met (M) �0.327
Val (V) �0.315
Trp (W) �0.298
Cys (C) �0.265
Tyr (Y) �0.226
Ala (A) �0.125
His (H) �0.107
Thr (T) �0.058
Pro (P) �0.054
Gly (G) �0.048
Gln (Q) �0.023
Arg (R) �0.020
Ser (S) �0.011
Asn (N) �0.011
Glu (E) þ0.028
Asp (D) þ0.048
Lys (K) þ0.065
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1083
2.4. Residue-residue contact preferences
Glaser et al. (2001) used a non-redundant set of 621 protein-protein interfaces of known high-resolution
structures to derive residue composition and residue-residue contact preferences. They estimated the
likelihood Gij(v) of contacts between a pair of residues i and j as a criterion for propensity of residue-
residue contact (Table 2):
Gij(v)¼A log (Qij(v)=WiWj) (7)
where Qij(v) is the number of residue-residue contacts by residue volumes V:
Qij¼ (v)¼CijViVj=X
k, l
(CklVkVl) (8)
and Cij is the total number of contacts observed between residue type i and j, and Wi is defined as
Wi¼Fi=X
i
Fi (9)
Fi is the number of residue i, having at least one contact with any residue across the interface.
2.5. Mathematical model for peptide inhibitor design
As we mentioned above, we will build a mathematical model for peptide inhibitor design based on the
above four knowledge blocks. Its principles are as follows:
1. Take a protein sequence as a Markov chain. That means the appearing probability of a residue in a
sequence is decided by its previous residue only.
2. The Markov chain has 20 states due to 20 kinds of residue. They construct 20�20 transition matrix by
the elements of M-J matrix, but let the sum of each line of this matrix be one.
Pij¼ exp (�M0ij=kT)=X20
j¼1
exp (�M0ij=kT) (10)
where k is Boltzmann constant and take temperature T¼ 300 K.
3. The emission matrix has to express the appearing probability of a residue in a peptide inhibitor. It is
also a 20�20 matrix. It should incorporate as much biological knowledge as possible.
FIG. 1. q value of each residue. These values are divided into two groups by a gap between �0.20 and �0.15.
Hydrophobic residues are on the left site of the gap, while the polar residues are on the right side.
1084 PANG ET AL.
The biggest challenge is how to determine the probability of a residue appearing in a peptide inhibitor.
So far we only have limited experimental data to write this emission matrix. The rule of thumb for
constructing an emission is as follows:
a. As a Markov chain, first of all, we have to determine which residue as initial residue of peptide inhibitor.
b. We take Fabian Glaser’s residue-residue contact preferences Gij(v) as the emission matrix (Table 2).
But we need to let the sum of each line of this matrix be one:
gij¼Gij(v)=X20
j¼1
Gij(v) (11)
c. According to the existing experimental data, six residues (Ile, Trp, Tyr, Pro, Arg, and Asp) often fall
down in the active pocket and four small residues (Phe, Val, Ala, and Gly) often as company.
Therefore, we should pay more attention to these 10 residues.
This is a key step in determining the emission matrix in our peptide inhibitor design. At present, we only
construct a mathematical theory, but the emission matrix needs to be revised step by step according to new
exact experimental data; then, this theory will have higher precision.
4. According to the two rules on protein-ligand interaction, for a given protein its LUMOs energy is
fixed. Hence, a ligand with higher HOMO energy would have more chance to interact with the protein
from the energy viewpoint. In other words, the higher the energy level of a peptide, the higher its
probability to interact with the protein. It could be argued that we should select a peptide inhibitor that
has higher M-J energy.
5. Now when having the transition matrix and the emission matrix of HMM and two rules on protein-
ligand interaction, we obtain enough materials to write a script program called ‘‘PEPTIDE.m’’, using
Matlab to generate the potential peptide sequences for a given protein.
6. But the selection of inhibitor requires very complex engineering. We could not use only HMM theory.
We still need to employ some other criteria to identify them, such as HOMO calculation and dock
process, for re-selection. Finally, some biological experiments are undertaken to assist and validate
the identification.
Table 2. Residue-Residue Contact Preferences
I V L F C M A G T S W Y P H E Q D N K R
I 3.89 4.91 4.59 5.33 1.76 5.25 2.84 0.77 3.05 1.00 6.24 5.61 3.27 3.38 3.20 3.60 2.30 1.59 3.23 3.80
V 4.91 3.74 4.20 4.69 2.89 4.37 2.57 20.41 2.83 1.42 2.92 3.95 2.90 3.21 3.22 3.22 1.93 1.36 4.45 4.18
L 4.59 4.20 4.03 4.86 2.93 5.32 2.77 20.37 2.07 1.41 5.77 4.19 2.50 4.88 3.12 3.46 1.40 2.31 3.15 4.99
F 5.33 4.69 4.86 5.34 3.68 5.28 3.00 0.14 3.34 1.75 5.83 5.83 4.25 3.47 2.87 4.25 0.99 3.11 3.57 4.49
C 1.76 2.89 2.93 3.68 7.65 1.84 1.46 20.25 1.03 2.48 2.14 2.47 2.74 4.12 2.51 1.33 0.24 20.42 2.05 2.81
M 5.25 4.37 5.32 5.28 1.84 6.02 2.30 0.91 2.09 1.61 4.89 4.81 3.38 4.65 3.88 4.18 0.36 2.30 3.93 3.62
A 2.84 2.57 2.77 3.00 1.46 2.30 20.52 21.77 1.21 0.39 3.37 2.47 1.22 2.59 1.71 1.72 1.13 1.69 2.13 1.90
G 0.77 20.4 20.4 0.14 20.3 0.91 21.8 4.40 0.21 21.5 1.42 1.25 20.5 1.08 20.9 0.70 20.1 20.5 1.33 1.59
T 3.05 2.83 2.07 3.34 1.03 2.09 1.21 0.21 1.27 1.91 5.12 3.14 2.65 2.71 2.88 1.82 3.88 2.52 3.67 3.77
S 1.00 1.42 1.41 1.75 2.48 1.61 0.39 21.5 1.91 20.1 2.87 2.30 1.33 0.80 2.60 2.00 2.94 1.77 2.74 2.82
W 6.24 2.92 5.77 5.83 2.14 4.89 3.37 1.42 5.12 2.87 5.85 6.19 7.87 6.46 1.20 1.37 2.62 3.54 5.76 8.57
Y 5.61 3.95 4.19 5.83 2.47 4.81 2.47 1.25 3.14 2.30 6.19 5.93 4.22 6.05 4.54 2.05 1.76 3.66 5.26 5.28
P 3.27 2.90 2.50 4.25 2.74 3.38 1.22 20.51 2.65 1.33 7.87 4.22 0.60 2.89 3.17 3.50 1.46 3.09 3.75 3.99
H 3.38 3.21 4.88 3.47 4.12 4.65 2.59 1.08 2.71 0.80 6.46 6.05 2.89 5.37 2.30 4.00 5.20 2.38 2.72 4.90
E 3.20 3.22 3.12 2.87 2.51 3.88 1.71 20.9 2.88 2.60 1.20 4.54 3.17 2.30 1.65 1.95 0.08 2.68 5.32 5.75
Q 3.60 3.22 3.46 4.25 1.33 4.18 1.72 0.70 1.82 2.00 1.37 2.05 3.50 4.00 1.95 2.83 3.26 3.45 3.50 4.50
D 2.30 1.93 1.40 0.99 0.24 0.36 1.13 20.08 3.88 2.94 2.62 1.76 1.46 5.20 0.08 3.26 0.13 3.85 3.90 4.94
N 1.59 1.36 2.31 3.11 20.4 2.30 1.69 20.54 2.52 1.77 3.54 3.66 3.09 2.38 2.68 3.45 3.85 2.92 3.17 3.85
K 3.23 4.45 3.15 3.57 2.05 3.93 2.13 1.33 3.67 2.74 5.76 5.26 3.75 2.72 5.32 3.50 3.90 3.17 3.24 2.29
R 3.80 4.18 4.99 4.49 2.81 3.62 1.90 1.59 3.77 2.82 8.57 5.28 3.99 4.90 5.75 4.50 4.94 3.85 2.29 2.87
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1085
7. The peptide inhibitor has a disadvantage; it hardly enters into the cell. As an entire drug design, one
can design a corresponding chemical molecule with the same active plot according to peptide in-
hibitor if it is needed. Certain chemical modifications should be made, such as methylation, so that
make the inhibitor has better character.
The working flowchart is as follows:
Run our PEPTIDE.m script program
Using HOMO calculation as first round of selection
Using docking as second round of selection
Final evaluation by biological assays
Chemical modification for further improvement
Select an initial residue:
a) Compare active atoms structure of exited ligands.
b) Compare active atoms (active spot) of protein.
Preparing work for active pocket and active residues:
a) Molecular dynamics
b) Full electronic structure calculation
c) Protein pocket calculation
1086 PANG ET AL.
3. RESULTS AND DISCUSSION
We now apply our mathematical model to design peptide inhibitors for protein CypA and FKBP12.
This work is based on our previous article (Pang et al., 2008).
3.1. The peptide inhibitor design of CypA
Previously (Pang et al., 2008), we obtained the ligand binding pocket and active atoms of CypA. Now,
suppose we want to design a tripeptide inhibitor for CypA.
1. Selecting an initial residue for the tripeptide inhibitor. We select the residue Pro as the initial residue
of tripeptide inhibitor for CypA. Why? The residue Pro is one of three residues (Pro, Gly, and Cys)
with special characters, and it has a ring of sub-amino acids, which is often the position of active
atoms. And we had identified the active residues of the receptor CypA: Phe113 and Phe60 (Pang et
al., 2008). From the M-J matrix, we found that residue Pro has the strongest contact energy with
residue Phe. Besides, Pro is a hydrophobic and non-polar residue, and the ligand binding site of CypA
consists primarily of hydrophobic and non-polar residues. Thus, we selected the residue Pro as the
initial residue of the tripeptide inhibitor.
2. Running the PEPTIDE.m program. After running our PEPTIDE.m program, five tripeptide inhibitors
were suggested for CypA: AGP, Ala-Val-Pro, Val-Ile-Pro, Ala-Trp-Pro, and Ile-Ala-Pro.
3. HOMO calculation (first criterion). Suppose a peptide is composed of n residues X¼X1X2X3 � � �Xn, if
we take a peptide as a Markov chain, then its probability is:
P(X)¼P(x1)P(X2j X1)P(X3j X2) � � �P(Xn� 1j Xn):
According to the Bayes formula P(Xi jXi�1)¼P(Xi�1Xi)/P(Xi�1), we can calculate the HOMO of peptide
pair by pair residues. The HOMO of X-Pro and X-Gly versus the q values (Table 1) are shown in Figure 2a, b,
respectively.
We can see from Figure 2 that Gly-Pro has the highest HOMO among X-Pro pair and Ala-Gly has the
highest HOMO among X-Gly pair. According to the two rules on protein-ligand interaction we proposed
previously and the character of Markov chain, we can conclude that AGP may be the most promising
peptide inhibitor, as a combination of X-Gly and X-Pro dipeptides. It explains well the previous
a b
FIG. 2. HOMO of X-Pro pairs and X-Gly pairs against q value. (a) The dipeptide Gly-Pro has the highest HOMO
energy among X-Pro. (b) The dipeptide Ala-Gly has the highest HOMO energy among X-Gly.
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1087
experimental observation that CypA recognition of hexapeptides involves contacts with peptide residues
Ala, Gly, and Pro, and is independent of the context of longer sequences (Vajdos et al., 1997).
4. Docking (second criterion). We take the CypA—tripeptide AGP system to run dock process using the
program Autodock4.0 (Morris et al., 1998; Sousa et al., 2006) with the Lamarckian genetic algorithm
(GA) and default parameters. The 200 conformations are performed for each ligand, and the maxi-
mum number of energy evaluations performed during each GA calculation is 3,000,000 steps, which
is big enough to test whether the complex system is converged or not. At the end of docking, a cluster
analysis is performed on the results of docking conformations. The docking results are as follows:
� The conformation of the first cluster with the lowest free energy occupied 168 of 200 conformations.� The estimated free energy for the first cluster is DG¼�7.25 (Kcal/mol).� The inhibition constant for the first cluster is 4.88 mM.� The convergence situation of all conformations is excellent (Fig. 3).
FIG. 4. Spatial configuration of the peptide AGP and the active pocket of CypA. (a) Peptide AGP (blue) perfectly lies
down in the ligand binding pocket of CypA (white). The binding mode of AGP/CypA is generated by Autodock. (b)
The active spot of AGP just covers the active spot of CypA. The small color circles are active atoms of AGP, and they
are located right above the active atoms of protein CypA (large gray circles).
FIG. 3. Number of conformations of peptide AGP in each cluster. The first cluster populates 168 conformations with
the lowest binding energy of �7.25 Kcal/mol. The convergence situation of all conformations is excellent.
1088 PANG ET AL.
� The position of the conformation in the lowest free energy perfectly inserts into the active pocket of
CypA (Fig. 4a).
5. Checking the relation of active spots between CypA and peptide AGP. The active atoms of the
tripeptide AGP were obtained as shown in Table 3 (according to the method described in our previous
work). For protein CypA, the binding pocket and the active atoms had been obtained previously (Pang
et al., 2008). The active spot of CypA has seven atoms forming a quincunx-type. The conformation of
the active atoms of both AGP and CypA are depicted in Figure 4b, where we can see that the active
atoms of AGP cover exactly the active region of CypA—that is, the active spot of the peptide AGP
just cover the active spot of CypA.
Table 3. The Active Spot of the Peptide AGP
Atom Residue X Y Z
C GLY6 51.721 27.873 �11.020
O GLY6 51.972 27.654 �12.218
N PRO4 51.859 29.096 �10.481
CA PRO4 52.326 30.269 �11.241
CB PRO4 51.974 31.423 �10.289
C PRO4 53.858 30.191 �11.443
OT1 PRO4 54.305 30.196 �10.549
OT2 PRO4 54.314 30.135 �12.584
FIG. 5. Sensorgram for AGP and CsA binding to CypA surface on the CM5 sensor chip. Binding responses are
shown for AGP and CsA injected at concentrations of 0.625, 1.25, 2.5, 5, and 10 mM (bottom to top). The biosensor
RUs are concentration-dependent. The equilibrium constants (KD values) evaluating the protein-ligand binding affin-
ities are denoted.
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1089
This perfect coverage of the docking system not only proved the two rules on protein-ligand interactions
(the higher the HOMO of ligand, the stronger the protein-ligand interactions), but also proved that this
model based on HMM mathematical theory is feasible.
6. Performing biological assays: binding affinity determination and inhibition of PPIase activity of
CypA. The binding affinity of the designed peptides to CypA was measured by surface plasmon
resonance (SPR) with Biacore 3000 instrument (BiacoreAB Corp., Uppsala, Sweden) as described
elsewhere (Chen et al., 2007; Thurmond et al., 2001). AGP was found to bind to CypA in a con-
centration-dependent manner with a KD value of 1.95�10�6 M (Fig. 5), whereas CsA, as a positive
control, showed a KD value of 6.42�10�6 M.
Since AGP could bind to CypA, we sequentially measured its inhibition of PPIase activity of CypA. The
standard spectrophotometric method was applied to determine the inhibitory activity of the compounds on
PPIase. During the assay, the rate constants for the cis–trans conversion were evaluated by fitting the data
to the integrated first-order rate equation through nonlinear least-square analysis.
Inhibitory rate (%)¼ [CypA (dAbs=time) � compounds (dAbs=time)]=[CypA (dAbs=time)
� control (dAbs=time)]
The inhibition results are shown in Figure 6. As a positive control, CsA showed an inhibition of 59.95%
against the PPIase activity of CypA at 1mM, while AGP an inhibition of 37.47% at 1mM.
Our designed peptide AGP has the same order of binding affinity and inhibition of PPIase activity as
CsA. Besides, AGP has no impact on cell proliferation and cell cycle (data not shown). Therefore, peptide
AGP may be new inhibitor for the CypA.
It should be noted that, before obtaining inhibitor AGP for CypA by our model, we were unaware of the
two guesses proposed by Vajdos et al. (1997) through experiments concerning the hexapeptides His-Ala-
Gly-Pro-Ile-Ala in 1997: one is ‘‘CypA recognition of these hexapeptides involves contacts with peptide
residues Ala(Va1) 88, Gly 89, and Pro 90, and is independent of the context of longer sequences’’; the other
is ‘‘the CypA active site is complementary to sequences containing the dipeptide Gly-trans-Pro.’’ Our
calculation results (Table 3 and Fig. 4b) happened to explain well the above two guesses from a theoretical
view point—that is, the active spot of AGP is only located in Gly and Pro (not in Ala), and it perfectly
covers the active spot of CypA. Our findings are independent and derive from the calculation of the full
electronic structure of CypA, two rules on protein-ligand interactions, and a statistical mathematical model.
The prediction of AGP is not accidental.
a b
FIG. 6. CypA PPIase inhibitory activities of AGP and CsA at 1mM. CypA was pre-incubated with 1 mM tested
compounds, and the PPIase activity was evaluated by fitting the data to the integrated first-order rate equation through
nonlinear least-square analysis. (a) The value of dAbs/time represents the rate constant for the cis–trans conversion. (b)
The percent inhibition of the PPIase activity of AGP and CsA at 1 mM.
1090 PANG ET AL.
3.2. The peptide inhibitor design of FKBP12
For FKBP12, we point out that why we select Gln residue as the initial residue for peptide inhibitor and
why we chose dipeptide as its inhibitor, but not tripeptide. We also prove another way to select the initial
residue.
From our previous article (Pang et al., 2008), we knew that the inhibitor FK506 bound well to FKBP12
and we knew the structure of its active spot. Through comparison, we found out that the side chain of
residue Gln has similar construction to that of the active spot of FK506, and residue Gln was therefore
selected as the initial residue. Besides, the active pocket of FKBP12 is smaller than that of CypA; thus, the
dipeptides were chosen as potential inhibitors to test on FKBP12.
After running the PEPTIDE.m program, three dipeptides were suggested: Gly-Gln, Ile-Gly, and Val-Gly.
As the second round of selection, the HOMO calculation suggested the Gly-Gln with the highest HOMO
(Fig. 7). Then we docked Gly-Gln to FKBP12 through the program Autodock to generate their binding
model, and the same docking parameters as the above AGP were employed.
The results of Autodock and active spot of Gly-Gln are as follows:
� The conformation of the first cluster with the lowest free energy occupied 62 of 200 conformations.� The estimated free energy of conformation in the lowest free energy is DG¼�6.02 (Kcal/mol).� The inhibition constant is 38.79mM.
FIG. 7. HOMO of X-Gln pairs against q value. Dipeptide Gly-Gln has the highest HOMO energy among X-Gln.
FIG. 8. Number of conformations of peptide Gly-Gln in each cluster. The convergence situation of all conformations
is excellent.
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1091
� The convergence of conformation is good (Fig. 8).� The active spot of Gly-Gln comprises six atoms as shown in Table 4. It perfectly covers the active spot
of FKBP12 in the active pocket, as shown in Figure 9.
All these docking results demonstrate that Gly-Gln may be a peptide inhibitor for FKBP12. A biological
assay needs to be done to evaluate the interactions between Gly-Gln and FKBP12. Unfortunately, we have
not performed such an assay yet.
4. CONCLUSION
We have proved that our mathematical model can be applied to peptide inhibitor design for a target
protein based on previous two rules on the protein-ligand interactions, on the M-J matrix, and on HMM.
Our results on CypA and FKBP12 show that the approach is promising for this type of problem, which is
typical of the de novo drug design problems currently being tackled by other workers in the field. Our
method does not require exhaustive search, and the properties of the suggested peptides can at least guide
the design of a novel compound. We have taken a step forward towards building a mathematical theory to
select peptide inhibitors for proteins. Our mathematical model is rough at present, especially its emission
matrix. How to perfect the emission matrix is still a challenge for us and needs more investigation. If it
represents a correct direction for biological theoretical trends, this mathematical model can be further
developed.
Table 4. The Active Atoms of Gly-Gln
Atom Residue X Y Z
O GLY2 27.403 18.511 40.115
C GLN3 25.621 15.257 41.099
OT1 GLN3 24.665 15.519 41.229
OT2 GLN3 26.436 14.305 41.766
N GLN3 27.644 16.309 40.226
CA GLN3 26.210 16.081 39.968
FIG. 9. Spatial configuration of dipeptide Gly-Gln and the binding pocket of FKBP12. (a) The active spot of Gly-
Gln (blue) covers the active spot of FKBP12 (white) in the active pocket. (b) The color circles are active atoms of
Gly-Gln.
1092 PANG ET AL.
ACKNOWLEDGMENTS
We thank Ye Yuanjie, Wang Xun, and Ye Ling for their kind help, and the Modern Applied Mathe-
matical Key Laboratory in Shanghai (Department of Mathematics at Fudan University) and Shanghai
Supercomputer Center (SSC) for providing the parallel computer. This work was supported by the National
Basic Research Program of China (grant 2006CB504509) and the Project of the State Key Program of
National Natural Science Foundation of China (grant 10635060).
DISCLOSURE STATEMENT
No competing financial interests exist.
REFERENCES
Chen, S.A., Zhao, X.M., Tan, J.Z., et al. 2007. Structure-based identification of small molecule compounds targeting
cell cyclophilin A with anti-HIV-1 activity. Eur. J. Pharmacol. 565, 54–59.
Gilbert, W. 1991. Towards a paradigm shift in biology. Nature 349, 99–99.
Glaser, F., Steinberg, D.M., Vakser, I.A., et al. 2001. Residue frequencies and pairing preferences at protein-protein
interfaces. Proteins 43, 89–102.
Lewis, R.A., and Leach, A.R. 1994. Current methods for site-directed structure generation. J. Comput. Aided Mol.
Design 8, 467–475.
Li, H., Tang, C., and Wingreen, N.S. 1997. Nature of driving force for protein folding: a result from analyzing the
statistical potential. Phys. Rev. Lett. 79, 765–768.
Miyazawa, S., and Jernigan, R.L. 1985. Estimation of effective interresidue contact energies from protein crystal-
structures—quasi-chemical approximation. Macromolecules 18, 534–552.
Pang, X., Zhou, L., Zhang, L., et al. 2008. Two rules on the protein-ligand interaction. Nat. Proc. http://precedings
.nature.com/documents/2728/version/1.
Thurmond, R.L., Wadsworth, S.A., Schafer, P.H., et al. 2001. Kinetics of small molecule inhibitor binding to p38
kinase. Eur. J. Biochem. 268, 5747–5754.
Vajdos, F.E., Yoo, S.H., Houseweart, M., et al. 1997. Crystal structure of cyclophilin A complexed with a binding site
peptide from the HIV-1 capsid protein. Protein Sci. 6, 2297–2307.
Address correspondence to:
Dr. Xinyi Zhang
Department of Physics
Fudan University
Shanghai 200433, China
E-mail: [email protected]
MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1093