what base pairings can occur in dna? a distributed multipole study of the electrostatic interactions...

11
J. CHEM. SOC. FARADAY TRANS., 1993, 89(18), 3407-3417 3407 What Base Pairings Can Occur in DNA? A Distributed Multipole Study of the Electrostatic Interactions between Normal and Alkylated Nucleic Acid Bases Sarah L. Price," Fabrizio Lo Ceiso and Julia A. Treichei Department of Chemistry, University College London, 20 Gordon Street, London, UK WCIH OAJ Julia M. Goodfellow and Yagnesh Umrania Department of Crystallography, Birkbeck College, Malet Street, London, UK WClE 7HX Ab initio distributed multipole electrostatic calculations are used to predict likely nucleic acid base pair struc- tures for both the gas phase and within a double helical backbone, as represented by simple constraints. The resulting structures are interpreted by comparison with an analysis of the experimental variation of base pair geometries found in oligonucleotide crystals. Our calculations on all pairs of the normal bases (G, A, T, C) correctly predict all the multiply hydrogen-bonded structures, in agreement with supermolecule SCF calcu- lations, and also predict some new low-energy structures. Consideration of the helical constraints confirms that the Watson-Crick G - C and A . T pairings are most favourable for inclusion in DNA, but certain mismatch base pairs, G * T and G - A, are also energetically favourable and their geometries correspond to the experimentally observed wobble conformations. This approach is also used to study the effect of the O6 methylation of guanine which can form a doubly hydrogen-bonded Watson-Crick-like structure with thymine. However, there are also a range of @-methylguanine - cytosine structures which fit into the helical backbone and are energetically com- petitive. Thus the mutation-inducing effects of this base modification are likely to be very sensitive to the exact sequence and local conformation of the DNA. 1. Introduction Modifications of the nucleic acid bases will produce changes in the structure and energetics of DNA, which, in turn, will determine the biological implications of the modification. The structure of DNA including any combination of normal or unusual bases is influenced by many factors, including the interactions between the base pairs, the steric constraints of the backbone, the stacking interactions between the base pairs, and the influence of the surrounding water and counter-ions. Although we can hope to include all these influences in molecular dynamics simulations,'.2 both the methodology and the required assumptions about the inter- atomic interactions need refining before we can be confident about the realism of the simulations. There is an immediate practical need for simple, well defined, models for the domi- nant effects, which could be used to predict the likely struc- tural effects of nucleic base mutations. This paper describes such a model, based on the interactions of the base pairs in the gas phase and under the constraints of the DNA back- bone conformation. It is first applied to the interactions between the normal base pairs, to establish whether it can predict which pairs are normally, and occasionally, found in DNA. This test is passed successfully, so the approach is applied to the controversial problem of the base pairing pos- sibilities of the modified base 06-methylguanine. The hydrogen bonding between the isolated base pairs is likely to be a major factor in determining the base pair struc- ture within DNA. However, there are 29 different geometries for pairs of the four common bases which correspond to two or more hydrogen bonds, assuming that the N9 of the purines and N1 of pyrimidines are not available for hydrogen bonding. Most of these structures were originally described by D o n o h ~ e , ~ and diagrams and notation for the complete set have been given by Hobza and S a n d ~ r f y . ~ There have been many attempts to predict the relative stability of these possible base pairs, using empirical force-fields or com- binations of perturbation based models for the various con- tributions to the intermolecular forces. These have produced somewhat different results, as concluded in a critical review by Rein' in 1978, and further varied calculations have been including SCF supermolecule calculation^.^ However, there is a consensus that whilst the triply hydrogen-bonded G C Watson-Crick structure is the most stable, there are many other pairs that are more stable than the Watson-Crick A - T structure. Thus, the interaction between the bases is not the sole factor responsible for the specificity of the base pairing in DNA. The geometrical simi- larity of the Watson-Crick G - C and A - T pairs suggests that steric constraints play a major role. The steric constraints do not ensure coplanarity of the bases within DNA, as assumed in many theoretical predic- tions. Indeed, an analysis of base pair geometries in co- crystal complexes and compounds of nucleic acid bases showed that significant deviation from planarity was suffi- ciently common that the author suggested that theoretical calculations on base pairs should take into account the possi- bility of propeller-twist.' Thus, initial studies of unusual base pairs should use a method which can be used to search all orientation space to suggest low-energy structures which might not be intuitively obvious. The simple ,model investigated in this paper is based on optimising the electrostatic interaction between the bases, within sterically allowed orientations, using a realistic distrib- uted multipole electrostatic model derived from the ab initio charge densities of the isolated bases. Hobza and Sandorfy4 found that this electrostatic approximation, which effectively identifies the total interaction energy of the complex with just the electrostatic energy, to be a surprisingly good when com- pared with the SCF interaction energy of the nucleic acid base pairs. There is also evidence that this model should predict the low-energy structures even better than the relative energies. The structures of many hydrogen-bonded van der Waals complexes of small molecules can be predicted by minimising their electrostatic energy, within sterically acces- sible conformations," provided that the electrostatic forces are represented accurately. The orientation dependence of the Published on 01 January 1993. Downloaded by University of Pittsburgh on 30/10/2014 22:41:44. View Article Online / Journal Homepage / Table of Contents for this issue

Upload: yagnesh

Post on 27-Feb-2017

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J. CHEM. SOC. FARADAY TRANS., 1993, 89(18), 3407-3417 3407

What Base Pairings Can Occur in DNA? A Distributed Multipole Study of the Electrostatic Interactions between Normal and Alkylated Nucleic Acid Bases

Sarah L. Price," Fabrizio Lo Ceiso and Julia A. Treichei Department of Chemistry, University College London, 20 Gordon Street, London, UK WCIH OAJ Julia M. Goodfellow and Yagnesh Umrania Department of Crystallography, Birkbeck College, Malet Street, London, UK WClE 7HX

Ab initio distributed multipole electrostatic calculations are used to predict likely nucleic acid base pair struc- tures for both the gas phase and within a double helical backbone, as represented by simple constraints. The resulting structures are interpreted by comparison with an analysis of the experimental variation of base pair geometries found in oligonucleotide crystals. Our calculations on all pairs of the normal bases (G, A, T, C) correctly predict all t h e multiply hydrogen-bonded structures, in agreement with supermolecule SCF calcu- lations, and also predict some new low-energy structures. Consideration of t h e helical constraints confirms that the Watson-Crick G - C and A . T pairings are most favourable for inclusion in DNA, but certain mismatch base pairs, G * T and G - A, are also energetically favourable and their geometries correspond to the experimentally observed wobble conformations. This approach is also used to study the effect of the O6 methylation of guanine which can form a doubly hydrogen-bonded Watson-Crick-like structure with thymine. However, there are also a range of @-methylguanine - cytosine structures which fit into the helical backbone and are energetically com- petitive. Thus t h e mutation-inducing effects of this base modification are likely to be very sensitive to the exact sequence and local conformation of the DNA.

1. Introduction Modifications of the nucleic acid bases will produce changes in the structure and energetics of DNA, which, in turn, will determine the biological implications of the modification. The structure of DNA including any combination of normal or unusual bases is influenced by many factors, including the interactions between the base pairs, the steric constraints of the backbone, the stacking interactions between the base pairs, and the influence of the surrounding water and counter-ions. Although we can hope to include all these influences in molecular dynamics simulations,'.2 both the methodology and the required assumptions about the inter- atomic interactions need refining before we can be confident about the realism of the simulations. There is an immediate practical need for simple, well defined, models for the domi- nant effects, which could be used to predict the likely struc- tural effects of nucleic base mutations. This paper describes such a model, based on the interactions of the base pairs in the gas phase and under the constraints of the DNA back- bone conformation. It is first applied to the interactions between the normal base pairs, to establish whether it can predict which pairs are normally, and occasionally, found in DNA. This test is passed successfully, so the approach is applied to the controversial problem of the base pairing pos- sibilities of the modified base 06-methylguanine.

The hydrogen bonding between the isolated base pairs is likely to be a major factor in determining the base pair struc- ture within DNA. However, there are 29 different geometries for pairs of the four common bases which correspond to two or more hydrogen bonds, assuming that the N9 of the purines and N1 of pyrimidines are not available for hydrogen bonding. Most of these structures were originally described by D o n o h ~ e , ~ and diagrams and notation for the complete set have been given by Hobza and Sand~r fy .~ There have been many attempts to predict the relative stability of these possible base pairs, using empirical force-fields or com- binations of perturbation based models for the various con- tributions to the intermolecular forces. These have produced

somewhat different results, as concluded in a critical review by Rein' in 1978, and further varied calculations have been

including SCF supermolecule calculation^.^ However, there is a consensus that whilst the triply hydrogen-bonded G C Watson-Crick structure is the most stable, there are many other pairs that are more stable than the Watson-Crick A - T structure. Thus, the interaction between the bases is not the sole factor responsible for the specificity of the base pairing in DNA. The geometrical simi- larity of the Watson-Crick G - C and A - T pairs suggests that steric constraints play a major role.

The steric constraints do not ensure coplanarity of the bases within DNA, as assumed in many theoretical predic- tions. Indeed, an analysis of base pair geometries in co- crystal complexes and compounds of nucleic acid bases showed that significant deviation from planarity was suffi- ciently common that the author suggested that theoretical calculations on base pairs should take into account the possi- bility of propeller-twist.' Thus, initial studies of unusual base pairs should use a method which can be used to search all orientation space to suggest low-energy structures which might not be intuitively obvious.

The simple ,model investigated in this paper is based on optimising the electrostatic interaction between the bases, within sterically allowed orientations, using a realistic distrib- uted multipole electrostatic model derived from the ab initio charge densities of the isolated bases. Hobza and Sandorfy4 found that this electrostatic approximation, which effectively identifies the total interaction energy of the complex with just the electrostatic energy, to be a surprisingly good when com- pared with the SCF interaction energy of the nucleic acid base pairs. There is also evidence that this model should predict the low-energy structures even better than the relative energies. The structures of many hydrogen-bonded van der Waals complexes of small molecules can be predicted by minimising their electrostatic energy, within sterically acces- sible conformations," provided that the electrostatic forces are represented accurately. The orientation dependence of the

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online / Journal Homepage / Table of Contents for this issue

Page 2: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

3408 J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

total intermolecular potential appears to be similar to that of the electrostatic component because the orientation depen- dence of the exchange-repulsion, dispersion, charge transfer and polarisation effects approximately cancels.' ' Studies on NH. . - 0 C hydrogen bond^'^*'^ also show that the electro- static forces dominate both the energy and the orientation dependence of this important interaction. Thus, this paper investigates the structures for the 10 possible pairs of nucleic acid bases predicted by the electrostatic model, to confirm that it is also able to predict minimum-energy structures for pairs of nucleic acid bases.

The structures at minima in the electrostatic energy are expected to correspond to the gas-phase structures of the complex, as found for various van der Waals complexes involving benzene and s-tetra~ine. '~ Thus, if we were pri- marily interested in the gas-phase complex, then more elab- orate ab initio calculations could then be performed starting from these predicted structures to give better predictions of the complex binding energy and confirm the structure. However, as we are interested in base pairing within the DNA helix, the errors in the electrostatic approximation are likely to be far smaller than the neglect of other effects. The next most important effect is likely to be the steric constraints of the helix. We take this into account by considering the experimental variation in the relative orientation of the bases within oligonucleotides, and the electrostatic minima when the C1' atom positions are constrained. This correctly pre- dicts the four base pairings which are observed within DNA.

Having validated this simple approach, we apply it to predict possible base pair structures for a rarely observed base, 06-methylguanine (MeG) which results from chemical modification. Such modifications are important because incorporation of such a modified base can lead on replication to MeG T and then A T base pairs (i.e. a G to A transition) and convert protooncogenes into oncogenes. Although X-ray crystallography and NMR approaches have been used to elucidate the structure of MeG C and MeG * T,l6-'' there are still some features which are unclear such as the conformation of the methyl group relative to the base, and the relative energies. Our method predicts an observed 06-methylguanine thymine base pair structure, but we also show that a pairing with cytosine can be equally energetically favour able.

2. Methods The electrostatic modelling was peformed with rigid base structures based on the average dimensions derived from crystal structures.22 The geometry of 06-methylguanine was taken from the crystal structure.23 A bondlength of 1 A was assumed for the'NH and CH bonds, and 1.453 A for the N9C1' or NlC1' bondlengths. Ab initio SCF wavefunctions for these structures were obtained using the program CADPAC24 with a 6-31G**,' basis set. Many of the calcu- lations were also performed using a 3-21G basis26 wavefunc- tion. This confirmed that the minimum-energy structures and relative energies were insensitive to the basis set, as pre- viously found for a blocked ~ e p t i d e . ~ ~

The ab initio charge density was represented by a set of multipoles up to hexadecapole on every atomic site, derived by a distributed multipole analysis (DMA) of the wavefunc- tion2' The method of splitting up the molecular charge density used to derive the distributed multipole moments differs from that used by Sokalski et aL2' or by Pullman and Perahia3' or by Rein,31 and thus the individual atomic moments may differ significantly even if derived from the same wavefunction. However, the predicted electrostatic properties should converge to the same values provided

sufficient terms of the multipole series are included. The equivalence of various distributed multipole schemes was demonstrated for the HF dimer by Spa~kman,~, and con- trasts strongly with the wide variation in the predicted elec- trostatic properties derived from atomic charge models. The fundamental difference between the various distributed multi- pole schemes and any atomic point charge model is that the anisotropic multipole moments represent the electrostatic effects of lone-pair, n-electrons, and other non-spherical fea- tures in the atomic charge distribution. This makes a major contribution to hydrogen bonding and n-n interactions, and therefore it is not surprising that the anisotropic multipole moments make a significant contribution to the electrostatic interactions between nucleic acid bases.' The accuracy of the electrostatic energies calculated from such a DMA is mainly limited by the quality of the wavefunction, although it is less basis set dependent than atomic charge representation^.^^ The DMA calculations do not include the effect of penetrat- ion into the charge density on the electrostatic potential.

The electrostatic interaction energies were calculated from the DMAs using the program ORIENT,34 including all terms in the multipole expansion of the electrostatic energy in R-" with n < 5. The minimisations were carried out using a pseudo-hard-sphere function to define sterically accessible conformations, which was a slightly softened repulsive wall (tanh-') defined by the van der Waals radii of 1.4 A for 0, 1.5 A for N and 2.0 A for C. The hydrogen atom radius was zero, corresponding to the effective behaviour of polar protons in hydrogen bonding, and a united atom approach to the methyl and >C-H groups. Thus the hydrogen-bond distances and other van der Waals contacts are determined by these assumed radii, and it is the relative orientation of the molecules that is being predicted by the electrostatic model. As with all minimisation procedures, there is no way to be sure that all the local minima have been located. However, the electrostatic hard-sphere model is suficiently computa- tionally efficient that the six-variable minimisations could be started from a variety of initial positions, which was done to try to detect all the significant minima, particularly the unex- pected ones. For the gas-phase minimisations, one base was held fixed, and the six variables corresponded to the position and orientation of the second base. In the constrained mini- misations, to determine the effect of a rigid backbone, the C1' atoms of each base were held fixed, and the orientations of the two molecules were optimised.

3. Results 3.1 Normal Base Pairs

3.1.1 Electrostatic minima were found corresponding to all the expected 29 multiply hydrogen bonded structures (Table l), except one, although a few of the structures were not copla- nar. The missing structure was the cytosine dimer with the symmetrical N4H. - - 0 2 and 0 2 . - .NH4 hydrogen bonds. This structure was also found not to correspond to a minimum in the SCF calculations4 and some empirical calculations,6 and this has been plausibly attributed to the structure involving a strong repulsive interaction between the N3 atoms. The non- coplanar structures all involved an aromatic nitrogen donor and an amine NH, acceptor, often, but not always, in the same hydrogen bond. Many of the non-coplanar structures were only slightly non-coplanar, and in the case of G - CI a corresponding coplanar structure was negligibly less stable. The predicted structure would be sensitive to the exact hydrogen atom positions and choice of van der Waals radii, in our model, and to the other terms in the potential in

Gus-phase Minima in the Electrostatic Energy

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 3: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J . CHEM. SOC. FARADAY TRANS., 1993, VOL. 89 3409

Table 1 Minima in electrostatic energy U for all base pairs

hydrogen bonds and U R e 4 AR 1, 1, CN,.CN,' paif number4 other contactsb /kJ mol-' /A /degrees /degrees /A /degrees /degrees /degrees

G . G I 9 06 N1, N1* 06 - 110.9 13.6 151 0 5.8 29 29 180 G - GI1 10 N1 * 06, (N2 N7), N2 06 -66.4 11.9 158 180 14.0 22 73 85 G * GI11 11 Nl.N7,N2*06 -83.1 11.5 150 0 4.7 30 30 180 G - GIV np 12 N3 * N2, N2 * N3 -49.9 7.8 95 -14 6.1 85 85 143 A - A1 2 N1 -N6, N6 * N1 -58.9 13.8 152 0 6.2 28 28 180 A * A11 3 N6 * N7, N1- N6 -54.0 12.4 160 0 7.0 20 20 180 A * A111 1 N6 * N7, N7 N6 -47.8 11.3 170 180 11.7 10 10 180 c * CI 23 N4 * N3, N3 N4 -87.8 11.1 158 0 6.0 22 22 180 C - CII open np 02 N4 -48.5 6.8 139 -7 4.6 41 119 53 C - CII open 24 02.N4 -47.0 9.5 93 0 5.9 87 31 63 T TI. 19 04 * N3, N3 02 -48.0 8.8 136 0 2.7 44 79 57 T * TI1 18 04 - N3, N3 * 04 -50.0 11.4 161 0 6.7 19 19 180 T - TI11 20 02 * N3, N3 * 02 -47.2 8.7 135 0 2.6 45 45 180 G * C W C 15 06 * N4, N1 * N3, N2 02 - 122.1 10.9 126 0 0.0 54 54 72 G - CI N3 - N4, N2 * N3 -68.9 9.1 76 0 8.6 104 5 81 G CI np N3 - N4, N2 * N3 -69.2 9.1 76 2 8.6 104 8 79 G * CII 17 N1- 02, N2 * N3 -67.5 10.6 139 0 2.4 41 64 157 G * Ciii N7 * N4, (C8 N3) -50.9 8.7 111 180 17.2 69 48 63 G - Civ N7 N4,06 * N4 -48.3 12.0 157 180 14.2 23 14 144 G * Cv N7 - N4, (06 C5) -46.0 11.5 166 180 12.5 14 15 179 A - T W C 5 N6 -04. N1 *N3 -60.5 10.9 125 0 0.2 55 52 73 A - T R W C 4 N6 * 02, N1* N3 -58.8 11.3 152 0 5.0 28 44 164 A - T H 25 N6 04, N7 N3 -61.4 8.8 132 180 15.4 48 58 74 A - T R H 26 N6 * 02, N7 * N3 -60.2 9.9 162 180 12.2 18 34 164 G * TI 13 06 * N3, N1 a 04 -66.6 12.2 162 0 7.3 18 32 166 G * TI1 14 06 * N3. N1 02 -63.2 10.7 138 0 2.2 42 67 71 G - Tiii np G Tiv G - Tv A * CI A * CII A Ciii np A - Civ A Cv np G A1 np G * A11 np G A111 nri

N2 02, N3 N3 N2 * 04, N1 * 04 N2 * 02, (C1 04)

6 N6 * N3, N1 N4 27 N6 N3, N7 * N4

N7 N4, (N6 N4) N7 N4, (C8 * N3) N1 * N4, (N6 * N4)

7 06 * N6, N1 N1 anti-anti 29 N3 N6, N2 - N7 28 06 N6. N1 N7 anti-svn

-54.0 7.4 72 -52.3 13.2 138 -46.6 8.0 74 -68.8 12.3 161 -65.6 11.2 172 -43.4 10.6 150 -40.0 8.6 108 -40.0 11.8 143 -67.0 13.1 133 -52.2 8.9 73 -59.4 11.1 127

12 9.0 108 0 3.4 42 0 8.7 106 0 7.1 19

180 11.4 8 241 12.6 30 180 17.4 72 43 6.7 37 3 2.7 47

-1 9.0 107 -1 0.4 53

28 4 23 34 24 32 48 37 45 15 38

74 142 97 165 164 121 60 118 88 61 90

~~~ ~~~ ~ ~~~~ ~~~ ~

G AIV np 8 N3 - N6, N2 - N1 -57.5 9.5 61 -8 11.1 119 11 64 G - Av np N2 . N7, (N2 N6, N1 * N7) -51.5 11.4 139 - 5 2.7 41 21 155 G * Avi np N1- N3, N2 * N3 -47.0 7.9 136 -31 5.1 44 132 62 T * CI 21 02 - N4, N3 N3 -41.6 9.8 146 0 3.8 34 31 178 T * CII np 22 N3 * N3,04 * N4 -48.9 8.6 122 -2 2.3 58 59 65

a Notation defines hydrogen-bonded structures as illustrated by Hobza and Sandorfy; mainly corresponding to pairings predicted by Donohue,' whose numbers are also given. Small roman numerals denote structures which were not foreseen by these authors. Non-planar structures are denoted by np. * Parenthesis denote traditional hydrogen bonds which are slightly elongated from van der Waals contact, or van der Waals contacts which do not correspond to traditonal hydrogen bonds. ' The structural parameters are defined in Fig. 1, except CN, CN,, which is the angle between the glycosyl bonds, as determined from their vector scalar product. When the angles between the glycosyl

bonds and the CI'-Cl' vector are suitable (0" < 1, x 1, < 75"), then this angle will be approximately 180" for anti-parallel bonds, and less than 90" when the geometry has the required approximate pseudo-dyad. Minima which might fit within a DNA helix, as discussed in 3.1.3, are underlined. Minima with U > -40 kJ mol-' have not been tabulated.

reality. However, this observation does suggest that these complexes involving NH, or aromatic nitrogen in the hydro- gen bonds might have low barriers to out-of-plane distor- tions, which is consistent with the high propeller-twist found for AA base pairs in crystal structures of nucleosides and nucleotide~.~~

Other electrostatic minima were found (denoted by small roman numerals in Table I), which were not the global minimum for any pair, but were often as stable as the weaker doubly hydrogen bonded complexes. Some of these, G Ciii, G Cv and A Civ, involved one hydrogen bond and a close contact between an aromatic CH group and a proton accep- tor, giving a coplanar complex. Others, G Civ, G Avi and G Tiv involved two hydrogen bonds to the same acceptor. Some non-coplanar structures, A Ciii, A Cv and G Av, appeared to allow for a favourable interaction between two NH, groups. There was also one non-coplanar structure,

G Tiii, involving two hydrogen bonds N2H- * - 0 2 and N3---HN3, which could not be planar because of steric repulsion between C1' and 04, and was probably not included in Donohue's3 list for this reason. The ability to find these less obvious structures is an advantage of using the electrostatic model, as previous workers have constrained the complexes to be coplanar.

3.1.2 Experimental Variations in Normal Base Pairs within Oligonucleotide Helices The DNA backbone conformation clearly puts a major con- straint on the geometries of the base pairs which can be accommodated within the double helix. Simple constraints include the relative positions of the C1' sugar atoms attached to each case of the base pair and the directions of the glycosyl bonds (Cl'N1 and Cl'N9). Although there is often a signifi- cant propeller twist within the base pairs, the possible degree

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 4: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

3410 J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

of non-planarity is limited by the proximity of the adjacent base pairs. The relative orientation of the glycosyl bonds to each other and to the Cl'-Cl' vector is also believed to be important. Thus, the pseudo-dyad in the plane of the Watson-Crick base pairs (Fig. l), leads to the two glycosyl bonds being on the same side of the Cl'-Cl' vector and pointing towards each other and at approximately equal angles (A, = LN9Cl'Cl' x LNlCl'Cl' = I, < 90").

In order to assess the apparent flexibility of the helical con- straints, we have surveyed the geometries of base pairs found in oligonucleotide crystal structures deposited in the Brook- haven Data BankJ6 (Table 2). This analysis has been achieved by superimposing all the guanines on a reference base and then using the same transformation matrix to show the relative location of the C1' atoms attached to the opposite, cytosine, base. This procedure has been repeated for A T base pairs and for A and B like conformations. The definition of the geometrical parameters is shown in Fig. 1. This quantitative analysis of the experimental data shows that the Cl'C1' separation, R, falls between 9.8 and 11.3 A, although the larger separations are only occasionally found in A-DNA (Fig. 2). The orientation of the pyrimidine C1' relative to the purine covers a wide range, with 80 < O/ degrees < 130. The A T pairs show a stronger tendency for non-coplanarity, with 4 usually between -45" and -30", whereas for GC 19 I is usually less than 30". The variation in the relative orientation of the glycosidic bonds is quite large (Fig. 3), with 35 < A/degrees < 70, although values between 50" and 60" are most common. The two glycosyl bonds always lie on the same side of the Cl'-Cl' vector.

I pseudo dyad

I Z . I

N2\ i

/N9 \ N 1 U

06 ! I

Fig. 1 Definition of structural parameters for the base pairs, illus- trated for the Watson-Crick G - C structure. The out-of-plane angle 4 is defined by spherical polar coordinates, with the guanine defining the xz plane. R denotes the Cl'-Cl' distance.

3.1.3 within the Helical Structure The most stable base pair structure found in our gas-phase calculations was the Watson-Crick G C structure. This structure matches the experimental structure of the G C pair within DNA sufficiently well that optimising the G C interaction appears to play a significant role in determining the structure of DNA. Thus, a first measure of the ability of the other base pairs to fit within the DNA structure is whether the relative orientation of the two C1' sugar atoms is similar. Thus the distance (AR) that the second C1' atom needs to be shifted to be superimposed on the cytosine C1' in our minimum-energy G C (WC) structure, when the C1' and N1/N9 position and the plane of the first base is superim- posed on the guanine, is also given in Table 1. The Watson- Crick A * T structure clearly requires the least distortion (0.2 8) to match the relative position of C1' in G C (WC). The remarkable similarity between the dimensions of the G C and A - T (WC) structures is well known, and it has been recognised from the earliest calculations on base pair binding

that this steric constraint favours the A - T (WC) pair over other base pairs which are more stable in isolation. Although this similarity is believed to explain the normal complementary base pairs, we need to look further to estab- lish other possibilities. The non-planar structure G - A111 comes second (AR = 0.4 A), with all other minima corre- sponding to a distortion of over 2 A, although several have A R < 3 A .

The DNA backbone is not equally flexible in all directions, so this analysis can be refined by considering the direction of the displacement (AR) of the second C1' from the WC struc- ture, using the experimental variations in R, 8 and 4 dis- cussed in the previous section. Since the backbone would be expected to give further if needed to accommodate an unusual base, we should add a generous margin to the degree of flexibility shown in B- and A-DNA for G - C and A - T pairs. The base pairs whose gas-phase minima correspond to geometries with 9.5 < R / A d 11.6, 65 < O/degrees < 140 and I 9 I < 6 O 0 are G * C (WC), G * C I I , G - T I I , A * T (WC), G AIII, G * Av and the open planar C * CII, in order of decreasing stability (Table 1). If we further exclude base pairs where the glycosyl bond angles Aa and Ib are not between 30" and 75", or the two glycosyl bonds are not on the same side of Cl'--Cl' vector, then only four structures, namely the Watson-Crick G C and A T, G A111 and G TII, seem likely to fit within the helix. All these are likely to meet the stacking requirements of the helix: G A111 is non-planar, but the maximum distance that any of the adenine atoms lie from the plane of the guanine is 2.3 8, and we have already noted the apparent flexibility for out-of-plane distortions of complexes involving NH. - .N hydrogen bonds.

Gas-phase Minimum-energy Structures which might j t

Table 2 Oligonucleotide structures analysed

DNA type sequence file res./A R-factor ref.

A A A A A A B B B B B B B

'CCGG GGGGCCCC GGATGGGAG GGGATCCC GCCCGGGC GTACGTAC CGCGAATTCGCG (1 7 "C)

CGCGAATTbrCGCG (20 "C) CGCGAATTbrCGCG (7 "C) CGCGA""ATTCGCG CCAGGCCTGG CGC ATATATGCG

CGCGAATTCGCG ( - 257 "C)

1 ANA 2ANA 1DN6 3ANA 9DNA 5ANA 1 BNA 2BNA 3BNA 4BNA 4DNB lBDl 1DN9

2.1 2.5 3.0 2.5 1.8 2.25 2.3 2.7 3.0 2.3 2.0 1.6 2.2

0.165 0.20 0.33 0.16 0.17 0.18 0.178 0.21 0.173 0.216 0.169 0.169 0.189

Conner et al., Nature (London), 1982, 295, 294 McCall et al., J . Mol. Biol., 1985, 183, 385 McCall et al., Nature (London), 1986, 322, 385 Lauble et al., Nucleic Acids Res., 1988, 16, 7799 Heinemann et al., Nucleic Acids, Res., 1988, 15, 9531 Takusagawa, J . Biomol. Struct. Dynam., 1990, 7, 795 Dickerson et al., J. Mol. Biol., 1981, 149, 761 Drew et al., Proc. Natl. Acad. Sci. USA, 1982, 79, 4040 Fratini et al., J. Biol. Chem., 1982, 257, 14686 Fratini et al., J. Biol. Chem., 1982, 257, 14686 Frederick et al., J. Biol. Chem., 1988, 263, 17872 Heinemann et al., J . Mol. Biol., 1989, 210, 369 Yoon et al., Proc. Natl., Acad. Sci. USA, 1989, 85, 6332

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 5: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

10-

8 -

6 .

4 .

2 .

341 1

lil 11.2

-

1 10.4 D_a 11.2 - 11.2

- 11.2 9.6

40

> 32

5 24

E .c 16

8

U

20

16

12

8

4

16

12

8

4 L 120 160 --v

40 80 120 160 .,-.-.

40 8 t9/d eg lees

8

6

4

2

60 180

48

40

32 >.

$ 24 * 16

a I -180 -60

1-,---T-l 60 180

-- -180 -60

- ....T

180 -180 -6 d/deg rees

Fig. 2 Experimental variation in the length and direction of the Cl’-Cl’ vector for (a) G * C pairs in B-DNA, (b) G - C pairs in A-DNA, (c) A T pairs in B-DNA and (6) A T pairs in A-DNA. R, 0 and 4 are defined in Fig. 1.

3.1.4 Electrostatic Minima for Bases constrained by a Rigid Backbone The above argument was derived from the probable low- energy gas-phase structures of the base pairs, but a base pair

within a DNA helix need not correspond to a minimum- energy conformation of the isolated pair. Although, a gas- phase minimum which is close to matching the WC structures would generally be expected to tolerate some dis- tortion to fit into the helix, hydrogen bonding is sufficiently sensitive to geometry that this need not always be the case. Also, it is possible for a base pair to be relatively stable in a Watson-Crick-like structure without having an uncon- strained minima in the region. To consider these possibilities, further electrostatic energy minimisations were performed within a rigid backbone model, where the two C1’ atoms were constrained to have the same separation as in the G C (WC) structure of minimum electrostatic energy. This struc- ture was chosen to represent a typical C1’41’ separation in DNA, as it is consistent with both the experimental data, and the assumed hard-sphere radii which determine the exact separations. In the first instance, all six orientational degrees of freedom were optimised, and therefore most of the pairs were able to adopt a structure containing two hydrogen bonds, often with only a small loss in electrostatic stabilisa- tion (Table 3). This shows that it is often possible for the competing interactions involved in multiple hydrogen bonds to lead to a range of structures which are close in energy. The destabilisation under the Cl’-Cl’ constraint was smallest for the A T (WC) structure, as might have been expected from the close correspondence of the relative positions of the C1’ atoms to those of G C (WC) in the gas-phase structures. At the other extreme, A A and T C were the only pairs which

Fig. 3 Experimental variation in the glycosyl bond angles. Top row: A , = LNl-Cl‘-Cl’ within (a) B-DNA and@) A-DNA. Bottom row: A, = L N ~ - C ~ ’ - C I ’ within (c) B-DNA and (d) A-DNA.

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 6: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

3412 J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

Table 3 Minima in electrostatic energy U for fixed backbone Cl'-Cl' distance

height' Aa All c N a ' CNbd /A /degrees /degrees /degrees

close contactsa U ~ - K 8 p t _ b l pair /A /kJ mol-' /kJ mol

~~~~ ~~

G - G N7 N1 3.1, 0 6 N2 3.0 - 76.0 7.1 - 1.6 to 1.5 32 39 147 A - A N7 - N6 3.1 -31.0 0 42 58 81 c - c N4 N3 3.1, N3 N4 3.1 - 84.7 3.1 0 24 24 180

T - T N3 0 4 3.0, 0 4 * N3 3.0 - 47.8 2.2 -3.8 to 2.1 26 26 145 G C WC 0 6 N4, N 1 * N3, N2 * 0 2 - 122.1 0.0 0 54 54 72 G - C N2 * N3 3.1 - 59.2 10.0 0 41 58 163 A - T N1 * N3 3.1, N6 0 4 3.0 - 60.4 0.1 0 55 53 72 G - T 0 6 * N3 3.0 N1 0 2 3.1 - 56.5 6.7 0 41 65 75

c - c N4 * N3 3.1, (N4 * N4 3.2) - 46.7 -3.0 to 1.0 39 28 120

G - T A - C A - C G 9 A G - A G - A T - C T - C

N2 - 0 2 3.0 N7 - N4 3.1, N6 * N3 3.1 N7 N4 3.1, (N6 N4 3.1) N1 N7 3.1, 0 6 * N6 3.0 N2. N7 3.1, N1 N7 3.1 N2 N1 3.1 0 4 * N4 3.0, (C7 * N4 3.7) 0 4 N4 3.0

- 35.5 - 57.6 -43.1 - 57.4 - 46.5 - 36.3 -31.9 -31.5

further minima if one base also fixed in plane of WC pairs G - G N7 N1 3.1, 0 6 N2 3.0 -67.1 T - T N3 0 4 3.0 - 23.9 A - C N6 * 0 2 3.0 - 27.0 G - A N1 N7 3.1, 0 6 N6 3.0 - 56.6

further minima if both bases constrained to be coplanar G - G (N7 * N1 3.4, C8 * N1 3.6) - 52.2 G - G N2 - 0 6 3.0 - 35.3 c - c N4 0 2 3.1 -41.5 c - c N4 * 0 2 3.0 - 36.3 c - c N7 * N4 3.1, (N7 * C5 3.6) - 36.9 G - A N1 N7 3.1. (06 N6 3.3) -44.3

-5.5 to 0.3 8.0 -2.2 to 2.5 0.3 -3.0 to 2.1 2.0 -1.3 to 4.9 5.0 -2.9 to 2.7

-2.2 to 2.4 -4.8 to 0.1

0

16.0 -3.6 to 1.6 0 0

2.8 -1.3 to 3.2

0 0 0 0 0

15.1 0

53 13 27 56 41

103 36 43

26 58 32 56

50 41 67 56 15 54

60 27 29 40 40 20 44 48

39 25 92 40

26 116 13 11 20 45

123 155 127 85

132 80

101 89

167 97 56 84

156 23 99

135 175 81

G * A N7 N6 3.1, 0 6 N6 3.0 - 37.7 0 43 21 117

The base pairs have a C1'41' distance of 10.9 A, as in the WC G C minimum. The other parameters characterising the base pair structure are defined in the Fig. 1. Minima suitable for fitting within the DNA helix are underlined. Minima below -30 kJ mol-I have generally been excluded.

Parenthesis denote traditional hydrogen bonds which are slightly elongated from van der Waals contact, or noteworthy van der Waals contacts which do not correspond to traditional hydrogen bonds. Loss in electrostatic stabilisation energy in going from the unconstrained minimum with similar hydrogen bonds (in Table 1). ' The range of z values for all atoms except the hydrogens bonded to Cl'. The angle between the two glycosyl bonds, calculated from the scalar product of the two C1N bond vectors. When the angles between the glycosyl bonds and the C1'41' vector are suitable (30" < 1, x 1, < 75"), then this angle will be approximately 180" for anti-parallel bonds, and less than 90" when the geometry has the required approximate pseudodyad.

were unable to form a reasonably stable structure within a fixed backbone constraint, because the thymine and cytosine are too small to form more than one hydrogen bond, and adenine is too large, given the relative disposition of its hydrogen-bonding groups.

The strongly bound structures for a fixed Cl'-Cl' distance were significantly non-planar for G G, T T, A C and G A. To establish the importance of the non-planarity in establishing multiple hydrogen bonds, further optimisations were performed where the C1'41' vector was constrained to be in the plane of first one, and then both, of the bases, until a planar minimum-energy structure was found. In the case of G G and G . A, the constraint of coplanarity led to rela- tively little loss in stabilisation energy, giving quite favour- able binding for the coplanar complex. However, for T * T and A C the strongest coplanar structures had only one hydrogen bond and were not particularly favourable.

In the fixed backbone model, the G C (WC) structure obviously becomes the most stable by a larger margin. However, both C C and G G can form structures that are significantly more stable than the A T (WC) structure, within the rigid backbone constraint. However, both of these structures can be excluded from DNA because of the relative orientations of their glycosyl bonds : the highly symmetrical

C C structure has a dyad perpendicular to the plane of the bases (instead of in this plane), and the G G structure also has the glycosyl bonds in the wrong relative direction. Thus the observed G C and A T structures also emerge from this analysis as the most probable to form within the DNA backbone. An analysis of the glycosyl bond geometry and stacking heights of the constrained structures, shows that only the complementary pairs, and a G T, G A, A A and T C structure fit the geometric criterion described in Section 3.1.3. The A A and T C structures are not particularly energetically favourable. Thus the fixed backbone model also predicts that G C, A T, G T and G A base pairings are the most favourable within a double helix.

3.2 Base Pair Structures of 06-Methylguanine

In-this section, we apply the above electrostatic modelling methods to the alkylated base MeG when it is base-paired with C and T. This modification converts the C606 carbonyl group to a C6-06-C7H3 group, and N1 consequently loses its bonded proton. Thus two of the groups involved in the Watson-Crick G C hydrogen bonds are altered.

Furthermore, the new methyl group can rotate around the C6-06 bond. The most stable conformation of 06-

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 7: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89 3413

methylguanine is believed to be the distal form with the C7 methyl group on the N1 side, in the hydrogen-bonding region. However, the proximal conformer would cause less disruption to the WC base pairing structure, as the methyl group is on the other side, close to N7. There is experimental evidence for structures involving both, and intermediate con- formations, as discussed below. Ab initio calculations on the isolated base3* predict two stable conformers with the proxi- mal conformer being 42 kJ mol-" less stable than the distal conformer for an STO-3G basis, and 19 kJ mol-' for a 3-21G basis set. These calculations also predicted a change of about 7" in the C6-06-C7 angle between the two con- formers. Hence the energy difference between the conformers is suffciently small that this study must consider both con- formers, but it may be suffciently large to introduce consider- able uncertainty into the interpretation of relative inter base interaction energies.

Thus, we performed SCF and electrostatic modelling calcu- lations using both the crystal structure distal geometry,23 and a proximal geometry derived by just changing the torsion angle. The likely error in the assumed C6-06-C7H3 angle was found to have a negligible effect on the calculated elec- trostatic interaction energies.

3.2.1 Gas-phase Minima involving 06-Methylguanine 06-Methylguanine can form some very stable complexes with cytosine (Table 4). The most stable, which has the methylated guanine in the proximal conformation, has an electrostatic interaction energy of -70 kJ mol-' and is, therefore, more stable than most of the gas-phase complexes between the normal bases. The complex has the two hydrogen bonds, N1. - +NH4 and N2H- * .N3 of a wobble conformation, and is planar. This wobble conformation was suggested by Leonard et a l l 6 on the basis of UV melting experiments, although they depicted it with the 06-Me group distal. The 06-Me group conformation does affect this complex, in that the cor-

responding minimum energy for the distal conformer is less stable by 20 kJ mol-' and significantly non-planar. Both of these minima correspond to Cl'-Cl' distances that are only a few tenths of an Angstrom longer than the longest found experimentally in A-DNA (Section 3.1.2), although they are significantly longer than most Cl'-Cl' distances involved in B or A-DNA. The direction of the Cl'-Cl' vectors, and the relative orientations of the glycosyl bonds are definitely suit- able for the DNA helix, according to the analysis in Section 3.1.3, although the non-planarity of the 06-methylguanine- cytosine complex with the distal conformer could cause some stacking problems. The strongly electrostatically favoured wobble structure is the most likely to be tolerated in a DNA helix, although there would be strain in the helical backbone; it is most stable, and planar, for the proximal conformation. None of the other gas-phase minima between 06-methyl- guanine and cytosine could possibly be accommodated within the DNA backbone as the C1'-Cl' vectors are in the wrong direction and considerably too short or too long.

The complexes of 06-methylguanine with thymine are less energetically favourable. However, for both conformers there is a minimum-energy structure with N2H...02 and Nl.e-NH3 hydrogen bonds which has a suitable C1'-Cl' vector and glycosyl bond disposition to fit within the helix. However, the complex with the distal conformer is extremely non-planar, due to the steric hinderance of the C7-Me group. The steric clash with the neighbouring base pairs in the helix might be alleviated by twisting about the C6-06 bond, if the intramolecular torsion barrier was not too steep. The complex between the proximal conformer and thymine is not particularly stable (-45 kJ mol-') considering that there are two hydrogen bonds, presumably because of the strong elec- trostatic repulsion between 0 6 and 0 4 which are separated by only 3.1 A! This structure is not the only gas-phase minimum that would be expected to fit within the double helix structure, but it is a particularly good match for the Watson-Crick base pairs.

Table 4 Minima in electrostatic energy U for base pairs including 06-methylguanine

CN, * CNb ' hydrogen bonds and U R 0 4 AR 1, 'b

pair other contacts" /kJ mol-I /A /degrees /degrees /A /degrees /degrees /degrees

MeG(d) * C N3 - N4, N2 * N3 - 57.7 9.0 76 -2 8.6 104 9 79 N2 * N3, N1 N4 - 50.7 11.6 115 - 5 2.4 65 35 85

m 2 - 0 2 4.2, N1 * N3 3.9, 0 6 * N4 4.23 N7 - N4, fC8 * N3) - 47.0 8.7 110 180 17.3 70 48 62 0 6 * N4, N7 * N4 - 40.0 12.3 169 180 12.5 11 4 174

MeG(p) * C N2 * N3, N 1 - N4 - 70.6 11.5 116 0 2.1 64 42 74

[N2 - 0 2 3.8, N1 N3 3.8, 0 6 N4 3.81 N2 * N3, N3 - N4 - 55.0 9.0 76 -2 8.6 104 10 78 0 6 - N4, (C7 * N3) - 32.0 13.2 - 3 0 10.5 3 31 152 (C8 * N3, C1 * 02) - 26.2 6.2 68 180 16.9 112 68 0

MeG(d) * T N2 - 02, N1 N3 - 40.0 10.1 132 47 6.6 48 54 1 1 1 N 2 . 0 2 , N3 * N3 - 40.0 8.0 74 0 8.7 106 23 97 N2 * 02, N1 * N3 - 37.4 10.6 122 14 2.2 58 46 86 N7 * N3 - 33.3 10.0 159 180 12.8 21 29 173 N7 * N3 -31.8 8.4 127 180 15.5 53 64 63

N2 * 04, N 1 . N3 - 48.0 11.0 144 - 34 5.6 36 46 138 N2 * 02, N 1 - N3 - 44.8 10.9 127 0 0.1 53 55 72 N7 * N3, (C8 - 02) - 23.7 7.9 112 180 16.4 68 70 42

MeG(p) T N2 02, N3 N3 - 50.1 7.4 73 13 9.0 107 28 74

Parenthesis denote van der Waals contacts which do not correspond to traditional hydrogen bonds. Distances corresponding to the hydro- gen bonds in the Watson-Crick G * C structure are given in [ ] where relevant. ' The structural parameters are defined in Fig. 1, except CN, CN,, which is the angle between the glycosyl bonds, as determined from their vector scalar product. When the angles between the glycosyl bonds and the Cl'-Cl' vector are suitable (30" c 1, x 1, < 75", then this angle will be ca. 180" for anti-parallel bonds, and less than 90" when the geometry has the required approximate pseudo-dyad. Minima which might fit within a DNA helix, as discussed in 3.1.3, are underlined. Minima with U > -20 kJ mol-' have not been tabulated.

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 8: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

3414 J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

3.2.2 06-Methylguanine Base Pairs If the C1’ atoms are constrained to the separation found for the Watson-Crick G C pair, then there is a considerable reduction in the possible electrostatic stabilisation energy for all but the proximal thymine pair, as would be expected from the optimal Cl’-Cl’ separations. However, both distal and proximal 06-methylguanine can form low-energy structures with either cytosine or thymine that are compatible with the helical constraints (i.e. planar with a reasonable glycosyl bond geometry, see Table 5 ) which have a favourable electro- static stabilisation energy of 30-45 kJ mol-’.

The most favourable constrained minimum ( - 5 5 kJ mol- I ) corresponds to the proximal conformer interacting with cytosine in a distorted and non-planar wobble confor- mation with slightly elongated N2H- .N3 and N1. - eHN4 hydrogen bonds. In this conformation, the Watson-Crick hydrogen bond distances are around 3.6 A, about 0.2 A shorter than in the unconstrained minimum. The most favourable planar minima (-45 kJ mol-’) conserves the N2H...N3 hydrogen bond and the loss of the Nl.e.HN4 bond is partially compensated for by the shortening of the N2H. - ~ 0 2 distance to 3.26 A. This could be a normal hydro- gen bond, giving a bifurcated hydrogen bond structure involving N2H, if the van der Waals radius on the cytosine C2 were smaller. The distal conformer has a similar behav- iour, although the structures are lower in energy and more

Fixed- bac kbone Electrostatic Minima for sterically constrained by the C7-Me group. However, both sets of constrained minima have considerably less binding energy than the gas-phase minimum which has an optimal N1. - sHN4 hydrogen bond, and there is likely to be a steep path from these constrained minima down to the doubly hydrogen bonded minima on the unconstrained potential- energy surface. Hence, the interaction between the bases would produce a strong force for distorting the Cl’-Cl’ geometry. There are a range of structures which could occur in DNA which lie between the planar fixed-backbone struc- ture [e.g. Fig. 4(b)] and gas-phase geometries, with a wobble pair of hydrogen bonds [e.g. Fig. q a ) ] , including non-planar structures, and structures with bifurcated hydrogen bonds.

Within the constraints of the fixed-backbone model, the interactions of 06-methylguanine with thymine are slightly less favourable than those with cytosine, though this differ- ence is considerably smaller than for the gas-phase minima. The distal conformer has some constrained minima that are only marginally less favourable than the gas-phase minima, but these highly twisted structures would not fit within the neighbouring base pairs. The most stable constrained planar structure for the distal conformer is the least favourable ener- getically of the four combinations, but by a small margin compared with the intramolecular energy required to twist the 06-Me into a non-planar or proximal position. The gas- phase minima for the proximal conformer with thymine, with a N2H+- .02 and Nl. . .HN3 hydrogen bond, is virtually

Table 5 Minima in electrostatic energy U for @-methylguanine complexes for fixed-backbone Cl’-Cl’ distance

close contacts‘ u U-U, , ,~ l height‘ 1, 1, CN;CNbd pair /A /kJ mol-’ /kJ mol /A /degrees /degrees /degrees

MeG(d) * C

MeG(p) - C

MeG(d) - T

MeG(p) - T

MeG(d) * C

MeG(p) - C

MeG(d) - T

N2 * N3 3.1, (N2 C2 3.6, N1 - N4 3.9) - 36.3 14.4 -0.7 to 3.2 73 56 51

N2 - N3 3.1, (N2 - C2 3.6, N1 - N4 4.2) - 36.0 14.7 0 75 59 47 “2 - 0 2 3.4, N1 * N3 4.0, 0 6 * N4 4.81

“2 - 0 2 3.3, N1 - N3 4.2, 0 6 * N4 5.11 N2 - 0 2 3.0, (N2 C2 3.7) - 27.9 -1.7 to 0.8 58 59 177

N2 * N3 3.1, N1 - N4 3.1 - 54.7 15.9 -1.6 to 1.2 66 47 67 “2 - 0 2 3.6, N1 * N3 3.6, 0 6 * N4 3.71 N2 N3 3.1, (N2 * C2 3.6, N1 * N4 4.1) - 45.0 25.6 0 74 59 47 “2 * 0 2 3.3, N1 * N3 4.1, 0 6 * N4 5.0) N l * N4 3.1, (N2 - N4 3.1) -41.2 -0.1 to 4.7 60 43 90

N2 * 0 4 3.0, (N1 * N3 3.4) N2 - 0 2 3.0, (N1 N3 3.5)

- 37.5 -2.2 to 4.3 46 48 122 - 33.7 6.3 -1.1 to 5.6 59 54 75

N2 * 0 2 3.0, N1 N3 3.1 - 44.8 0.0 0 53 55 72 N2 * 0 2 3.0, N1 - N3 3.1 -44.8 0.0 -3.2 to 0.7 54 53 74 N2 * 0 4 3.0, (N2 * C7 3.7) - 20.9 -6.1 to 1.9 101 22 67

further minima when both constrained to be coplanar N2 0 2 3.0 (N2 - C2 3.7) - 27.7 N2 - 0 2 3.0 (N2 * C2 3.6) - 26.8 N2 - 0 2 3.1 - 19.3

0 54 66 168 0 68 47 159 0 68 74 38

N2 - 0 2 3.0 - 23.7 0 55 68 167

N2 - 0 2 3.0, (C7 - 0 4 3.5) -31.1 0 62 69 49

The base pairs have a Cl‘-Cl’ distance of 10.9 A, as in the WC G C minimum. The other parameters characterising the base pair structure are defined in the Fig. 1 . Minima suitable for fitting within the DNA helix are underlined. Minima below -20 kJ mol-’ have generally been excluded. ’ Parenthesis denote hydrogen bonds which are slightly elongated from van der Waals contact or noteworthy van der Waals contacts which do not correspond to traditional hydrogen bonds. Distances corresponding to the hydrogen bonds in the Watson-Crick G - C structure are given in square brackets where relevant. Loss in electrostatic stabilisation energy in going from the unconstrained minimum with similar hydrogen bonds (in Table 4). The range of z values for all atoms, except the hydrogens bonded to a Cl’. The angle between the two glycosyl bonds, calculated from the scalar product of the two C1-N bonds vectors. When the angles between the glycosyl bonds and the Cl’-Cl’ vector are suitable (30” < )La x A,, c 75”), then this angle will be approximately 180” for anti-parallel bonds, and less than 90” when the geometry has the required approximate pseudo-dyad.

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 9: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89 3415

\

Y O 6

'\ Fig. 4 Comparison of certain electrostatic minima between proxi- mal 06-methylguanine and cytosine (thick lines) and the Watson-' Crick G C structure (thin lines). (a) Unconstrained gas-phase minimum with UWmtic = -71 kJ mol-'. (b) Minimum when con- strained to have G - C Cl'-Cl' separation and to be planar, with Ucrutis = -45 kJ mol-'.

unaltered by the constraint of the C1' positions, and will be at quite a favourable energy minimum within the DNA struc- ture. There is also a corresponding non-planar constrained minimum with the same energy but slightly longer 06. - - 0 4 separation (3.25 A against 3.13 A in the constrained planar structure), emphasising the destabilising effect of this contact.

4. Discussion Consideration of the electrostatic interactions alone between nucleic acid bases leads to the prediction of the low-energy structures in good agreement with experiment and more complete treatments of the intermolecular forces. Thus, a comparison of our calculated minima in the electrostatic energy for pairs of the normal bases with the SCF energy of the complexes calculated by Hobza and Sandorfy4 shows agreement on the main features: namely G C (WC) struc- ture is the most stable, then G GI, C CI, and G GI11 are significantly more stable than the rest, and C CII is not a stable structure, More detailed comparisons of the two methods are not appropriate as the SCF calculations used different monomer geometries, with the bases protonated at Nl/N9, whereas these calculations use methylated bases. The predicted geometries of the base pairs are sometimes differ- ent, most notably when the less constrained electrostatic cal- culations predicted non-planar structures. Moreover, the two theoretical methods differ in their approximations and weak- nesses. Since the SCF calculations used a minimal basis set, whereas the electrostatic model was based on a split-valence plus polarisation description of the monomer charge distribu- tion, the electrostatic component of the SCF calculations is less well estimated. However, allowing for these uncertainties, the comparison with the SCF calculations confirms that the electrostatic model is able to locate the most favourable structures for the base pairs in vacuu, and to estimate their relative stability.

When we use the experimentally determined ranges for key geometric parameters as a filter, we find that only the Watson-Crick G C and A T and the mismatch G T and G A base pairs can fit onto the helical backbone. Moreover, the fixed-backbone constrained minima led to the same con- clusion, that after the Watson-Crick complementary base pairs, the most probable alternative base pairings are for a G T structure with 06. - .N3 and N1- - - 0 2 hydrogen bonds and a G A structure with Nl-.-N7 and 06-- .N6 hydrogen bonds. The detailed structure of any likely mismatch pairs within DNA would probably be intermediate between the gas-phase and the fixed-backbone models, as it would be a compromise between the base-base interactions, the back- bone steric constraints, although other factors such as stack- ing interaction and hydration would also play a role.

The predicted G T and G A wobble base pairs have been observed in DNA. The G T wobble base pair, with the G TI1 hydrogen bonding pattern, has been observed in A, B and Z-oligonucleotides.39 Crystal structures of both an A type4' and a B-type4' double helix contain such G T mis- matches corresponding to A, = 42", 43", 45" and A, = 67", 70", 72", in remarkable agreement with our gas-phase vdues (A, = 42" and A, = 67") and fixed-backbone model (A, = 41" and A, = 65").

G A mismatches with the same hydrogen bonds as our predicted G A111 structure have also been observed in the crystal structure d(C-G-C-G-A-A-T-T-A-G-C-G) in a B-DNA helix.42 The two observed GA base pairs have pro- peller twists of 17.5" and 11.8" and glycosyl bond angles of A,, = 55.2", A,, = 40.8" and AgG = 62.3", = 39.5", respec- tively. This is in reasonable agreement with our non-coplanar gas-phase minima G AIII (A9G = 53", A,, = 38") and the fixed-backbone non-coplanar (A96 = 56", A,, = 40") and coplanar (A9G = 54", A,, = 45") structures. There is experi- mental evidence for different conformations for the G A mismatch43 which lie outside our criteria for fitting into a normal helix. Proton NMR44 and X-ray crystallography4' have found this mismatch with both G and A in the anti conformation and with N1. - .N1 and 06. .N6 hydrogen bonds as considered by Crick.46 This corresponds to the gas- phase minimum G - AI, which has the appropriate glycosidic bond symmetry but a long Cl'--Cl' distance (13.1 A). The unusually large Cl'-Cl' distance has been noted experime'n-

It has also been proposed, based on evidence from combined NMR and modelling studies:' that the G A11 conformation, which also has both glycosidic angles as anti, can exist within a B-like helix.

Many of the other electrostatic minima structures are likely to be found when the steric constraints of the DNA backbone are relaxed. For example a T - T wobble pair destabilises a DNA helix,48 as does a C T mismatch at the centre of a duplex, though it behaves as a stable base pair at the terminus.49 This is consistent with the T T and C T pairs having gas-phase minima with short C1'41' distances, which are otherwise geometrically acceptable (T TI and T CII), and constrained minima that are not particularly stable. Many of the minima are observed in crystal structures of nucleic acid components and derivatives6

Our electrostatic claculations on the base pairing of 06- methylated guanine shows that the likely energy differences between an 06-methylguanine cytosine and 06-methyl- guanine thymine base pair within DNA are not very large. Although the proximal conformer has stronger base pair interactions than the distal conformer, the latter is more stable in isolation, and these two factors will cancel to a large degree, so there is unlikely to be a clearly preferred con- former. Thus more information on the barrier to rotation about C606 will be essential for more detailed theoretical

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 10: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

3416 J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89

studies of the effects of this modified base on DNA. Our con- clusions differ from those of Nagata and Aida,” who per- formed SCF STO-3G supermolecule calculations (without correction for basis set superposition error) on the MeG C and MeG T pairs at the Watson-Crick structure, and found that MeG C was repulsive. Although the reliability of such calculations is poor, the main cause for the disagreement is likely to be that their search for other relative orientations was too limited and would not have detected the range of favourable structures found by our method.

This lack of an unambiguously preferred structure for 06- methylguanine base pairs explains the controversy in the lit- erature over the effects of this modification, as it suggests that the exact geometry of the base pair will depend on its sur- roundings, including the neighbouring base pairs etc. A prox- imal 06-methylguanine thymine base pair structure with N2H.s.02 and Nl.s-HN3 hydrogen bonds can clearly be at a potential minimum within the DNA helix, despite the prox- imity of 0 6 and 04. This structure has been observed within a dodecanucleotide crystal of B-DNA.l6 The observed struc- ture has slightly shorter contact distances than the normal van der Waals radii in our model will allow, but is otherwise essentially the same. Our calculations confirm that the muta- genic base pair would not distort the DNA backbone, as it is almost identical to the G C pair structurally. However, even though the base-pairing has two hydrogen bonds, it would be destabilising, because the base-base interaction is only about a third of that for the Watson-Crick G C base pair.

The 06-methylguanine cytosine interactions can produce a very stable planar wobble structure with two hydrogen bonds (N2H. . .N3 and N1. - vHN4) but this would strain the helical backbone. However, even if the backbone was effec- tively rigid, the pair could form a planar structure with the N2H...N3 hydrogen bond, or a twisted structure with both hydrogen bonds. These have comparable energies to the proximal 06-methylguanine . thymine minimum. Thus the 06-methylguanine cytosine base pair is likely to be found in wobble structures with an N2H. - .N3 hydrogen bond, which are a compromise between our gas-phase and fixed-backbone minima (Fig. 4). This is consistent with the structures of two different 06-ethylguanine cytosine base pairs found, by X-ray crystallography, in B-DNA dodecamers complexed to minor groove binding drugs.” One base pair has the predict- ed wobble hydrogen bonds with an N2. - sN3 distance of 2.61 A and an N1. .N4 distance of 2.60 A, in a non-planar struc- ture with the ethyl group distal. The second structure has N2H in a bifurcated hydrogen bond to 0 2 (2.63 A) and to N3 (2.90 A), and N4H in a longer bifurcated hydrogen bond to 0 6 (2.97 A) and N1 (3.16 A) in a non-planar structure with the ethyl group proximal. This is in reasonable agreement with our model, because the hydrogen bonds in the structure (which has a resolution of ca. 2.0 A) are significantly shorter than permitted by our model.

The predicted wobble structure for MeG C corresponds to the structures postulated on the basis of NMR studies of a dodecanucleotide duplex’* and a related DNA fragment in aqueous so l~ t ion , ’~ with the later showing that the distal conformation was involved in the complex. Similar NMR studies on the MeG T base air'^^^' also postulated a distal conformer with the N2H.e.02 hydrogen bond, and the C7-Me group forcing a significant elongation of the N1. . .HN3 potential hydrogen bond, as in our coplanar con- strained minimum.

Thus the NMR and X-ray studies mentioned above are consistent with our predictions for MeG T and the conclu- sion that MeG C will adopt a wobble structure, distorted by the backbone constraints, without a clear preference for the conformation of the 06-Me group. We do not predict the

06-alkylated guanine cytosine base pair structure observed within Z-DNA.” In this X-ray study, the electron density observed for the two MeG - C base pairs ‘shows unequivo- cally that the bases are paired in the conventional Watson- Crick arrangement’. This result is surprising, since the loss of the proton on the N1 on alkylation of the guanine would be expected to lead to a strongly repulsive interaction with the cytosine N3. Ginell et a2.” suggested that their observed structure implies protonation of the cytosine N3 or the occurrence of minor tautomers. They also suggest that a wobble base pairing may be tolerated in B-DNA, but would be inordinately destabilising for the Z conformation due to unfavourable stacking interactions. We calculate that their proposed MeG C base pair structure has a surprisingly favourable electrostatic interaction (ca. - 25 kJ mol- for the proximal conformer in our optimum G C conformation), although there could be some steric repulsion between N1 and N3 depending on the exact intra and intermolecular geometries. Hence, if the steric constraints are more severe in Z-DNA than in the DNA structures that we have considered, then the observed structure is possible without invoking protonation or rare tautomers, though it would destablise the helix.

In this paper, we have concentrated on structures rather than on energetics as this is the proven strength of the elec- trostatic plus hard-sphere model. However, it is worth noting that our results do predict that 06-methylguanine forms stronger complexes with cytosine than thymine, although these are both significantly weaker than the Watson-Crick G C base pair, in agreement with the thermodynamic mea- surements on olig~nucleotides.~~

5. Conclusion The analysis of the electrostatic hard-sphere minima in both the gas phase and under constraints is a useful method of finding likely structures for the base pairs and for determin- ing which are likely to be found within DNA. These struc- tures not only include those which are obviously stable because of multiple hydrogen bonds, but also those which are either less obvious or non-planar. Such calculations are necessary to interpret experimental results, as counting hydrogen bonds, and invoking rare tautomers and proto- nation to increase their numbers, is too simplistic. In the case of the normal base pairs, the possibilities have been so widely discussed that the identification of likely wobble pair struc- tures seems rather obvious. However, it should be remem- bered that there were omissions in the papers that discussed the likely base pair structures on merely geometrical opti- misation of the number of hydrogen bonds3 and that this predicted unstable structures (C CII).

The ability of the electrostatic model to predict when the Other interactions present will significantly destabilise struc- tures with multiple hydrogen bonds, such as C CII, has pro- vided useful insights into the effects of a modification of guanine on the DNA structure. An MeG T base pairing structure with two hydrogen bonds, which is an excellent structural match for the Watson-Crick G C pair, is suffi- ciently destabilised by *e proximity of 0 6 and 0 4 that less obvious MeG . C base pair structures are energetically com- petitive. Thus theoretical calculations which are capable of giving a reasonable estimate of the binding energy and can produce minima in unexpected geometries are essential for predicting possible base pairing with unusual bases.

S.L.P. thanks the SERC for computer facilities, provided under GR/G11668. J.M.G. acknowledges support from the

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online

Page 11: What base pairings can occur in DNA? A distributed multipole study of the electrostatic interactions between normal and alkylated nucleic acid bases

J. CHEM. SOC. FARADAY TRANS., 1993, VOL. 89 3417

1 2

3

4 5

6

7 8

9 10

11

12

13

14 15

16

17

18

19

20

21

22 23

24

Leverhulme Trust and from the SERC under GR/H35170 (MRI) and GR/09016 (CSI) as well as the Wellcome Trust for a Research Leave Fellowship. F.L.C. and J.T. worked at UCL as ERAMUS students.

References

J. S. Andrews, S. M. Colwell, N. C. Handy, D. Jayatilaka, P. J. Knowles, R. Kobayashi, N. Koga, K. E. Laidig, P. E. Maslen, C. W. Murray, J. E. Rice, J. Sanz, E. D. Simandiras, A. J. Stone and M.-D. Su.

25 P. C. Hariharan and J. A. Pople, Theor. Chim. Acta, 1973, 28, 213.

26 J. S. Binkley, J. A. Pople and W. J. Hehre, J. Am. Chem. SOC., 1980.102.939.

V. Fritsch and E. Westhof, J. Am. Chem. SOC., 1991,113,8271. L. Cruzeiro-Hanson, P. F. Swann, L. Pearl and J. M. Goodfel- low, Carcinogenesis, 1992, 13,2067. J. Donohue, Proc. Natl. Acad. Sci. USA, 1956, 42, 60; J. Donohue and K. N. Trueblood, J. Mol. Biol., 1960,2,363. P. Hobza and C. Sandorfy, J. Am. Chem. SOC., 1987,109,1302. R. Rein, Perspectives in Quantum Chemistry and Biochemistry, 1978,2, 307. V. I. Poltev and N. V. Shulyupina, J. Biomol. Struct. Dynam., 1986,3, 739. R. G. A. R. Maclagan, Aust. J. Chem., 1979,32, 1635. J. Langlet, P. Claverie, F. Caron and J. C. Boeuve, Int. J. Quantum. Chem., 1981,19,299. C. C. Wilson, Nucleic Acids Res., 1988, 16, 385. A. D. Buckingham and P. W. Fowler, Can. J. Chem., 1985, 63, 2018. G. J. B. Hurst, P. W. Fowler, A. J. Stone and A. D. Buckingham, Int. J. Quantum. Chem., 1986,29, 1223. J. B. 0. Mitchell and S. L. Price, J. Comput. Chem., 1990, 11, 1217. J. B. 0. Mitchell and S. L. Price, Chem. Phys. Lett., 1989, 154, 267. S. L. Price and A. J. Stone, J. Chem. Phys., 1987,86,2859. G. Mitra, G. T. Pauly, R. Kumar, G. K. Pei, S. H. Hughes, R. C. Moschel and M. Barbacid, Proc. Natl. Acad. Sci. USA, 1989,86, 8650. G. A. Leonard, J. Thomson, W. P. Watson and T. Brown, Proc. Natl. Acad. Sci. USA, 1990,87,9573. M. Sriram, G. A. van der Marel, H. L. P. F. Roelen, J. H. van Boom and A. H. J. Wang, EMBO J., 1992,11,225. D. J. Patel, L. Shapiro, S. A. Kozlowski, B. L. Gaffney and R. A. Jones, Biochemistry, 1986,25, 1027. B. F. Li, P. F. Swann, M. W. Kalnik, M. Kouchakdjian and D. J. Patel, in N M R Spectroscopy in Drug Research, ed. J. W. Jaroszewski, K. Schaumburg and H. Kofod, Munksgaard, Copenhagen, 1988, pp. 309-340. D. J. Patel, L. Shapiro, S. A. Kozlowski, B. L. Gaffney and R. A. Jones, Biochemistry, 1986,25, 1036. S . L. Ginell, S. Kuzmich, R. A. Jones and H. M. Berman, Bio- chemistry, 1990, 29, 10461. R. Taylor and 0. Kennard, J. Am. Chem. SOC., 1982,104,3209. Y. Yamagata, K. Kohda and K. Tomita, Nucleic Acids Res., 1988,16,9307. CADPAC5 : The Cambridge Analytic Derivatives Package Issue 5, Cambridge 1992, A suite of quantum chemistry programs developed by R. D. Amos with contributions from I. L. Alberts,

27

28 29

30 31 32 33 34

35 35

37

38

39

40

41

42

43

44

45

46 47

48

49

50 51

S. L.'Price, J. S. Andrews, C. W. Murray and R. D. Amos, J. Am. Chem. SOC., 1992,114,8268. A. J. Stone and M. Alderton, Mol. Phys., 1985,56, 1047. W. A. Sokalski, P. C. Hariharan and J. J. Kaufman, Znt. J. Quantum Chem: Quantum Biol. Symp., 1987,14,111. A. Pullman and D. Perahia, Theor. Chim. Acta, 1978,48,29. R. Rein, Adv. Quantum Chem., 1973,7,335. M. A. Spackman, J. Chem. Phys., 1986,85,6587. W. A. Sokalski and S. F. Sneddon, J. MoZ. Graphics, 1991,9,74. A. J. Stone, ORIENT: a program for calculating electrostatic interactions between molecules, University of Cambridge, 1992. C. C. Wilson, Nucleic Acids Res., 1987, 15,8577. F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, G. F. Mayer, M. D. Brice, J. R. Rodgers, 0. Kennard, T. Shimanouchi and M. Tasumi, J. Mol. Biol., 1977, 112, 535. D. F. Bradley and H. A. Nash, Mol. Ass. Biol. Proc. Int. Symp. 1967, ed. B. Pullman, 1968,137. L. G. Pedersen, T. A. Darden, D. W. Deerfield, M.W. Anderson and D. G. Hoel, Carcinogenesis, 1988,9, 1553. See discussion within W. N. Hunter, T. Brown, N. N. Anand and 0. Kennard, Nature (Lmdon), 1986,320,552. G. Kneale, T. Brown and 0. Kennard, J. Mol. Biol., 1985, 186, 805. W. N. Hunter, T. Brown, G. Kneale, N. N. Anand, D. Rabino- vich and 0. Kennard, J. Biol. Chem., 1987,262,9962. T. Brown, W. N. Hunter, G. Kneale and 0. Kennard, Proc. Natl. Acad. Sci. USA, 1986,83,2402. T. Brown, W. N. Hunter and G. A. Leonard, Chem. BY., 1993, June, 484. L-S. Kan, S. Chandrasegaran, S. M. Pulford and P. S. Miller, Proc. Natl. Acad. Sci. USA, 1983, SO, 4263. G. G. Prive, U. Heinemann, S. Chandrasegaran, L-S. Kan, M. L. Kopka and R. E. Dickerson, Science, 1987,238,498. F. H. C. Crick, J. Mol. Biol., 1966,19, 548. Y. Li, G. Zon and W. D. Wilson, Proc. Natl. Acad. Sci. USA, 1991,88,26. A. G. Cornelis, J. H. J. Haasnoot, J. F. den Hartog, M. de Rooij, J. H. van Boom and A. Cornelis, Nature (London), 1979, 281, 235. S. A. M. Vanhommerig, M. H. P. van Genderen and H. M. Buck, Biopolymers, 1991,31, 1087. C. Nagata and M. Aida, J. Mol. Struct., 1988, 179,451. B. L. Gaffney and R. A. Jones, Biochemistry, 1989,28,5881.

Paper 3/02258G; Received 20th April, 1993

Publ

ishe

d on

01

Janu

ary

1993

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

30/

10/2

014

22:4

1:44

. View Article Online