nonplanar peptide bonds in proteins are common and conserved

5
Nonplanar peptide bonds in proteins are common and conserved but not biased toward active sites Donald S. Berkholz a,b , Camden M. Driggers a , Maxim V. Shapovalov c , Roland L. Dunbrack, Jr. c , and P. Andrew Karplus a,1 a Department of Biochemistry and Biophysics, Oregon State University, 2011 Agriculture and Life Sciences Building, Corvallis, OR 97331; b Departments of Physiology and Biomedical Engineering and Pediatric and Adolescent Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905; and c Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111 Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved October 19, 2011 (received for review May 4, 2011) The planarity of peptide bonds is an assumption that underlies dec- ades of theoretical modeling of proteins. Peptide bonds strongly deviating from planarity are considered very rare features of pro- tein structure that occur for functional reasons. Here, empirical analyses of atomic-resolution protein structures reveal that trans peptide groups can vary by more than 25° from planarity and that the true extent of nonplanarity is underestimated even in 1.2 Å resolution structures. Analyses as a function of the φ,ψ-backbone dihedral angles show that the expected value deviates by 8° from planar as a systematic function of conformation, but that the large majority of variation in planarity depends on tertiary effects. Furthermore, we show that those peptide bonds in proteins that are most nonplanar, deviating by over 20° from planarity, are not strongly associated with active sites. Instead, highly nonplanar peptides are simply integral components of protein structure related to local and tertiary structural features that tend to be conserved among homologs. To account for the systematic φ,ψ- dependent component of nonplanarity, we present a conforma- tion-dependent library that can be used in crystallographic refine- ment and predictive protein modeling. omega torsion angle peptide planarity protein geometry kernal density regression strain T he prediction of the dominant forms of secondary structure in proteins, α-helices and β-strands, was enabled by the simplify- ing assumption that the peptide bond was planar, consistent with its expected partial double-bond character and evidence from small-molecule crystal structures (13). Pauling, et al. were aware that deformations from planarity associated with an energetic cost could occur, but the expectation was that the minimum- energy conformation was always planar (2, 4). In proteins, the ω torsion angle measures peptide planarity, with ω ¼ 180° and ω ¼ 0° representing planar trans and cis peptides, respectively. In an early large-scale empirical study of peptide planarity, MacArthur and Thornton (5) found that in proteins determined at better than 2 Å resolution and in small-molecule peptides, the ω-distributions were Gaussian-like with averages of 179.6° (σ ¼ 4.7°) and 179.7° (σ ¼ 5.9°), respectively. These authors further proposed that the smaller spread seen in proteins was an artifact due to the planarity restraints used in crystallographic re- finements. This study and that of Karplus (6) also showed that the average ω-value varies as a function of the conformation of the backbone torsion angles φ and ψ , with MacArthur and Thornton suggesting that the direction of nonplanarity was related to the handedness of the φ,ψ -associated chain twist (5). As more structures were analyzed at ultrahigh (1.2 Å) reso- lutions (7), higher deviations in planarity have emerged (813). It has also been proposed that highly nonplanar residues are biased toward active sites (14), and a number of descriptions of protein structures emphasized nonplanar peptide bonds in the active site (1417). The question of conformation dependence was revisited by Esposito, et al. (8) using structures refined at better than 1.2 Å resolution, and a correlation with the handedness of the chain twist was not found. Instead, peptide planarity was seen to most strongly depend on the ψ torsion angle of the residue preceding the peptide bond in question (8), with additional influence caused by participation in an α-helix or a β-strand. The authors proposed that accounting for these variations by conformation-dependent crystallographic restraints would be beneficial (8). In a related effort, we recently created the Protein Geometry Database (PGD; 18) and used it to document how protein back- bone bond lengths and angles vary as a function of φ and ψ and to produce a backbone conformation-dependent library (CDL) for use in protein modeling (19). We further showed that using this CDL to move beyond the paradigm of a single, context- independent ideal geometry does greatly improve the behavior of crystallographic refinements (20). Here, we extend this CDL to include the nonplanarity of the peptide bond. In the course of the analysis, we gain additional insight into aspects of peptide nonplanarity that allow it to be viewed as a feature that is widely seen in folded proteins and heavily influenced by nonlocal interactions. Results and Discussion The Resolution Dependence of Observed Deviations from Planarity. Consistent with earlier studies, for nonredundant structures determined at 1.0 Å resolution or better (see Materials and Meth- ods), the distribution of ω-values for trans peptides has σ ¼ 6.3°, much broader than the σ ¼ 4.8° distribution seen for structures determined at the lesser but still quite high resolution of 1.7 Å (Fig. 1A). This sizable increase in the standard deviation brings the 1 Å resolution structures to a spread on par with the deviation from planarity of σ ¼ 5.9° seen in linear small-molecule pep- tides (5). What has not yet been documented is at which resolution the artifact due to planarity restraints used in refinement ceases to be a problem. Compared to the standard deviation of the distribu- tion, a more sensitive measure of the effects of restraints is the number of highly deviating residues; this is because those will in- cur the largest restraint penalties with, for instance, a 20°-outlier experiencing a fourfold greater restraint pushing it back toward a planar conformation than would a 10°-outlier, assuming a harmo- nic restraint such as is used in protein crystallography (23). Indeed, the fractions of peptides deviating by >10° or >20° from planarity are about two and threefold higher for structures at 1 Å resolution compared with those at 1.6 Å resolution (Fig. 1B), and the electron density at the highest resolutions provides unambiguous evidence for the reality and the level of nonplanar- ity of such extreme outliers (Fig. 1 C and D). Author contributions: D.S.B. and P.A.K. designed research; D.S.B., C.M.D., and M.V.S. performed research; R.L.D. and M.V.S. developed kernel-regression methods; D.S.B. and P.A.K. analyzed data; and D.S.B. and P.A.K. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence should be addressed. E-mail: [email protected]. edu. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1107115108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1107115108 PNAS January 10, 2012 vol. 109 no. 2 449453 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Upload: hadang

Post on 07-Feb-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nonplanar peptide bonds in proteins are common and conserved

Nonplanar peptide bonds in proteins are commonand conserved but not biased toward active sitesDonald S. Berkholza,b, Camden M. Driggersa, Maxim V. Shapovalovc, Roland L. Dunbrack, Jr.c, and P. Andrew Karplusa,1

aDepartment of Biochemistry and Biophysics, Oregon State University, 2011 Agriculture and Life Sciences Building, Corvallis, OR 97331; bDepartments ofPhysiology and Biomedical Engineering and Pediatric and Adolescent Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905; and cInstitute forCancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111

Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved October 19, 2011 (received for review May 4, 2011)

The planarity of peptide bonds is an assumption that underlies dec-ades of theoretical modeling of proteins. Peptide bonds stronglydeviating from planarity are considered very rare features of pro-tein structure that occur for functional reasons. Here, empiricalanalyses of atomic-resolution protein structures reveal that transpeptide groups can vary by more than 25° from planarity and thatthe true extent of nonplanarity is underestimated even in 1.2 Åresolution structures. Analyses as a function of the φ,ψ-backbonedihedral angles show that the expected value deviates by�8° fromplanar as a systematic function of conformation, but that the largemajority of variation in planarity depends on tertiary effects.Furthermore, we show that those peptide bonds in proteins thatare most nonplanar, deviating by over 20° from planarity, are notstrongly associated with active sites. Instead, highly nonplanarpeptides are simply integral components of protein structurerelated to local and tertiary structural features that tend to beconserved among homologs. To account for the systematic φ,ψ-dependent component of nonplanarity, we present a conforma-tion-dependent library that can be used in crystallographic refine-ment and predictive protein modeling.

omega torsion angle ∣ peptide planarity ∣ protein geometry ∣kernal density regression ∣ strain

The prediction of the dominant forms of secondary structure inproteins, α-helices and β-strands, was enabled by the simplify-

ing assumption that the peptide bond was planar, consistent withits expected partial double-bond character and evidence fromsmall-molecule crystal structures (1–3). Pauling, et al. were awarethat deformations from planarity associated with an energeticcost could occur, but the expectation was that the minimum-energy conformation was always planar (2, 4). In proteins, theω torsion angle measures peptide planarity, with ω ¼ 180° andω ¼ 0° representing planar trans and cis peptides, respectively.In an early large-scale empirical study of peptide planarity,MacArthur and Thornton (5) found that in proteins determinedat better than 2 Å resolution and in small-molecule peptides,the ω-distributions were Gaussian-like with averages of 179.6°(σ ¼ 4.7°) and 179.7° (σ ¼ 5.9°), respectively. These authorsfurther proposed that the smaller spread seen in proteins was anartifact due to the planarity restraints used in crystallographic re-finements. This study and that of Karplus (6) also showed that theaverage ω-value varies as a function of the conformation of thebackbone torsion angles φ and ψ , with MacArthur and Thorntonsuggesting that the direction of nonplanarity was related to thehandedness of the φ,ψ-associated chain twist (5).

As more structures were analyzed at ultrahigh (≤1.2 Å) reso-lutions (7), higher deviations in planarity have emerged (8–13). Ithas also been proposed that highly nonplanar residues are biasedtoward active sites (14), and a number of descriptions of proteinstructures emphasized nonplanar peptide bonds in the active site(14–17). The question of conformation dependence was revisitedby Esposito, et al. (8) using structures refined at better than 1.2 Åresolution, and a correlation with the handedness of the chaintwist was not found. Instead, peptide planarity was seen to most

strongly depend on the ψ torsion angle of the residue precedingthe peptide bond in question (8), with additional influence causedby participation in an α-helix or a β-strand. The authors proposedthat accounting for these variations by conformation-dependentcrystallographic restraints would be beneficial (8).

In a related effort, we recently created the Protein GeometryDatabase (PGD; 18) and used it to document how protein back-bone bond lengths and angles vary as a function of φ and ψ andto produce a backbone conformation-dependent library (CDL)for use in protein modeling (19). We further showed that usingthis CDL to move beyond the paradigm of a single, context-independent ideal geometry does greatly improve the behaviorof crystallographic refinements (20).

Here, we extend this CDL to include the nonplanarity of thepeptide bond. In the course of the analysis, we gain additionalinsight into aspects of peptide nonplanarity that allow it to beviewed as a feature that is widely seen in folded proteins andheavily influenced by nonlocal interactions.

Results and DiscussionThe Resolution Dependence of Observed Deviations from Planarity.Consistent with earlier studies, for nonredundant structuresdetermined at 1.0 Å resolution or better (seeMaterials and Meth-ods), the distribution of ω-values for trans peptides has σ ¼ 6.3°,much broader than the σ ¼ 4.8° distribution seen for structuresdetermined at the lesser but still quite high resolution of 1.7 Å(Fig. 1A). This sizable increase in the standard deviation bringsthe 1 Å resolution structures to a spread on par with the deviationfrom planarity of σ ¼ 5.9° seen in linear small-molecule pep-tides (5).

What has not yet been documented is at which resolution theartifact due to planarity restraints used in refinement ceases to bea problem. Compared to the standard deviation of the distribu-tion, a more sensitive measure of the effects of restraints is thenumber of highly deviating residues; this is because those will in-cur the largest restraint penalties with, for instance, a 20°-outlierexperiencing a fourfold greater restraint pushing it back toward aplanar conformation than would a 10°-outlier, assuming a harmo-nic restraint such as is used in protein crystallography (23).Indeed, the fractions of peptides deviating by >10° or >20° fromplanarity are about two and threefold higher for structures at 1 Åresolution compared with those at 1.6 Å resolution (Fig. 1B),and the electron density at the highest resolutions providesunambiguous evidence for the reality and the level of nonplanar-ity of such extreme outliers (Fig. 1 C and D).

Author contributions: D.S.B. and P.A.K. designed research; D.S.B., C.M.D., and M.V.S.performed research; R.L.D. and M.V.S. developed kernel-regression methods; D.S.B. andP.A.K. analyzed data; and D.S.B. and P.A.K. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1107115108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1107115108 PNAS ∣ January 10, 2012 ∣ vol. 109 ∣ no. 2 ∣ 449–453

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Page 2: Nonplanar peptide bonds in proteins are common and conserved

Assuming that the proteins in each resolution bin have similarbehavior in terms of nonplanarity, a surprise finding is that evenat the normally-used ultrahigh resolution threshold of 1.2 Å, crys-tal structures still underestimate by ca. 30% and 100% the num-bers of peptides that have deviations from planarity of >10° and>20°, respectively. It is not until ∼0.9–1.0 Å resolution that thecurves level out. At 0.9 Å resolution the number of observations(at ∼6;900 residues) is still large enough to be considered broadlyrepresentative, so we suspect the increase in outlier observationbetween 1.0 Å and 0.9 Å is real. The fewer observations at 0.8 Å(∼1;900 residues) and especially 0.7 Å (∼500 residues) lead us notto propose a more stringent resolution cutoff associated with thereliable determination of extreme outlier ω-values.

The Local Conformation Dependence of Observed Deviations fromPlanarity.To analyze the dependence of peptide planarity on back-bone φ,ψ-angles, we used a dataset of 28,917 well-defined 3-re-sidue segments from diverse protein chains determined at 1.0 Åresolution or better (see Materials and Methods) and carried outseparate statistical analyses for eight groups of residues (Gly, Pro,Ile/Val, other “general” residues, and each of these groupspreceding Pro) as well as control calculations that grouped allresidues together and all prePro residues together (19). Althougheven 1 Å resolution structures may not have fully accurate ω-

values for extreme outliers, we have chosen this 1 Å resolutioncutoff as a trade-off that provides sufficient numbers of observa-tions to carry out a φ,ψ-dependent analysis while at the same timeproviding sufficiently accurate ω-values for extreme outliers.

Because a peptide bond resides halfway between two residues,we assessed the φ,ψ-dependence of the planarity of both thepeptide bonds before and after the central residue (residue 0):

Xaa−1ωbeforeXaa0

ωafterXaaþ1:

With this nomenclature, ωbefore is the omega-angle traditionallyassigned as belonging to residue 0. For observing conformation-dependent trends, we used kernel density regression with peri-odic von Mises functions as a method for achieving smooth localregressions as a function of φ and ψ (24). As seen in Fig. 2 forgeneral residues, the variation of ωbefore is largely φ-dependentwith a pattern of vertical stripes, and for ωafter the dependenceis mostly on ψ , resulting in horizontal stripes. For both peptideunits, the conformation-dependent averages vary over ∼16–17°,yet the standard deviation within each conformation remainsnear 6°, close to the 6.3° standard deviation of the overall distri-bution. For other residue types (i.e., Ile/Val, Gly, Pro, prePro) thevariations as a function of conformation show similar trends yetinclude distinct features (see Figs. S1, S2, S3, and S4).

A B

DC

0

1000

2000

3000

4000

5000

150 160 170 180 190 200 210

Obs

erva

tions

0

2

4

6

8

10

12

14

0.81.21.622.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

% r

esid

ues

with

| –

180

°| >

10°

% r

esid

ues

with

| –

180

°| >

20°

Resolution (Å)

| – 180°| > 20°| – 180°| > 10°0–1.0 Å

1.66–1.69 Å

ω (°)

Fig. 1. Observed nonplanarity in peptide bonds increasesat atomic resolution. (A) Histogram of the distribution of ωangles for two resolution ranges (≤1.00 Å and1.66–1.69 Å). Observation numbers and means of thetwo histograms are 32;549∕179.1° and 17;001∕179.3°, re-spectively. (B) The percent of general peptides that aremodeled as highly nonplanar (≥10° and ≥20° as notedin figure) are plotted as a function of resolution in0.1 Å resolution slices. (C,D) Two well defined highly non-planar peptide bonds (one rotated in each direction) lo-cated outside of protein active sites. Shown arethe peptide bonds between (C) residues Ile102-Asp103from a carboxylic esterase (PDB code 1qlw) (21) with ðω −180°Þ ¼ −26° (electron density at 6.0ρrms), and (D) residuesAsp105-Asn106 from a β-glycoside hydrolase (PDB code7a3 h) (22) with ðω − 180°Þ ¼ 23° (electron density at4.6ρrms). In a planar peptide bond, all five atoms wouldlie in the plane shown in gray. Searches were done withthe PGD (18) for dipeptides, using a 90% sequence-iden-tity threshold and otherwise default seach parameters.

A

B

Fig. 2. Conformation-dependent variation in the planarityof peptide bonds for general residues. Ramachandran plotsof the averages (A) and standard deviations (B) are shownfor ωbefore and ωafter as a function of the φ,ψ-angles of resi-due 0, and for ωafter as a function of the ψ of residue 0 and φof residue þ1 (i.e., ωbetween; right boxes). Within each plot,colors indicate ω values ranging from the global minimum(blue) to the global maximum (red) as calculated using ker-nel density regression (see SI Methods). The global minimumand maximum are provided in each plot. With ∼90% of thedata in bins having ≥36 observations (N), the standard errorsof the means (s∕

pN) are below 1° for the large majority of

residues.

450 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1107115108 Berkholz et al.

Page 3: Nonplanar peptide bonds in proteins are common and conserved

Given these patterns of dependence, the peptide planarity var-ies mostly with ψ of the preceding residue and φ of the followingresidue. Focusing on the peptide unit following the central resi-due (i.e., ωafter), the dependence of its planarity on these twotorsion angles can be visualized with a φþ1,ψ0 plot. Although thisplot (called ωbetween because the analyzed peptide bond is be-tween the two torsion angles being varied) shows a smaller totalspread of only 13.5°, it appears that the extreme values are morecentrally located in populated regions. Also, the standard devia-tions, while similar, are a few tenths of a degree smaller through-out. This plot also shows that the ψ0-dependence appears todominate over the φþ1-dependence (i.e., the main variationsoccur with ψ0 so plots have horizontal stripes of relatively con-stant ω), so for generating a local CDL we expect that using eitherωafter or ωbetween will lead to the highest predictive power; ω-be-tween will likely have somewhat higher predictive power, becauseit has both dihedrals adjacent to the peptide in question.

Despite the large standard deviations, histograms of theωbetween distributions for selected regions with high and lowaverages emphasize their distinct natures (Fig. 3A). Also ratherstriking is that the distribution of the φ,ψ-dependent averagesshows distinct maxima ∼4° degrees to either side of 180° in addi-tion to a main peak near 179° (Fig. 3B). To assess the impact ofsecondary structure on nonplanarity, we carried out separate

analyses of ωbetween for residues adopting α-region φ,ψ-anglesbut not residing in helices and for residues adopting β-regionφ,ψ-angles but not in β-strands (Fig. 3A). For the non-α-helicalsubpopulation, the average ω-value shifted from 180° to 183° andthe σ rose from 2.5° to 3.9°. For the non-β-strand subpopulation,the average ω-value shifted from 172° to 176° along with a littlechange in the spread of the distribution from a σ of 6.9° to 6.7°.Thus secondary structure formation causes a systematic ∼3°–4°adjustment in the expected ω-values (in one case closer to planarand in the other case away from planar) that modulates the ∼15°range correlated with variations in φ and ψ .

The observation of high (∼6°) standard deviations for theindividual φ,ψ-bins contrasts strongly with the behavior of back-bone bond angles, for which the standard deviations of the con-formation-dependent distributions (at ∼1.0°–1.5°) were abouthalf of the standard deviations seen for the population as a whole(19). This distinct behavior of ω, with values within each φ,ψ-binspanning ∼25° (which is �2σ with σ ≈ 6°), implies that longer-range interactions are playing a dominant role in influencing in-dividual ω-values. This implication is consistent with a quantummechanics study of peptide planarity in which calculated φ,ψ-associated deviations did not match closely with the ω-values inthe small protein crambin (25). In contrast, the authors reportedthat quantum mechanics calculations done for each residue in the

A B

C D

Fig. 3. Properties and implications of conformation-dependent ω deviations. (A) Observed deviations from planarity (ωbetween − 180°) are shown for observa-tions from selected 10° × 10° bins in three regions of the bordering torsion angles [ψ0∕φþ1]: α-helical residues in the region [−45� 5∕ − 65� 5]; β-strand re-sidues in the region [þ155� 5∕ − 115� 5]; and all residues in the region [−45� 5∕ − 95� 5]. In addition, distributions for residues in the first two bins but notin α-helices or β-strands, respectively, are shown. The distributions, based on structures at ≤1.2 Å resolution, are normalized to have the same area andsmoothed using Gaussian kernel-density estimates with a 1.5° bandwidth. (B) The distribution of median values for peptide nonplanarity (ωbetween − 180°)seen in the 10° × 10° φ,ψ bins. Distributions are treated as in A, with a 0.5° bandwidth. (C) The predictive power of a CDL-estimated ωbetween (ωCDL), plottedas the observed deviation from planarity for ωbetween (ωexptl − 180°) vs. ωCDL-predicted nonplanarity (ωCDL − 180°). In contrast, fixed 180° predictions wouldcollapse all data to x ¼ 0. Plotted are observations in the ωexptl range shown as well as the best-fit linear regression (black line and equation), which hasa standard uncertainty in the slope of 0.01. The coefficient of determination indicates that the CDL accounts for ∼20–25% of nonplanarity. The slope of>1.0 results from extreme deviations in less-populated regions being damped by the kernal density estimate fitting; this damping is intended to create abetter predictive model. As an estimate of the gain in predictive accuracy, comparing use of the CDL vs. a fixed prediction of 180°, the ω rmsd of the modelsfrom the reference is 5.6° vs. 6.3°. (D) Conceptual illustration of how a shift in the minimum of a harmonic energy well for nonplanarity with no change in itswidth would make large nonplanarities accessible at a lower computed strain energy. The true nature of the potential functions in relevant environments areunknown, but for the illustration we use a generic energy form previously suggested [Energy ¼ A sin2 ω, with A ¼ 30 kcal∕mol (2, 4)], and show minimum-energy ω-values shifted�10° (red, blue) from 180° (gray). This conceptual illustration shows how, all other things being equal, the 10°-shifted potentials enablea nonplanarity of 30° to be reached at a 4 kcal∕mol lower computed energetic cost relative to the minimal energy in the relevant environment.

Berkholz et al. PNAS ∣ January 10, 2012 ∣ vol. 109 ∣ no. 2 ∣ 451

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Page 4: Nonplanar peptide bonds in proteins are common and conserved

context of the whole protein produced much better agreement. Inthis light, the near 3° standard deviation of residues in α-helicescan be explained by their highly consistent longer-range contexteven compared with β-strands. Interestingly, the ∼3° standard de-viation seen for α-helices may be limited by coordinate accuracy,as it roughly matches the estimated uncertainty of ω-values in the1.2 Å resolution crystal structure of ribonuclease (13) and theagreement between noncrystallographic symmetry related ex-treme nonplanar peptides in this study (see Table S1).

Given the strong dependence of planarity on tertiary factors,the first-generation ω-CDL we generate here will not capture thefull diversity of ω-values in proteins. Nevertheless, such a CDL isstill a valuable step forward compared to a universal target valueof 180°. To decide which parameters to use in this first generationω-CDL, a set of trial CDLs were generated and tested for theirpredictive power (Table S2). As expected, both ωafter and ωbetweenstrongly outperformed ωbefore. For CDLs using ωafter or ωbetween,and with or without residue classes, the performance differencesare smaller, but overall the CDL based on ωbetween and usingclasses of residue types performed best (Table S2).

Fig. 3C illustrates the systematically improved agreement ofthis ωbetween-CDL with the observed ω-values in protein struc-tures. The slope near 1 shows that the CDL, as expected basedon how it was developed, is a good match to the averages of theobserved values. However, the large spread in ωexptl at each ωCDLvalue indicates the dominant impact of tertiary factors. The coef-ficient of determination of ∼0.20–0.25 implies that the localconformational dependence accounts for about one-quarter ofthe total variation. Also, the tendency of the most extreme devia-tions to occur in the same direction as the local effects supportsthe further insight that the conformation-dependent shifts in theexpected value of ω enable the larger deviations to occur at amuch lower computed energetic cost (Fig. 3D).

Extreme Deviations from Planarity Tend to Be Conserved but Do NotFavor Functional Sites. To investigate the conservation and func-tional significance of the most extreme examples of peptide non-planarity, we searched the PGD (18) for peptides ≥20° fromplanar using a slightly less stringent resolution criterion of≤1.2 Å. Manual inspection of the electron-density evidence foreach of the occurrences yielded 116 examples of proven reliability(Table S1). We assessed evolutionary conservation by finding thesubset of these proteins for which a homolog was also known at≤1.2 Å resolution.

This search yielded homologs (having ∼25–50% sequenceidentity) for eight proteins (from five protein families) that in-cluded 16 of the 116 highly deviating peptides. For 15 of the 16cases, the local backbone conformation is conserved and theequivalent peptide in the homolog is strongly nonplanar in thesame direction — greater than 9° in every case with a median

value of 16° (Table S3). For seven of these, the high deviationfrom planarity is maintained despite mutation of the residue.For one of the 16 cases [PDB code 1o5x:Phe150 (26–28)], thelocal backbone conformation in the homolog changed, and thenonplanarity was not conserved.

Viewing the distribution of these ω-outliers in the proteinstructures, we were surprised that the large majority of them, 13of 16, were not associated with the protein’s active site (Fig. 4). Tocarry out a more general assessment of any correlation betweenthe most extreme nonplanar peptides and functional sites inproteins, we used the Sequence Annotated by Structure (SAS)resource (31). Automated searches for all 116 ω-outliers showedno significant enrichment (at p ≤ 0.05) at functional sites com-pared to a control set of randomly chosen residues (Table S4).

Interestingly, a consideration of the secondary structural con-text of the ω-outliers reinforces the idea the secondary structureis not a strong determinant of peptide nonplanarity. Consideringthe central tripeptide residue (i.e., residue 0) of each of the reli-able ω-outliers, all secondary structure types are represented: 65are in β-structure, 12 in α-∕310-helices, 14 in H-bonded turns, 11in non-H-bonded bends, and 14 have no defined secondary struc-ture (Table S1). Interestingly, all five secondary structure typesinclude ω-outliers on both sides of 180°, proving that within a gi-ven secondary structure context, tertiary factors can cause omegato vary over 40°.

Outlook. In this work, we have conclusively shown that peptidenonplanarity is a common, even mundane, feature of proteinsthat is distributed throughout their structures, and it is not in gen-eral a marker for functional sites. The perceived association withactive sites appears due to a bias in what has been noticed ratherthan reflecting what exists. Indeed, the overwhelming majority ofthe extreme outliers we studied [including those in Fig. 1 C andD(20,21)] were not mentioned in the original structure reports.

We also show that using current refinement methodologies,better than 1 Å resolution data are required to accurately modelthe most extreme outliers and that based on such structures,a generic protein will have on the order of 10–15% of generalresidues deviating ≥10° from planarity with occasional residuesdeviating over 30° from planarity. When backbone path is con-served, such extreme ω-deviations also tend to be conserved.One factor that makes such extreme deviations more energeti-cally accessible are φ,ψ-dependent shifts in the thermodynami-cally most stable ω-value (Fig. 3D).

We have documented these φ,ψ-dependent shifts in a first-gen-eration ω-CDL. As was seen for a backbone bond length and an-gle CDL (19), the implementation of this CDL should help withthe accuracy of protein modeling, even though in this case thelocal effects only capture a modest portion of the variations inplanarity. As specific longer-range effects that influence peptide

A B C D E

Fig. 4. Highly nonplanar residues are not dominantly present in active sites. For five protein families having extreme ω-outliers and at least two divergentmembers analyzed at atomic-resolution (Table S1), a backbone ribbon is shown with ≥20° ω-outlier residues (red sticks) labeled and the active site regionidentified by a bound ligand (cyan sticks). (A) Penicillinopepsin at 0.95 Å resolution (PDB code 1bxo) (29). (B) Triose phosphate isomerase at 1.10 Å resolution(PBD code 1n55) (27). (C) Cellulase 6A at 1.11 Å resolution (PDB code 1oc7) (30). (D) Nitrophorin at 1.10 Å resolution (PDB code 1pm1). (E) β-glucosidase at 0.99 Åresolution (PDB code 1ug6). A stereoview of each of these molecules that in addition has the backbone ribbon colored by ω from 160° (red) to 200° (blue) isprovided as Fig. S5. Those images provide additional visualization of the lack of correlation of ω-variations and active sites.

452 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1107115108 Berkholz et al.

Page 5: Nonplanar peptide bonds in proteins are common and conserved

planarity are discovered and the effects of secondary structure aremore fully worked out, these can be incorporated into futuremore general “context-dependent” restraint libraries.

Finally, the prevalence of widespread and substantial devia-tions from planarity in proteins supports the view that the exqui-site packing of folded proteins is not as ideal as it appears to theeye. Instead, folded protein structures are filled with hiddenstrain (6) and are a dynamic ensemble of many similar energystructures that are “minimally frustrated,” but nevertheless fru-strated (32, 33).

Materials and MethodsQuantifying φ,ψ-Dependent Variations in Peptide Planarity. The φ,ψ-dependentvariations in ωwere derived in the same way as were the φ,ψ-dependent var-iations in bond lengths and angles by Berkholz, et al. (19). Briefly, a PGD (18)search of structures determined at ≤1.0 Å resolution with a maximum se-quence identity of 25% as determined by the PISCES (34) 06-18-2011 datasetresulted in 28,917 well ordered three-residue segments (from 204 proteinchains) with average main-chain, side-chain, and Cγ B-factors below 25 Å2.The systematic ω-variations are represented using a smoothing technique

called kernel density regression (SI Methods), which lacks the artifacts causedby binning.

Creation and Analysis of a Set of Extreme ω Outliers. The set of extreme ω-out-liers was created by a PGD (18) search similar to that above but using a≤1.2 Åresolution for three-residue segments with ωafter ≥ 20° from planarity (per-formed in July 2009). For each of the 66 proteins containing an ω-outlier,a BLASTP (35) search of the Protein Data Bank (SI Methods) was used to iden-tify all homologs with structures determined at 1.2 Å resolution or better.

Automated searches of the SAS (31) server were carried out using thewsSAS interface (36) (SI Methods). For each homolog, the two residuesbordering the ω-outlier (i.e., positions “0” and “þ1”) were searched forall functional annotations. The control was equivalent searches based on fiverandomly-chosen peptides in the same protein chain.

Library Availability. The ω-CDL is freely available at http://dunbrack.fccc.edu/and http://proteingeometry.sourceforge.net/.

ACKNOWLEDGMENTS. This work was supported by National Institutes ofHealth (NIH) Grant R01-GM083136 (to P.A.K.), NIH grants P20-GM76222and R01 GM84453 (to R.L.D.), and an American Heart Association (Midwestaffiliate) postdoctoral fellowship (to D.S.B.).

1. Pauling L, Corey RB (1951) Configurations of polypeptide chains with favored orienta-tions around single bonds: two new pleated sheets. Proc Natl Acad Sci USA37:729–740.

2. Corey RB, Pauling L (1953) Fundamental dimensions of polypeptide chains. P R SocLond B 141:10–20.

3. Corey RB, Branson HR, Pauling L (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA37:205–211.

4. Edison AS (2001) Linus Pauling and the planar peptide bond.Nat Struct Biol 8:201–202.5. MacArthur MW, Thornton JM (1996) Deviations from planarity of the peptide bond in

peptides and proteins. J Mol Biol 264:1180–1195.6. Karplus PA (1996) Experimentally observed conformation-dependent geometry and

hidden strain in proteins. Protein Sci 5:1406–1420.7. Schmidt A, Lamzin VS (2002) Veni, vidi, vici—atomic resolution unraveling the mys-

teries of protein function. Curr Opin Struct Biol 12:698–703.8. Esposito L, De Simone A, Zagari A, Vitagliano L (2005) Correlation between ω and ψ

dihedral angles in protein structures. J Mol Biol 347:483–487.9. Esposito L, Vitagliano L, Zagari A, Mazzarella L (2000) Pyramidalyzation of backbone

carbonyl carbon atoms in proteins. Protein Sci 9:2038–2042.10. Herzberg O, Moult J (1991) Analysis of the steric strain in the polypeptide backbone of

protein molecules. Proteins 11:223–229.11. Kang BS, Devedjiev Y, Derewenda U, Derewenda ZS (2004) The PDZ domain of synte-

nin at ultra-high resolution: bridging the gap between macromolecular and small mo-lecule crystallography. J Mol Biol 338:483–493.

12. Longhi S, Czjzek M, Lamzin V, Nicolas A, Cambillau C (1997) Atomic resolution (10 Å)crystal structure of Fusarium solani cutinase: stereochemical analysis. J Mol Biol268:779–799.

13. Sevcik J, Dauter Z, Lamzin VS, Wilson KS (1996) Ribonuclease from Streptomyces aur-eofaciens at atomic resolution. Acta Crystallagr D 52:327–344.

14. Merritt EA, et al. (1998) The 125 Å resolution refinement of the cholera toxin B-pen-tamer: evidence of peptide backbone strain at the receptor-binding site. J Mol Biol282:1043–1059.

15. Lawson CL (1996) An atomic view of the L-tryptophan binding site of trp repressor.NatStruct Biol 3:986–987.

16. Xu Q, Buckley D, Guan C, Guo HC (1999) Structural insights into the mechanism ofintramolecular proteolysis. Cell 98:651–661.

17. Dobson RCJ, et al. (2008) Conserved main-chain peptide distortions: a proposed rolefor Ile203 in catalysis by dihydrodipicolinate synthase. Protein Sci 17:2080–2090.

18. Berkholz DS, Krenesky PB, Davidson JR, Karplus PA (2010) Protein Geometry Database:a flexible engine to explore backbone conformations and their relationships to cova-lent geometry. Nucleic Acids Res 38:D320–325.

19. Berkholz DS, Shapovalov MV, Dunbrack RL, Jr, Karplus PA (2009) Conformation depen-dence of backbone geometry in proteins. Structure 17:1316–1325.

20. Tronrud DE, Berkholz DS, Karplus PA (2010) Using a conformation-dependent stereo-chemical library improves crystallographic refinement of proteins. Acta Crystallagr D64:834–842.

21. Sevrioukova IF, Li H, Poulos TL (2004) Crystal structure of putidaredoxin reductase fromPseudomonas putida, the final structural component of the cytochrome P450cammonooxygenase. J Mol Biol 336:889–902.

22. Davies GJ, et al. (1998) Snapshots along an enzymatic reaction coordinate: analysis of aretaining β-glycoside hydrolase. Biochemistry 37:11707–11713.

23. Evans PR (2007) An introduction to stereochemical restraints. Acta Crystallagr D63:58–61.

24. Shapovalov MV, Dunbrack RL, Jr (2011) A smoothed backbone-dependent rotamerlibrary for proteins derived from adaptive kernel density estimates and regressions.Structure 19:844–858.

25. Ramek M, Yu C-H, Sakon J, Schafer L (2000) Ab initio study of the conformational de-pendence of the nonplanarity of the peptide group. J Phys Chem A 104:9636–9645.

26. Parthasarathy S, Eaazhisai K, Balaram H, Balaram P, Murthy MR (2003) Structure ofPlasmodium falciparum triose-phosphate isomerase-2-phosphoglycerate complex at1.1-Å resolution. J Biol Chem 278:52461–52470.

27. Kursula I, Wierenga RK (2003) Crystal structure of Triosephosphate isomerase com-plexed with 2-phosphoglycolate at 0.83-Å resolution. J Biol Chem 278:9544–9551.

28. Jogl G, Rozovsky S, McDermott AE, Tong L (2003) Optimal alignment for enzymaticproton transfer: structure of the Michaelis complex of triosephosphate isomeraseat 1.2-Å resolution. Proc Natl Acad Sci USA 100:50–55.

29. Khan AR, et al. (1998) Lowering the entropic barrier for binding conformationally flex-ible inhibitors to enzymes. Biochemistry 37:16839–16845.

30. Varrot A, et al. (2003) Structural basis for ligand binding and processivity in cellobio-hydrolase Cel6A from Humicola insolens. Structure 11:855–864.

31. Milburn D, Laskowski RA, Thornton JM (1998) Sequences annotated by structure: atool to facilitate the use of structural information in sequence analysis. ProteinEng 11:855–859.

32. Panchenko AR, Luthey-Schulten Z, Wolynes PG (1996) Foldons, protein structural mod-ules, and exons. Proc Natl Acad Sci USA 93:2008–2013.

33. Zhuravlev PI, Papoian GA (2010) Functional versus folding landscape: the same yetdifferent. Curr Opin Struct Biol 20:16–22.

34. Wang G, Dunbrack RL, Jr (2005) PISCES: recent improvements to a PDB sequence cul-ling server. Nucleic Acids Res 33:W94–98.

35. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment searchtool. J Mol Biol 215:403–410.

36. Talavera D, Laskowski RA, Thornton JM (2009)WSsas: a web service for the annotationof functional residues through structural homologues. Bioinformatics 25:1192–1194.

Berkholz et al. PNAS ∣ January 10, 2012 ∣ vol. 109 ∣ no. 2 ∣ 453

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY