on the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been...

12
On the validation of crystallographic symmetry and the quality of structures Jimin Wang* Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520 Received 4 September 2014; Revised 20 October 2014; Accepted 24 October 2014 DOI: 10.1002/pro.2595 Published online 29 October 2014 proteinscience.org Abstract: In 2008, Zwart and colleagues observed that the fraction of the structures deposited in the PDB alleged to have “pseudosymmetry” or “special noncrystallographic symmetry” (NCS) was about 6%, and that this percentage was rising annually. A few years later, Poon and colleagues found that 2% of all the crystal structures in the PDB belonged to higher symmetry space groups than those assigned to them. Here, I report an analysis of the X-ray diffraction data deposited for this class of structures, which shows that most of the “pseudosymmetry” and “special NCS” that has been reported is in fact true crystallographic symmetry (CS). This distinction is important because the credibility of crystal structures depends heavily on quality control statistics such as R free that are unreliable when they are computed incorrectly, which they often are when CS is misi- dentified as “special NCS” or “pseudosymmetry”. When mistakes of this kind are made, artificially low values of R free can give unjustified confidence in the accuracy of the reported structures. Keywords: pseudosymmetry; special noncrystallographic symmetry; R free values; reliability of reli- ability index; symmetry R-factor; merging R-factor; measurement R-factor; crystallographic R-factor; symmetry downshifting; local minimum; refinement statistics Introduction Phase retrieval is a classic inverse problem because the phases associated with the Bragg reflections in a diffraction pattern, which are not directly measured, are fully encoded in their amplitudes. Phases can be retrieved from the diffraction patterns of the crys- tals of any continuous, three-dimensional object by computation as long as: (i) diffraction amplitudes are accurately measured with a signal-to-noise ratio of over 20, and (ii) there is 2.6-fold oversampling in reciprocal space (for review and additional informa- tion, see Palatinus and online supporting materials, OSM, Supporting Information Fig. S1). 1,2 This level of oversampling occurs naturally in macromolecular crystals that have an unordered solvent content of 65%, or greater. A survey of the protein data bank (PDB) carried out in April 2014 demonstrated that the average solvent content for all macromolecular crystals is about 50%, which is not much less than 65%, and also showed that about 40% of all entries are reported to have at least twofold noncrystallo- graphic symmetry (NCS), which should also contrib- ute to oversampling. Thus, it is possible that the majority of the structures in the PDB could have been solved starting with measured amplitudes only. Even though none of the structures in the database was in fact determined this way, complete a priori phase retrieval using NCS averaging has been suc- cessfully carried out for a number of test cases. 3–7 Thus, it would be interesting to know if NCS really is as prevalent in macromolecular crystals as the data in the PDB might lead one to believe, and this is why the work described below was undertaken. In 2008, using coordinates available in the PDB repository, Zwart et al. calculated the root-mean- square deviations of the Ca coordinates of molecules Additional Supporting Information may be found in the online version of this article. Grant sponsor: National Institutes of Health; Grant number: P01 GM022778; Grant sponsor: Steitz Center for Structural Biology, Gwangju Institute of Science and Technology, Republic of Korea. *Correspondence to: Jimin Wang; Department of Molecular Bio- physics and Biochemistry, Yale University, New Haven, Con- necticut 06520. E-mail: [email protected] Published by Wiley-Blackwell. V C 2014 The Protein Society PROTEIN SCIENCE 2015 VOL 24:621—632 621

Upload: others

Post on 31-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

On the validation of crystallographicsymmetry and the quality of structures

Jimin Wang*

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520

Received 4 September 2014; Revised 20 October 2014; Accepted 24 October 2014DOI: 10.1002/pro.2595

Published online 29 October 2014 proteinscience.org

Abstract: In 2008, Zwart and colleagues observed that the fraction of the structures deposited inthe PDB alleged to have “pseudosymmetry” or “special noncrystallographic symmetry” (NCS) was

about 6%, and that this percentage was rising annually. A few years later, Poon and colleagues

found that 2% of all the crystal structures in the PDB belonged to higher symmetry space groupsthan those assigned to them. Here, I report an analysis of the X-ray diffraction data deposited for

this class of structures, which shows that most of the “pseudosymmetry” and “special NCS” that

has been reported is in fact true crystallographic symmetry (CS). This distinction is importantbecause the credibility of crystal structures depends heavily on quality control statistics such as

Rfree that are unreliable when they are computed incorrectly, which they often are when CS is misi-

dentified as “special NCS” or “pseudosymmetry”. When mistakes of this kind are made, artificiallylow values of Rfree can give unjustified confidence in the accuracy of the reported structures.

Keywords: pseudosymmetry; special noncrystallographic symmetry; Rfree values; reliability of reli-ability index; symmetry R-factor; merging R-factor; measurement R-factor; crystallographic R-factor;

symmetry downshifting; local minimum; refinement statistics

Introduction

Phase retrieval is a classic inverse problem because

the phases associated with the Bragg reflections in a

diffraction pattern, which are not directly measured,

are fully encoded in their amplitudes. Phases can be

retrieved from the diffraction patterns of the crys-

tals of any continuous, three-dimensional object by

computation as long as: (i) diffraction amplitudes

are accurately measured with a signal-to-noise ratio

of over 20, and (ii) there is 2.6-fold oversampling in

reciprocal space (for review and additional informa-

tion, see Palatinus and online supporting materials,

OSM, Supporting Information Fig. S1).1,2 This level

of oversampling occurs naturally in macromolecular

crystals that have an unordered solvent content of

65%, or greater. A survey of the protein data bank

(PDB) carried out in April 2014 demonstrated that

the average solvent content for all macromolecular

crystals is about 50%, which is not much less than

65%, and also showed that about 40% of all entries

are reported to have at least twofold noncrystallo-

graphic symmetry (NCS), which should also contrib-

ute to oversampling. Thus, it is possible that the

majority of the structures in the PDB could have

been solved starting with measured amplitudes only.

Even though none of the structures in the database

was in fact determined this way, complete a priori

phase retrieval using NCS averaging has been suc-

cessfully carried out for a number of test cases.3–7

Thus, it would be interesting to know if NCS really

is as prevalent in macromolecular crystals as the

data in the PDB might lead one to believe, and this

is why the work described below was undertaken.

In 2008, using coordinates available in the PDB

repository, Zwart et al. calculated the root-mean-

square deviations of the Ca coordinates of molecules

Additional Supporting Information may be found in the onlineversion of this article.

Grant sponsor: National Institutes of Health; Grant number: P01GM022778; Grant sponsor: Steitz Center for Structural Biology,Gwangju Institute of Science and Technology, Republic ofKorea.

*Correspondence to: Jimin Wang; Department of Molecular Bio-physics and Biochemistry, Yale University, New Haven, Con-necticut 06520. E-mail: [email protected]

Published by Wiley-Blackwell. VC 2014 The Protein Society PROTEIN SCIENCE 2015 VOL 24:621—632 621

Page 2: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

supposedly related by NCS from average structures

that were obtained from the same data assuming that

the symmetry operations relating them are actually

CS. They concluded that 6% of all structures in the

PDB ought to be considered as having “special NCS” or

“pseudosymmetry,” and noted that the percentage of

such structures reported each year was increasing.8 In

addition, 15 years earlier, Wang and Janin had shown

that the NCS axes being reported in the PDB tended

to be either parallel or orthogonal to a unit cell axis.9

Taken together, these observations suggested that at

least some of the claims of NCS that have been made

might be spurious.

Shortly after the study of Zwart and colleagues

was published, another report appeared indicating

that there is indeed a widespread problem with NCS

and CS in the PDB. Poon and colleagues found that

up to 2% of all the crystal structures in the PDB

belonged to higher symmetry space groups than the

ones to which they were assigned by their authors.10

Their study focused on whether one could detect

“missed symmetry” by analyzing the coordinates of

already determined structures, trace any problems so

identified back to the X-ray diffraction data, and then

correct them. However, given the large differences

that exist between the calculated and observed ampli-

tudes of structure factors for crystal structures in the

PDB,11 this kind of coordinate-based analysis is likely

to underestimate the extent of the special NCS prob-

lem because when CS is treated as NCS, experimental

errors are certain to make coordinates diverge that

should be identical by reason of symmetry.

Here, I report the result of an analysis of the X-

ray diffraction data in the PDB,12 which confirms

that there is reason for concern about the way crys-

tal symmetry is being treated by macromolecular

structural biologists. Unlike the earlier studies of

Zwart et al. and Poon et al.,8,10 the analysis dis-

cussed below focused on how widespread the phe-

nomenon of missed symmetry might be, and the

deleterious effects it can have on structure quality.

It is clear that at least in some cases, the symmetry

in question was not missed. It was instead treated

as NCS for the purposes of refinement in the belief

that it would lead to an improvement in the quality

of the structures that emerged, which is not the

case. It is also clear that the problems created by

missed symmetry cannot be addressed using techni-

ques based on quality control statistics such as Rfree,

the crossvalidation (CV) statistic introduced in 1992

on which so much reliance is placed today.13

Results

Special noncrystallographic symmetry is often

crystallographic symmetryNCS R-factors (Rsymm) were computed from observed

intensities for all low-symmetry space group entries

in the PDB database using XPREP from the Shelx

suite, and the orientations of NCS axes were deter-

mined using MolRep from CCP4.12,14–16 It was found

that in most cases where NCS has been reported,

and Rsymm between NCS-related reflections is (less

than 15% and) often smaller than expected given the

value quoted for <I/rI> (Table I, Fig. 1, OSM, Sup-

porting Information Table S1–S3). Thus, the data do

not support the hypothesis that the NCS reported for

most of these crystals is anything other than CS.

Included among the many entries that appear to

have been incorrectly identified as having NCS are

219 of the 3003 P1 entries in the database (Table I).

The mean value of the averaged <I/rI> for these

entries is 12.1 6 6.2, suggesting that the data for

most of them was quite weak. However, the mean

value of Rsymm for the NCS-related reflections of

these entries is 0.071 6 0.032, which is smaller than

that expected, given the average value of <R(sigma)>

for these data, which is 0.083 [R(sigma) is <rI/I> and

can sometimes be approximated as <rI>/<I>]. Thus

poor data quality does not explain why so many of

these crystals were assigned to the space group P1.

A strong hint about what has been going on is

to be found in the fact that the data associated with

a noticeable fraction of the P1 entries in the PDB,

plus all other low-symmetry entries, and even some

P212121 entries (and beyond, see below), have Rsymm

values of zero when the NCS it is claimed they

Table I. Number of Structures with Misassigned SpaceGroupsa

Space group Entriesb Total Percentage (%)

P1 219 3003 7.3P2 14 152 9.2P21 563c 12,345 4.6C2 187d 7705 2.4P3 10 547 1.8P31 15 521 2.9P32 29 494 5.9P212121 133e 17,965 2.2P212121 from

Tetragonal or Cubicf106 1823 5.8

P212121 from Hexagonalg 7 2034 0.3

a An analysis was carried out with entries released prior tomid-May 2014. Whereas this analysis focused on low sym-metry space groups, higher symmetry space groups are notimmune to the problems identified here.b Entries with Rsymm less than 0.15 in higher symmetryspace groups are counted.c Break-down is: 530 orthorhombic, 23 tetragonal, 9 hexag-onal, and 1 cubic.d Break-down is: 101 orthorhombic, 33 rhombohedral, 24hexagonal, 15 tetragonal, and 1 cubic, and a few otherchoices.e Break-down is: 128 tetragonal and 5 cubic.f Crystals having unit cell lengths within 2.5% of thoseexpected either for tetragonal or cubic were selected foranalysis.g Crystals having unit cell lengths within 2.5% of thoseexpected for hexagonal were selected for analysis.

622 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?

Page 3: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

display is treated as CS for purposes of data process-

ing (Fig. 1). The only way this could happen is if

those data sets were processed initially in a higher

symmetry space group, and then deliberately

expanded into P1 without reprocessing, which is

totally unacceptable.

The special noncrystallographic symmetryproblem is widespread

The criterion used above to assess whether special

NCS is actually CS, namely that Rsymm is smaller

than the value expected on the basis of the <I/rI>

value of the data, probably underestimates the

extent of the special NCS problem in the database.

It is instructive in this regard to analyze the orien-

tations of NCS axes (Supporting Information Table

S1, Fig. 2). In P1, the c* axis is the shortest unit cell

vector in the reciprocal space, and it corresponds to

the longest real-space unit cell vector, c. When an

axis of NCS is almost perfectly parallel or orthogo-

nal to the c* axis, and the peak in the self-rotation

function (SRF) corresponding to this NCS is suffi-

ciently high relative to the origin peak, it is highly

probable that the axis of the alleged NCS will be

perfectly aligned from one unit cell to the next

throughout the entire crystal lattice. If this is so,

then the NCS will certainly be CS. The NCS axes of

about one-third (34.6%) of all P1 entries in the PDB

have this property.

It is not easy to decide how high the SRF peak

attributed to special NCS must be to make it reason-

able to conclude that it is, in fact, CS. This is because

CS peaks in SRF maps are often substantially

weaker than origin peaks when real CS is ignored

during data processing, especially when the data are

weak. The extent of that reduction in strength is a

function of multiplicity number of the NCS. For

example, an SRF peak associated with threefold NCS

is likely to be weaker than a twofold NCS peak.

Nevertheless, a crude estimate of the extent of the

special NCS problem can be obtained using the crite-

rion that the height of the SRF peak (relative to the

origin peak) assigned to NCS should be at least one

standard deviation above the mean value within a

given class. For example, the mean SRF peak height

for all P1 entries is 0.45 6 0.25 (Fig. 2), including all

possible kinds of NCS. Thus, if a special NCS SRF

peak is higher than 0.70, there is a high probability

that it really represents a CS operation. This class of

structures includes 17.5% of all P1 entries.

Figure 1. Distribution of Rsymm (in fraction unit with arbitrary 0.15 cutoff) for X-ray data in reported low-symmetry space groups

(P1, P21, C2) and P212121 in the database as a function of 1/(<I/rI>). Entries with Rsymm of zero are distribution along the hori-

zontal axis. The red lines are the merging statistics expected from the given <I/rI> value of each data set. Data sets below

these lines can definitely be scaled in higher symmetry space groups. Since Rsymm does not take multiplicity into account and

is often overestimated, the majority of the data sets should be scaled in higher symmetry space groups. When multiple higher

symmetry space groups are possible, only the best one is used in the plots shown.

Wang PROTEIN SCIENCE VOL 24:621—632 623

Page 4: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

When this analysis was carried out on the sub-

group of P1 structures for which cyclic twofold rota-

tional NCS was reported (i.e., excluding all other

types of NCS), the mean SRF peak height proved to

be 0.64 6 0.18. Thus, 11.3% of the data sets in this

class have an SRF peak height greater than 0.82,

which could be indicative of an invalid P2 to P1 con-

version. When this kind SRF peak analysis is done

on all the P1 entries (including ones without any

NCS), the mean SRF peak height drops to

0.35 6 0.25, and 18.6% of them are found to have

special NCS SRF peak heights higher than 0.60.

These estimates of the extent of the special NCS

problems in the PDB are broadly consistent with the

two studies discussed earlier.8,10 It is a significant

problem (Supporting Information Table S1).

Systematic absences are as useful for determin-

ing the true crystal symmetry as Rsymm and SRF

analysis even when data are very weak. Systemati-

cal absences are a property of the entire crystal lat-

tice whereas NCS is a local property of individual

asymmetric units. For example, when a structure is

refined in P1 that has special P212121 NCS, it is

highly unlikely that the refined structure that

emerges will predict the systematic absences charac-

teristic of crystals that belong to the space group of

P212121. Although translational NCS may exist, it

can at most generate pseudosystematical absences

along a single axis, which often break down rapidly

with increasing resolution. The probability of

encountering a crystal that has three orthogonal

sets of translational NCS characterized by perfect

systematic absences along all three major axes is

zero for all intents and purposes. Examination of the

data of the P1 space group entries containing so-

called pseudo P212121 NCS reveals that the system-

atic absences called for by P212121 CS are often

unmistakably clear in the data. It follows that most

of the structures solved in P1 that are alleged to

involve special P212121 NCS actually belong to the

space group P212121.

On the intensity differences that distinguish

crystallographic symmetry fromnoncrystallographic symmetry

To address whether Rsymm statistics are sensitive

enough to distinguish between special NCS and CS, a

simulation was carried out in which CS was converted

computationally into NCS by tilting a true CS rota-

tion axis in the unit cell of a crystal with respect to

the axes of that unit cell. The homodimeric structure

3R5G,17 which was solved at 1.5 A resolution in the

space group P2 (a 5 51.36 A, b 5 36.31 A, c 5 94.76 A,

b 5 98.76�), was used for this simulation (for this

structure, the SRF peak for the two nondyad-related

subunits in the original asymmetric unit is only 11%

of the origin peak). The contents of the P1 unit cell

content were generated from the published structure

using the CS twofold operation at (1/2, 0, 0), and the

Bragg reflection intensities were computed for the

crystal that resulted when this new local dyad was

tilted by 1�, 2�, 3�, 4�, 5�, and 10� toward either the a*

or c* axis of the unit cell.16 This tilting operation, of

course, converted what was previously a CS dyad into

an NCS dyad. The calculated intensities in P1 for

(hkl) reflections were then compared with those of

(-h,k,-l) reflections, which would be identical if the

NCS-so-generated was actually CS. Figure 3 shows

the results. A 1� difference between the orientation of

the twofold NCS axis and a unit cell axis can result in

an intensity Rsymm of over 18% in the 5-A resolution

shell, which is far greater than the merging R-factor

statistics Rmeas are likely to be at such low resolution

(Rmeas� 5%).6 Thus, even small deviations in axis of

orientation from true rotational CS are readily

detected using Rsymm.

Figure 2. Distribution of the orientations of SRF peaks in h value in all P1 entries with only one top peak per entry. (a) The

majority of orientation with h value is parallel (180�) or orthogonal (90�) to c* axis are overlaps in the plot so that single dots

may represent as many as 100 entries as the range of high SRF peaks. (b) Histogram of the distribution. Black line is the counts

for h value being parallel (180�) or orthogonal (90�) to c* axis and red line is the counts for h value elsewhere.

624 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?

Page 5: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

On the effects of expanding data usingcrystallographic symmetry

The practice of expanding high symmetry data to

lower symmetry for the purpose of structure refine-

ment has become widespread in the last decade

(Supporting Information Table S1). The justifications

most commonly given for doing so are: (i) that Rfree

values fell as a result, and (ii) that composite-omit,

simulated-annealing procedures produced better

maps in P1 or some other reduced symmetry space

group.18–22 Neither is valid for reasons that will

shortly be explained.

Many of the problems that arise when the true

symmetry of crystals is not respected stem from the

effects this practice has on data statistics. If the

redundancy that results from high symmetry is not

taken into account, the merging statistics obtained

for a data set will not only underestimate its true

quality (in which case Rpim is more appropriate,

which includes both redundancy and multiplicity)

but will also lead to increases in the level of noise in

the electron density maps that ultimately emerge.

For example, the <I/rI> value for the data associ-

ated with the 4HYO structure, which is one of the

many characterized by missed symmetry, increases

by �2.8-fold when the data originally processed in

P1 are eightfold averaged in the correct space group,

which is P4212.20

Problems caused by symmetry downshifting

were first addressed over a decade ago,23 and since

then many other examples of this practice have

appeared in the literature. Several have been pro-

vided by a series of structures that have been

reported for crystals of the Thermus thermophilus

RNA polymerase (TTHRNP) complex.24–34 One of

the crystal forms of this molecule has data-scaling

statistics, systematic absences, and molecular-

replacement solutions that are consistent with space

group P65.24–26 Nevertheless, the authors of all the

structures obtained with this crystal form (e.g.

1SMY) expanded data sets that had been processed

in P65 into P32 for the purposes of structure refine-

ment, and attributed the additional symmetry appa-

rent in the data to perfect merohedral twinning with

the twinning operation parallel with the 32 axis.24–26

The rationale given was that, after molecular

replacement, rigid-body refinement in P65 led to a

reduction in Rfree from 50.0% to only 45.3%, whereas

rigid-body refinement done in P32, assuming perfect

merohedral twinning, reduced the Rfree value to

38.7%, and therefore the latter refinement had to be

correct.26 Similar arguments were made recently by

a different group to justify the correctness of molecu-

lar replacement solutions in the presence of NCS

even though such solutions they obtained thereby

could not be further refined.35 In fact, the mistake

made by these authors, like many others, is to give

more credence to differences in Rfree values that

they deserve. Evans and Murshudov have shown

that in the presence of perfect merohedral twinning,

a structure with an Rfree value of 29.1% can be com-

pletely wrong (OSM)!36

There are problems with the 1SMY structure

that emerged from the refinements mentioned above

that should have been enough to call it into ques-

tion.24–26 For one thing, it does not explain the pat-

tern of systematic absences along the screw axis

that are clearly evident in data (Supporting Infor-

mation Fig. S2). In addition, perfect merohedral

twinning of a P32 crystals along its threefold axis

can only result in apparent symmetry of P62 or P64,

but never P65! Furthermore, there are so many diva-

lent cations included in the structure that at neutral

pH, the charge per enzyme molecule would be

11,856.

Given the relatively modest resolution of this

structure (2.71 A), the addition of 9031 geometrically

unconstrained water molecules (20% of all the atoms

Figure 3. Intensity change caused by tilting a CS axis to convert it an NCS axis. (A) Dyad tilts toward x axis by 1�, 2�, 3�, 4�,

5�, and 10� (black, red, green, blue, dark brown, and brown, respectively). (B) Dyad tilts toward z axis. Vertical axis is fractional

intensity change and horizontal axis is reciprocal resolution (A21).

Wang PROTEIN SCIENCE VOL 24:621—632 625

Page 6: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

in the model!), together with 1897 unconstrained

Mg21 ions is likely to have resulted in severe overfit-

ting of the data. However, it did not produce the

increase in Rfree values it should have because the

selection of reflections for the CV set was done incor-

rectly. The correct space group for this TTHRNP

crystal form is P65 as initially identified by the origi-

nal authors (Supporting Information Table S4).2626

The data from all closely crystal structures display

varying degrees of mild twinning (e.g., about 4.4%

in 3DXJ) with the twin axis orthogonal to the P65

axis in such a way that the selection of reflections

for CV has to be made assuming that the higher

symmetry space group is P6522, and then expanded

back to the correct, lower symmetry space group of

P65 to obtain the correct structure (Supporting

Information Table S4). This was not done.24–26

To further illustrate why the methods used for

both CV reflection selection and refinement of the

TTHRNP structures in question (i.e., data-expansion

with added twinning) are invalid, a computational

experiment was done in which the original CV set

was kept, and a molecular replacement solution for

the TTHRNP data was sought using as a starting

model the structure of a smaller, completely unre-

lated polymerase (RB69 DNA polymerase). The

molecular centers of the two models were aligned to

generate a plausible model for the packing of the

RB69 DNA polymerase structure in the TTHRNP

unit cell, and the resulting model was then refined.

During rigid-body refinement, Rfree values fluctuated

more than 10% from one refinement step to the

next, and it decreased additionally by more than 7%

(to about 49% for the data between 40.0 and 4.0 A

resolution) during atomic refinement. These reduc-

tions in Rfree had nothing to do with the validity of

the molecular model being used, which was unre-

lated to the crystals in question, but spoke only to

the fact that after refinement, the symmetry of the

calculated amplitudes now approximately repro-

duced the symmetry of the data. Thus, reductions in

Rfree values obtained by refining molecular replace-

ment models should not be interpreted as proof that

a correct solution has been found, especially when

that solution cannot be refined further (or when

Rfree values are above 42.3%, see OSM). Specific

examples of other problems that have been caused

by the incorrect assignment of space groups may be

found elsewhere.8,37

On the effects of treating crystallographic

symmetry as noncrystallographic symmetry:

A test caseThe practical consequences of refining a structure of

high symmetry in a lower symmetry space group

were further explored by refining the same crystal

structure under strictly identical conditions using

data for that structure that had been processed in

the correct space group, and the same data proc-

essed in a reduced symmetry space group. 4HYO

was chosen for this computational experiment,20 for

the two following reasons. First, the noise in the

data set was relatively small because the resolution

was quite high, �1.65 A. Second, the data clearly

displays P4212 symmetry, which means that the dif-

ference in the quality of the data processed between

P1 and P4212 is magnified because of the eightfold

averaging that occurs when they are processed in

P4212 (Table II).

Three parallel structure refinement runs were

carried out (see Materials and Methods). First, the

structure was refined against the 4HYO data after

the P1 data had been eightfold averaged to create a

P4212 data set, and reflections had been selected for

the CV set in a manner that respected the P4212

symmetry of the data (the “P4212” run). Second, it

was also refined against the original 4HYO P1 data

using reflections for the CV set chosen by the original

authors in P1 using the thin slice method of constant

resolution shells (the “P1-P1” run).20 Third, it was

refined against the 4HYO P1 data using the CV

reflections that were used for the P4212 run, followed

by expansion into P1 (the “P1-P4212” run). The same

protein-only 4HYO model was used as the starting

model for all three runs. After symmetry downshift-

ing, the number of atoms in the asymmetric unit

increased about eightfold but the number of observa-

tions increases only about sevenfold because about

12% data in P4212 are centric but none are centric in

P1 (Table III). The reduced observation-to-atom ratio

in the symmetry-downshifted model will presumably

lower both free and working R-factor values. As

expected, after the first two passes of refinement (a

total of 40 cycles, see Material and Methods), the

P4212 run had the highest Rfree value of the three by

more than 0.5% (Table III). Thus, if Rfree values had

been used at this stage to decide which refinement

strategy to continue pursuing, one might have

stopped the P4212 run. However, after all three

refinements converged fully, the Rfree value for the

P4212 run was 16.4%, but it was 17.5% for the P1-

P4212 run and 16.8% for the P1-P1 run (Table III).

Table II. Data Reprocessing Statistics of 4HYO inP4212 Space Group

Resolution(A) <I/rI> Rsymm Rmeas

99–3.55 49.0 0.016 0.0183.55–2.82 60.0 0.019 0.0202.82–2.46 60.4 0.023 0.0252.46–2.24 54.9 0.031 0.0332.24–2.08 46.6 0.041 0.0442.08–1.96 36.8 0.053 0.0571.96–1.86 22.6 0.082 0.0891.86–1.78 13.6 0.132 0.1431.78–1.71 10.4 0.163 0.1761.71–1.65 6.1 0.256 0.277Overall 46.1 0.025 0.027

626 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?

Page 7: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

Because CS averaging reduced the levels of noise

in the data used for the P4212 run, the quality of the

electron density map that emerged from this run was

superior to those obtained in the other two runs, par-

ticularly in the vicinity of both the N and C-termini of

the protein. For the same reason, the Ramachandran

statistics of the product of the P4212 refinement are

superior to those obtained in the two P1 refinements,

even though the starting model for all three was iden-

tical and had perfect geometry, and even though

same manual editing was done to correct geometrical

errors introduced by refinement (Table III).

If P4212 is, in fact, the right space group to use for

this structure, why did the P1-P1 run produce such a

“good” result as measured by Rfree? This question can

be answered by comparing the results of the P1-P1 run

and the P1-P4212 run. Even though the converged

Rfree value for the P1-P1 run is significantly better

than that of the P1-P4212 run, their converged work-

ing R-factor distributions are nearly identical (Fig. 4).

This observation suggests that the reason the P1-P1

refinement has a lower Rfree value is that reflections in

its CV set are not fully independent of the reflections

in the working set, which is not the case for the CV set

used for the P1-P4212 run (OSM).

Symmetry downshifting can have consequences

for the resulting structures that are biochemically

significant. The structure 4HYO discussed above is

closely related to 4HZ3.20 The channel blocker tetra-

ammonium antimony was included in the crystalliza-

tions that led to 4HZ3, but not in the crystallizations

that resulted in 4HYO. The authors reported that

bound tetra-ammonium antimony could be visualized

in the 4HZ3 structure, but only if the data were proc-

essed in P1.20 However, an isomorpohous difference

Fourier map between the observed 4HZ3 and 4HYO

data provides no evidence whatsoever showing that

the channel blocker in question is bound in the 4HZ3

structure (Fig. 4, OSM). Phases for these maps were

calculated from the final complete 4HYO model that

emerged from the P4212 run described above (which

has a Rwork of 9.1% at 1.65-A resolution, Table III).

Experimental errors in the data for other structures of

this kind that have resolutions lower than that of

4HYO are likely to be much higher that those in

4HYO, and thus the errors introduced into the result-

ing structures by symmetry downshifting are likely to

be greater, and every bit as likely to lead to misleading

conclusions about biochemically significant issues.

Better structures can often be obtained whenCS is properly taken into account

Two of the three P1 structures deposited prior to

2005 that are listed in Supporting Information

Table III. Refinement Statistics of 4HYO in P4212 and in P1

“P4212” “P1-P1” “P1-P4212”

Resolution range (A)a 45–1.65(1.69–1.65) 45–1.65(1.69–1.65) 45–1.65(1.69–1.65)Number of observations 10,760 75,462 75,584Final refinement statisticsNumber of atoms 917 6,483 6,597Observation/atom Ratio, RO2A 11.7 11.6 11.5Rwork (%) 9.1(10.2) 11.4(16.7) 11.2(17.6)Rfree (%) 16.4(16.2) 16.8(21.1) 17.5(23.6)RO2A/Rwork

b 128.6 101.7 102.3rmsd Bond length (A) 0.0087 0.0082 0.0091Ramachandron Plotc

Preferred 76 residues (96.2%) 627 (95.6%) 602 (94.2%)Generously allowed 1 residue (1.3%) 12 (1.8%) 14 (2.2%)Disallowed 2 residues (2.5%) 17 (2.6%) 23 (3.6%)Initial R-factor statistics of protein-only models with bulk solvent correctiond

Rwork (%) 20.98 20.53 20.57Rfree (%) 22.64 22.50 21.45R-factor statistics after two passes of refinement without manual editingd

Rwork (%) 20.1(19.1) 19.7(22.7) 19.6(23.3)Rfree (%) 24.2(24.1) 23.5(25.8) 23.3(25.8)

a The highest resolution shell and their statistics are in parenthesis.b When RO2A/Rwork is used as a measure for global goodness of fit, the model refined in the correct P4212 space group isabout 30% better than those in the reduced P1.c Increased outliers in Ramachandron plot in P1 relative to P4212 were clearly due to reduced quality of resulting electrondensities, particularly for residues at both N and C termini where all outliers are located.d R-factor statistics for the same protein-only model with symmetry expansion (for P1) and with bulk solvent correction atthe start of refinement and after the second pass of refinement each with 20 cycles. The first pass was refined withoutNCS restraints and the second pass with NCS restraints in P1. The lowered R-factor values for the “P1-P4212” and “P1-P1”runs (particularly the “P1-P4212” run) before any refinement than those for the P4212 run was due to reduced multiplicityweights in R-factor calculations in P1 for the pseudocentric reflections whose reflections typically have higher R-factors.Once refinement was carried out in P1, all pseudocentric reflections became acentric, when NCS axes were notconstrained.

Wang PROTEIN SCIENCE VOL 24:621—632 627

Page 8: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

Table S1 as likely to belong to a higher symmetry

space group (1JXO and 1EGW) were rerefined in

that higher symmetry space group to further assess

the impact that CS-to-NCS conversions might have

on structural quality (all other entries in Supporting

Information Table S1 were deposited after

2005).38,39 Re-examination of the data on file for

1JXO revealed that the crystal on which it is based

belongs to P21; the systematic absences along the

screw axis are unmistakably clear.38 When the P1

data were remerged into P21, the quality of the

data for rerefinement in P21 improved, and with the

Figure 4. Rerefined 4HYO structure and its relationship to 3LDC and 3HZ3 structures. (a) Packing of tetrameric K1 channel in

P4212 unit cell. (b) Isomorphous amplitude differences (fractional unit) between the observed 4HYO and 3LDC data (black) and

between 4HYO and 4HZ3 (red) as a function of reciprocal resolution (A21, see text). (c) Final working (solid lines with open sym-

bols) and free (dashed lines with filled symbols) R-factor distribution as a function of reciprocal resolution for three parallel

refinement runs, (i) the “P4212” run (black), (ii) the “P1-P1” run (red), and (iii) the “P1-P4212” run (green). (d) Initial R-factors after

the second pass of refinement. (e) The observed Fobs(3LDC)-Fobs(4HYO) difference Fourier maps contoured at 13.0 r. (f) The

observed Fobs(4HZ3)-Fobs(4HYO) difference Fourier maps contoured at 14.0 r. See OSM for additional discussion.

628 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?

Page 9: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

CV set chosen appropriately, all of the 42 residues

(14% of the total residues) missing from the original

1JXO structure, which was refined in P1, plus 39

missing side chains (13%), were clearly evident in

Fobs-Fcalc difference Fourier maps (Fig. 5, Support-

ing Information Tables S5 and S6). The refined

structure that emerged has working and free R-

factors of 17.3 and 23.6%, respectively. The corre-

sponding value for the original P1 structure are

22.0 and 26.4%, even though the CV set used was

incorrectly chosen.38

Examination of the data for 1EGW shows that

its crystal too belongs to P21.39 However, while the

systematic absences along the twofold screw axis are

obvious upon inspection of the reported intensities,

they are much less clear from I/rI ratios for

unknown reasons (Supporting Information Fig. S3).

The structure obtained by rerefining the 1EGW data

in P21 has working and free R-factors of 14.2 and

18.3%, respectively, as compared to the original P1

structure for which the corresponding values are

20.6 and 22.9% (Supporting Information Figs. S4

and Supporting Information Tables S7 and S8). In

that case, the CV set used for the refinement of the

original structure in P21 was correctly selected, and

thus the same set was kept for its rerefinement.

Eight other structures were rerefined in higher

symmetry space groups using properly selected CV

reflection sets, and with only one exception, the

structures obtained for all of them were statistically

better than those on deposit in the PDB, which

firmly supports the conclusion that better structures

result when data are processed in the correct

highest-possible symmetry space group (Supporting

Information Table S2). As most of these structures

were solved recently using the same or similar ver-

sions of the refinement programs used here to rere-

fine them, the improvements in structure quality

obtained are likely to be ascribable both to the

improvement in data quality that resulted from

merging them in the correct, higher symmetry space

group, and to the use of a properly selected CV set.

This conclusion is strongly supported with the

results of three parallel refinement runs for 4HYO

described above.

It should be noted that comparison of the Rfree

values of these rerefined structures with those origi-

nally reported may be unfair given the fact that the

reflections in the CV sets used were incorrectly

selected in P1 or reduced symmetry space groups

(OSM). In such cases, comparison of corresponding

Rwork values might be more appropriate. For exam-

ple, the newly rerefined P4212 4HYO structure has

a working R-factor of 9.1% (Table III), whereas the

working R-factor for the original P1 4HYO structure

was 16.5% (Supporting Information Table S2). Addi-

tionally, it is noted that the 1JXO data set has <I/

rI> of 12.9 in the highest resolution shell and a

value of Rfree in that shell that is less than the over-

all Rfree value for the structure (22.7 vs. 23.6% (Fig.

5, Supporting Information Tables S5 and S6). Simi-

larly, the 1EGW data set also has very high I/rI

value (9.31) and a very low Rfree value (18.4%) in

the highest resolution shell (Fig. 5, Supporting

Information Tables S7 and S8).

DiscussionTreating CS as NCS for the purpose of refinement is

invalid for two reasons. First, when higher symmetry

Figure 5. Rerefined 1JXO structure and refinement statistics.

(A) The complete structure from N to C in rainbow colors. (B)

Superposition with the original 1JXO structure (gray) shows

locations of four gaps in the original structures and other dif-

ferences in termini. (C) R-factor distributions for both 1EGW

(black) and 1JXO (red) rerefined structures. It should be

noted that R-factors for high-resolution data were actually

smaller than low-resolution data due to an excessive trunca-

tion of high-resolution data.

Wang PROTEIN SCIENCE VOL 24:621—632 629

Page 10: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

data are expanded into lower symmetry space groups,

the coordinates of “NCS-related” molecules that ought

to be identical are allowed to vary independently dur-

ing refinement, thereby increasing the number of

parameters that can be adjusted but not the number

of independent observations. This will result in a

decrease in Rfree, even when the set of reflections

used for CV is properly selected, which brings us to

the second problem. For most of these structures, the

reflections for CV were selected using invalid proce-

dures (OSM).

In addition, symmetry downshifting can

increase the noise in the electron density maps, and

thereby degrade the quality of the structures

obtained from them even though this (as well as

multicopy refinement,40 see OSM) always leads a

decrease in Rfree values, as demonstrated in this

study. Symmetry downshifting for structure refine-

ment should be discouraged, or even banned since

there is no valid basis for it. Moreover, the invoca-

tion of special NCS or unidentified CS not only com-

plicates the selection of reflections for CV, it also

reduces the effectiveness of composite-omit map cal-

culations.41,42 For example, when a crystal structure

that belongs to the space group P212121 is refined in

P1, its four identical asymmetric units are treated

as though they are independent. For every omit

block, therefore, there will be three identical unomit-

ted blocks that are related to the omitted block by

the “special P212121 NCS.” Model-bias due to the

three nonomitted blocks will completely invalidate

the resulting omit map. Because this is so,

composite-omit maps always look better in a

symmetry-downshifted space group than in the cor-

rect space group when the maps from these two pro-

cedures are compared side-by-side.22,23

Historical perspectiveIn crystallography, the term “crystal symmetry” is well

defined. Operationally, a crystal must be assigned to

the highest symmetry its data permit, given the level

of error evident in the experimental data, that is, pro-

vided Rsymm < R(sigma). The term “pseudosymmetry”

is occasionally used to describe the symmetry evident

in data sets that show peaks in their SRFs that are

aligned well enough with crystal axes to be true CS,

but fail the conventional Rsymm<R(sigma) test none-

theless. Not only is it wrong to treat CS as NCS, it can

have serious consequences for the quality of inferred

crystal structures, as is the case for the TTHRNP

structure discussed above.

Before the introduction of Rfree, in a commen-

tary on erroneous structures that shook the confi-

dence of the structural biology community, Branden

and Jones warned that (working) R-factor values of

25% or higher could be indicative of problematic

crystal structures.43 In a test case at 1.8-A resolu-

tion, it was shown that a structure obtained by

deliberately fitting its polypeptide chain backwards

through the corresponding electron density map

could be refined so that its R-factor was less than

25%.44,45 The introduction of Rfree went a long way

towards curing many of the problems Branden and

Jones identified. Nevertheless, numerous cases are

known where the use of Rfree did not prevent the

publication of seriously defective structures.46–49

Today, one might want to make a statement about

structure quality similar to the one Branden and

Jones made 2 decades ago, namely that Rfree values

of 25% or higher could be indicative of structures

that have serious problems.

As shown above, the Rfree values calculated for

structures obtained for crystals alleged to contain NCS

may have little value as measures of structure quality,

especially when the CV sets used have been chosen

improperly. Many other factors can also reduce the

objectivity of Rfree values, a subject that will be dis-

cussed elsewhere. Nevertheless, the majority the CS/

NCS problems identified here could easily be avoided,

or corrected, if necessary. All that is required is for the

community to decide collectively that this problem

needs to be addressed. An important point of this

manuscript is to deliver a call to arms to the entire

structural biology community so that the important,

but entirely correctable problems identified above get

resolved. Our scientific credibility is on the line.

A proposal for reform

Given the extent of the NCS problem discussed

here, it would be beneficial if, along with the many

other statistical tests that are already routinely

done to assess structure quality, Rsymm<R(sigma)

tests were run on all the data associated with struc-

tures for which NCS is claimed at the time of struc-

ture/data deposition.50 Equivalent tests have been

included for decades in many of the programs used

for crystallographic data processing; they enable

those programs to determine the symmetries of crys-

tals automatically. Those who choose to over-ride the

space group assignments provided by these pro-

grams should be warned that this is bad practice,

and asked either to show cause, or to take corrective

steps. If they do not, it would be appropriate that

warning flags be added to the PDB entry.

Materials and Methods

All observed structure factors were retrieved in mid-

April 2014 from the PDB,12 and converted in the

scalepack format.51 When experimentally measured

intensities (I) were absent and measured amplitudes

(F) were present, the intensities were used assuming

that I 5 F2 and rI 5 2FrF. Entries were rejected if

the corresponding standard deviation column was

missing. Intensities in each entry were then rescaled

to have maximal accuracy. Rsymm was analyzed

using XPREP from the Shelx suite.14,15 The

630 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?

Page 11: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

orientations of NCS axes were determined using

MolRep from CCP4.16 Merging intensities to appro-

priate higher symmetry space groups was done

using scalepack from HKL2000.51 Rsymm obtained

from prescaled P1 intensity data in this study differs

slightly from that of unscaled integrated intensity

data in a single-step data processing. This difference

does not affect the conclusion of this study. For rere-

finement of structures in the correct space group,

the location of the special CS dyad was determined

using least square superposition methods, and

proper origin-shift was applied to convert special

NCS back to CS whenever possible.

To minimize any differences due to different

refinement procedures, different levels of investiga-

tors’ modeling skills, and any potential bias of this

author, three parallel refinement runs were carried

out using conditions as strictly identical as possible

per suggestions of one reviewer of this manuscript.

That was as follows: (i) the “P4212” run against the

4HYO P4212 data in the correct space group with

properly selected reflections for the CV set,20 (ii) the

“P1-P1” run against the P1 data with the reflections

for the CV set selected in P1 using a thin slicing

method of resolution shells by the original authors,20

and (iii) the “P1-P4212” run against the P1 data

with the reflections for the CV set selected in P4212

and then expanded to P1. All three runs started

with the same protein-alone model. This model was

derived rerefined 4HYO model with Rwork 5 12.6%

and Rfree 5 17.5% at 1.65 A resolution at the time of

original manuscript submission after all ordered

water molecules and 1,6-hexanediol molecules were

removed.20 An initial refinement was carried out in

two passes each with 20 cycles using Refmac5.52 For

the “P4212” run, both passes had no NCS restraints

applied. For the “P1-P1” and “P1-P4212” runs, NCS

restraints were applied in the second pass but not in

the first pass. For the remaining 10 passes of struc-

ture refinement, including two passes of parallel

Ramachandron backbone editing, exactly the same

criteria were applied to locate the ordered water

molecules and 1,6-hexanediol molecules or any

errors in models. For example, all three runs used

the same 5.0 r or 4.5 r, and finally 4.0 r as cut-off

criteria in peaks and holes in the residual Fobs-Fcalc

difference Fourier maps. The three models were

then allowed to diverge only because of different

numbers of peaks or holes in residual maps at a

given selected cut-off. Results of this refinement are

summarized in Table III and Figure 4.

AcknowledgmentsThe author is indebted to Dr. Peter Moore for exten-

sively editing this manuscript to make it appropriate

for the entire structural biology community rather

than just those experts in crystallography. The author

thanks Drs. Z. Dauter, and T. A. Jones for critical

comments and useful suggestions to improve this

manuscript, and Drs. W. A. Hendrickson, B. W. Mat-

thews, T. Richmond, A. T. Brunger, W. Yang, S. Burley,

H. Berman, M. G. Rossmann, A. Horwich, M. Hoch-

strasser, and W. Meng for discussion and comments.

References

1. Miao J, Sayre D, Chapman HN (1998) Phase retrievalfrom the magnitude of the Fourier transforms of non-periodic objects. J Optic Soc Amer 15:1662–1669.

2. Palatinus L (2013) The charge-flipping algorithm incrystallography. Acta Cryst B69:1–16.

3. Braig K, Otwinowski Z, Hegde R, Boisvert DC,Joachimiak A, Horwich AL, Sigler PB (1994) The crys-tal structure of the bacterial chaperonin GroEL at 2.8A. Nature 371:578–586.

4. Kleywegt GJ, Jones TA (1999) Software for handlingmacromolecular envelopes. Acta Cryst D55:941–944.

5. Pattanayek R, Wang JM, Mori T, Xu Y, Johnson CH,Egli M (2004) Visualizing a circadian clock protein:crystal structure of KaiC and functional insights. MolCell 15:375–388.

6. Wang J, Wing, R (2014) Diamonds in the rough: astrong case for the inclusion of weak-intensity X-raydiffraction data. Acta Cryst D70:1491–1497.

7. Wang J, Hartling JA, Flanagan JM (1998) Crystal struc-ture determination of Escherichia coli ClpP startingfrom an EM-derived mask. J Struct Biol 124:151–163.

8. Zwart PH, Grosse-Kunstleve RW, Lebedev AA,Murshudov GN, Adams PD (2008) Surprises and pit-falls arising from (pseudo)symmetry. Acta Cryst D64:99–107.

9. Wang XD, Janin J (1993) Orientation of non-crystallographic symmetry axes in protein crystals.Acta Cryst D49:505–512.

10. Poon BK, Grosse-Kunstleve RW, Zwart PH, Sauter NK(2010) Detection and correction of underassigned rota-tional symmetry prior to structure deposition. ActaCryst D66:503–513.

11. Holton JM, Classen S, Frankel KA, Tainer JA (2014)The R-factor gap in macromolecular crystallography:an untapped potential for insights on accuratestructures. FEBS J 281:4046–4060.

12. Berman HM, Bhat TN, Bourne PE, Feng Z, GillilandG, Weissig H, Westbrook J (2000) The Protein DataBank and the challenge of structural genomics. NatStruct Biol 7:957–959.

13. Brunger AT (1992) Free R value: a novel statisticalquantity for assessing the accuracy of crystal struc-tures. Nature 355:472–475.

14. Sheldrick GM. 2000. XPREP Version 6.0, Bruker AXS,Inc., Madison, Wisconsin, USA.

15. Sheldrick GM. (2008) A short history of SHELX. ActaCryst A64:112–122.

16. Winn MD, Ballard CC, Cowtan KD, Dodson EJ,Emsley P, Evans PR, Keegan RM, Krissinel EB, LeslieAG, McCoy A, McNicholas SJ, Murshudov GN, PannuNS, Potterton EA, Powell HR, Read RJ, Vagin A,Wilson KS (2011) Overview of the CCP4 suite and cur-rent developments. Acta Cryst D67:235–242.

17. Topal H, Fulcher NB, Bitterman J, Salazar E, Buck J,Levin LR, Cann MJ, Wolfgang MC, Steegborn C (2012)Crystal structure and regulation mechanisms of theCyaB adenylyl cyclase from the human pathogen Pseu-domonas aeruginosa. J Mol Biol 416:271–286.

18. Guo L, Han A, Bates DL, Cao J, Chen L (2007) Crystalstructure of a conserved N-terminal domain of histone

Wang PROTEIN SCIENCE VOL 24:621—632 631

Page 12: On the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been reported is in fact true crystallographic symmetry (CS). This distinction is important

deacetylase 4 reveals functional insights into glutamine-rich domains. Proc Natl Acad Sci USA 104:4297–4302.

19. Kummel D, Krishnakumar SS, Radoff DT, Li F,Giraudo CG, Pincet F, Rothman JE, Reinisch KM(2011) Complexin cross-links prefusion SNAREs into azigzag array. Nat Struct Mol Biol 18:927–933.

20. Posson DJ, McCoy JG, Nimigean CM (2013) Thevoltage-dependent gate in MthK potassium channels islocated at the selectivity filter. Nat Struct Mol Biol 20:159–166.

21. Gourdon P, Liu XY, Skjorringe T, Morth JP, Moller LB,Pedersen BP, Nissen P (2011) Crystal structure of acopper-transporting PIB-type ATPase. Nature 475:59–64.

22. Aggarwal A, Nair D, Johnson R, Prakash L, Prakash,S(2005) Reply to Wang: Hoogsteen base-pairing inDNA replication? Nature 437:E7; discussion E7.

23. Wang J (2005) DNA polymerases: Hoogsteen base-pairingin DNA replication? Nature 437:E6–7; discussion E7.

24. Mukhopadhyay J, Das K, Ismail S, Koppstein D, JangM, Hudson B, Sarafianos S, Tuske S, Patel J, JansenR, Irschik H, Arnold E, Ebright RH (2008) The RNApolymerase "switch region" is a target for inhibitors.Cell 135:295–307.

25. Artsimovitch I, Patlan V, Sekine S, Vassylyeva MN,Hosaka T, Ochi K, Yokoyama S, Vassylyev DG (2004)Structural basis for transcription regulation by alar-mone ppGpp. Cell 117:299–310.

26. Vassylyeva MN, Lee J, Sekine SI, Laptenko O,Kuramitsu S, Shibata T, Inoue Y, Borukhov S,Vassylyev DG, Yokoyama S (2002) Purification, crystal-lization and initial crystallographic analysis of RNApolymerase holoenzyme from Thermus thermophilus.Acta Cryst D58:1497–1500.

27. Vassylyev DG, Sekine S, Laptenko O, Lee J,Vassylyeva MN, Borukhov S, Yokoyama S (2002) Crys-tal structure of a bacterial RNA polymerase holo-enzyme at 2.6 A resolution. Nature 417:712–719.

28. Artsimovitch I, Vassylyeva MN, Svetlov D, Svetlov V,Perederina A, Igarashi N, Matsugaki N, Wakatsuki S,Tahirov TH, Vassylyev DG (2005) Allosteric modulationof the RNA polymerase catalytic reaction is an essen-tial component of transcription control by rifamycins.Cell 122:351–363.

29. Vassylyev DG, Svetlov V, Vassylyeva MN, Perederina A,Igarashi N, Matsugaki N, Wakatsuki S, Artsimovitch I(2005) Structural basis for transcription inhibition bytagetitoxin. Nat Struct Mol Biol 12:1086–1093.

30. Vassylyev DG, Vassylyeva MN, Zhang J, Palangat M,Artsimovitch I, Landick R (2007) Structural basis forsubstrate loading in bacterial RNA polymerase. Nature448:163–168.

31. Vassylyev DG, Vassylyeva MN, Perederina A, TahirovTH, Artsimovitch I (2007) Structural basis for tran-scription elongation by bacterial RNA polymerase.Nature 448:157–162.

32. Belogurov GA, Vassylyeva MN, Sevostyanova A,Appleman JR, Xiang AX, Lira R, Webber SE, KlyuyevS, Nudler E, Artsimovitch I, Vassylyev DG (2009) Tran-scription inactivation through local refolding of theRNA polymerase structure. Nature 457:332–335.

33. Feklistov A, Mekler V, Jiang Q, Westblade LF, IrschikH, Jansen R, Mustaev A, Darst SA, Ebright RH (2008)Rifamycins do not function by allosteric modulation ofbinding of Mg21 to the RNA polymerase active center.Proc Natl Acad Sci USA 105:14820–14825.

34. Tuske S, Sarafianos SG, Wang X, Hudson B, Sineva E,Mukhopadhyay J, Birktoft JJ, Leroy O, Ismail S, Clark

AD, Jr, Dharia C, Napoli A, Laptenko O, Lee J,Borukhov S, Ebright RH, Arnold E (2005) Inhibition ofbacterial RNA polymerase by streptolydigin: stabiliza-tion of a straight-bridge-helix active-center conforma-tion. Cell 122:541–552.

35. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A,Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D,Axelrod HL, Das D, Vorobiev SM, Iwai H, PokkuluriPR, Baker D (2011) Improved molecular replacementby density- and energy-guided protein structure opti-mization. Nature 473:540–543.

36. Evans PR, Murshudov GN (2013) How good are my dataand what is the resolution? Acta Cryst D69:1204–1214.

37. Dauter Z, Wlodawer A, Minor W, Jaskolski M, Rupp B(2014) Avoidable errors in deposited macromolecularstructures: an impediment to efficient data mining.IUCr J 1:179–193.

38. Tavares GA, Panepucci EH, Brunger AT (2001) Struc-tural characterization of the intramolecular interactionbetween the SH3 and guanylate kinase domains ofPSD-95. Mol Cell 8:1313–1325.

39. Santelli E, Richmond TJ (2000) Crystal structure ofMEF2A core bound to DNA at 1.5 A resolution. J MolBiol 297:437–449.

40. Pellegrini M, GronbechJensen N, Kelly JA, PflueglGMU, Yeates TO (1997) Highly constrained multiple-copy refinement of protein crystal structures. Proteins29:426–432.

41. Brunger AT, Rice LM (1997) Crystallographic refine-ment by simulated annealing: methods and applica-tions. Methods Enzymol 277:243–269.

42. Rice LM, Shamoo Y, Brunger AT (1998) Phaseimprovement by multi-start simulated annealingrefinement and structure-factor averaging. J ApplCryst 31:798–805.

43. Branden CI, Jones TA (1990) Between objectivity andsubjectivity. Nature 343:687–689.

44. Kleywegt GJ, Jones TA (1995) Where freedom is given,liberties are taken. Structure 3:535–540.

45. Jones TA, Zou JY, Cowan SW, Kjeldgaard M (1991)Improved methods for building protein models in elec-tron density maps and the location of errors in thesemodels. Acta Cryst A47:110–119.

46. Matthews BW (2007) Five retracted structure reports:inverted or incorrect? Protein Sci 16:1013–1016.

47. Jeffrey PD (2009) Analysis of errors in the structuredetermination of MsbA. Acta Cryst D65:193–199.

48. Wang J (2001) A corrected quaternary arrangement ofthe peptidase HslV and atpase HslU in a cocrystalstructure. J Struct Biol 134:15–24.

49. Wang J, Rho SH, Park HH, Eom SH (2005) Correctionof X-ray intensities from an HslV-HslU co-crystal con-taining lattice-translocation defects. Acta Cryst D61:932–941.

50. Read RJ, Adams PD, Arendall WB, III, Brunger AT,Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB,Lutteke T, Otwinowski Z, Perrakis A, Richardson JS,Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH(2011) A new generation of crystallographic validationtools for the protein data bank. Structure 19:1395–1412.

51. Otwinowski Z, Minor W (1997) Processing of X-ray dif-fraction data collected in oscillation mode. MacromolCryst A 276:307–326.

52. Murshudov GN, Vagin AA, Dodson EJ (1997) Refine-ment of macromolecular structures by the maximum-likelihood method. Acta Cryst D53:240–255.

632 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?