on the validation of crystallographic symmetry and the quality of … · 2015-05-05 · has been...
TRANSCRIPT
On the validation of crystallographicsymmetry and the quality of structures
Jimin Wang*
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
Received 4 September 2014; Revised 20 October 2014; Accepted 24 October 2014DOI: 10.1002/pro.2595
Published online 29 October 2014 proteinscience.org
Abstract: In 2008, Zwart and colleagues observed that the fraction of the structures deposited inthe PDB alleged to have “pseudosymmetry” or “special noncrystallographic symmetry” (NCS) was
about 6%, and that this percentage was rising annually. A few years later, Poon and colleagues
found that 2% of all the crystal structures in the PDB belonged to higher symmetry space groupsthan those assigned to them. Here, I report an analysis of the X-ray diffraction data deposited for
this class of structures, which shows that most of the “pseudosymmetry” and “special NCS” that
has been reported is in fact true crystallographic symmetry (CS). This distinction is importantbecause the credibility of crystal structures depends heavily on quality control statistics such as
Rfree that are unreliable when they are computed incorrectly, which they often are when CS is misi-
dentified as “special NCS” or “pseudosymmetry”. When mistakes of this kind are made, artificiallylow values of Rfree can give unjustified confidence in the accuracy of the reported structures.
Keywords: pseudosymmetry; special noncrystallographic symmetry; Rfree values; reliability of reli-ability index; symmetry R-factor; merging R-factor; measurement R-factor; crystallographic R-factor;
symmetry downshifting; local minimum; refinement statistics
Introduction
Phase retrieval is a classic inverse problem because
the phases associated with the Bragg reflections in a
diffraction pattern, which are not directly measured,
are fully encoded in their amplitudes. Phases can be
retrieved from the diffraction patterns of the crys-
tals of any continuous, three-dimensional object by
computation as long as: (i) diffraction amplitudes
are accurately measured with a signal-to-noise ratio
of over 20, and (ii) there is 2.6-fold oversampling in
reciprocal space (for review and additional informa-
tion, see Palatinus and online supporting materials,
OSM, Supporting Information Fig. S1).1,2 This level
of oversampling occurs naturally in macromolecular
crystals that have an unordered solvent content of
65%, or greater. A survey of the protein data bank
(PDB) carried out in April 2014 demonstrated that
the average solvent content for all macromolecular
crystals is about 50%, which is not much less than
65%, and also showed that about 40% of all entries
are reported to have at least twofold noncrystallo-
graphic symmetry (NCS), which should also contrib-
ute to oversampling. Thus, it is possible that the
majority of the structures in the PDB could have
been solved starting with measured amplitudes only.
Even though none of the structures in the database
was in fact determined this way, complete a priori
phase retrieval using NCS averaging has been suc-
cessfully carried out for a number of test cases.3–7
Thus, it would be interesting to know if NCS really
is as prevalent in macromolecular crystals as the
data in the PDB might lead one to believe, and this
is why the work described below was undertaken.
In 2008, using coordinates available in the PDB
repository, Zwart et al. calculated the root-mean-
square deviations of the Ca coordinates of molecules
Additional Supporting Information may be found in the onlineversion of this article.
Grant sponsor: National Institutes of Health; Grant number: P01GM022778; Grant sponsor: Steitz Center for Structural Biology,Gwangju Institute of Science and Technology, Republic ofKorea.
*Correspondence to: Jimin Wang; Department of Molecular Bio-physics and Biochemistry, Yale University, New Haven, Con-necticut 06520. E-mail: [email protected]
Published by Wiley-Blackwell. VC 2014 The Protein Society PROTEIN SCIENCE 2015 VOL 24:621—632 621
supposedly related by NCS from average structures
that were obtained from the same data assuming that
the symmetry operations relating them are actually
CS. They concluded that 6% of all structures in the
PDB ought to be considered as having “special NCS” or
“pseudosymmetry,” and noted that the percentage of
such structures reported each year was increasing.8 In
addition, 15 years earlier, Wang and Janin had shown
that the NCS axes being reported in the PDB tended
to be either parallel or orthogonal to a unit cell axis.9
Taken together, these observations suggested that at
least some of the claims of NCS that have been made
might be spurious.
Shortly after the study of Zwart and colleagues
was published, another report appeared indicating
that there is indeed a widespread problem with NCS
and CS in the PDB. Poon and colleagues found that
up to 2% of all the crystal structures in the PDB
belonged to higher symmetry space groups than the
ones to which they were assigned by their authors.10
Their study focused on whether one could detect
“missed symmetry” by analyzing the coordinates of
already determined structures, trace any problems so
identified back to the X-ray diffraction data, and then
correct them. However, given the large differences
that exist between the calculated and observed ampli-
tudes of structure factors for crystal structures in the
PDB,11 this kind of coordinate-based analysis is likely
to underestimate the extent of the special NCS prob-
lem because when CS is treated as NCS, experimental
errors are certain to make coordinates diverge that
should be identical by reason of symmetry.
Here, I report the result of an analysis of the X-
ray diffraction data in the PDB,12 which confirms
that there is reason for concern about the way crys-
tal symmetry is being treated by macromolecular
structural biologists. Unlike the earlier studies of
Zwart et al. and Poon et al.,8,10 the analysis dis-
cussed below focused on how widespread the phe-
nomenon of missed symmetry might be, and the
deleterious effects it can have on structure quality.
It is clear that at least in some cases, the symmetry
in question was not missed. It was instead treated
as NCS for the purposes of refinement in the belief
that it would lead to an improvement in the quality
of the structures that emerged, which is not the
case. It is also clear that the problems created by
missed symmetry cannot be addressed using techni-
ques based on quality control statistics such as Rfree,
the crossvalidation (CV) statistic introduced in 1992
on which so much reliance is placed today.13
Results
Special noncrystallographic symmetry is often
crystallographic symmetryNCS R-factors (Rsymm) were computed from observed
intensities for all low-symmetry space group entries
in the PDB database using XPREP from the Shelx
suite, and the orientations of NCS axes were deter-
mined using MolRep from CCP4.12,14–16 It was found
that in most cases where NCS has been reported,
and Rsymm between NCS-related reflections is (less
than 15% and) often smaller than expected given the
value quoted for <I/rI> (Table I, Fig. 1, OSM, Sup-
porting Information Table S1–S3). Thus, the data do
not support the hypothesis that the NCS reported for
most of these crystals is anything other than CS.
Included among the many entries that appear to
have been incorrectly identified as having NCS are
219 of the 3003 P1 entries in the database (Table I).
The mean value of the averaged <I/rI> for these
entries is 12.1 6 6.2, suggesting that the data for
most of them was quite weak. However, the mean
value of Rsymm for the NCS-related reflections of
these entries is 0.071 6 0.032, which is smaller than
that expected, given the average value of <R(sigma)>
for these data, which is 0.083 [R(sigma) is <rI/I> and
can sometimes be approximated as <rI>/<I>]. Thus
poor data quality does not explain why so many of
these crystals were assigned to the space group P1.
A strong hint about what has been going on is
to be found in the fact that the data associated with
a noticeable fraction of the P1 entries in the PDB,
plus all other low-symmetry entries, and even some
P212121 entries (and beyond, see below), have Rsymm
values of zero when the NCS it is claimed they
Table I. Number of Structures with Misassigned SpaceGroupsa
Space group Entriesb Total Percentage (%)
P1 219 3003 7.3P2 14 152 9.2P21 563c 12,345 4.6C2 187d 7705 2.4P3 10 547 1.8P31 15 521 2.9P32 29 494 5.9P212121 133e 17,965 2.2P212121 from
Tetragonal or Cubicf106 1823 5.8
P212121 from Hexagonalg 7 2034 0.3
a An analysis was carried out with entries released prior tomid-May 2014. Whereas this analysis focused on low sym-metry space groups, higher symmetry space groups are notimmune to the problems identified here.b Entries with Rsymm less than 0.15 in higher symmetryspace groups are counted.c Break-down is: 530 orthorhombic, 23 tetragonal, 9 hexag-onal, and 1 cubic.d Break-down is: 101 orthorhombic, 33 rhombohedral, 24hexagonal, 15 tetragonal, and 1 cubic, and a few otherchoices.e Break-down is: 128 tetragonal and 5 cubic.f Crystals having unit cell lengths within 2.5% of thoseexpected either for tetragonal or cubic were selected foranalysis.g Crystals having unit cell lengths within 2.5% of thoseexpected for hexagonal were selected for analysis.
622 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?
display is treated as CS for purposes of data process-
ing (Fig. 1). The only way this could happen is if
those data sets were processed initially in a higher
symmetry space group, and then deliberately
expanded into P1 without reprocessing, which is
totally unacceptable.
The special noncrystallographic symmetryproblem is widespread
The criterion used above to assess whether special
NCS is actually CS, namely that Rsymm is smaller
than the value expected on the basis of the <I/rI>
value of the data, probably underestimates the
extent of the special NCS problem in the database.
It is instructive in this regard to analyze the orien-
tations of NCS axes (Supporting Information Table
S1, Fig. 2). In P1, the c* axis is the shortest unit cell
vector in the reciprocal space, and it corresponds to
the longest real-space unit cell vector, c. When an
axis of NCS is almost perfectly parallel or orthogo-
nal to the c* axis, and the peak in the self-rotation
function (SRF) corresponding to this NCS is suffi-
ciently high relative to the origin peak, it is highly
probable that the axis of the alleged NCS will be
perfectly aligned from one unit cell to the next
throughout the entire crystal lattice. If this is so,
then the NCS will certainly be CS. The NCS axes of
about one-third (34.6%) of all P1 entries in the PDB
have this property.
It is not easy to decide how high the SRF peak
attributed to special NCS must be to make it reason-
able to conclude that it is, in fact, CS. This is because
CS peaks in SRF maps are often substantially
weaker than origin peaks when real CS is ignored
during data processing, especially when the data are
weak. The extent of that reduction in strength is a
function of multiplicity number of the NCS. For
example, an SRF peak associated with threefold NCS
is likely to be weaker than a twofold NCS peak.
Nevertheless, a crude estimate of the extent of the
special NCS problem can be obtained using the crite-
rion that the height of the SRF peak (relative to the
origin peak) assigned to NCS should be at least one
standard deviation above the mean value within a
given class. For example, the mean SRF peak height
for all P1 entries is 0.45 6 0.25 (Fig. 2), including all
possible kinds of NCS. Thus, if a special NCS SRF
peak is higher than 0.70, there is a high probability
that it really represents a CS operation. This class of
structures includes 17.5% of all P1 entries.
Figure 1. Distribution of Rsymm (in fraction unit with arbitrary 0.15 cutoff) for X-ray data in reported low-symmetry space groups
(P1, P21, C2) and P212121 in the database as a function of 1/(<I/rI>). Entries with Rsymm of zero are distribution along the hori-
zontal axis. The red lines are the merging statistics expected from the given <I/rI> value of each data set. Data sets below
these lines can definitely be scaled in higher symmetry space groups. Since Rsymm does not take multiplicity into account and
is often overestimated, the majority of the data sets should be scaled in higher symmetry space groups. When multiple higher
symmetry space groups are possible, only the best one is used in the plots shown.
Wang PROTEIN SCIENCE VOL 24:621—632 623
When this analysis was carried out on the sub-
group of P1 structures for which cyclic twofold rota-
tional NCS was reported (i.e., excluding all other
types of NCS), the mean SRF peak height proved to
be 0.64 6 0.18. Thus, 11.3% of the data sets in this
class have an SRF peak height greater than 0.82,
which could be indicative of an invalid P2 to P1 con-
version. When this kind SRF peak analysis is done
on all the P1 entries (including ones without any
NCS), the mean SRF peak height drops to
0.35 6 0.25, and 18.6% of them are found to have
special NCS SRF peak heights higher than 0.60.
These estimates of the extent of the special NCS
problems in the PDB are broadly consistent with the
two studies discussed earlier.8,10 It is a significant
problem (Supporting Information Table S1).
Systematic absences are as useful for determin-
ing the true crystal symmetry as Rsymm and SRF
analysis even when data are very weak. Systemati-
cal absences are a property of the entire crystal lat-
tice whereas NCS is a local property of individual
asymmetric units. For example, when a structure is
refined in P1 that has special P212121 NCS, it is
highly unlikely that the refined structure that
emerges will predict the systematic absences charac-
teristic of crystals that belong to the space group of
P212121. Although translational NCS may exist, it
can at most generate pseudosystematical absences
along a single axis, which often break down rapidly
with increasing resolution. The probability of
encountering a crystal that has three orthogonal
sets of translational NCS characterized by perfect
systematic absences along all three major axes is
zero for all intents and purposes. Examination of the
data of the P1 space group entries containing so-
called pseudo P212121 NCS reveals that the system-
atic absences called for by P212121 CS are often
unmistakably clear in the data. It follows that most
of the structures solved in P1 that are alleged to
involve special P212121 NCS actually belong to the
space group P212121.
On the intensity differences that distinguish
crystallographic symmetry fromnoncrystallographic symmetry
To address whether Rsymm statistics are sensitive
enough to distinguish between special NCS and CS, a
simulation was carried out in which CS was converted
computationally into NCS by tilting a true CS rota-
tion axis in the unit cell of a crystal with respect to
the axes of that unit cell. The homodimeric structure
3R5G,17 which was solved at 1.5 A resolution in the
space group P2 (a 5 51.36 A, b 5 36.31 A, c 5 94.76 A,
b 5 98.76�), was used for this simulation (for this
structure, the SRF peak for the two nondyad-related
subunits in the original asymmetric unit is only 11%
of the origin peak). The contents of the P1 unit cell
content were generated from the published structure
using the CS twofold operation at (1/2, 0, 0), and the
Bragg reflection intensities were computed for the
crystal that resulted when this new local dyad was
tilted by 1�, 2�, 3�, 4�, 5�, and 10� toward either the a*
or c* axis of the unit cell.16 This tilting operation, of
course, converted what was previously a CS dyad into
an NCS dyad. The calculated intensities in P1 for
(hkl) reflections were then compared with those of
(-h,k,-l) reflections, which would be identical if the
NCS-so-generated was actually CS. Figure 3 shows
the results. A 1� difference between the orientation of
the twofold NCS axis and a unit cell axis can result in
an intensity Rsymm of over 18% in the 5-A resolution
shell, which is far greater than the merging R-factor
statistics Rmeas are likely to be at such low resolution
(Rmeas� 5%).6 Thus, even small deviations in axis of
orientation from true rotational CS are readily
detected using Rsymm.
Figure 2. Distribution of the orientations of SRF peaks in h value in all P1 entries with only one top peak per entry. (a) The
majority of orientation with h value is parallel (180�) or orthogonal (90�) to c* axis are overlaps in the plot so that single dots
may represent as many as 100 entries as the range of high SRF peaks. (b) Histogram of the distribution. Black line is the counts
for h value being parallel (180�) or orthogonal (90�) to c* axis and red line is the counts for h value elsewhere.
624 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?
On the effects of expanding data usingcrystallographic symmetry
The practice of expanding high symmetry data to
lower symmetry for the purpose of structure refine-
ment has become widespread in the last decade
(Supporting Information Table S1). The justifications
most commonly given for doing so are: (i) that Rfree
values fell as a result, and (ii) that composite-omit,
simulated-annealing procedures produced better
maps in P1 or some other reduced symmetry space
group.18–22 Neither is valid for reasons that will
shortly be explained.
Many of the problems that arise when the true
symmetry of crystals is not respected stem from the
effects this practice has on data statistics. If the
redundancy that results from high symmetry is not
taken into account, the merging statistics obtained
for a data set will not only underestimate its true
quality (in which case Rpim is more appropriate,
which includes both redundancy and multiplicity)
but will also lead to increases in the level of noise in
the electron density maps that ultimately emerge.
For example, the <I/rI> value for the data associ-
ated with the 4HYO structure, which is one of the
many characterized by missed symmetry, increases
by �2.8-fold when the data originally processed in
P1 are eightfold averaged in the correct space group,
which is P4212.20
Problems caused by symmetry downshifting
were first addressed over a decade ago,23 and since
then many other examples of this practice have
appeared in the literature. Several have been pro-
vided by a series of structures that have been
reported for crystals of the Thermus thermophilus
RNA polymerase (TTHRNP) complex.24–34 One of
the crystal forms of this molecule has data-scaling
statistics, systematic absences, and molecular-
replacement solutions that are consistent with space
group P65.24–26 Nevertheless, the authors of all the
structures obtained with this crystal form (e.g.
1SMY) expanded data sets that had been processed
in P65 into P32 for the purposes of structure refine-
ment, and attributed the additional symmetry appa-
rent in the data to perfect merohedral twinning with
the twinning operation parallel with the 32 axis.24–26
The rationale given was that, after molecular
replacement, rigid-body refinement in P65 led to a
reduction in Rfree from 50.0% to only 45.3%, whereas
rigid-body refinement done in P32, assuming perfect
merohedral twinning, reduced the Rfree value to
38.7%, and therefore the latter refinement had to be
correct.26 Similar arguments were made recently by
a different group to justify the correctness of molecu-
lar replacement solutions in the presence of NCS
even though such solutions they obtained thereby
could not be further refined.35 In fact, the mistake
made by these authors, like many others, is to give
more credence to differences in Rfree values that
they deserve. Evans and Murshudov have shown
that in the presence of perfect merohedral twinning,
a structure with an Rfree value of 29.1% can be com-
pletely wrong (OSM)!36
There are problems with the 1SMY structure
that emerged from the refinements mentioned above
that should have been enough to call it into ques-
tion.24–26 For one thing, it does not explain the pat-
tern of systematic absences along the screw axis
that are clearly evident in data (Supporting Infor-
mation Fig. S2). In addition, perfect merohedral
twinning of a P32 crystals along its threefold axis
can only result in apparent symmetry of P62 or P64,
but never P65! Furthermore, there are so many diva-
lent cations included in the structure that at neutral
pH, the charge per enzyme molecule would be
11,856.
Given the relatively modest resolution of this
structure (2.71 A), the addition of 9031 geometrically
unconstrained water molecules (20% of all the atoms
Figure 3. Intensity change caused by tilting a CS axis to convert it an NCS axis. (A) Dyad tilts toward x axis by 1�, 2�, 3�, 4�,
5�, and 10� (black, red, green, blue, dark brown, and brown, respectively). (B) Dyad tilts toward z axis. Vertical axis is fractional
intensity change and horizontal axis is reciprocal resolution (A21).
Wang PROTEIN SCIENCE VOL 24:621—632 625
in the model!), together with 1897 unconstrained
Mg21 ions is likely to have resulted in severe overfit-
ting of the data. However, it did not produce the
increase in Rfree values it should have because the
selection of reflections for the CV set was done incor-
rectly. The correct space group for this TTHRNP
crystal form is P65 as initially identified by the origi-
nal authors (Supporting Information Table S4).2626
The data from all closely crystal structures display
varying degrees of mild twinning (e.g., about 4.4%
in 3DXJ) with the twin axis orthogonal to the P65
axis in such a way that the selection of reflections
for CV has to be made assuming that the higher
symmetry space group is P6522, and then expanded
back to the correct, lower symmetry space group of
P65 to obtain the correct structure (Supporting
Information Table S4). This was not done.24–26
To further illustrate why the methods used for
both CV reflection selection and refinement of the
TTHRNP structures in question (i.e., data-expansion
with added twinning) are invalid, a computational
experiment was done in which the original CV set
was kept, and a molecular replacement solution for
the TTHRNP data was sought using as a starting
model the structure of a smaller, completely unre-
lated polymerase (RB69 DNA polymerase). The
molecular centers of the two models were aligned to
generate a plausible model for the packing of the
RB69 DNA polymerase structure in the TTHRNP
unit cell, and the resulting model was then refined.
During rigid-body refinement, Rfree values fluctuated
more than 10% from one refinement step to the
next, and it decreased additionally by more than 7%
(to about 49% for the data between 40.0 and 4.0 A
resolution) during atomic refinement. These reduc-
tions in Rfree had nothing to do with the validity of
the molecular model being used, which was unre-
lated to the crystals in question, but spoke only to
the fact that after refinement, the symmetry of the
calculated amplitudes now approximately repro-
duced the symmetry of the data. Thus, reductions in
Rfree values obtained by refining molecular replace-
ment models should not be interpreted as proof that
a correct solution has been found, especially when
that solution cannot be refined further (or when
Rfree values are above 42.3%, see OSM). Specific
examples of other problems that have been caused
by the incorrect assignment of space groups may be
found elsewhere.8,37
On the effects of treating crystallographic
symmetry as noncrystallographic symmetry:
A test caseThe practical consequences of refining a structure of
high symmetry in a lower symmetry space group
were further explored by refining the same crystal
structure under strictly identical conditions using
data for that structure that had been processed in
the correct space group, and the same data proc-
essed in a reduced symmetry space group. 4HYO
was chosen for this computational experiment,20 for
the two following reasons. First, the noise in the
data set was relatively small because the resolution
was quite high, �1.65 A. Second, the data clearly
displays P4212 symmetry, which means that the dif-
ference in the quality of the data processed between
P1 and P4212 is magnified because of the eightfold
averaging that occurs when they are processed in
P4212 (Table II).
Three parallel structure refinement runs were
carried out (see Materials and Methods). First, the
structure was refined against the 4HYO data after
the P1 data had been eightfold averaged to create a
P4212 data set, and reflections had been selected for
the CV set in a manner that respected the P4212
symmetry of the data (the “P4212” run). Second, it
was also refined against the original 4HYO P1 data
using reflections for the CV set chosen by the original
authors in P1 using the thin slice method of constant
resolution shells (the “P1-P1” run).20 Third, it was
refined against the 4HYO P1 data using the CV
reflections that were used for the P4212 run, followed
by expansion into P1 (the “P1-P4212” run). The same
protein-only 4HYO model was used as the starting
model for all three runs. After symmetry downshift-
ing, the number of atoms in the asymmetric unit
increased about eightfold but the number of observa-
tions increases only about sevenfold because about
12% data in P4212 are centric but none are centric in
P1 (Table III). The reduced observation-to-atom ratio
in the symmetry-downshifted model will presumably
lower both free and working R-factor values. As
expected, after the first two passes of refinement (a
total of 40 cycles, see Material and Methods), the
P4212 run had the highest Rfree value of the three by
more than 0.5% (Table III). Thus, if Rfree values had
been used at this stage to decide which refinement
strategy to continue pursuing, one might have
stopped the P4212 run. However, after all three
refinements converged fully, the Rfree value for the
P4212 run was 16.4%, but it was 17.5% for the P1-
P4212 run and 16.8% for the P1-P1 run (Table III).
Table II. Data Reprocessing Statistics of 4HYO inP4212 Space Group
Resolution(A) <I/rI> Rsymm Rmeas
99–3.55 49.0 0.016 0.0183.55–2.82 60.0 0.019 0.0202.82–2.46 60.4 0.023 0.0252.46–2.24 54.9 0.031 0.0332.24–2.08 46.6 0.041 0.0442.08–1.96 36.8 0.053 0.0571.96–1.86 22.6 0.082 0.0891.86–1.78 13.6 0.132 0.1431.78–1.71 10.4 0.163 0.1761.71–1.65 6.1 0.256 0.277Overall 46.1 0.025 0.027
626 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?
Because CS averaging reduced the levels of noise
in the data used for the P4212 run, the quality of the
electron density map that emerged from this run was
superior to those obtained in the other two runs, par-
ticularly in the vicinity of both the N and C-termini of
the protein. For the same reason, the Ramachandran
statistics of the product of the P4212 refinement are
superior to those obtained in the two P1 refinements,
even though the starting model for all three was iden-
tical and had perfect geometry, and even though
same manual editing was done to correct geometrical
errors introduced by refinement (Table III).
If P4212 is, in fact, the right space group to use for
this structure, why did the P1-P1 run produce such a
“good” result as measured by Rfree? This question can
be answered by comparing the results of the P1-P1 run
and the P1-P4212 run. Even though the converged
Rfree value for the P1-P1 run is significantly better
than that of the P1-P4212 run, their converged work-
ing R-factor distributions are nearly identical (Fig. 4).
This observation suggests that the reason the P1-P1
refinement has a lower Rfree value is that reflections in
its CV set are not fully independent of the reflections
in the working set, which is not the case for the CV set
used for the P1-P4212 run (OSM).
Symmetry downshifting can have consequences
for the resulting structures that are biochemically
significant. The structure 4HYO discussed above is
closely related to 4HZ3.20 The channel blocker tetra-
ammonium antimony was included in the crystalliza-
tions that led to 4HZ3, but not in the crystallizations
that resulted in 4HYO. The authors reported that
bound tetra-ammonium antimony could be visualized
in the 4HZ3 structure, but only if the data were proc-
essed in P1.20 However, an isomorpohous difference
Fourier map between the observed 4HZ3 and 4HYO
data provides no evidence whatsoever showing that
the channel blocker in question is bound in the 4HZ3
structure (Fig. 4, OSM). Phases for these maps were
calculated from the final complete 4HYO model that
emerged from the P4212 run described above (which
has a Rwork of 9.1% at 1.65-A resolution, Table III).
Experimental errors in the data for other structures of
this kind that have resolutions lower than that of
4HYO are likely to be much higher that those in
4HYO, and thus the errors introduced into the result-
ing structures by symmetry downshifting are likely to
be greater, and every bit as likely to lead to misleading
conclusions about biochemically significant issues.
Better structures can often be obtained whenCS is properly taken into account
Two of the three P1 structures deposited prior to
2005 that are listed in Supporting Information
Table III. Refinement Statistics of 4HYO in P4212 and in P1
“P4212” “P1-P1” “P1-P4212”
Resolution range (A)a 45–1.65(1.69–1.65) 45–1.65(1.69–1.65) 45–1.65(1.69–1.65)Number of observations 10,760 75,462 75,584Final refinement statisticsNumber of atoms 917 6,483 6,597Observation/atom Ratio, RO2A 11.7 11.6 11.5Rwork (%) 9.1(10.2) 11.4(16.7) 11.2(17.6)Rfree (%) 16.4(16.2) 16.8(21.1) 17.5(23.6)RO2A/Rwork
b 128.6 101.7 102.3rmsd Bond length (A) 0.0087 0.0082 0.0091Ramachandron Plotc
Preferred 76 residues (96.2%) 627 (95.6%) 602 (94.2%)Generously allowed 1 residue (1.3%) 12 (1.8%) 14 (2.2%)Disallowed 2 residues (2.5%) 17 (2.6%) 23 (3.6%)Initial R-factor statistics of protein-only models with bulk solvent correctiond
Rwork (%) 20.98 20.53 20.57Rfree (%) 22.64 22.50 21.45R-factor statistics after two passes of refinement without manual editingd
Rwork (%) 20.1(19.1) 19.7(22.7) 19.6(23.3)Rfree (%) 24.2(24.1) 23.5(25.8) 23.3(25.8)
a The highest resolution shell and their statistics are in parenthesis.b When RO2A/Rwork is used as a measure for global goodness of fit, the model refined in the correct P4212 space group isabout 30% better than those in the reduced P1.c Increased outliers in Ramachandron plot in P1 relative to P4212 were clearly due to reduced quality of resulting electrondensities, particularly for residues at both N and C termini where all outliers are located.d R-factor statistics for the same protein-only model with symmetry expansion (for P1) and with bulk solvent correction atthe start of refinement and after the second pass of refinement each with 20 cycles. The first pass was refined withoutNCS restraints and the second pass with NCS restraints in P1. The lowered R-factor values for the “P1-P4212” and “P1-P1”runs (particularly the “P1-P4212” run) before any refinement than those for the P4212 run was due to reduced multiplicityweights in R-factor calculations in P1 for the pseudocentric reflections whose reflections typically have higher R-factors.Once refinement was carried out in P1, all pseudocentric reflections became acentric, when NCS axes were notconstrained.
Wang PROTEIN SCIENCE VOL 24:621—632 627
Table S1 as likely to belong to a higher symmetry
space group (1JXO and 1EGW) were rerefined in
that higher symmetry space group to further assess
the impact that CS-to-NCS conversions might have
on structural quality (all other entries in Supporting
Information Table S1 were deposited after
2005).38,39 Re-examination of the data on file for
1JXO revealed that the crystal on which it is based
belongs to P21; the systematic absences along the
screw axis are unmistakably clear.38 When the P1
data were remerged into P21, the quality of the
data for rerefinement in P21 improved, and with the
Figure 4. Rerefined 4HYO structure and its relationship to 3LDC and 3HZ3 structures. (a) Packing of tetrameric K1 channel in
P4212 unit cell. (b) Isomorphous amplitude differences (fractional unit) between the observed 4HYO and 3LDC data (black) and
between 4HYO and 4HZ3 (red) as a function of reciprocal resolution (A21, see text). (c) Final working (solid lines with open sym-
bols) and free (dashed lines with filled symbols) R-factor distribution as a function of reciprocal resolution for three parallel
refinement runs, (i) the “P4212” run (black), (ii) the “P1-P1” run (red), and (iii) the “P1-P4212” run (green). (d) Initial R-factors after
the second pass of refinement. (e) The observed Fobs(3LDC)-Fobs(4HYO) difference Fourier maps contoured at 13.0 r. (f) The
observed Fobs(4HZ3)-Fobs(4HYO) difference Fourier maps contoured at 14.0 r. See OSM for additional discussion.
628 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?
CV set chosen appropriately, all of the 42 residues
(14% of the total residues) missing from the original
1JXO structure, which was refined in P1, plus 39
missing side chains (13%), were clearly evident in
Fobs-Fcalc difference Fourier maps (Fig. 5, Support-
ing Information Tables S5 and S6). The refined
structure that emerged has working and free R-
factors of 17.3 and 23.6%, respectively. The corre-
sponding value for the original P1 structure are
22.0 and 26.4%, even though the CV set used was
incorrectly chosen.38
Examination of the data for 1EGW shows that
its crystal too belongs to P21.39 However, while the
systematic absences along the twofold screw axis are
obvious upon inspection of the reported intensities,
they are much less clear from I/rI ratios for
unknown reasons (Supporting Information Fig. S3).
The structure obtained by rerefining the 1EGW data
in P21 has working and free R-factors of 14.2 and
18.3%, respectively, as compared to the original P1
structure for which the corresponding values are
20.6 and 22.9% (Supporting Information Figs. S4
and Supporting Information Tables S7 and S8). In
that case, the CV set used for the refinement of the
original structure in P21 was correctly selected, and
thus the same set was kept for its rerefinement.
Eight other structures were rerefined in higher
symmetry space groups using properly selected CV
reflection sets, and with only one exception, the
structures obtained for all of them were statistically
better than those on deposit in the PDB, which
firmly supports the conclusion that better structures
result when data are processed in the correct
highest-possible symmetry space group (Supporting
Information Table S2). As most of these structures
were solved recently using the same or similar ver-
sions of the refinement programs used here to rere-
fine them, the improvements in structure quality
obtained are likely to be ascribable both to the
improvement in data quality that resulted from
merging them in the correct, higher symmetry space
group, and to the use of a properly selected CV set.
This conclusion is strongly supported with the
results of three parallel refinement runs for 4HYO
described above.
It should be noted that comparison of the Rfree
values of these rerefined structures with those origi-
nally reported may be unfair given the fact that the
reflections in the CV sets used were incorrectly
selected in P1 or reduced symmetry space groups
(OSM). In such cases, comparison of corresponding
Rwork values might be more appropriate. For exam-
ple, the newly rerefined P4212 4HYO structure has
a working R-factor of 9.1% (Table III), whereas the
working R-factor for the original P1 4HYO structure
was 16.5% (Supporting Information Table S2). Addi-
tionally, it is noted that the 1JXO data set has <I/
rI> of 12.9 in the highest resolution shell and a
value of Rfree in that shell that is less than the over-
all Rfree value for the structure (22.7 vs. 23.6% (Fig.
5, Supporting Information Tables S5 and S6). Simi-
larly, the 1EGW data set also has very high I/rI
value (9.31) and a very low Rfree value (18.4%) in
the highest resolution shell (Fig. 5, Supporting
Information Tables S7 and S8).
DiscussionTreating CS as NCS for the purpose of refinement is
invalid for two reasons. First, when higher symmetry
Figure 5. Rerefined 1JXO structure and refinement statistics.
(A) The complete structure from N to C in rainbow colors. (B)
Superposition with the original 1JXO structure (gray) shows
locations of four gaps in the original structures and other dif-
ferences in termini. (C) R-factor distributions for both 1EGW
(black) and 1JXO (red) rerefined structures. It should be
noted that R-factors for high-resolution data were actually
smaller than low-resolution data due to an excessive trunca-
tion of high-resolution data.
Wang PROTEIN SCIENCE VOL 24:621—632 629
data are expanded into lower symmetry space groups,
the coordinates of “NCS-related” molecules that ought
to be identical are allowed to vary independently dur-
ing refinement, thereby increasing the number of
parameters that can be adjusted but not the number
of independent observations. This will result in a
decrease in Rfree, even when the set of reflections
used for CV is properly selected, which brings us to
the second problem. For most of these structures, the
reflections for CV were selected using invalid proce-
dures (OSM).
In addition, symmetry downshifting can
increase the noise in the electron density maps, and
thereby degrade the quality of the structures
obtained from them even though this (as well as
multicopy refinement,40 see OSM) always leads a
decrease in Rfree values, as demonstrated in this
study. Symmetry downshifting for structure refine-
ment should be discouraged, or even banned since
there is no valid basis for it. Moreover, the invoca-
tion of special NCS or unidentified CS not only com-
plicates the selection of reflections for CV, it also
reduces the effectiveness of composite-omit map cal-
culations.41,42 For example, when a crystal structure
that belongs to the space group P212121 is refined in
P1, its four identical asymmetric units are treated
as though they are independent. For every omit
block, therefore, there will be three identical unomit-
ted blocks that are related to the omitted block by
the “special P212121 NCS.” Model-bias due to the
three nonomitted blocks will completely invalidate
the resulting omit map. Because this is so,
composite-omit maps always look better in a
symmetry-downshifted space group than in the cor-
rect space group when the maps from these two pro-
cedures are compared side-by-side.22,23
Historical perspectiveIn crystallography, the term “crystal symmetry” is well
defined. Operationally, a crystal must be assigned to
the highest symmetry its data permit, given the level
of error evident in the experimental data, that is, pro-
vided Rsymm < R(sigma). The term “pseudosymmetry”
is occasionally used to describe the symmetry evident
in data sets that show peaks in their SRFs that are
aligned well enough with crystal axes to be true CS,
but fail the conventional Rsymm<R(sigma) test none-
theless. Not only is it wrong to treat CS as NCS, it can
have serious consequences for the quality of inferred
crystal structures, as is the case for the TTHRNP
structure discussed above.
Before the introduction of Rfree, in a commen-
tary on erroneous structures that shook the confi-
dence of the structural biology community, Branden
and Jones warned that (working) R-factor values of
25% or higher could be indicative of problematic
crystal structures.43 In a test case at 1.8-A resolu-
tion, it was shown that a structure obtained by
deliberately fitting its polypeptide chain backwards
through the corresponding electron density map
could be refined so that its R-factor was less than
25%.44,45 The introduction of Rfree went a long way
towards curing many of the problems Branden and
Jones identified. Nevertheless, numerous cases are
known where the use of Rfree did not prevent the
publication of seriously defective structures.46–49
Today, one might want to make a statement about
structure quality similar to the one Branden and
Jones made 2 decades ago, namely that Rfree values
of 25% or higher could be indicative of structures
that have serious problems.
As shown above, the Rfree values calculated for
structures obtained for crystals alleged to contain NCS
may have little value as measures of structure quality,
especially when the CV sets used have been chosen
improperly. Many other factors can also reduce the
objectivity of Rfree values, a subject that will be dis-
cussed elsewhere. Nevertheless, the majority the CS/
NCS problems identified here could easily be avoided,
or corrected, if necessary. All that is required is for the
community to decide collectively that this problem
needs to be addressed. An important point of this
manuscript is to deliver a call to arms to the entire
structural biology community so that the important,
but entirely correctable problems identified above get
resolved. Our scientific credibility is on the line.
A proposal for reform
Given the extent of the NCS problem discussed
here, it would be beneficial if, along with the many
other statistical tests that are already routinely
done to assess structure quality, Rsymm<R(sigma)
tests were run on all the data associated with struc-
tures for which NCS is claimed at the time of struc-
ture/data deposition.50 Equivalent tests have been
included for decades in many of the programs used
for crystallographic data processing; they enable
those programs to determine the symmetries of crys-
tals automatically. Those who choose to over-ride the
space group assignments provided by these pro-
grams should be warned that this is bad practice,
and asked either to show cause, or to take corrective
steps. If they do not, it would be appropriate that
warning flags be added to the PDB entry.
Materials and Methods
All observed structure factors were retrieved in mid-
April 2014 from the PDB,12 and converted in the
scalepack format.51 When experimentally measured
intensities (I) were absent and measured amplitudes
(F) were present, the intensities were used assuming
that I 5 F2 and rI 5 2FrF. Entries were rejected if
the corresponding standard deviation column was
missing. Intensities in each entry were then rescaled
to have maximal accuracy. Rsymm was analyzed
using XPREP from the Shelx suite.14,15 The
630 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?
orientations of NCS axes were determined using
MolRep from CCP4.16 Merging intensities to appro-
priate higher symmetry space groups was done
using scalepack from HKL2000.51 Rsymm obtained
from prescaled P1 intensity data in this study differs
slightly from that of unscaled integrated intensity
data in a single-step data processing. This difference
does not affect the conclusion of this study. For rere-
finement of structures in the correct space group,
the location of the special CS dyad was determined
using least square superposition methods, and
proper origin-shift was applied to convert special
NCS back to CS whenever possible.
To minimize any differences due to different
refinement procedures, different levels of investiga-
tors’ modeling skills, and any potential bias of this
author, three parallel refinement runs were carried
out using conditions as strictly identical as possible
per suggestions of one reviewer of this manuscript.
That was as follows: (i) the “P4212” run against the
4HYO P4212 data in the correct space group with
properly selected reflections for the CV set,20 (ii) the
“P1-P1” run against the P1 data with the reflections
for the CV set selected in P1 using a thin slicing
method of resolution shells by the original authors,20
and (iii) the “P1-P4212” run against the P1 data
with the reflections for the CV set selected in P4212
and then expanded to P1. All three runs started
with the same protein-alone model. This model was
derived rerefined 4HYO model with Rwork 5 12.6%
and Rfree 5 17.5% at 1.65 A resolution at the time of
original manuscript submission after all ordered
water molecules and 1,6-hexanediol molecules were
removed.20 An initial refinement was carried out in
two passes each with 20 cycles using Refmac5.52 For
the “P4212” run, both passes had no NCS restraints
applied. For the “P1-P1” and “P1-P4212” runs, NCS
restraints were applied in the second pass but not in
the first pass. For the remaining 10 passes of struc-
ture refinement, including two passes of parallel
Ramachandron backbone editing, exactly the same
criteria were applied to locate the ordered water
molecules and 1,6-hexanediol molecules or any
errors in models. For example, all three runs used
the same 5.0 r or 4.5 r, and finally 4.0 r as cut-off
criteria in peaks and holes in the residual Fobs-Fcalc
difference Fourier maps. The three models were
then allowed to diverge only because of different
numbers of peaks or holes in residual maps at a
given selected cut-off. Results of this refinement are
summarized in Table III and Figure 4.
AcknowledgmentsThe author is indebted to Dr. Peter Moore for exten-
sively editing this manuscript to make it appropriate
for the entire structural biology community rather
than just those experts in crystallography. The author
thanks Drs. Z. Dauter, and T. A. Jones for critical
comments and useful suggestions to improve this
manuscript, and Drs. W. A. Hendrickson, B. W. Mat-
thews, T. Richmond, A. T. Brunger, W. Yang, S. Burley,
H. Berman, M. G. Rossmann, A. Horwich, M. Hoch-
strasser, and W. Meng for discussion and comments.
References
1. Miao J, Sayre D, Chapman HN (1998) Phase retrievalfrom the magnitude of the Fourier transforms of non-periodic objects. J Optic Soc Amer 15:1662–1669.
2. Palatinus L (2013) The charge-flipping algorithm incrystallography. Acta Cryst B69:1–16.
3. Braig K, Otwinowski Z, Hegde R, Boisvert DC,Joachimiak A, Horwich AL, Sigler PB (1994) The crys-tal structure of the bacterial chaperonin GroEL at 2.8A. Nature 371:578–586.
4. Kleywegt GJ, Jones TA (1999) Software for handlingmacromolecular envelopes. Acta Cryst D55:941–944.
5. Pattanayek R, Wang JM, Mori T, Xu Y, Johnson CH,Egli M (2004) Visualizing a circadian clock protein:crystal structure of KaiC and functional insights. MolCell 15:375–388.
6. Wang J, Wing, R (2014) Diamonds in the rough: astrong case for the inclusion of weak-intensity X-raydiffraction data. Acta Cryst D70:1491–1497.
7. Wang J, Hartling JA, Flanagan JM (1998) Crystal struc-ture determination of Escherichia coli ClpP startingfrom an EM-derived mask. J Struct Biol 124:151–163.
8. Zwart PH, Grosse-Kunstleve RW, Lebedev AA,Murshudov GN, Adams PD (2008) Surprises and pit-falls arising from (pseudo)symmetry. Acta Cryst D64:99–107.
9. Wang XD, Janin J (1993) Orientation of non-crystallographic symmetry axes in protein crystals.Acta Cryst D49:505–512.
10. Poon BK, Grosse-Kunstleve RW, Zwart PH, Sauter NK(2010) Detection and correction of underassigned rota-tional symmetry prior to structure deposition. ActaCryst D66:503–513.
11. Holton JM, Classen S, Frankel KA, Tainer JA (2014)The R-factor gap in macromolecular crystallography:an untapped potential for insights on accuratestructures. FEBS J 281:4046–4060.
12. Berman HM, Bhat TN, Bourne PE, Feng Z, GillilandG, Weissig H, Westbrook J (2000) The Protein DataBank and the challenge of structural genomics. NatStruct Biol 7:957–959.
13. Brunger AT (1992) Free R value: a novel statisticalquantity for assessing the accuracy of crystal struc-tures. Nature 355:472–475.
14. Sheldrick GM. 2000. XPREP Version 6.0, Bruker AXS,Inc., Madison, Wisconsin, USA.
15. Sheldrick GM. (2008) A short history of SHELX. ActaCryst A64:112–122.
16. Winn MD, Ballard CC, Cowtan KD, Dodson EJ,Emsley P, Evans PR, Keegan RM, Krissinel EB, LeslieAG, McCoy A, McNicholas SJ, Murshudov GN, PannuNS, Potterton EA, Powell HR, Read RJ, Vagin A,Wilson KS (2011) Overview of the CCP4 suite and cur-rent developments. Acta Cryst D67:235–242.
17. Topal H, Fulcher NB, Bitterman J, Salazar E, Buck J,Levin LR, Cann MJ, Wolfgang MC, Steegborn C (2012)Crystal structure and regulation mechanisms of theCyaB adenylyl cyclase from the human pathogen Pseu-domonas aeruginosa. J Mol Biol 416:271–286.
18. Guo L, Han A, Bates DL, Cao J, Chen L (2007) Crystalstructure of a conserved N-terminal domain of histone
Wang PROTEIN SCIENCE VOL 24:621—632 631
deacetylase 4 reveals functional insights into glutamine-rich domains. Proc Natl Acad Sci USA 104:4297–4302.
19. Kummel D, Krishnakumar SS, Radoff DT, Li F,Giraudo CG, Pincet F, Rothman JE, Reinisch KM(2011) Complexin cross-links prefusion SNAREs into azigzag array. Nat Struct Mol Biol 18:927–933.
20. Posson DJ, McCoy JG, Nimigean CM (2013) Thevoltage-dependent gate in MthK potassium channels islocated at the selectivity filter. Nat Struct Mol Biol 20:159–166.
21. Gourdon P, Liu XY, Skjorringe T, Morth JP, Moller LB,Pedersen BP, Nissen P (2011) Crystal structure of acopper-transporting PIB-type ATPase. Nature 475:59–64.
22. Aggarwal A, Nair D, Johnson R, Prakash L, Prakash,S(2005) Reply to Wang: Hoogsteen base-pairing inDNA replication? Nature 437:E7; discussion E7.
23. Wang J (2005) DNA polymerases: Hoogsteen base-pairingin DNA replication? Nature 437:E6–7; discussion E7.
24. Mukhopadhyay J, Das K, Ismail S, Koppstein D, JangM, Hudson B, Sarafianos S, Tuske S, Patel J, JansenR, Irschik H, Arnold E, Ebright RH (2008) The RNApolymerase "switch region" is a target for inhibitors.Cell 135:295–307.
25. Artsimovitch I, Patlan V, Sekine S, Vassylyeva MN,Hosaka T, Ochi K, Yokoyama S, Vassylyev DG (2004)Structural basis for transcription regulation by alar-mone ppGpp. Cell 117:299–310.
26. Vassylyeva MN, Lee J, Sekine SI, Laptenko O,Kuramitsu S, Shibata T, Inoue Y, Borukhov S,Vassylyev DG, Yokoyama S (2002) Purification, crystal-lization and initial crystallographic analysis of RNApolymerase holoenzyme from Thermus thermophilus.Acta Cryst D58:1497–1500.
27. Vassylyev DG, Sekine S, Laptenko O, Lee J,Vassylyeva MN, Borukhov S, Yokoyama S (2002) Crys-tal structure of a bacterial RNA polymerase holo-enzyme at 2.6 A resolution. Nature 417:712–719.
28. Artsimovitch I, Vassylyeva MN, Svetlov D, Svetlov V,Perederina A, Igarashi N, Matsugaki N, Wakatsuki S,Tahirov TH, Vassylyev DG (2005) Allosteric modulationof the RNA polymerase catalytic reaction is an essen-tial component of transcription control by rifamycins.Cell 122:351–363.
29. Vassylyev DG, Svetlov V, Vassylyeva MN, Perederina A,Igarashi N, Matsugaki N, Wakatsuki S, Artsimovitch I(2005) Structural basis for transcription inhibition bytagetitoxin. Nat Struct Mol Biol 12:1086–1093.
30. Vassylyev DG, Vassylyeva MN, Zhang J, Palangat M,Artsimovitch I, Landick R (2007) Structural basis forsubstrate loading in bacterial RNA polymerase. Nature448:163–168.
31. Vassylyev DG, Vassylyeva MN, Perederina A, TahirovTH, Artsimovitch I (2007) Structural basis for tran-scription elongation by bacterial RNA polymerase.Nature 448:157–162.
32. Belogurov GA, Vassylyeva MN, Sevostyanova A,Appleman JR, Xiang AX, Lira R, Webber SE, KlyuyevS, Nudler E, Artsimovitch I, Vassylyev DG (2009) Tran-scription inactivation through local refolding of theRNA polymerase structure. Nature 457:332–335.
33. Feklistov A, Mekler V, Jiang Q, Westblade LF, IrschikH, Jansen R, Mustaev A, Darst SA, Ebright RH (2008)Rifamycins do not function by allosteric modulation ofbinding of Mg21 to the RNA polymerase active center.Proc Natl Acad Sci USA 105:14820–14825.
34. Tuske S, Sarafianos SG, Wang X, Hudson B, Sineva E,Mukhopadhyay J, Birktoft JJ, Leroy O, Ismail S, Clark
AD, Jr, Dharia C, Napoli A, Laptenko O, Lee J,Borukhov S, Ebright RH, Arnold E (2005) Inhibition ofbacterial RNA polymerase by streptolydigin: stabiliza-tion of a straight-bridge-helix active-center conforma-tion. Cell 122:541–552.
35. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A,Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D,Axelrod HL, Das D, Vorobiev SM, Iwai H, PokkuluriPR, Baker D (2011) Improved molecular replacementby density- and energy-guided protein structure opti-mization. Nature 473:540–543.
36. Evans PR, Murshudov GN (2013) How good are my dataand what is the resolution? Acta Cryst D69:1204–1214.
37. Dauter Z, Wlodawer A, Minor W, Jaskolski M, Rupp B(2014) Avoidable errors in deposited macromolecularstructures: an impediment to efficient data mining.IUCr J 1:179–193.
38. Tavares GA, Panepucci EH, Brunger AT (2001) Struc-tural characterization of the intramolecular interactionbetween the SH3 and guanylate kinase domains ofPSD-95. Mol Cell 8:1313–1325.
39. Santelli E, Richmond TJ (2000) Crystal structure ofMEF2A core bound to DNA at 1.5 A resolution. J MolBiol 297:437–449.
40. Pellegrini M, GronbechJensen N, Kelly JA, PflueglGMU, Yeates TO (1997) Highly constrained multiple-copy refinement of protein crystal structures. Proteins29:426–432.
41. Brunger AT, Rice LM (1997) Crystallographic refine-ment by simulated annealing: methods and applica-tions. Methods Enzymol 277:243–269.
42. Rice LM, Shamoo Y, Brunger AT (1998) Phaseimprovement by multi-start simulated annealingrefinement and structure-factor averaging. J ApplCryst 31:798–805.
43. Branden CI, Jones TA (1990) Between objectivity andsubjectivity. Nature 343:687–689.
44. Kleywegt GJ, Jones TA (1995) Where freedom is given,liberties are taken. Structure 3:535–540.
45. Jones TA, Zou JY, Cowan SW, Kjeldgaard M (1991)Improved methods for building protein models in elec-tron density maps and the location of errors in thesemodels. Acta Cryst A47:110–119.
46. Matthews BW (2007) Five retracted structure reports:inverted or incorrect? Protein Sci 16:1013–1016.
47. Jeffrey PD (2009) Analysis of errors in the structuredetermination of MsbA. Acta Cryst D65:193–199.
48. Wang J (2001) A corrected quaternary arrangement ofthe peptidase HslV and atpase HslU in a cocrystalstructure. J Struct Biol 134:15–24.
49. Wang J, Rho SH, Park HH, Eom SH (2005) Correctionof X-ray intensities from an HslV-HslU co-crystal con-taining lattice-translocation defects. Acta Cryst D61:932–941.
50. Read RJ, Adams PD, Arendall WB, III, Brunger AT,Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB,Lutteke T, Otwinowski Z, Perrakis A, Richardson JS,Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH(2011) A new generation of crystallographic validationtools for the protein data bank. Structure 19:1395–1412.
51. Otwinowski Z, Minor W (1997) Processing of X-ray dif-fraction data collected in oscillation mode. MacromolCryst A 276:307–326.
52. Murshudov GN, Vagin AA, Dodson EJ (1997) Refine-ment of macromolecular structures by the maximum-likelihood method. Acta Cryst D53:240–255.
632 PROTEINSCIENCE.ORG Crystallographic or Noncrystallographic Symmetry?