situs:apackage for docking crystal structures into low...

12

Upload: others

Post on 18-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution
Page 2: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

Situs: A Package for Docking Crystal Structures into Low-ResolutionMaps from Electron Microscopy

Willy Wriggers,*,†,1 Ronald A. Milligan,† and J. Andrew McCammon*

*Department of Chemistry and Biochemistry and Department of Pharmacology, University of California, San Diego, 9500 Gilman Drive,La Jolla, California 92093-0365; and †Department of Cell Biology, The Scripps Research Institute, 10666 N. Torrey Pines Road,

La Jolla, California 92037

Received October 26, 1998, and in revised form December 16, 1998

Three-dimensional image reconstructions of large-scale protein aggregates are routinely determinedby electron microscopy (EM). We combine low-resolution EM data with high-resolution structuresof proteins determined by x-ray crystallography. Aset of visualization and analysis procedures, termedthe Situs package, has been developed to provide anefficient and robust method for the localization ofprotein subunits in low-resolution data. Topology-representing neural networks are employed to vec-tor-quantize and to correlate features within thestructural data sets. Microtubules decorated withkinesin-related ncd motors are used as model aggre-gates to demonstrate the utility of this package ofroutines. The precision of the docking has allowedfor the extraction of unique conformations of themacromolecules and is limited only by the reliabil-ity of the underlying structural data.

1999 Academic Press

Key Words: topology representing neural net-works; multiresolution; visualization; macromolecu-lar assemblies; kinesin; microtubules.

INTRODUCTION

The role of visualization in structural biology iscurrently expanding, as novel modeling and graphicstools begin to integrate data from a variety ofbiophysical sources (Bajaj, 1998). The developmentof ‘‘multiresolution’’ structural visualization pro-grams is stimulated in part by recent successfulconstructions of electron microscopy (EM)-basedatomic-resolution models of viruses and cytoskeletalmotor-filament complexes (Rayment et al., 1993;Schroder et al., 1993; Milligan, 1996; DeRosier andHarrison, 1997; Chiu et al., 1997). Such synergisticapplications require the development of advanced

modeling techniques to bridge the scales in spaceand complexity of the underlying biological data(Mendelson and Morris, 1997; Hanein et al., 1998;Wriggers et al., 1998; Belnap et al., 1999). In this work,we describe Situs, a set of routines for the quantitativedocking of 3D data at variable resolution. The programsare deployed to support structural biologists in theconstruction of near-atomic resolution models oflarge-scale macromolecular assemblies.

In past applications, researchers employed mainly‘‘visual docking’’ to determine the position of proteinsubunits in low-resolution envelopes. This approachis feasible in cases where the reliability of cryo-EMdata provides a docking precision of four to five timesthe resolution of the experimental data (Baker andJohnson, 1996) or where there are additional con-straints such as subunit localizations on gold label-ing. However, in the case of ambiguous proteinshapes, divergent models of complexes docked by eyehave been reported (Sosa et al., 1997; Kozielski et al.,1998). Therefore, it is desirable to employ a quantita-tive method that would allow the researcher toassess the docking of multiple conformations of thecomplex by means of a score function. In this work, wedescribe a suite of programs that simplify this task ofgenerating optimally docked macromolecules.

Uncertainties of the visual approach are overcomethrough the use of the 3D density distribution thatdescribes the protein structure at low resolution(Frank, 1996). Key to the docking procedure aretopology-representing neural networks (TRNs) thatare used to correlate the high- and low-resolutiondata sets (Wriggers et al., 1998). TRNs are one ofseveral possible algorithmic realizations of vectorquantization, a data compression technique fre-quently used in image and speech processing applica-tions (Gray, 1984; Makhoul et al., 1985). The repre-sentation of data by a small number of so-calledcodebook vectors reduces the combinatorial complex-ity of the structural comparison, as data sets are

1 To whom correspondence should be addressed. Fax: (619)534-7042. E-mail: [email protected].

Journal of Structural Biology 125, 185–195 (1999)Article ID jsbi.1998.4080, available online at http://www.idealibrary.com on

185 1047-8477/99 $30.00Copyright 1999 by Academic Press

All rights of reproduction in any form reserved.

Page 3: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

compared indirectly by the corresponding vectors.This reduced representation is crucial to the perfor-mance of the method. Situs performs a full exhaus-tive search of possible docking conformations withinseconds, allowing the user to assess interactively thebiological relevance of a list of ranked docking confor-mations.

The described docking makes use of single-molecule density maps that can be obtained bysubtraction of maps from specimens of variablesubunit composition. The more complex problem oflocalizing subunits in a single low-resolution dataset of a biomolecular aggregate is described else-where (Wriggers et al., 1998). We demonstrate theapplication of the Situs package using microtubulesdecorated with the kinesin-related motor protein ncdas a model specimen. The results are compared topublished models of the assembly that were obtainedby visual docking (Sosa et al., 1997). Since theorientation of ncd relative to the microtubule is stillcontroversial (Kozielski et al., 1998), we discussthese preliminary results mainly in terms of thequality of the fit. We hope that the Situs package willbecome a standard tool that removes human biasfrom the docking of multiresolution data and therebycontributes to a consensus regarding the orientationof kinesin-related motors relative to the microtubule.We also discuss possible applications of the procedures

to other specimens and to other areas of microscopywhere 3D biological data sets are compared.

DESIGN OF THE SITUS PACKAGE

Overview. The software currently consists of indi-vidual, stand-alone programs for format conversion,analysis, visualization, manipulation, vector quanti-zation, and docking of 3D data sets. Each program isself-explanatory as the user is asked to enter allrelevant information at the shell prompt duringinteractive use on a UNIX workstation. The modulardesign of the package allows the user to maintainflexibility in its application. The documented sourcecode written in ANSI C can be ported to otherarchitectures. File names are passed as argumentsto the programs of the distribution: ‘‘program [inputfile] [output file].’’

The series of steps and the programs that arerequired to dock an atomic-resolution structure intolow-resolution EM data are shown schematically inFig. 1. Standard EM formats are supported and areconverted into Situs format. Subsequently, the dataare inspected and, if necessary, prepared for thevector quantization using a variety of visualizationand analysis tools. Atomic coordinates in PDB for-mat (Abola et al., 1997) are filtered through UNIXshell scripts (e.g., to purge a crystal structure of

FIG. 1. Schematic diagram of the Situs package (version 1.0). Individual shell scripts and C program components are classified by theirfunctionality. The data flow is indicated by the arrows. The procedures are discussed in the text. Additional documentation is available atURL http://chemcca10.ucsd.edu/�situs.

186 WRIGGERS, MILLIGAN, AND MCCAMMON

Page 4: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

associated water molecules). After the vector quanti-zation, the high-resolution structure is docked to thelow-resolution density by the corresponding code-book vectors. The resulting docked complex can beinspected using a standard molecular graphicsviewer. The user learns about the progress of eachstep from comments on results and parameters. Allinformation provided by the user at the prompt isvalidated; error messages are returned if undefinedprogram behavior would ensue.

The suite of programs is supported by a header file(situs.h) containing user-defined parameters (e.g.,the floating point type used, the allowed range of thenumber of codebook vectors, the number of iterativesteps of the TRN algorithm) and by auxiliary libraryprograms. The library programs handle input andoutput of atomic coordinates in PDB format (pd-bio.c), input and output of volumetric data (volio.c),input of data at the prompt (stdread.c), and Eigenvec-tor computation for real symmetric 3 � 3 matrices(jacobi3.c). The program pdbio.c was adapted fromthe program DOWSER (Zhang and Hermans, 1996).The main C programs (Fig. 1) supported by thelibrary programs will be described in some detail inthe following paragraphs. The ‘‘.c’’ suffix will bedropped to denote the executables.

Conversion of file formats. Volumetric 3D densitydata are represented in a Situs-specific format thatprovides data compatibility between the individualprograms to keep track of the relative positions ofcreated maps. The convert utility reads file formatsof standard EM application software. These includethe MRC (Henderson et al., 1990) and SPIDER(Frank et al., 1996) formats, as well as similar 4-bytefloating-point binary formats in which a header isfollowed by the sequence of density values. ASCIIfiles that contain a sequence of density values in freeformat are also recognized. In the current version, agrid of cubic voxels is required, and the voxel size, aswell as the number of x, y, and z increments (col-umns, rows, and sections), must be known in ad-vance. The order of values in the sequence of densi-ties is altered, if necessary, such that x incrementschange fastest and z increments change slowest. Ashort header holds the mentioned voxel size andnumbers of x, y, and z increments, as well as the gridorigin (i.e., the 3D coordinates of grid point x � 1,y � 1, z � 1). The header is followed by the sequenceof data values. The converted Situs files are in ASCIIformat, allowing the user to verify the successfulconversion of the data.

Preparation of volumetric data. The package con-tains a variety of routines for the manipulation ofvolumetric data in the Situs format. The subtractutility subtracts a set of density values from a secondset exhibiting identical grid parameters. If neces-

sary, the sign of the density values can be changedwith the invert utility so that the protein corre-sponds to the maximum values of the density distri-bution. A smaller grid defining a region of interestcan be extracted from a large data set with thepindown utility. The interpolate utility allows theuser to alter the voxel size; density values will beconverted to a new grid by linear interpolation. Thefloodfill utility finds a contiguous volume of densityvalues above a threshold density level in the vicinityof a start grid point. Note that a special utility for therescaling of density values is not necessary, becausethe data are automatically normalized by the vectorquantization routines described below.

Volumetric density sets can be inspected in theUNIX shell without resorting to external graphicspackages. Cross sections of the density data in the(x, y), (y, z), or (z, x) planes can be inspected with thevisualization program volslice (Fig. 2). Those voxelswhose corresponding density values exceed a thresholdlevel are represented by text characters. The back-ground is represented by a two-dimensional grid to helpidentify positions in the cross section. This simple render-ing method is sufficient to locate individual voxels in themap.

The histovox utility prints the voxel histogram(Frank et al., 1991; Frank, 1996) of the densityvalues (Fig. 3). The histogram illustrates two gen-eral properties of EM density distributions. First, apronounced peak at low densities (here: �12) is dueto background scattering. The protein density corre-sponds to a second, broader peak at higher densities(here: �20). When integrating the histogram ‘‘fromthe top down,’’ the known molecular volume of aprotein can be used to compute its boundary densityvalue (Frank, 1996). The histovox program alsoallows the user to add a constant value to thedensities to shift the background density peak to theorigin (Wriggers et al., 1998).

Vector quantization. The TRN algorithm (Mar-tinetz and Schulten, 1993; Martinetz and Schulten,1994) offers a flexible way to develop a discreterepresentation of biological data including neighbor-hood relationships (Wriggers et al., 1998). It can beshown that the algorithm minimizes the so-calleddistortion error (Makhoul et al., 1985; Martinetz andSchulten, 1994) of the discrete representation. Froma signal processing point of view this error measuresthe fidelity of the data encoding, i.e., the mean-squares deviation of the discrete set of vectors fromthe corresponding data (Makhoul et al., 1985). If theglobal minimum of the error is found, the vector distribu-tion is fully determined by the input density distribu-tion. In this situation the codebook vectors form a set ofcontrol points that provide information about the shapeand density distribution of a 3D biological object, similar

187THE SITUS DOCKING PACKAGE

Page 5: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

to the control points that determine the shape of Bezierspline curves in computer graphics (Foley et al., 1990).Therefore, the TRN algorithm presents a good choice forlearning the characteristics of 3D biological data sets atreduced complexity. During the iterative training, thedistribution of codebook vectors responds to randominput stimuli determined by the density distribution,lowering the distortion error.

Like other vector quantizers, the TRN algorithm isnot guaranteed to find the global minimum of thedistortion error. In the Situs implementation, asmall number of calculations (eight by default) arerepeated with different random number seeds. The

averaged codebook vectors are returned in PDB-formatted atom coordinate files, and the rms vari-abilities, representing the precision of the codebookvectors, are written to the occupancy fields of thePDB-style atom entries. Both the statistical uncer-tainty and the existence of multiple minima of thedistortion error contribute to the rms variabilities ofthe found codebook vectors. In general, the averagevariability depends on the shape of the 3D inputdensity distribution and on the number of vectors. Tooptimally represent the input density distribution,the user should select the number of codebook vec-tors that minimizes this average rms variability of

FIG. 2. Inspecting volumetric data with the volslice utility. Shown is the (x, y) cross section of a 534 Å � 534 Å � 366 Å data block of anncd dimer-decorated microtubule (Sosa et al., 1997). Note that pairs of voxels neighboring in the y-direction are represented by a singlecharacter: ‘‘ ˆ ’’ if the upper voxel density exceeds a threshold (cut-off ) level, ‘‘u,’’ if the lower voxel density exceeds the threshold level, ‘‘0’’ ifboth upper and lower voxel densities exceed the threshold level, and ‘‘�,’’ if the densities are below the threshold. This program requires theuse of a fixed-width font in the shell window. The recommended font on a Silicon Graphics workstation is ‘‘fixed(misc) mediumsemicondensed 13 pt.’’

188 WRIGGERS, MILLIGAN, AND MCCAMMON

Page 6: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

the vectors. Also, the convergence of the algorithm isimproved by a neighborhood ranking of the codebookvectors to avoid getting trapped in local minima(Martinetz et al., 1993). In combination with theaveraging of the vectors from statistically indepen-dent runs, one can achieve near-optimal results thatare sufficiently reliable for the docking problem.

Two vector quantization routines are provided bythe Situs package: qvol, for the quantization ofvolumetric data, and qpdb, for the quantization ofatomic resolution data. Only densities above a user-defined threshold value are considered by qvol toeliminate background noise in the EM data. Themass-weighted atomic coordinate vectors of the proteinsubunit form a discrete input vector distribution forqpdb. Each atom i of mass mi is represented by anumber ni � round(mi/14) of equally weighted inputvectors.As a result, chemical elements 1–3 (including H)are ignored (i.e., ni � 0), chemical elements 4–10 (includ-ing C, N, and O) are represented by one input vector,chemical elements 11–16 (including P and S) are repre-

sented by two input vectors, etc. The vector quantizationis not affected, within the precision of the codebookvectors, by this simple but efficient mass-weightingscheme (W.W., unpublished results).

Combining high- and low-resolution data. Let usassume that each of the data sets is represented by kcodebook vectors xi, corresponding to high-resolu-tion data, and by k codebook vectors yj, correspond-ing to low-resolution data (i, j � 1, . . . , k). If theindex map I : j= i, defining the k pairs of correspond-ing vectors, is known, it is straightforward to dockthe vectors xI( j) to the vectors yj by a least-squares fit(Kabsch, 1976, 1978). The resulting rigid-body trans-formation, applied to the original atomic structure,solves the docking problem. In practical situations,however, the index map I is not known a priori, andall k! � k · (k � 1) . . . 3 · 2 possible permutations(I(1), . . . , I(k)) must be explored. The program qdockcarries out an exhaustive search of the permutationsand returns a list of best least-squares fits, ranked

FIG. 3. Voxel histogram of volumetric data (Frank et al., 1991; Frank, 1996) echoed to the screen by histovox. The shown densitydistribution corresponds to a (150 Å)3 block of data from a microtubule surface (Sosa et al., 1997), as shown in Fig. 6.

189THE SITUS DOCKING PACKAGE

Page 7: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

by the remaining rms deviation after superpositionof the vectors xI( j) and yj (Fig. 4). The ranking bycodebook vector rms deviation typically produces aclear prediction of the optimum docking configura-tion. Figure 5 presents an example of optimallysuperimposed codebook vectors.

Volkmann and Hanein (1999) use a correlationcoefficient that measures the overlap of the high- andlow-resolution data as the criterion for the docking.The program qdock computes this coefficient for thebest least-squares fits of the codebook vectors, asshown in Fig. 4. A lower rms deviation of theleast-squares fit typically corresponds to a highercorrelation coefficient. However, the coefficients alllie within a very narrow numeric range. Therefore,as mentioned by Hanein et al. (1998), fits based onthe correlation coefficient alone are often ambiguous.

The program volcube in the Situs distribution

produces wireframe meshes or solid surfaces ofisocontours that can be displayed with atomic mod-els of proteins using the free molecular graphicspackage VMD (Humphrey et al., 1996), available atURL http://www.ks.uiuc.edu/Research/vmd. The vol-cube utility takes advantage of VMD’s Tcl scriptingcapabilities (Dalke and Schulten, 1997) that allowthe user to display geometric objects and surfacesconsisting of shaded triangles. In particular, volcubegenerates the vertices and vertex norm vectors oftriangles that can be used for solid rendering ofsurfaces. The VMD-specific part of the volcube codeis concentrated in a single subroutine and can easilybe modified to other standard graphics packageswith surface or mesh rendering capabilities. Theisofurfaces are generated with an improved version(Heiden et al., 1993) of the ‘‘marching cubes’’ algo-rithm (Lorensen and Cline, 1987), adapted from the

FIG. 4. List of best least-squares fits echoed to the screen by the qdock tool. Shown are the rms deviations of the codebook vectorscorresponding to high- and low-resolution data, the correlation coefficients, and the vector permutations. The correlation coefficients Cpqbetween the grid-based EM density px, y,z and the protein densities qx, y,z, calculated by linear interpolation from the x-ray crystal structurein the trial orientations, were determined using the expression

Cpq �

�x,y,z

px, y,z � qx, y,z

(�x,y,zpx, y,z

2 )1/2 (�x,y,zqx,y,z

2 )1/2.

The results shown correspond to the docking of the ncd structure to the density of the attached motor domain, as described underApplication of the Situs Package. The complex model resulting from the best fit is shown in Fig. 5.

190 WRIGGERS, MILLIGAN, AND MCCAMMON

Page 8: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

program VIEWMOL (Bleiber and Hill, 1993). In apractical application of volcube, the user first ex-tracts from the volumetric data a region of interest(e.g., with pindown or floodfill) and then generatesthe VMD script using volcube. Information re-quested by volcube include the rendering style (wire-frame or solid) and the new voxel size for therendering of the isocontours, allowing the user toadjust the mesh size (the input grid is automaticallyinterpolated).

APPLICATION OF THE SITUS PACKAGE

In practice a user would begin the docking with avector quantization of the high-resolution structurewith the program qpdb. The ncd monomer structure(residues 335–587 and 597–672) used in this workwas obtained from the crystal structure of dimericncd (Sablin et al., 1998). Nine residues of loop L11are not defined in the crystal due to their disorder.

We did not attempt to include these residues in thedocking as was done by Sosa et al. (1997), sincecrystal B-factors and differences in two ncd crystalstructures (Kull et al., 1996; Sablin et al., 1996,1998), as well as the variability of the loop inmolecular dynamics simulations of kinesin (Wrig-gers and Schulten, 1998), suggest that this is aflexible loop that changes conformation and positionwhen the motor binds to the track. We also note thatthe crystal structure used in this work includes partof ncd’s ‘‘neck’’ region not reported in the earlier crystalstructure of this protein (Sablin et al., 1996) and, there-fore, corresponds better to the construct used for EM.

The rms variability of the codebook vectors indi-cates whether a unique minimum of the distortionerror was found. Hence, to obtain more accurateresults, the vector quantization of the high-resolu-tion structure should be repeated for a range of thenumber of vectors. In the case of ncd, a minimum

FIG. 5. Docking of high- and low-resolution data with the qdock tool (stereo view). Shown are the crystal structure of the ncd motordomain (Sablin et al., 1998) and volumetric data (black wireframe) corresponding to the attached motor, as described in the text. A whitebackbone trace represents the protein, gray van-der-Waals spheres correspond to the ADP nucleotide and associated Mg2� ion, whitespheres indicate the position of the high-resolution codebook vectors, and black spheres identify the position of the low-resolution codebookvectors. The isocontour wireframe (isocontour level 20) was generated with volcube and the scene was rendered with the moleculargraphics program VMD (Humphrey et al., 1996) as viewed from the outside of the microtubule.

191THE SITUS DOCKING PACKAGE

Page 9: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

average rms variability of 1.4 Å was obtained using 7and 11 codebook vectors (Table 1).

The performance of the qpdb and qdock utilitiesdepends on the number k of codebook vectors chosen(Table 1). The running time of the vector quantiza-tion, roughly on the order of a minute on a standardUNIX workstation, increases only linearly with k.However, the explosive growth of the k! combina-tions of the codebook vectors renders an exhaustivesearch impractical for comparisons of more than 12vector pairs. It would be possible, in principle, toimplement more efficient, nonexhaustive searchmethods in the qdock program for a large number ofvectors. We note that the number of vectors neces-sary for the encoding of the underlying EM data islimited both by the experimental resolution and bythe number of recognizable features in the data.Hence, for single molecules and a resolution below10 Å, less than 10 vectors will usually be sufficient.In the present example, 7 vectors were chosen basedon the vector rms variability criterion (Table 1).

As a next step, the user prepares aligned volumet-ric data sets of protein aggregates such that differ-ence densities would correspond to single-moleculedensity distributions. The procedures presented herewere tested on data sets of (1) undecorated microtu-bules and microtubules decorated with (2) mono-meric ncd (residues 335–700) and (3) dimeric ncdproteins (residues 250–700) described by Sosa et al.(1997). The three density maps were converted to theSitus format using the convert utility, and differencemaps (2-1) and (3-2) were computed with the sub-tract utility. The difference density (2-1) can beattributed to the first motor domain (residues 335–

700) when attached to the microtubule. The differ-ence density (3-2) corresponds to the density of thesecond motor domain (res. 335–700) plus the densityof the coiled coil stalk (residues 250–334) of thedimer (Sablin et al., 1998). The position of individualproteins in the density maps can be identified withthe volslice program (cf. Fig. 2). As a final step in thepreparation of the data, individual, contiguous pro-tein densities are extracted with the floodfill utility.We note that for the ncd difference maps a (dimension-less) minimum threshold density of 12 was necessary toseparate the individual protein densities.

Subsequently, the user carries out the vector quan-tization of the low-resolution density maps with theprogram qvol. A threshold value must be chosen toseparate the protein density (above the threshold)from the background noise (below the threshold).This information can be obtained with the histovoxutility (see above) or by consideration of the knownmolecular volume. For the data sets from decoratedmicrotubules, an upper bound of 20 was estimated(cf. Fig. 3). A value of 12 was the lower bound basedon the distribution of individual protein densitiesafter application of floodfill. We chose a thresholdvalue of 20 for the density corresponding to the firstmotor domain and a value of 12 for the densitycorresponding to the second motor domain to adjustfor the lower densities of the latter data set (Sosa etal., 1997). In the quantization of the EM data k � 7codebook vectors were used, which were found torepresent the crystal structure optimally (see above).The codebook vectors corresponding to high- andlow-resolution data were then superimposed usingthe qdock utility (cf. Fig. 5). We note that the resultswere not sensitive to the choice of threshold value inthe quantization. Similar to the docking of thecrystal structure to the density of the attached motordomain (Fig. 4), the docking to the difference densitybetween the dimer- and the monomer-decoratedmicrotubule resulted in an unambiguous predictionusing the rms deviation of the codebook vectors as acriterion (best fit 4.8 Å; second best fit 7.1 Å). Thecorrelation coefficients (Eq. (1)) corresponding to the20 best fits fell into the range of 0.82–0.90. Thecorrelation coefficient criterion alone failed to iden-tify a unique orientation of ncd: Both the 1st and the11th fits (codebook vector rms deviation 9.4 Å)exhibited the maximum coefficient 0.90, but they areconsiderably different.

The resulting model of the two motor domainsdocked to the microtubule is shown in Fig. 6. Theorientation of the first motor domain is close to thatreported by Sosa et al. (1997). For the second,detached motor domain, the structural details in theEM map were insufficient to allow determination of aunique fit by eye (Sosa et al., 1997). It is thereforesurprising that the docking with Situs clearly pre-

TABLE IPerformance of the Programs qpdb and qdock as a

Function of the Number k of Codebook Vectors

k rmsf (Å) Time (qpdb) Time (qdock)

4 2.2 35 s 91 s5 4.0 42 s 91 s6 3.4 53 s �1 s7 1.4 63 s 1 s8 1.7 74 s 4 s9 2.5 85 s 46 s

10 2.7 95 s 8 min11 1.4 110 s 85 min12 1.8 122 s 17 h

Note. The rms fluctuations (rmsf) of the codebook vectors wereobtained with the program qpdb from eight statistically indepen-dent TRN calculations involving 100 000 iterations each (Wrig-gers et al., 1998). The variability values depend on the underlyingstructural data. Shown here are the values computed for thestructure of an ncd motor domain from the dimer crystal structure(Sablin et al., 1998) described in the text. Also shown are the timesrequired to complete the qpdb calculations, as well as times for anexhaustive search of the k! least-squares fits (Kabsch, 1976, 1978)with the program qdock, on a Silicon Graphics Indigo-2 R4400workstation.

192 WRIGGERS, MILLIGAN, AND MCCAMMON

Page 10: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

dicted a single conformation of the motor domainthat fits the EM data very well (Fig. 6). In this model,the two motor domains form a close-packed dimer:The heads are mutually attached by their neckregions, the nucleotide binding sites are orientedaway from the microtubule, and the putative micro-tubule binding sites (Sosa et al., 1997; Woehlke et al.,1997) face the microtubule. The positions of themotor domains represented by their centers of masssuggest that the two motor domains move about 10 Å

closer compared to the dimer crystal structure, wherethey are separated by the stalk (Sablin et al., 1998).Also, the relative orientation of the two heads issignificantly different from that in the dimer crystalstructure, where the nucleotide-binding sites pointin opposite directions (Sablin et al., 1998).

The familiar, motor domain-like shape of the differ-ence density distribution between the dimer- and themonomer-decorated microtubule motivated the dock-ing of the second, detached motor domain. However,

FIG. 6. Visualization of docked complexes with volcube. Shown is a solid surface representation of a (150 Å)2 patch of the microtubulesurface (Sosa et al., 1997) at isocontour level 20. Also shown are wireframe representations of the attached (black; isocontour level 20) anddetached (gray, isocontour level 12) motor-domain density maps and cartoon representations of the best fitting ncd structures (as describedin the text), with associated ADP nucleotides and Mg2� ions (red). The microtubule ‘‘�’’ end is oriented toward the lower left corner. Thescene was rendered with the molecular graphics program VMD (Humphrey et al., 1996).

193THE SITUS DOCKING PACKAGE

Page 11: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

the accuracy of the docking is limited by the reliabil-ity of the underlying structural data. Therefore, weenter two caveats against overly optimistic conclu-sions drawn form the unique fit of the second motordomain. First, the densities of the detached motordomain were significantly lower than those of thefirst motor domain (Sosa et al., 1997), suggesting ahigher degree of disorder of the detached head,which may affect the density distribution. Second,the model does not account for the density of thestalk of the ncd dimer (residues 250–334) and onemust assume that this part of the protein is delocal-ized. Further experimental work and a comparisonof data from different laboratories are necessary todetermine with more confidence the orientation ofthe second motor domain.

DISCUSSION AND FUTURE PROSPECTS

We have developed and disseminated a set ofprocedures for the quantitative and reproducibledocking of x-ray crystal structures to EM data. It ispossible for a single operator to complete the dockingwithin about an hour, a process that previouslywould have involved visual predictions by severalindividuals and a subsequent structural evaluationof the resulting fits (Sosa et al., 1997). The package isfreely available and designed to be user-friendly andportable. The modular design and the use of theANSI Cprogramming language should readily accommodateany modifications desired for applications other thanrigid-body docking of crystallographic and EM data.

The major conceptual difference between Situsand other methods of combining 3D biological data(Mendelson and Morris, 1997; Volkmann and Hanein,1999) is the discretization of the configurationalsearch space by vector quantization. This discretiza-tion yields a list of best-scoring superpositions of thecodebook vectors that represent the structural datasets. Our results suggest that this ranking consti-tutes a more stringent criterion for finding theoptimum docking configuration than the correlationcoefficient. The correlation coefficient, which mea-sures the position-dependent overlap of the datasets, depends less sensitively on the relative orienta-tion of the data sets and, therefore, may lead toambiguous results unless it is combined with addi-tional biochemical information (Hanein et al., 1998)to eliminate false positives.

The vector quantization of 3D biological data hasadditional benefits that can be exploited in variouspractical applications: First, the pairs of correspond-ing codebook vectors could be used as constraints forflexible docking of protein structures in steeredmolecular dynamics simulations (Izrailev et al., 1998).Second, the quantization paradigm can be employedin the construction of large-scale assemblages of

protein subunits. As described recently (Wriggers etal., 1998), atomic resolution models of actin fila-ments can be built from low-resolution EM datausing the TRN technique. Third, the quantizationsimplifies the computational task of the search inconfigurational space. Comparisons of less than 10pairs of codebook vectors can be carried out interac-tively within seconds. This speed would make itpossible, in principle, to develop a World Wide Web-based version of the Situs package.

The mutual fitting of low-resolution density mapsrepresents a problem often encountered in micros-copy. A recent study of the formation of the gapjunction intercellular channel indicated that twoapposing connexons, imaged at low resolution, re-quire a 30° rotation for the protrusions of oneconnexon to fit into the valleys of the other, whichwould make for an ionically tight interface necessaryfor a functional cell–cell channel (Perkins et al.,1998). This model was substantiated by a docking oftwo connexons into a reconstruction of the wholechannel. Future applications of the Situs packagemay be helpful to automate pure low-resolutionmodeling efforts such as described by Perkins et al.(1998). The described multiresolution method mayalso be useful in other microscopic applicationswhere 3D biological data sets are compared, such asin the combination of electron- with light-micro-scopic images of cellular organelles.

The current implementation of the Situs packageis available from URL http://chemcca10.ucsd.edu/�situs.

We thank Michael Zeller, Devin Lee Drew, and Gina Sosinskyfor discussions. We also thank Jorg-Rudiger Hill and Jan Her-mans for their permission to include source code of the VIEWMOLand DOWSER programs in the Situs distribution. This work wassupported in part by grants from NIH and the NRAC program ofthe NSF Supercomputer Centers. W.W. acknowledges the LJISInterdisciplinary Training Program and The Burroughs WellcomeFund for fellowship support.

REFERENCES

Abola, E., Sussman, J., Prilusky, J., and Manning, N. (1997)Protein data bank archives of three-dimensional macromolecu-lar structures. in Carter, C., Jr., and Sweet, R. (Eds.), Methodsin Enzymology Vol. 277, Academic Press, San Diego.

Bajaj, C. (1998) Data Visualization Techniques. Wiley, New York.Baker, T., and Johnson, J. (1996) Low resolution meets high:

Towards a resolution continuum from cells to atoms, Curr. Opin.Struct. Biol. 6,585–594.

Belnap, D. M., Kumar, A., Folk, J. T., Smith, T. J., and Baler. T. S.(1999) Low-resolution density maps from atomic models: [Howstepping ‘‘back’’ can be a step ‘‘forward’’], J. Struct. Biol. 125,166–175.

Bleiber, A., and Hill, J.-R. (1993) VIEWMOL, Version 1.4. Arbeits-gruppe Quantenchemie an der Humboldt-Universitat zu Ber-lin. Max-Planck-Gesellschaft.

Chiu, W., Burnett, R., and Garcea, R. (1997) Structural Biology ofViruses, Oxford Univ. Press, New York.

194 WRIGGERS, MILLIGAN, AND MCCAMMON

Page 12: Situs:APackage for Docking Crystal Structures into Low ...wriggers.biomachina.org/publications_web/WRIG99A.pdf · Situs:APackage for Docking Crystal Structures into Low-Resolution

Dalke, A., and Schulten, K. (1997) Using Tcl for molecularvisualization and analysis. In Proceedings of the Pacific Sympo-sium on Biocomputing 97 on Interactive Molecular Visualiza-tion. World Scientific, Singapore.

DeRosier, D., and Harrison, S. (1997) Macromolecular assem-blages: Sizing things up, Curr. Opin. Struct. Biol. 7, 237–238.

Foley, J., van Dam, A., Feiner, S., and Hughes, J. (1990) ComputerGraphics, Principles and Practices, Addison-Wesley, New York.

Frank, J. (1996) Three-Dimensional Electron Microscopy of Mac-romolecular Assemblies, Academic Press, San Diego.

Frank, J., Penczek, P., Grassucci, R., and Srivastava, S. (1991)Three-dimensional reconstruction of the 70S E. coli ribosome inice: The distribution of ribosomal RNA, J. Cell Biol. 115,597–605.

Frank, J., Radermacher, J., Penczek, P., Zhu, J., Li, Y., Ladjadj,M., and Leith, A. (1996) SPIDER and WEB: Processing andvisualization of images in 3D electron microscopy and relatedfields, J. Struct. Biol. 116, 190–199.

Gray, R. (1984) Vector quantization, IEEE ASSP Mag. 1, 4–29.Hanein, D., Volkmann, N., Goldsmith, S., Michon, A.-M., Lehman,

W., Craig, R., DeRosier, D., Almo, S., and Matsudaira, P. (1998)An atomic model of fimbrin binding to F-actin and its implica-tions for filament crosslinking and regulation, Nat. Struct. Biol.5, 787–792.

Heiden, W., Goetze, T., and Brickmann, J. (1993) Fast generationof molecular surfaces from 3D data fields with an enhanced‘‘marching cube’’ algorithm, J. Comput. Chem. 14, 246–250.

Henderson, R., Baldwin, J. M., Ceska, T. A., Zemlin, F., Beck-mann, E., and Downing, K. H. (1990) Model for the structure ofbacteriorhodopsin based on high-resolution electron cryo-microscopy, J. Mol. Biol. 213, 899–929.

Humphrey, W., Dalke, A., and Schulten, K. (1996) VMD—Visualmolecular dynamics, J. Mol. Graphics 14, 33–38.

Izrailev, S., Stepaniants, S., Isralewitz, B., Kosztin, D., Lu, H.,Molnar, F., Wriggers, W., and Schulten, K. (1998) Steeredmolecular dynamics. in Algorithms for Macromolecular Model-ling, Lecture Notes in Computational Science and Engineering,Springer Verlag, Berlin.

Kabsch, W. (1976) A solution for the best rotation to relate two setsof vectors, Acta Crystallogr. Sect. A 32, 922–923.

Kabsch, W. (1978) A discussion of the solution for the best rotationto relate two sets of vectors, Acta Crystallogr. Sect. A 34,827–828.

Kozielski, F., Arnal, I., and Wade, R. (1998) A model of themicrotubule–kinesin complex based on electron cryomicroscopyand X-ray crystallography, Curr. Biol. 8, 191–198.

Kull, F. J., Sablin, E. P., Lau, R., Fletterick, R. J., and Vale, R. D.(1996) Crystal structure of the kinesin motor domain reveals astructural similarity to myosin, Nature 380, 550–555.

Lorensen, W., and Cline, H. (1987) Marching cubes: A highresolution 3D surface construction algorithm, Comput. Graph.21, 163–169.

Makhoul, J., Roucos, S., and Gish, H. (1985) Vector quantizationin speech coding, Proc. IEEE 73, 1551–1588.

Martinetz, T., and Schulten, K. (1994) Topology representingnetworks, Neural Networks 7, 507–522.

Martinetz, T., and Schulten, K. (1993) A neural network for robotcontrol: Cooperation between neural units as a requirement forlearning, Comput. Elect. Eng. 19, 315–332.

Martinetz, T. M., Berkovich, S. G., and Schulten, K. (1993)‘‘Neural gas’’ for vector quantization and its application totime-series prediction, IEEE Trans. Neural Networks 4, 558–569.

Mendelson, R., and Morris, E. (1997) The structure of acto-myosinsubfragment 1 complex: Results of searches using data fromelectron microscopy and X-ray crystallography, Proc. Natl.Acad. Sci. USA 94, 8533–8538.

Milligan, R. (1996) Protein–protein interactions in the rigoractomyosin complex, Proc. Natl. Acad. Sci. USA 93, 21–26.

Perkins, G., Goodenough, D., and Sosinsky, G. (1998) Formation ofthe gap junction intercellular channel requires a 30° rotationfor interdigitating two apposing connexons, J. Mol. Biol. 277,171–177.

Rayment, I., Holden, H., Whittaker, M., Yohn, C., Lorenz, M.,Holmes, K., and Milligan, R. (1993) Structure of the actin–myosin complex and its implications for muscle contraction,Science 261, 58–65.

Sablin, E. P., Kull, F. J., Cooke, R., Vale, R. D., and Fletterick, R. J.(1996) Crystal structure of the motor domain of the kinesin-related motor ncd, Nature 380, 555–559.

Sablin, E., Case, R., Dai, S., Hart, C., Ruby, A., Vale, R., andFletterick, R. (1998) Direction determination in the minus-end-directed kinesin motor ncd, Nature 395, 813–816.

Schroder, R., Manstein, D., Jahn, W., Holden, H., Rayment, I.,Holmes, K., and Spudich, J. (1993) Three-dimensional atomicmodel of F-actin decorated with Dictyostelium myosin S1,Nature 364, 171–174.

Sosa, H., Dias, P., Hoenger, A., Whittaker, M., Wilson-Kubalek, E.,Sablin, E., Fletterick, R., Vale, R., and Milligan, R. (1997) Amodel for the microtubule–ncd motor protein complex obtainedby cryo-electron microscopy and image analysis, Cell 90, 217–224.

Volkmann, N., and Hanein, D. (1999) Quantitative fitting ofatomic models into observed densities derived by electronmicroscopy, J. Struct. Biol. 125, 176–184.

Woehlke, G., Ruby, A., Hart, C., Ly, B., Hom-Boohrer, N., and Vale,R. (1997) Microtubule interaction site of the kinesin motor, Cell90, 207–216.

Wriggers, W., and Schulten, K. (1998) Nucleotide-dependentmovements of the kinesin motor domain predicted by simulatedannealing, Biophys. J. 75,646–661.

Wriggers, W., Milligan, R., Schulten, K., and McCammon, J.(1998) Self-organizing neural networks bridge the biomolecularresolution gap, J. Mol. Biol. 284, 1247–1254.

Zhang, L., and Hermans, J. (1996) Hydrophilicity of cavities inproteins, Proteins: Struct. Function Genet. 24, 433–438.

195THE SITUS DOCKING PACKAGE