structure meeting review

6
Structure Meeting Review The Protein Data Bank at 40: Reflecting on the Past to Prepare for the Future Helen M. Berman, 1, * Gerard J. Kleywegt, 2 Haruki Nakamura, 3 and John L. Markley 4 1 RCSB PDB, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA 2 Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom 3 Protein Data Bank Japan, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, 565-0871, Japan 4 BioMagResBank, University of Wisconsin-Madison, Madison, WI 53706, USA *Correspondence: [email protected] DOI 10.1016/j.str.2012.01.010 A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by the Worldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28–30, 2011. PDB40’s distinguished speakers highlighted four decades of innovation in structural biology, from the early era of structural determination to future directions for the field. Structural biology was born in Cambridge, England in the 1950s, when the race to the DNA double helix reached the finish line (Franklin and Gosling, 1953; Watson and Crick, 1953; Wilkins et al., 1953) and the first three-dimensional (3D) crystal struc- tures of hemoglobin and myoglobin (Kendrew et al., 1958; Perutz et al., 1960) were determined. In the years that followed, a slow but steady trickle of new protein structures brought unexpected insights into the principles and consequences of protein struc- ture and evolution, and contributed to our growing under- standing of the intricate relationships between protein sequence, structure, and function. In 1971, a landmark meeting was held at Cold Spring Harbor Laboratory (CSHL) entitled ‘‘Structure and Function of Proteins at the Three Dimensional Level.’’ At that symposium, the earliest 3D structures were described by the pioneers of structural biology in a way that led David Phillips to announce structural biology’s ‘‘coming of age’’ (Cold Spring Laboratory Press, 1972). The meeting also provided a venue for ongoing conversations about what it would mean for all scientists to have access to the structural data (Berman, 2008). These discussions culminated in the offer by Walter Hamilton to host the Protein Data Bank (PDB) at Brookhaven National Laboratory (BNL) (Protein Data Bank, 1971). The rest, as the saying goes, is history (Figure 1). A symposium celebrating the 40th anniversary of the PDB, organized by the leadership of the current guardians of the PDB archive, the Worldwide Protein Data Bank (wwPDB; http://wwpdb.org)(Berman et al., 2003), was held at CSHL October 28–30, 2011. Many people involved in the PDB’s past and present were in attendance––staff members of the current wwPDB partners, past PDB BNL heads Tom Koetzle and Joel Sussman, and former staff, including the PDB’s longest-serving data processor, Frances Bernstein, who annotated entries from 1974 until 1998 (Figure 2). ‘‘PDB40’’ was attended by almost 300 scientists from all over the world, several of whom had been at the seminal 1971 meeting. Thanks to generous funding from the National Science Foundation, National Institutes of Health, Wellcome Trust, Japan Society for the Promotion of Science, as well as more than 20 industrial sponsors, 34 students from as far away as India and South Africa were able to participate in the meeting. Almost 100 posters were presented, and in spite of rain, snow (yes, snow in New York in October), and wind pelting on the tent in which they were displayed, they engen- dered lively discussions until the very last minute of the meeting. The program boasted 19 distinguished speakers. Several speakers who had been a part of that early era of structural biology described their experiences determining structures before the advent of cryocooling, high-speed computers, in- tense X-rays, powerful detectors and automated phasing and model-building software. Michael Rossmann, who had worked in Max Perutz’s laboratory, described how he built brass models and measured coordinates with a lead plumb. Richard Henderson, who determined one of the first structures using electron microscopy (EM), recounted how when he was part of the chymotrypsin group, it took so long to measure the coordi- nates with the plumb (a cord with a lead bob) that the values changed because the string stretched! A research assistant at the time of the 1971 meeting, Jane Richardson showed how colored models were built using Tygon tubing filled with fluo- rescein. Kurt Wu ¨ thrich, who was awarded the Nobel Prize for developing the methods used to determine biomacromolecular structures by NMR spectroscopy, and others paid homage to Dick Dickerson and Irving Geis for their marvelous hand-drawn depictions of DNA and protein structures (Figure 3)(Dickerson, 1997). Many speakers commented on the importance of data sharing and the pioneering and exemplary role that the PDB has played in ensuring that biological data are kept in the public domain. Tribute was given to structural biologists such as Fred Richards, Dick Dickerson, and Max Perutz, who rallied their colleagues to deposit their data in spite of the considerable initial resistance of some, as pointed out by Hans Deisenhofer. Following the guide- lines published in 1989 (International Union of Crystallography, 1989), virtually every journal requires deposition of coordinates and experimental data as a prerequisite to publication. It was pointed out that the PDB was conceived as a global resource from the outset. In 1971, it was jointly operated at Broo- khaven and the Cambridge Crystallographic Data Centre (CCDC) (Protein Data Bank, 1971). Nowadays, structures and experimental data are deposited at and processed by the wwPDB partner sites in America (RCSB PDB; http://rcsb.org), Structure 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserved 391

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure Meeting Review

Structure

Meeting Review

The Protein Data Bank at 40: Reflectingon the Past to Prepare for the Future

Helen M. Berman,1,* Gerard J. Kleywegt,2 Haruki Nakamura,3 and John L. Markley41RCSB PDB, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA2Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom3Protein Data Bank Japan, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, 565-0871, Japan4BioMagResBank, University of Wisconsin-Madison, Madison, WI 53706, USA*Correspondence: [email protected] 10.1016/j.str.2012.01.010

A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by theWorldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28–30, 2011.PDB40’s distinguished speakers highlighted four decades of innovation in structural biology, from theearly era of structural determination to future directions for the field.

Structural biology was born in Cambridge, England in the 1950s,

when the race to the DNA double helix reached the finish line

(Franklin and Gosling, 1953; Watson and Crick, 1953; Wilkins

et al., 1953) and the first three-dimensional (3D) crystal struc-

tures of hemoglobin andmyoglobin (Kendrew et al., 1958; Perutz

et al., 1960) were determined. In the years that followed, a slow

but steady trickle of new protein structures brought unexpected

insights into the principles and consequences of protein struc-

ture and evolution, and contributed to our growing under-

standing of the intricate relationships between protein sequence,

structure, and function.

In 1971, a landmark meeting was held at Cold Spring Harbor

Laboratory (CSHL) entitled ‘‘Structure and Function of Proteins

at the Three Dimensional Level.’’ At that symposium, the earliest

3D structures were described by the pioneers of structural

biology in a way that led David Phillips to announce structural

biology’s ‘‘coming of age’’ (Cold Spring Laboratory Press, 1972).

The meeting also provided a venue for ongoing conversations

about what it would mean for all scientists to have access to the

structural data (Berman, 2008). These discussions culminated

in the offer by Walter Hamilton to host the Protein Data Bank

(PDB) at Brookhaven National Laboratory (BNL) (Protein Data

Bank, 1971). The rest, as the saying goes, is history (Figure 1).

A symposium celebrating the 40th anniversary of the PDB,

organized by the leadership of the current guardians of the

PDB archive, the Worldwide Protein Data Bank (wwPDB;

http://wwpdb.org) (Berman et al., 2003), was held at CSHL

October 28–30, 2011. Many people involved in the PDB’s past

and present were in attendance––staff members of the current

wwPDB partners, past PDB BNL heads Tom Koetzle and Joel

Sussman, and former staff, including the PDB’s longest-serving

data processor, Frances Bernstein, who annotated entries from

1974 until 1998 (Figure 2). ‘‘PDB40’’ was attended by almost 300

scientists from all over the world, several of whom had been at

the seminal 1971 meeting. Thanks to generous funding from

the National Science Foundation, National Institutes of Health,

Wellcome Trust, Japan Society for the Promotion of Science,

as well as more than 20 industrial sponsors, 34 students from

as far away as India and South Africa were able to participate

in the meeting. Almost 100 posters were presented, and in spite

Stru

of rain, snow (yes, snow in New York in October), and wind

pelting on the tent in which they were displayed, they engen-

dered lively discussions until the very last minute of the meeting.

The program boasted 19 distinguished speakers. Several

speakers who had been a part of that early era of structural

biology described their experiences determining structures

before the advent of cryocooling, high-speed computers, in-

tense X-rays, powerful detectors and automated phasing and

model-building software. Michael Rossmann, who had worked

in Max Perutz’s laboratory, described how he built brass

models and measured coordinates with a lead plumb. Richard

Henderson, who determined one of the first structures using

electron microscopy (EM), recounted how when he was part of

the chymotrypsin group, it took so long to measure the coordi-

nates with the plumb (a cord with a lead bob) that the values

changed because the string stretched! A research assistant at

the time of the 1971 meeting, Jane Richardson showed how

colored models were built using Tygon tubing filled with fluo-

rescein. Kurt Wuthrich, who was awarded the Nobel Prize for

developing the methods used to determine biomacromolecular

structures by NMR spectroscopy, and others paid homage to

Dick Dickerson and Irving Geis for their marvelous hand-drawn

depictions of DNA and protein structures (Figure 3) (Dickerson,

1997).

Many speakers commented on the importance of data sharing

and the pioneering and exemplary role that the PDB has played

in ensuring that biological data are kept in the public domain.

Tribute was given to structural biologists such as Fred Richards,

Dick Dickerson, and Max Perutz, who rallied their colleagues to

deposit their data in spite of the considerable initial resistance of

some, as pointed out by Hans Deisenhofer. Following the guide-

lines published in 1989 (International Union of Crystallography,

1989), virtually every journal requires deposition of coordinates

and experimental data as a prerequisite to publication.

It was pointed out that the PDB was conceived as a global

resource from the outset. In 1971, it was jointly operated at Broo-

khaven and the Cambridge Crystallographic Data Centre

(CCDC) (Protein Data Bank, 1971). Nowadays, structures and

experimental data are deposited at and processed by the

wwPDB partner sites in America (RCSB PDB; http://rcsb.org),

cture 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserved 391

Page 2: Structure Meeting Review

Figure 1. Timeline of Key PDB Events and Structural Biology Highlights, 1971–2011(Left) Key events in the evolution of the PDB.(Right) selected key structures in the field of structural biology (Ban et al., 2000; Carter et al., 2000; Schluenzen et al., 2000; Henderson et al., 1990; Driscoll et al.,1989; Drew et al., 1981; Wang et al., 1979; Kim et al., 1973; Robertus et al., 1974).

392 Structure 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserved

Structure

Meeting Review

Page 3: Structure Meeting Review

Figure 2. PDB Staff Members, Past and Present, at PDB40Members of the PDB team who attended the conference, including members from RCSB PDB, PDBe, PDBj, BMRB, and BNL. Photo by Constance Brukin usedwith permission. Additional information about the symposium is available from the wwPDB site (http://wwpdb.org).

Figure 3. Myoglobin Fold, 1987Illustration, Irving Geis. The protein chain is shown in blue, and the heme isshown as a gold disk. Illustration, Irving Geis. Image from the Irving GeisCollection, Howard Hughes Medical Institute. Rights owned by HHMI. Not tobe reproduced without permission. Reprinted here with permission.

Structure

Meeting Review

Europe (PDBe; http://pdbe.org), and Japan (PDBj; http://pdbj.

org). A formal Memorandum of Understanding among the three

partner institutes was signed in 2003 (Berman et al., 2003). In

2006, the BioMagResBank (BMRB) joined the wwPDB partner-

ship to collect and curate experimental NMR data belonging to

PDB entries. The wwPDB now operates as a cohesive unit of

equal partners to steward the world’s structural data. wwPDB

members collaborate on all issues related to the archive, in-

cluding deposition and annotation policies, requirements and

procedures, formats, validation standards, description of chem-

ical components, interactions with journals, distribution, re-

mediation, and weekly updates to the archive. All four partners

maintain independent websites and services through which the

archived data is presented in a variety of ways to the many

user communities.

The impact that technological advances have had on the

field of structural biology was emphasized in many ways.

Stephen Burley pointed out that these advances have made it

possible to determine a typical protein structure in 20 days

rather than the 20 man-years it took 40 years ago. Nowadays,

synchrotron radiation is used for 75% of the X-ray crystallo-

graphic structures deposited in the PDB. In the early 1980s,

synchrotrons made it possible for Wayne Hendrickson to

develop new phasing methods based on anomalous disper-

sion, thereby facilitating the direct determination of structures

with far fewer crystals than had been possible previously.

Soichi Wakatsuki showed dramatic video footage of the earth-

quake damage at the Photon Factory. Fortunately, the damage

has now been repaired and data collection once more goes on.

For the future, the community eagerly anticipates the next leap

into nano-crystallography using resources such as X-ray free-

electron lasers.

Stru

In the early days of protein crystallography, protein structures

were built as physical models and the atomic coordinates

measured by hand. Such models have long since been replaced

by well-refined structures thanks to major developments in inter-

active graphics model building and reciprocal-space refinement

methods. Such methods work very well at relatively high

cture 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserved 393

Page 4: Structure Meeting Review

Structure

Meeting Review

resolution, but below 3–4 A, crystallographers are often still

struggling. New methods such as Deformable Elastic Networks,

described by Axel Brunger, are designed to get the most out of

very low-resolution data. In the field of EM, Richard Henderson

demonstrated the problems that microscopists face because

the samples are moving. He pointed out that these problems

must be solved in order for the field of cryo-EM to make the

leap to higher resolution structure determination.

Jane Richardson showed how she developed the now famous

ribbon depictions of protein structures and described the evolu-

tion of the validation methods developed in her laboratory. Her

interest in de novo design led to the recognition that hydrogen

atoms are key to the success of these designs. The realization

that many structures have serious steric clashes, which can be

identified and resolved if explicit hydrogen atoms are taken

into account, inspired the use of ‘‘hydrogenated’’ crystal struc-

tures in model validation, and was implemented in MolProbity

(Davis et al., 2004).

Common themes in many of the presentations on state-of-

the-art experimental structural biology included the use of

groups of structures to understand complex biological systems,

the use of more than one experimental method to study such

394 Structure 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserve

systems, and an increasing number of successes in tackling

membrane-protein structures. Cheryl Arrowsmith showed how

she uses high-throughput X-ray and NMR methods to under-

stand epigenetics and brute-force screening to find inhibitors

against histone-modifying proteins. Using both X-ray and NMR

methods, Susan Taylor’s group has made major inroads into

understanding protein kinases and the human kinome. Angela

Gronenborn, well known for her work in NMR spectroscopy,

described how she has added cryo-EM to her arsenal of

methods to study and understand HIV-capsid assembly. Wah

Chiu gave an impressive account of the use of cryo-EM to inves-

tigate complex molecular machines in isolation and as part of

cells and organelles. Wayne Hendrickson focused on X-ray

studies to understand plant stomata (guard cells) and presented

his work on membrane structures with novel pore channels.

Using solution NMR methods, Ad Bax studies influenza fusion

protein and its interactions when inserted into membranes. Mei

Hong presented the use of sophisticated solid-state NMR tech-

niques to study influenza membrane proteins.

The talks on bioinformatics and computational biology high-

lighted how the availability of tens of thousands of structures

has enabled entirely new ways of doing science. For example,

Figure 4. Number of Structures Released Per Year,Organized by Experimental MethodX-ray structure counts are shown in red, EM structurecounts in purple, and NMR structure counts in orange.

d

Page 5: Structure Meeting Review

Structure

Meeting Review

Janet Thornton uses bioinformatics approaches to try to under-

stand enzyme families and function. In her presentation, she

emphasized the importance of integrating structural data with

those from other sources. Andrej Sali described an integrated

approach to structure determination. Using data with relatively

limited information content from a variety of biophysical and

other methods, he demonstrated how it was possible to arrive

at a very plausible model for a nuclear pore complex. David

Baker explained how he uses experimental and modeling

methods to design and predict structures. In a unique demon-

stration of community involvement and outreach, computer-

game players were able to successfully fold proteins using

software developed in Baker’s lab. David Searls, who is well

known for his work on describing DNA as a language, is now

using protein structure to create a new grammar.

The combined presentations about the history, state-of-the-

art trends, and future directions of structural biology demon-

strated that although the field has grown and blossomed almost

beyond recognition, the characteristics that were apparent in the

1971meeting (andwhichmay explainmuch of the success of the

field) are still the same. First and foremost, there is the curiosity

and determination to understand biology inmolecular terms and,

increasingly, on larger scales (either physically, e.g., at the level

of organelles and cells, or conceptually, at the level of proteomes

and diseases). Whereas in 1971 the only available structure-

determination method was X-ray crystallography, nowadays

many other biophysical methods, such as NMR spectroscopy,

cryo-EM, and small-angle scattering are contributing structural

data and insights. Even more exciting is the combined use of

multiple methods to study large and complex systems and the

fact that molecular modeling approaches are becoming sophis-

ticated enough to integrate the data from the many methods. As

Soichi Wakatsuki and others stated, we have entered the era of

an ‘‘integrated structural biology.’’ Another theme that has char-

acterized the field since 1971 is the never-ending push to

develop and apply new technologies andmethods to solve prob-

lems and remove bottlenecks. Early on, the focus was on new

instruments and computer technologies to determine structures

from samples sometimes provided by others. Structural biolo-

gists have now developed new ways to crystallize proteins

and to produce large amounts of pure samples. Today, high-

throughput structure determination is a reality, although struc-

tural biologists will always continue to push the boundaries

and tackle problems that are ambitious and difficult, but whose

solution will bring important rewards and insights (and, occa-

sionally, a Nobel Prize). The field continues to be populated

with scientists of uncommon passion and persistence and with

a strong sense of collegiality, community, and collaboration.

This 40th anniversary symposium highlighted some of the

challenges to the wwPDB as it manages the PDB archive of

the future. Although the growth is no longer exponential, it is still

formidable especially in an era of flat or possibly diminished

funding. Moreover, deposited structures are increasing in both

size and complexity. More and more structures are determined

in complex with small-molecule ligands, such as peptide inhibi-

tors and antibiotics. Whereas the majority of structures in the

PDB are still determined by X-ray crystallographic methods,

the largest relative growth is seen for structures determined

using EMmethods (Figure 4). Furthermore, as was noted several

Stru

times at the symposium, hybrid methods in which multiple

experimental approaches are combined with modeling are being

used increasingly to unravel the structures of large systems not

amenable to any single method.

The wwPDB partners have anticipated many of these chal-

lenges and trends and have mechanisms in place to meet

them. Validation task forces (VTFs) made up of community

experts have been set up for each of the major structure-deter-

mination methods represented in the PDB. These VTFs advise

the wwPDB as to what data must be archived to ensure that

the depositions can be validated and what type of validation

procedures should be used. Starting in 2012, EM volume maps

will be collected, archived and distributed by wwPDB. The

internal working format for the PDB archive, called PDBx, can

accommodate any structure-determination method and struc-

tures of any size with none of the restrictions imposed by the

legacy 80 column punched-card format (Bernstein et al., 1977).

Following a workshop at the EBI in September 2011, a wwPDB

Format Working Group is now implementing and refining this

format for most of the commonly used structure-determination

packages. The providers of these software packages are collab-

orating on its implementation with a self-imposed deadline of

January 1, 2013. The planned launch of a new common wwPDB

Deposition and Annotation System in 2012 will allow users to

deposit models and data derived by single or multiple methods.

Annotators at all wwPDB data centers will use these powerful

new tools to process, annotate and validate the depositions

much more efficiently.

ACKNOWLEDGMENTS

PDB40 was organized by the wwPDB directors and Stephen K. Burley (Eli Lilly& Company). We gratefully acknowledge the support of our PDB40 host, ColdSpring Harbor Laboratory. Thanks to Christine Zardecki for her help in coordi-nating the meeting. The wwPDB is supported by: RCSB PDB (NSF DBI0829586, NIGMS, DOE, NLM, NCI, NINDS, and NIDDK), PDBe (EMBL-EBI, Wellcome Trust, BBSRC, NIGMS, and EU), PDBj (NBDC-JST) andBMRB (NLM).

REFERENCES

Ban, N., Nissen, P., Hansen, J., Moore, P.B., and Steitz, T.A. (2000). Science289, 905–920.

Berman, H.M. (2008). Acta Crystallogr. A 64, 88–95.

Berman, H.M., Henrick, K., and Nakamura, H. (2003). Nat. Struct. Biol. 10, 980.

Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Jr., Brice, M.D.,Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977). J. Mol.Biol. 112, 535–542.

Carter, A.P., Clemons, W.M., Brodersen, D.E., Morgan-Warren, R.J.,Wimberly, B.T., and Ramakrishnan, V. (2000). Nature 407, 340–348.

Cold Spring Laboratory Press. (1972). Cold Spring Harbor Symposia on Quan-titative Biology, Vol 36.

Davis, I.W., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2004).Nucleic Acids Res. 32 (Web Server issue), W615–W619.

Dickerson, R.E. (1997). Protein Sci. 6, 2483–2484.

Drew, H.R., Wing, R.M., Takano, T., Broka, C., Tanaka, S., Itakura, K., andDickerson, R.E. (1981). Proc. Natl. Acad. Sci. USA 78, 2179–2183.

Driscoll, P.C., Gronenborn, A.M., Beress, L., and Clore, G.M. (1989). Biochem-istry 28, 2188–2198.

cture 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserved 395

Page 6: Structure Meeting Review

Structure

Meeting Review

Franklin, R.E., and Gosling, R.G. (1953). Nature 171, 740–741.

Henderson, R., Baldwin, J.M., Ceska, T.A., Zemlin, F., Beckmann, E., andDowning, K.H. (1990). J. Mol. Biol. 213, 899–929.

International Union of Crystallography. (1989). Acta Crystallogr. A 45, 658.

Kendrew, J.C., Bodo, G., Dintzis, H.M., Parrish, R.G., Wyckoff, H., and Phillips,D.C. (1958). Nature 181, 662–666.

Kim, S.H., Quigley, G.J., Suddath, F.L., McPherson, A., Sneden, D., Kim, J.J.,Weinzierl, J., and Rich, A. (1973). Science 179, 285–288.

Perutz, M.F., Rossmann, M.G., Cullis, A.F., Muirhead, H., Will, G., and North,A.C.T. (1960). Nature 185, 416–422.

396 Structure 20, March 7, 2012 ª2012 Elsevier Ltd All rights reserve

Protein Data Bank. (1971). Nat. New Biol. 233, 223.

Robertus, J.D., Ladner, J.E., Finch, J.T., Rhodes, D., Brown, R.S., Clark,B.F.C., and Klug, A. (1974). Nature 250, 546–551.

Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D.,Bashan, A., Bartels, H., Agmon, I., Franceschi, F., and Yonath, A. (2000).Cell 102, 615–623.

Wang, A.H.-J., Quigley, G.J., Kolpak, F.J., Crawford, J.L., van Boom, J.H., vander Marel, G.A., and Rich, A. (1979). Nature 282, 680–686.

Watson, J.D., and Crick, F.H.C. (1953). Nature 171, 737–738.

Wilkins, M.H.F., Stokes, A.R., and Wilson, H.R. (1953). Nature 171, 738–740.

d