design of a novel globular protein fold with atomic-level accuracy · 2017. 3. 14. · design of a...

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Science (2003). 302: 1364-1368.

Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker

1

Overview Background Initialzation Iteration Results Conclusion

Goal:✦ Given a novel 3D topology, develop a corresponding protein sequence.

Choose a topology that does not exist in known protein structures.

Find the amino-acid sequence that leads to this topology.

Approach:

Question:✦ Can we design 3D protein structures that have never been observed?

✦ Are the 3D structures that have not been observed in nature unattainable, or just unsampled?

✦ Iterate between sequence optimization and structure prediction.

Devlop sequence-structure pairs with low free energy.

Low free energy corresponds to high structural stability.

2


Structure Redesign: Reina, et al. (2002).

✦ Redesigned the binding site of a known fold.

Two PDZ-ligand complexes, with the proposed mutations (green).

✦ Starting point: PDZ domain of PST-95 protein.

Binds to a specific C-terminal motif of target proteins.

✦ Computational process (Perla).

Manually identified residues that interact with ligand.

New amino acids chosen for each position based on:

• geometry

• conformation

Further restricted set by pairwise comparisons.

Ranked resulting structures based on free energy of complex.

Repeated process with resulting set of structures.

✦ Specific approach; difficult to generalize.

J. Reina, et al. Nature Struct. Biol., 9(8): 621-627, 2002.

3


Sequence prediction: Dahiyat & Mayo (1997).

✦ Computed the optimal sequence for a known fold.

Zif269 zinc-finger domain II (top) and FSD-1 NMR structure (bottom).

ther, the BLAST search found only lowidentity matches of weak statistical signifi-cance to fragments of various unrelated pro-teins. The highest identity matches were 10residues (36 percent) with P values rangingfrom 0.63 to 1.0, where P is the probability of

a match being a chance occurrence. Random28-residue sequences that consist of aminoacids allowed in the !!" position classifica-tion described above produced similarBLAST search results, with 10- or 11-residueidentities (36 to 39 percent) and P values

ranging from 0.35 to 1.0, further suggestingthat the matches for FSD-1 are statisticallyinsignificant. The low identity with anyknown protein sequence demonstrates thenovelty of the FSD-1 sequence and under-scores that no sequence information fromany protein motif was used in our sequencescoring function.

In order to examine the robustness of thecomputed sequence, we used the sequence ofFSD-1 as the starting point of a Monte Carlosimulated annealing run. The Monte Carlosearch revealed high scoring, suboptimal se-quences in the neighborhood of the optimalsolution (4). The energy spread from theground-state solution to the 1000th moststable sequence is about 5 kcal/mol, an indi-cation that the density of states is high. Theamino acids comprising the core of the mol-ecule, with the exception of position 7, areessentially invariant (Fig. 1). Almost all ofthe sequence variation occurs at surface po-sitions, and typically involves conservativechanges. Asn14, which is predicted to form astabilizing hydrogen bond to the helix back-

A

B

Fig. 2. Comparison of Zif268 (9) and computed FSD-1 structures. (A) Stereoview of the second zincfinger module of Zif268 showing its buried residues and zinc binding site. (B) Stereoview of thecomputed orientations of buried side chains in FSD-1. For clarity, only side chains from residues 3, 5, 8,12, 18, 21, 22, and 25 are shown. Color figures were created with MOLMOL (38).

Table 1. NMR structure determination: distance restraints, structural statistics, and atomic root-mean-square (rms) deviations. #SA$ are the 41 simulated annealing structures, SA is the average structurebefore energy minimization, (SA )r is the restrained energy minimized average structure, and SD is thestandard deviation.

Distance restraints

Intraresidue 97Sequential 83Short range (!i – j! % 2 to 5 residues) 59Long range (!i – j! & 5 residues) 35Hydrogen bond 10Total 284

Structural statisticsrms deviations #SA$ ' SD (SA)r

Distance restraints (Å) 0.043 ' 0.003 0.038Idealized geometry

Bonds (Å) 0.0041 ' 0.0002 0.0037Angles (degrees) 0.67 ' 0.02 0.65Impropers (degrees) 0.53 ' 0.05 0.51

Atomic rms deviations (Å)*#SA$ versus SA ' SD #SA$ versus (SA)r ' SD

Backbone 0.54 ' 0.15 0.69 ' 0.16Backbone ( nonpolar side chains† 0.99 ' 0.17 1.16 ' 0.18Heavy atoms 1.43 ' 0.20 1.90 ' 0.29

*Atomic rms deviations are for residues 3 to 26, inclusive. Residues 1, 2, 27, and 28 were disordered [), *, angularorder parameters (34) + 0.78] and had only sequential and !i – j! % 2 NOEs. †Nonpolar side chains are fromresidues Tyr3, Ala5, Ile7, Phe12, Leu18, Phe21, Ile22, and Phe25, which constitute the core of the protein.

Fig. 3. Circular dichroism (CD) measurements ofFSD-1. (A) Far-UV CD spectrum of FSD-1 at 1°C.The minima at 220 and 207 nm indicate a foldedstructure. (B) Thermal unfolding of FSD-1 moni-tored by CD. The melting curve has an inflectionpoint at 39°C. To illustrate the cooperativity of thethermal transition, the melting curve was fit to atwo-state model [(39) and the derivative of the fit isshown (inset)]. The melting temperature deter-mined from this fit is 42°C.

SCIENCE ! VOL. 278 ! 3 OCTOBER 1997 ! www.sciencemag.org84

✦ Starting point: Zinc-finger domain of Zif268.

Used a small domain from the PDB as a template.

✦ Computational process.

Started with all possible combinations at all positions.

Restricted to allowed rotamers.

Limited sequence space by dead-end elimination.

Optimized sequence using various interactions:

• backbone-backbone

• backbone-side chain

• side chain-side chain

✦ Structure does not change during computation.B. I. Dahiyat and S. L. Mayo.

Science, 278(5335): 82–87, 1997.

4


Designing a protein from scratch: this study.

✦ Why might a target structure (or fold) be unknown?

Maybe it just hasn't been sampled by scientists yet.

Maybe it hasn't been tried during protein evolution.

Maybe it is not designable at all.

✦ How do we design with the fewest number of initial constraints?

Previous studies have started from a small set of sequences or structures.

Not necessarily a globally optimal fold or sequence.

Not widely applicable, but specific (e.g., for a certain function).

✦ Obstacles to pure de novo design:

Must vary both sequence and structure space during design process.

Potentially large computational cost.

5


General algorithim design.

Choose topology

Design sequences

Optimize model

Generate starting models

Rank

Select lowest energy sequence

Design sequence

Optimize model

Choose topology Generate starting models Rank

Select

6

Overview Background Initialization Iteration Results Conclusion

Choose a target topology.

Novel α/β topology chosen as the design template. Initial constrains on the structure were defined by hydrogen bond interactions (purple arrows).

Design sequence

Optimize model


Select

7

Overview Background Initialization Iteration Results Conclusion

Generate starting models from topology.

✦ Chose fragments of existing proteins that fit initial constraints.

Used fragments of 3-9 amino acids in length.

Taken from structures in PDB.

Fit secondary structures in topology diagram.

✦ Assembled fragments into set of backbone models.

Used Rosetta software to build 172 starting models.

Models are backbone only; side chain packing is ignored at this step.

Models are fairly close structurally (RMSDs of 2-3 Å).

Design sequence

Optimize model


Select

8


Step 3: Generate a sequence for each model.

✦ Used RosettaDesign to generate sequence.

RosettaDesign uses a Monte Carlo search method and energy function.

Monte Carlo method is a stochastic method for solving computational problem with many variables.

Generally solved by taking lots of random samples and analyzing patterns in the result.

✦ RosettaDesign energy function based on:

Lennard-Jones potential.

Hydrogen bonding.

Implicit solvation.

✦ Further restricted certain positions.

Only polar residues allowed in surface β sheets.

Cysteines disallowed.Lennard-Jones potential for Ar.

http://en.wikipedia.org/wiki/Lennard_Jones_potential

Design sequence

Optimize model


Select

9




Rank and select.

✦ Sequences ranked according to free energy.

Initial (starting) sequences had higher free energies than natural proteins.

Explained by lack of side chain packing constraints in design.

Reversed course during backbone optimization.

✦ Sequence with the lowest free energy used in optimization.

First round uses the starting model that generated this sequence.

Subsequent rounds use the optimized model (input structure).

Design sequence

Optimize model


Select

10


Optimize backbone model.

✦ Goal: Identify lowest free energy model for a fixed sequence.

✦ Process: Perturb, relax, repeat.

1. Perturb the structure.

• random change to 1-5 torsion angles; or

• replace 1-3 random torsion angles with selection from the PDB.

2. Optimize any high-energy side chains.

• cycle through each position in the model that has a higher energy after perturbation.

• replace side chain with lowest energy rotamer.

3. Optimize region around the site of perturbation.

• minimize energy in 10 residue window around insertion.

Design sequence

Optimize model


Select

11


Crunching the numbers.

✦ 172 starting models.

✦ 5 simulations for each model: 860 simulations.

Design sequence

Optimize model


Select

✦ 15 rounds of sequence-design/model-optimization for each simulation: 12,900 rounds.

✦ "Several thousand" minimizations per optimiztion: 1 x 107 minimizations to produce a final sequence-structure pair.

12


Protein Top7.

✦ 93 residue protein with no significant BLAST matches.

✦ Modest changes in structure during optimization (RMSD 1.1Å).

✦ Substantial changes in sequence:

Design of a Novel… Kuhlman et al.

- S1 -

Supplementary Online Materials for:

Design of a Novel Globular Protein Fold with Atomic Level Accuracy

Brian Kuhlman*, Gautam Dantas*, Gregory C. Ireton, Gabriele Varani, Barry L.

Stoddard, and David Baker

* These authors contributed equally to this work

Energies and sequence for Top7 before and after alternating cycles of backbone and

sequence optimization.

before DIEITVRINNNGEDYDYKKTATTLSEINAHFEELEKHLKEENGEKITISVKLRNEKEAYW

after DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELKDYIKKQGAKRVRISITARTKKEAEK

before VAAKIKEQALRAGVETIQIDKQSDTMTATLGKQ

after FAAILIKVFAELGYNDINVTFDGDTVTVEGQLE

Table S1. Energies for Top7 before and after iterative cycles of backbone and sequence

optimization (kcal / mole). Expected Lennard-Jones energies are derived from the

average Lennard-Jones energy for each of the twenty amino acids for different degrees of

burial.

Top7 before

relaxation

Final Top7

model

Lennard-Jones (LJ) attractive -370 -385

13


Top7 has favorable energies.

✦ Dramatic decrease in L-J repulsive force during optimization.

✦ Modest decreases in other energy measures.

Energy Starting Model Final Model

Lennard-Jones attractive -370 -385

Lennard-Jones repulsive 28 8.6

Hydrogen bonding -89 -80

Solvation energy 188 175

Total energy -324 -386

14


α/β topology.

Very thermostable.

Unfolds cooperatively.

Denatures in cold.

Typical heat capacity of unfolding.

Typical chemical denaturation.

More stable than most proteins of similar size.

Suggests mixed α helix and β sheet character.

Highly soluble.

Monomeric.

Crystals diffract to 2.5Å.

15


High backbone similarity overall.

Comparing Top7 model backbone with crystal structure. Model is in blue, actual structure is in red.

Very high similarity at the C-terminus.

View Top7 in 3D.

16

http://users.molvisions.loc/projects/seq123/seq123.php?id=1qys

http://users.molvisions.loc/projects/seq123/seq123.php?id=1qys


Conclusion.

✦ Design of heretofor unknown protein folds is possible.

These may yet exist in nature, or they may not.

Next step: larger proteins with multiple folds?

✦ Alternation of sequence design and structure optimization is powerful method.

Allows more flexibility in design of each.

May be applicable to purposeful design (i.e., design for specific function).

May be applicable to ab initio structure prediction, but high sequence variability would be a problem.

✦ New direction for designer proteins.

No longer limited to known structures.

Appears to be fairly efficient design and optimization process.

17

design of a novel globular protein fold with atomic-level accuracy · 2017. 3. 14. · design of a...

Documents