university of groningen engineering specificity and ... · other bonds, including phosphoric...

32
University of Groningen Engineering specificity and activity of thermolysin-like proteases from Bacillus de Kreij, Arno IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2001 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): de Kreij, A. (2001). Engineering specificity and activity of thermolysin-like proteases from Bacillus. Groningen: s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 31-08-2020

Upload: others

Post on 16-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

University of Groningen

Engineering specificity and activity of thermolysin-like proteases from Bacillusde Kreij, Arno

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2001

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):de Kreij, A. (2001). Engineering specificity and activity of thermolysin-like proteases from Bacillus.Groningen: s.n.

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 31-08-2020

Page 2: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

3

Introduction

1. Introduction.1.1 General.

Enzymes offer many advantages overtraditional chemical catalysts. They are clean,often cheap, and renewable. However, fullexploitation of enzymes in industry requires themodification of enzymes in such a way that theyperform a desired reaction under a set ofspecific conditions. The problems that have tobe solved to be able to rationally modifyenzymes are diverse and relate to questions suchas: why are proteins folded the way they are,which structural features determine proteinstability, how does enzymatic catalysis work,and which structural elements control thespecificity of an enzyme? Protein engineering isthe multi-disciplinary approach to answer thesequestions by studying the structure-functionrelationships of proteins. Once these structure-function relationships are known, the obtainedknowledge can be exploited to endow proteinswith changed properties. Since the breakthroughof recombinant DNA technology tremendousprogress has been made in solving structure-function relationships. For example, theunderstanding of thermostability has led toalgorithms that can predict which changes in theprimary structure are likely to improve thethermostability of a protein (1). Proteinengineering could eventually enable the de novodesign of virtually any enzymatic property.

However, fully de novo design of anenzyme is still far away, and many questions onthe structure-function relationships of enzymesremain unanswered. One of the most importantsteps of the last decade in exploiting enzymes isundoubtedly the development of molecular

breeding strategies (2), in which gene splicingand random mutagenesis are combined to yieldenzymes with novel or strongly improvedproperties. Molecular breeding and otherrandom mutagenesis methods (3) are greatlyfacilitated by the advent of robotics, enablingthe handling and screening of tens of thousandsof mutants.

However impressive the results withrandom mutagenesis methods are in yieldingnovel or improved biocatalysts, they do notdirectly contribute to the knowledge of howproteins work. Reverse engineering, that isobtaining an enzyme with a novel functionalitythrough random mutagenesis and reverting itstep by step to the wild-type lacking thisfunctionality, would help in revealingcomplicated structure function-relationshipswhile at the same time yielding many newbiocatalysts.

This thesis describes the developmentand use of protein engineering technologies tochange the properties of thermolysin-likeproteases produced by various Bacillus species.

1.2 The exploitation of the genus Bacilluswith respect to industrially importantenzymes.

The Gram-positive family ofBacillaceae contains four aerobic, endospore-forming genera i.e. Thermoactinomyces,Sporosarcina, Sporolactobacillus and Bacillus(4). Compared to the three other genera, thegenus Bacillus is quite heterogeneous: itconsists of all the aerobic spore forming Gram-positive bacteria that can not be classified asbelonging to one of the three other genera

Page 3: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

4

within the Bacillaceae family. The Bacillusspecies are widely distributed in soil and water,and certain strains tolerate high temperaturesand extreme pH values. Most species areharmless to humans and animals and have beenused in several traditional food fermentations,including the production in Japan of natto fromsoybeans. Only a few pathogens are known,including B. anthracis, the causative agent ofanthrax (5), B. cereus, which causes foodpoisoning and several insect pathogens of whichB. thuringiensis is the most well known (6). Thegenus Bacillus is an important source ofcommercial enzymes, such as cellulases,lipases, starch degrading enzymes and proteases(7). The low level of reported incidence of

pathogenicity of B. subtilis and the widespreaduse of its products and those of its closerelatives in the food, beverage, and detergentindustries have resulted in the granting ofGRAS (generally regarded as safe) status to B.subtilis by the U.S. Food and DrugAdministration.

The genetics and physiology of B.subtilis (8), considered to be the modelorganism for bacilli in general, are welldeveloped. Natural competence is one of severalpost-exponential phase phenomena that are acharacteristic of this bacterium (9).Transformation of competent B. subtilis wasfirst described in 1958 by Spizizen (10), andseveral reviews dealing with genetictransformation exist (11-14). Maximalcompetence develops shortly after the transitionfrom exponential to stationary growth phase,and high cell densities promote the initiation ofcompetence development via a quorum-sensingmechanism in which secreted oligopeptides areinvolved (15). In the presence of polyethylene

glycol, protoplasts of bacilli can be stabilizedand incorporate DNA from the medium. Aftersubsequent cell wall regeneration, transformedcells can be selected (16). Plasmidtransformation systems for Bacillus wereoriginally developed from plasmids of otherGram-positive bacteria, such as Staphylococcusaureus and later Lactococcus lactis (14). More

Figure 1.1. Contribution of various types of enzymes to the totalworldwide enzyme sales. 1% corresponds to approximately $ 10 million. Theproteases (shaded sections) contribute approximately 59% of total enzymesales of $1 billion.

Page 4: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

5

recently, plasmids based on cryptic Bacillusplasmids have been developed as well (17).

The availability of a wide variety ofgenetic tools and the recent sequencing of thecomplete genome (18) makes B. subtilis amongthe best-understood microorganisms (19). Inaddition, the protein secretion pathways of thisorganism have been elucidated in considerabledetail; for reviews see (20-22). The ability ofmany Bacillus species to secrete high levels ofproteins, both of homologous and heterologousnature, has made these bacteria of considerableimportance for biotechnological applications(7).

The current estimated value of theworldwide sales of industrial enzymes amountsto $ 1 billion, 59% of which is accounted for byproteases (23)(Fig. 1.1). The two main types ofproteases secreted by Bacillus are alkalineproteases (subtilisin, EC 3.4.21.62) and neutralproteases (thermolysin, EC 3.4.24.27).Subtilisins account for approximately 30% oftotal enzyme sales (7). Amylases, mainly fromBacillus amyloliquefaciens, account for another18% (24). Thus enzymes from Bacillus speciesgenerate more than 48% of total enzyme sales.

Thermolysin is used in diverseapplications in the leather industry, in bakingand in breweries (24, 25). Although proteasesare hydrolytic enzymes, they also catalyze thereverse reaction resulting in peptide synthesis.This applies in particular to the synthesis of theartificial sweetener aspartame, a dipeptidecomposed of L-aspartic acid and the methylester of L-phenylalanine, by thermolysinvariants (26, 27). DSM is one of the majorindustrial producers of aspartame.

The subtilisins and thermolysin-likeproteases (TLPs) together constitute more than95% of the extracellular protease content andare called major proteases (28). Subtilisinconstitutes approximately 15%, and the TLPsapproximately 80% of all proteolytic activityexpressed by Bacillus (29). These two proteases

are produced mainly at the end of theexponential growth phase and duringsporulation (4, 28, 29). Mutant Bacilli, failing toexpress these proteases, show normal growthand sporulation in artificial growth media (30).In nature, these proteases are thought to liberateamino acids and small peptides from externalsources (29).

1.3 Enzyme classification and thethermolysin-like proteases.

By the late 1950's the number of knownenzymes had increased rapidly; however therewas no guiding authority coordinating thenomenclature. This situation resulted in achaotic and unintelligible enzyme nomenclature.On initiative of the International Union ofBiochemistry, in cooperation with theInternational Union of Pure and AppliedChemistry (IUPAC), a committee was formedthat created the guidelines for the currentenzyme nomenclature. Today the nomenclatureis maintained by the International Union ofBiochemistry and Molecular Biology (IUBMB)(31).

Enzymes are classified according to thetype of reaction they catalyze e.g.oxidoreductases, transferases, hydrolases,lyases, isomerases and ligases. Theclassification of peptidases, which hydrolysepeptide bonds, according to their mode of actionon substrates (Fig. 1.2) has proved moreinformative for the exopeptidases than for theendopeptidases, as may be judged from the factthat subclasses have been further subdividedonly with respect to exopeptidases (31). Sinceendopeptidases all cut within a peptidesubstrate, a further subdivision based on theirmode of action on substrates can not be made.However, Hartley (32) pointed out that fourdistinct types of catalytic mechanisms are beingused by peptidases, namely serine-, cystein-,aspartic- and metallotype mechanisms. Thecarboxypeptidases and the endopeptidases have

Page 5: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

6

been subdivided on the basis of their catalyticmechanism (Fig. 1.3).

Each enzyme has a code numberconsisting of 4 numbers, e.g. thermolysin EC3.4.24.27. The first figure is the class number.All enzymes are divided in 6 classes, class 3being the hydrolases which catalyze thehydrolytic cleavage of C-O, C-N, C-C and someother bonds, including phosphoric anhydridebonds. The second figure is the subclass.Subclass 3.4, one of the 12 current subclasses ofclass 3, consists of those hydrolases that act onpeptide bonds. The third figure usually specifiesthe nature of the substrate on which the enzymeacts. However, in the case of the peptidyl-peptide hydrolases the third figure is based onthe nature of the catalytic mechanism. Sub-subclass 3.4.24 consists of those endopeptidasesthat have water as the nucleophile and have thewater molecule coordinated by a metal ion

(metallo-endopeptidases). The fourth figure isthe serial number of an enzyme in its sub-subclass. Thus, thermolysin EC 3.4.24.27 is ahydrolase, acting on a peptide bond, belongs tothe sub-subgroup of metallo-endopeptidases,and is the 27th member of this sub-subclass (Fig.1.3).

Ideally the systematic name of apeptidase should be derived from the reactioncatalyzed. However for the majority of theendopeptidases the specificity is too complex toprovide the basis for a name. In such cases,trivial names such as trypsin and thermolysinmay serve quite well. Rawlings and Barrett (33),used the term "family" to describe a group ofpeptidases that on the basis of their primarystructure are believed to be evolutionary relatedin the sense that they have arisen by divergentevolution from a single ancestral protein. Inpractice, members of families are recognized bythe fact that each shows a statisticallysignificant relatedness in amino acid sequenceto at least one other member, either throughoutthe whole sequence or at least in the domainresponsible for catalytic activity. The system offamilies that is arrived at on the basis of primarystructures almost certainly contains several setsof families that have a common evolutionaryorigin. Proteins in these sets of families havediverged from a single ancestral protein, buthave diverged to such an extent that theirprimary structure is no longer indicative of theirrelatedness. Therefore, such proteins areaccommodated in separate families. However,indications of distant relationship such as thelinear order of the catalytic residues and thetertiary structure of proteins suggest a commonancestor. The term "clan" was introduced (33) todescribe such a group of families. As yet,however, there are no generally accepted,objective methods by which to decide whetherthese similarities truly reflect divergentrelationships as opposed to convergence ofstructures under evolutionary pressure.

Figure 1.2. Classification of peptidases by the type ofreaction catalyzed. Indicated are the cleavage sites onthe various substrates. Filled circles are the residuescomprising blocks of one, two, or three terminal aminoacids that are cleaved off by these enzymes. The trianglesindicate the N- and C- terminal modification of peptidesthat provide substrates for some of the omega peptidases.Further subdivisions of the carboxypeptidases andendopeptidases have been made on the basis of catalytictype, as shown in Fig. 3.

Page 6: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

7

The M4 family (34) is represented bythermolysin (EC 3.4.24.27) as its prototype andconsists of secreted eubacterial endopeptidasesfrom both Gram-positive and Gram-negative

sources. Thermolysin is produced by thethermophilic Bacillus thermoproteolyticus (35).The neutral proteases, or thermolysin-likeproteases (TLPs) are inhibited by specific zinc

Figure 1.3. IUPAC enzyme classification. The peptidases, a subclass of hydrolases, are subdivided into exo- andendopeptidases, based on their mode of action on substrates. The carboxy- and exopeptidases have been furthersubdivided according to their catalytic type. Due to the establishment of the primary structure of a large number ofpeptidases it has become possible to further subdivide the peptidases based on evolutionary relationship. Themetalloendopeptidases are divided in 13 clans. Clan MA consists of 20 families, one of which is the M4 orthermolysin family. Only 6 peptidases of this family have been provided with their own EC number (see text fordetails).

Page 7: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

8

chelators, such as 1.10-phenantroline and EDTAand have their pH optimum mainly at neutralpH (36). The M4 family currently consists ofapproximately 38 members, six of which havetheir own EC number. The M4 family in turn isone of many families that form the clan MA.The MA clan is characterized by a waternucleophile in the form of water bound by asingle zinc ion ligated to two histidine residues,within the motif HExxH, and Glu, His or Asp asthe catalytic unit. Other families in the MA claninclude M34, with anthrax lethal factor as itsprototype and M13, with neprilysin as itsprototype. The MA clan together with otherclans that can be distinguished on the basis oftheir catalytic residues and the metal ionsinvolved in catalysis, form the sub-subclass ofmetallo endopeptidases EC 3.4.24. (Fig. 1.3).The primary sequence (37) and structure ofthermolysin (38) were published as early as1972. All TLPs bind two calcium ions in adouble calcium binding site, whereas the morestable TLPs bind two additional calcium ions intwo separate single calcium binding sites (39).

Subtilisin (EC 3.4.21.62), member of thesub-subgroup of serine endopeptidases, is theprototype of family S8A (40). This family inturn is part of clan SB. The SB clan ischaracterized by a serine nucleophile and has itscatalytic residues in the order Asp, His, Ser inthe primary amino acid sequence. In contrast tothe TLPs of family M4, subtilisins are inhibitedby phenylmethylsulfonylfluoride (PMSF) anddi-isopropylfluorophosphate (DIPF) and havetheir pH optimum mainly at alkaline pH (4, 41-43).

1.4 Enzyme catalysis.During the last few decades many key

questions in the life sciences have been solved.However, the origin of the catalytic power ofenzymes is still controversial (44). Although asearly as 1946, Pauling already postulated (45)that an enzyme might be complementary to the

transition state of a reaction and accelerate thereaction by binding the transition state andlowering the energy of activation, thequantitation of the physico chemical propertiesinvolved is still problematic. The catalyticpower of enzymes, that is the acceleration ofchemical reactions, should be explained by thecontribution that various enzyme-substrateinteractions have to lowering the energy of thetransition state without the need to invokespecial enzymatic secrets i.e. a special force.

The catalytic power of an enzyme issimple in its definition (44), namely the rate ofthe catalyzed reaction divided by the rate of theuncatalyzed reaction. The simplicity of thisdefinition is in contrast with the problems ofmeasuring the rate enhancement. A few of theproblems include (a) the frequent lack of ameasurable uncatalyzed reaction, (b)comparison of a pseudo-first order enzymaticreaction to a second order simple catalyst (suchas OH-), and (c) the automatic exclusion ofproximity effects if an intramolecular catalyst isused for comparison. Nevertheless, the rateenhancement of chymotrypsin and triose-phosphate isomerase are estimated at 108 to1012-fold (44).

Most of the current theories includePaulings suggestion that an enzyme might becomplementary to the transition state of areaction, enhancing the reaction by binding thetransition state and lowering the energy ofactivation. Some aspect of entropy is usuallyincorporated, although it can take many forms.Ideas of orbital steering (46, 47), solvation (48),low barrier hydrogen bonds (49), electrostatics,and pre-organized active sites all incorporateentropic factors. The second problem inexplaining the catalytic power of enzymes is thedetermination of the magnitude of thecontributions to the rate enhancement that thesedifferent effects have.

One of the most popular models toexplain the catalytic power of enzymes is the

Page 8: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

9

model of the pre organized active sites (50). Inpolar solvents half of the energy gained fromcharge-dipole interactions between a substrateand the solvent is spent on changing the dipole-dipole interaction needed to stabilize thetransition state. In proteins however, internalwater molecules and ionized residues arealready partially oriented toward the transitionstate charge center. Thus, the reorganization

energy is lower and the enzyme is a moreefficient catalyst than the solvent. One of theimplications of this model is that electrostaticinteractions within the active site areresponsible for the majority of the stabilizationof the transition state due to the pre-organizedpolar environment.

Figure 1.4. Abbreviated mechanism of thermolysin catalysis, as advanced byMatthews and co-workers (61, 62). The Zn2+-bound water molecule is displacedtowards Glu143 upon substrate binding. The tetrahedral intermediate, formed after anucleophilic attack of the displaced water, coordinates to the Zn2+, forming a tetrahedraltransition state. His231 acts as proton donor in the subsequent cleavage step. See text forfurther details. The figure was reproduced from (63).

Figure 1.5. Summary of thermolysin catalysis, as advanced by Mock and co-workers(63, 64). The Zn2+-bound water molecule is displaced upon substrate binding, and isactivated by His231 to perform a nucleophilic attack. His231 performs a crucial protondonation in the subsequent cleavage step. This figure was reproduced from (63).

Page 9: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

10

1.5 Catalysis in thermolysin-like proteases.The amino acid sequences of several

TLPs have been determined [see (51), or theMerops data base at http://www.merops.co.uk/merops/famcards/m4.htm], and the three-dimensional structure of TLPs isolated fromseveral bacteria have been solved i.e. Bacillusthermoproteolyticus (52), Bacillus cereus (53,54), Pseudomonas aeruginosa (55) andStaphylococcus aureus (56). TLPs consist of anα-helical C-terminal domain and an N-terminaldomain mainly consisting of β-strands. Thedomains are connected by a central α-helix. Theactive site is located in the cleft between thesetwo domains, and the catalytically essential Zn2+

ion is located at the bottom of this cleft. TheZn2+ ion is co-ordinated by His142, His146 andGlu166.

In a significant number of publishedstructures in which TLN was co-crystallisedwith inhibitors (52, 57-61), the residuesinvolved in catalysis could be identified.Matthews and co-workers proposed amechanism, presented in Fig. 1.4, in which thewater molecule bound to the Zn2+ is displaced toGlu143 upon substrate binding. The watermolecule is then activated by Glu143 andperforms a nucleophilic attack on the carbonylgroup in the scissile bond. The carboxyl groupis polarised by Tyr157, His231 and the Zn2+.The nucleophilic attack leads to a tetrahedralintermediate which co-ordinates the Zn2+ withboth oxygens. The tetrahedral intermediate isstabilised by Glu143, Tyr157 and His231. Inthis mechanism, His 231 acts as proton donor inthe subsequent cleavage step and the side chainof Asn112 and the carbonyl oxygen of residue113 stabilises the newly formed amino group(62).

This mechanism was questioned byMock and co-workers (63, 64). Theirexperiments suggested that the pH-dependenceof kcat and Km was incompatible with the role of

Glu143 and His231. Instead, Mock et al.proposed a reverse protonation mechanism inwhich the acidic limb of the pH profile isdetermined by a water molecule or hydroxideion, bound to the Zn2+ ion, whereas the basiclimb is determined by His231 (Fig. 1.5).According to Mock et al. the kcat increases withincreasing pH and has a pKa of 8.3. This wouldmean that the kcat depends on a deprotonatedHis231. Therefore, His231 can not be a protondonor but must be a proton acceptor. Mock etal. therefore proposed that TLPs follow areverse protonation mechanism.

Site-directed mutagenesis studies haveconfirmed the importance of various residues inthe active site. Beaumont et al. (65) showed thatmutating His231 in thermolysin resulted in adrastic decrease in activity and a greatly reducedpH dependence of activity in the alkaline range.Toma et al (66) confirmed the importance ofGlu143 and His231 in B. subtilis TLP (TLP-sub) by showing that mutation of these residuesresulted in a drastic reduction of activity.However, results from mutagenesis studiesseem to favour the less critical role of theHis231, as proposed by Matthews and co-workers (62, 65).

1.6 Changing the pH-activity profile ofthermolysin-like proteases.

The pH-profile of an enzyme is a ratherbroad term, which can have several meanings.In this thesis the following definitions are used:• pH-performance profile: The specific

activity measured by a given assay as afunction of pH under a given set ofconditions.

• pH-activity profile (kcat-profile or kcat/Km-profile): The kcat or kcat/Km as a function ofpH under a given set of conditions,excluding any effects from stability on theseparameters.

Page 10: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

11

• pH-stability profile: The stability of theenzyme as function of pH under a given setof conditions.

The pH-performance profile (which isoften called the pH-profile) of an enzyme is, infact, a combination of the pH-activity profileand the pH-stability profile under the conditionswhere the assay is performed. To be able todetermine the pH-activity profile, assayconditions have to be used that do not affect thepH-stability.

The use of Furylacryloyl (Fa) substrates,particularly Fa-Gly-Leu-Ala (FaGLA), allowthe determination of initial rates of substratedegradation. As long as the initial rate shows alinear relation between time and substrateconversion, it may be assumed that the enzymeis stable during the time of the assay and,therefore, that the pH stability remainsunchanged. The effects measured are thereforechanges in the pH-activity profile.

The dependence of the Michaelis-Menten parameters on the pKa values of theactive site residues in either the substrate-boundor substrate-free form can be derived usingkinetic equations (67). The kcat depends only onthe pKa values of the active site groups whenthe substrate is bound. The Km depends on thepKa values of both the substrate-bound andsubstrate-free form of the enzyme. Informationon the pKa values in the substrate-free form ofthe enzyme is obtained by measuring kcat/Km.

To change the pH-activity profile, thepKa values of the active site residues have to bechanged. The pKa values of titratable groups inproteins can be changed by altering theirenvironment. Three possible changes in theenvironment of a titratable group and theireffects on a positively and negatively chargedgroup are shown in Fig. 1.6. The change in pKaupon placing an acid in a negative, positive orhydrophobic environment can be summarized inthe following way;

Placing a negative charge close to theacid will give an unfavourable interaction withthe negative charge of the charged form of theacid. Compared to the situation in water thismeans that the charged state of the acidbecomes less favourable than the neutral state.The pKa value of a titratable group is the pHvalue where the group is half-protonated.Placing a negative charge close to an acid willmake an acid keep its proton longer uponraising the pH, compared to the situation inwater, and the pKa of the acid will, therefore, beelevated. Placing a positive charge next to theacid will give a favourable interaction betweenthe negative charge of the deprotonated acid.Thus the acid will loose its proton sooner withincreasing pH than it would have done so in

Figure 1.6. Environmental effects on the pKa. Theeffect on the pKa value of an acid (exemplified by aglutamic acid in panel A) and on a base (exemplified by alysine in panel B). The effect on the pKa values ofinserting a charged residue nearby is the same for the acidand the base, namely an upward shift in the pKa upon theinsertion of a negative charge, and a downward shift inthe pKa upon insertion of a positive charge. Transfer ofan acid to a hydrophobic environment gives an upwardshift in the pKa value, whereas a hydrophobicenvironment causes a base to shift its pKa valuedownwards (67).

Page 11: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

12

water. Consequently, the pKa value of an acidwould be lowered when a positive charge isplaced in its close proximity.

Generally, it is unfavourable for acharged residue to reside in a hydrophobicenvironment as compared to residing in water.Placing an acid in a hydrophobic environmentwill therefore have the same effect as placing itin a negative environment. The pKa value of theacid will therefore be elevated in a hydrophobicenvironment. The effect of the environment onthe pKa of a base can be deduced in the sameway as for an acid. For a base, however, itshould be noted that the charged form ispredominant at low pH values, whereas for theacid the charged form is dominant at high pHvalues.

Although it is well established that pKavalues can be modified by adjacent residues(68), rational modification of the pH-performance profile, or pH-optimum, hasproven to be difficult. In the most successfulstudies, natural variants with a different pH-optimum have been used to design mutationsclose to the active site (69, 70). This approach islimited to those cases where well characterised,highly homologous proteins with a different pHoptimum are available. Even then success is notguaranteed, as mutations in and close to theactive site often produce strong negative effectson the activity (71-73). An alternative approachto mutations close to the active site is themodification of the surface charge of theenzyme (72-74). Since the electrostaticinteraction between a charged group and atitratable group is a long-range interaction,mutations outside the active site should be ableto influence the pKa values of the active siteresidues. In this respect, Russell et.al. (74) haveshown that mutating residues as far as 15Å fromthe active site of subtilisin caused a shift in thepKa values of the active site residues ofapproximately 0.4 pH units. However, even the

complete reversal of the surface charge does notnecessarily have an effect on catalysis (75).

1.7 Engineering thermal stability ofthermolysin-like proteases.

For an extensive review on theengineering of the thermal stability determinantsof TLPs see (76). Bacillus strains display largedifferences in optimum growth temperature andthe stabilities of their TLPs vary accordingly(77). The thermal stability of the thermolysin-like protease from B. stearothermophilus (TLP-ste) is considerably lower than that ofthermolysin, which is the most stable TLP.Initial protein engineering studies of TLP-stewere aimed at stabilising the enzyme bymutations designed on the basis of generalprinciples of protein stability (78). On the basisof these results, it was suggested that thermalinactivation of TLP-ste is governed by localunfolding processes that involve only parts ofthe molecule. Mutations with large effects onstability were located in or near a surface-loopin the N-terminal domain, spanning residues 56through 69. Combining 5 stabilising TLP-ste�thermolysin mutations (A4T, T56A, G58A,T63F, A69A), four of which are located in thestability determining surface loop, resulted in anenzyme that was considerably more stable thanthermolysin itself (79).

An important consequence of the factthat stability-determining unfolding processes inTLP-ste have a local character is that mutationaleffects may display extreme non-additivity (80).To illustrate this, Vriend et al. (81) created fourpseudo wild-types of TLP-ste in which a secondunfolding region had been created by mutationsin the C-terminal domain of the enzyme. Aftermaking these pseudo wild-types, the effects ofstabilising the two regions individually orsimultaneously were studied. The resultsobtained with these mutants show the ”enoughis enough” effect (80). This effect means that itdoes not help to stabilise a region of the protein

Page 12: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

13

once this region has become so stable that itsunfolding no longer contributes to the overallthermal inactivation process. It is interesting tonote that mutations in the hydrophobic core ofthe C-terminal domain that had surprisinglymarginal effects on the stability of wild-typeTLP-ste, became important once the N-terminalunfolding region had been considerablystabilised. In other words, the contribution of asecond unfolding process (involving thehydrophobic core of the C-terminal domain) tothe overall rate of thermal inactivation becomesnoticeable after the contribution of the othermajor unfolding process (involving the 56-69region) has been reduced by stabilisingmutations.

The most spectacular example of adesigned stabilising mutation in the 56-69region in TLP-ste was an engineered disulphidebridge between residues 8 and 60 whichincreased the T50 by as much as 16.7 degrees(82). This result contrasts with the rathermarginal stability effects that were obtainedupon introducing a variety of designeddisulphide bridges in the broad-specificityprotease subtilisin (83). Eijsink et al. (76)propose that the lack of success of theengineered disulphide bridges in subtilisin is atleast partly due to the fact that these bridgeswere introduced in to regions of the proteasethat do not play a dominant role in stability-determining local unfolding processes.

Combination of the five stabilising TLP-ste�thermolysin mutations (A4T, T56A, G58A,T63F, A69P) with the designed mutations S65P,G8C, and N60C yielded one of the most stableproteins ever obtained by protein engineering(84, 85). This 8-fold variant had a half-life of170 minutes at 100 oC and was namedboilysin™. Boilysin was stable for at least 24hours at 90 degrees, a temperature at which thehalf-life of the original enzyme (wild-type TLP-ste) was approximately one minute. It wasshown that, in contrast to wild-type TLP-ste,

boilysin tolerates considerable amounts of urea,guanidinium-HCl and SDS. For example, theenzyme retained approximately 60 percent of itsactivity in the presence of 5 M urea and 40 % ofits activity in the presence of 1 % (wt/vol) SDS(84).

1.8 Engineering hinge bending inthermolysin-like proteases.

Many proteins undergo hinge-bendingmotions during catalysis (86-90). Hinge-bending is the motion of two domains of anenzyme around a hinge point or hinge axis.Through this hinge movement one domain of anenzyme can close onto another to isolate asubstrate from the environment. In several casesglycine residues have been shown crucial forproviding a protein with the necessary (hinge-bending) flexibility (54, 89, 90). Van Aalten etal. (91) showed that mutation of a glycineresidue in the proposed hinge region in retinolbinding protein indeed dramatically reducedretinol binding.

Originally, the crystal structure ofthermolysin was supposed to be that of the pureenzyme (92, 93). When the crystal structure ofB. cereus TLP (TLP-cer) became available, itwas noticed that the active site cleft of thisenzyme was more open than that in thermolysin,resulting from a hinge-bending between the N-and C-terminal domains (54, 86). Puzzled bythis observation, Holland et al. (86) re-examined the original crystallographic data forthermolysin and TLP-cer. They concluded thatthe structure of thermolysin was not that of thefree enzyme, but of the enzyme containing adipeptide in the active site. This peptide was notpresent in the TLP-cer structure. Theseobservations suggested that a hinge-bendingmotion is part of the catalytic mechanism ofTLPs, and that substrate binding yields a closureof the active site (86). Interestingly, a hinge-bending motion quite similar to the oneproposed by Holland et al. was observed when

Page 13: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

14

concerted motions in thermolysin were studiedusing so-called 'essential dynamics' analysis ofmolecular dynamics simulations of thermolysin(89). Originally, Stark et al. (54) suggested thatthe hinge-bending region in TLP-cer andthermolysin is located around glycines 135 and136. This is an attractive proposal, since itplaces the hinge at the beginning of the α-helixconnecting the two domains between whichhinge-bending occurs. In later studies, Hollandet al. proposed a more complex scheme, inwhich bending of the N-terminal α-helix,around Gly78, plays a prominent role. On thebasis of their molecular dynamics simulations,Van Aalten et al. suggested the hinge region inthermolysin to be at both ends of the α-helixconnecting the N- and C-terminal domain(residues 135 and 136 are on the N-terminal endof the connecting α-helix).

The alignment of several TLPs [see (54)or http://www.merops.co.uk/merops/famcards/m4.htm] shows that Gly78 and several glycinesin the α-helix connecting the two domains areconserved in certain groups of TLPs. Residue135 is the most conserved glycine; otherglycines are less well conserved, but acorrelation exists between the disappearance of

certain glycines and the appearance of others.For example, a glycine at 147 is conserved inthose TLPs that have no glycine at position 136.All these glycine residues (78, 135, 136, 141,147, 154) have dihedral angles that arecompatible with some non-glycine amino acidresidues and therefore, are unlikely to representresidues exclusively required for the 3Dstructure of the enzyme. Rather they may berequired for hinge-bending.

1.9 Engineering substrate specificity andactivity.

Rational design of substrate specificity isone of the main goals of protein engineering.Exploitation of enzymes in industry would befacilitated by the ability to rationally modify thesubstrate specificity of an enzyme.Consequently, an extensive body of literatureexists on engineering the substrate specificity ofproteases. The subsite and substratenomenclature regarding specificity issummarized in Fig. 1.7 (94). Three examples ofspecificity determinants are discussed below.

The first and most common example isthe engineering of the substrate binding pocketsto change the substrate specificity. Mei et al.

Figure 1.7. Subsite and substrate nomenclature of proteases. The amino acids of the substrate are countedfrom the cleavage site toward the N-terminus as P1, P2 etc, and towards the C-terminus as P1', P2' etc. Thecorresponding binding pockets on the enzyme are called S1, S2 and S1' and S2'.

Page 14: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

15

(95) replaced glycines in the S1 subsite ofsubtilisin YaB by larger residues such as alanineand valine. This resulted in an increase inactivity towards substrates with a P1 Ala and asharp decrease in activity towards substrateswith a P1 Phe or Leu. Many other examplesexist in which the preference for largehydrophobic substrates was diminished byreducing the substrate binding pocket sizethrough the replacement of small binding pocketresidues by larger residues (95-97) or byblocking the entrance of the binding pocket(98).

The second example concerns theconversion of trypsin to chymotrypsin. One ofthe most thoroughly studied and now bestunderstood systems is the conversion of trypsinto chymotrypsin and the structural basis ofsubstrate specificity in the serine proteases (99-101). Trypsin (EC 3.4.21.4) and chymotrypsin(EC 3.4.21.1) both belong to the S1 peptidasefamily and catalyze peptide bond cleavage byidentical mechanisms. A serine residue acts as anucleophile and the catalytic residues are in theorder His, Asp, Ser in the primary sequence.Both enzymes are endopeptidases and possessvery similar tertiary structures consisting of twojuxtaposed six stranded β-barrel domains (102,103). The substrate specificity of trypsin,expressed in relative kcat/Km values, is nearly106-fold higher for P1 Arg or Lys containingsubstrates compared to the activity towardsanalogous P1 Phe containing substrates.Conversely, chymotrypsin favours peptidesubstrates possessing Trp, Tyr and Phe at the P1

position, with an overall specificity relative toP1 Lys substrates of up to 104-fold.

Since the structures of the S1 subsites ofthe two enzymes are very similar, the differencein substrate specificity was thought to be asimple property of the local electrostaticenvironment. However, replacement of theprimary binding determinant Asp189 of trypsinwith the analogous Ser189 of chymotrypsin

failed to convert the specificity but, instead,resulted in a poorly performing nonspecificprotease (104). Conversion of trypsin to achymotrypsin-like protease required thesubstitution of four residues in the S1 subsitetogether with the exchange of two adjacentsurface loops, which do not directly contact thesubstrate (105). Inspection of the crystalstructures of the wild-type trypsin andchymotrypsin and those of several mutants,revealed the specificity determinants involved(99). The conserved Gly216, which contacts theP3 residue in both trypsin and chymotrypsin,turned out to be crucial for correct positioningof the substrate in the active site. The differentstructures of the surface loops in trypsin andchymotrypsin maintain Gly216 in distinctconformations, enabling this residue to functionas a specificity determinant despite beingconserved in both proteases.

The study of the trypsin-chymotrypsinsystem has led to a definition of two types ofspecificity determinants (99); primaryspecificity determinants encompassing aminoacids that directly contact the substrate, andsecondary specificity determinants which aremore distantly located elements in the protein.The secondary determinants can act throughvarious mechanisms such as influencing theconformation of primary determinants, as in thecase of Gly216 in trypsin and chymotrypsin, orby modulating the degree of flexibility in thesubstrate binding site. Examples of the latter canbe found in elastase (106, 107) and coenzyme Atransferase (108). The existence of secondaryspecificity determinants imply that substratespecificity is not necessarily determined by alimited set of amino acids in the substratebinding pockets. Instead, substrate specificitycan be a globally distributed propertydetermined by a large part of the protein fold.

The third example is another example ofa specificity determinant which is not located ina subsite. This example relates to the S10 family

Page 15: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

3

Introduction

1. Introduction.1.1 General.

Enzymes offer many advantages overtraditional chemical catalysts. They are clean,often cheap, and renewable. However, fullexploitation of enzymes in industry requires themodification of enzymes in such a way that theyperform a desired reaction under a set ofspecific conditions. The problems that have tobe solved to be able to rationally modifyenzymes are diverse and relate to questions suchas: why are proteins folded the way they are,which structural features determine proteinstability, how does enzymatic catalysis work,and which structural elements control thespecificity of an enzyme? Protein engineering isthe multi-disciplinary approach to answer thesequestions by studying the structure-functionrelationships of proteins. Once these structure-function relationships are known, the obtainedknowledge can be exploited to endow proteinswith changed properties. Since the breakthroughof recombinant DNA technology tremendousprogress has been made in solving structure-function relationships. For example, theunderstanding of thermostability has led toalgorithms that can predict which changes in theprimary structure are likely to improve thethermostability of a protein (1). Proteinengineering could eventually enable the de novodesign of virtually any enzymatic property.

However, fully de novo design of anenzyme is still far away, and many questions onthe structure-function relationships of enzymesremain unanswered. One of the most importantsteps of the last decade in exploiting enzymes isundoubtedly the development of molecular

breeding strategies (2), in which gene splicingand random mutagenesis are combined to yieldenzymes with novel or strongly improvedproperties. Molecular breeding and otherrandom mutagenesis methods (3) are greatlyfacilitated by the advent of robotics, enablingthe handling and screening of tens of thousandsof mutants.

However impressive the results withrandom mutagenesis methods are in yieldingnovel or improved biocatalysts, they do notdirectly contribute to the knowledge of howproteins work. Reverse engineering, that isobtaining an enzyme with a novel functionalitythrough random mutagenesis and reverting itstep by step to the wild-type lacking thisfunctionality, would help in revealingcomplicated structure function-relationshipswhile at the same time yielding many newbiocatalysts.

This thesis describes the developmentand use of protein engineering technologies tochange the properties of thermolysin-likeproteases produced by various Bacillus species.

1.2 The exploitation of the genus Bacilluswith respect to industrially importantenzymes.

The Gram-positive family ofBacillaceae contains four aerobic, endospore-forming genera i.e. Thermoactinomyces,Sporosarcina, Sporolactobacillus and Bacillus(4). Compared to the three other genera, thegenus Bacillus is quite heterogeneous: itconsists of all the aerobic spore forming Gram-positive bacteria that can not be classified asbelonging to one of the three other genera

Page 16: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

4

within the Bacillaceae family. The Bacillusspecies are widely distributed in soil and water,and certain strains tolerate high temperaturesand extreme pH values. Most species areharmless to humans and animals and have beenused in several traditional food fermentations,including the production in Japan of natto fromsoybeans. Only a few pathogens are known,including B. anthracis, the causative agent ofanthrax (5), B. cereus, which causes foodpoisoning and several insect pathogens of whichB. thuringiensis is the most well known (6). Thegenus Bacillus is an important source ofcommercial enzymes, such as cellulases,lipases, starch degrading enzymes and proteases(7). The low level of reported incidence of

pathogenicity of B. subtilis and the widespreaduse of its products and those of its closerelatives in the food, beverage, and detergentindustries have resulted in the granting ofGRAS (generally regarded as safe) status to B.subtilis by the U.S. Food and DrugAdministration.

The genetics and physiology of B.subtilis (8), considered to be the modelorganism for bacilli in general, are welldeveloped. Natural competence is one of severalpost-exponential phase phenomena that are acharacteristic of this bacterium (9).Transformation of competent B. subtilis wasfirst described in 1958 by Spizizen (10), andseveral reviews dealing with genetictransformation exist (11-14). Maximalcompetence develops shortly after the transitionfrom exponential to stationary growth phase,and high cell densities promote the initiation ofcompetence development via a quorum-sensingmechanism in which secreted oligopeptides areinvolved (15). In the presence of polyethylene

glycol, protoplasts of bacilli can be stabilizedand incorporate DNA from the medium. Aftersubsequent cell wall regeneration, transformedcells can be selected (16). Plasmidtransformation systems for Bacillus wereoriginally developed from plasmids of otherGram-positive bacteria, such as Staphylococcusaureus and later Lactococcus lactis (14). More

Figure 1.1. Contribution of various types of enzymes to the totalworldwide enzyme sales. 1% corresponds to approximately $ 10 million. Theproteases (shaded sections) contribute approximately 59% of total enzymesales of $1 billion.

Page 17: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

5

recently, plasmids based on cryptic Bacillusplasmids have been developed as well (17).

The availability of a wide variety ofgenetic tools and the recent sequencing of thecomplete genome (18) makes B. subtilis amongthe best-understood microorganisms (19). Inaddition, the protein secretion pathways of thisorganism have been elucidated in considerabledetail; for reviews see (20-22). The ability ofmany Bacillus species to secrete high levels ofproteins, both of homologous and heterologousnature, has made these bacteria of considerableimportance for biotechnological applications(7).

The current estimated value of theworldwide sales of industrial enzymes amountsto $ 1 billion, 59% of which is accounted for byproteases (23)(Fig. 1.1). The two main types ofproteases secreted by Bacillus are alkalineproteases (subtilisin, EC 3.4.21.62) and neutralproteases (thermolysin, EC 3.4.24.27).Subtilisins account for approximately 30% oftotal enzyme sales (7). Amylases, mainly fromBacillus amyloliquefaciens, account for another18% (24). Thus enzymes from Bacillus speciesgenerate more than 48% of total enzyme sales.

Thermolysin is used in diverseapplications in the leather industry, in bakingand in breweries (24, 25). Although proteasesare hydrolytic enzymes, they also catalyze thereverse reaction resulting in peptide synthesis.This applies in particular to the synthesis of theartificial sweetener aspartame, a dipeptidecomposed of L-aspartic acid and the methylester of L-phenylalanine, by thermolysinvariants (26, 27). DSM is one of the majorindustrial producers of aspartame.

The subtilisins and thermolysin-likeproteases (TLPs) together constitute more than95% of the extracellular protease content andare called major proteases (28). Subtilisinconstitutes approximately 15%, and the TLPsapproximately 80% of all proteolytic activityexpressed by Bacillus (29). These two proteases

are produced mainly at the end of theexponential growth phase and duringsporulation (4, 28, 29). Mutant Bacilli, failing toexpress these proteases, show normal growthand sporulation in artificial growth media (30).In nature, these proteases are thought to liberateamino acids and small peptides from externalsources (29).

1.3 Enzyme classification and thethermolysin-like proteases.

By the late 1950's the number of knownenzymes had increased rapidly; however therewas no guiding authority coordinating thenomenclature. This situation resulted in achaotic and unintelligible enzyme nomenclature.On initiative of the International Union ofBiochemistry, in cooperation with theInternational Union of Pure and AppliedChemistry (IUPAC), a committee was formedthat created the guidelines for the currentenzyme nomenclature. Today the nomenclatureis maintained by the International Union ofBiochemistry and Molecular Biology (IUBMB)(31).

Enzymes are classified according to thetype of reaction they catalyze e.g.oxidoreductases, transferases, hydrolases,lyases, isomerases and ligases. Theclassification of peptidases, which hydrolysepeptide bonds, according to their mode of actionon substrates (Fig. 1.2) has proved moreinformative for the exopeptidases than for theendopeptidases, as may be judged from the factthat subclasses have been further subdividedonly with respect to exopeptidases (31). Sinceendopeptidases all cut within a peptidesubstrate, a further subdivision based on theirmode of action on substrates can not be made.However, Hartley (32) pointed out that fourdistinct types of catalytic mechanisms are beingused by peptidases, namely serine-, cystein-,aspartic- and metallotype mechanisms. Thecarboxypeptidases and the endopeptidases have

Page 18: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

6

been subdivided on the basis of their catalyticmechanism (Fig. 1.3).

Each enzyme has a code numberconsisting of 4 numbers, e.g. thermolysin EC3.4.24.27. The first figure is the class number.All enzymes are divided in 6 classes, class 3being the hydrolases which catalyze thehydrolytic cleavage of C-O, C-N, C-C and someother bonds, including phosphoric anhydridebonds. The second figure is the subclass.Subclass 3.4, one of the 12 current subclasses ofclass 3, consists of those hydrolases that act onpeptide bonds. The third figure usually specifiesthe nature of the substrate on which the enzymeacts. However, in the case of the peptidyl-peptide hydrolases the third figure is based onthe nature of the catalytic mechanism. Sub-subclass 3.4.24 consists of those endopeptidasesthat have water as the nucleophile and have thewater molecule coordinated by a metal ion

(metallo-endopeptidases). The fourth figure isthe serial number of an enzyme in its sub-subclass. Thus, thermolysin EC 3.4.24.27 is ahydrolase, acting on a peptide bond, belongs tothe sub-subgroup of metallo-endopeptidases,and is the 27th member of this sub-subclass (Fig.1.3).

Ideally the systematic name of apeptidase should be derived from the reactioncatalyzed. However for the majority of theendopeptidases the specificity is too complex toprovide the basis for a name. In such cases,trivial names such as trypsin and thermolysinmay serve quite well. Rawlings and Barrett (33),used the term "family" to describe a group ofpeptidases that on the basis of their primarystructure are believed to be evolutionary relatedin the sense that they have arisen by divergentevolution from a single ancestral protein. Inpractice, members of families are recognized bythe fact that each shows a statisticallysignificant relatedness in amino acid sequenceto at least one other member, either throughoutthe whole sequence or at least in the domainresponsible for catalytic activity. The system offamilies that is arrived at on the basis of primarystructures almost certainly contains several setsof families that have a common evolutionaryorigin. Proteins in these sets of families havediverged from a single ancestral protein, buthave diverged to such an extent that theirprimary structure is no longer indicative of theirrelatedness. Therefore, such proteins areaccommodated in separate families. However,indications of distant relationship such as thelinear order of the catalytic residues and thetertiary structure of proteins suggest a commonancestor. The term "clan" was introduced (33) todescribe such a group of families. As yet,however, there are no generally accepted,objective methods by which to decide whetherthese similarities truly reflect divergentrelationships as opposed to convergence ofstructures under evolutionary pressure.

Figure 1.2. Classification of peptidases by the type ofreaction catalyzed. Indicated are the cleavage sites onthe various substrates. Filled circles are the residuescomprising blocks of one, two, or three terminal aminoacids that are cleaved off by these enzymes. The trianglesindicate the N- and C- terminal modification of peptidesthat provide substrates for some of the omega peptidases.Further subdivisions of the carboxypeptidases andendopeptidases have been made on the basis of catalytictype, as shown in Fig. 3.

Page 19: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

7

The M4 family (34) is represented bythermolysin (EC 3.4.24.27) as its prototype andconsists of secreted eubacterial endopeptidasesfrom both Gram-positive and Gram-negative

sources. Thermolysin is produced by thethermophilic Bacillus thermoproteolyticus (35).The neutral proteases, or thermolysin-likeproteases (TLPs) are inhibited by specific zinc

Figure 1.3. IUPAC enzyme classification. The peptidases, a subclass of hydrolases, are subdivided into exo- andendopeptidases, based on their mode of action on substrates. The carboxy- and exopeptidases have been furthersubdivided according to their catalytic type. Due to the establishment of the primary structure of a large number ofpeptidases it has become possible to further subdivide the peptidases based on evolutionary relationship. Themetalloendopeptidases are divided in 13 clans. Clan MA consists of 20 families, one of which is the M4 orthermolysin family. Only 6 peptidases of this family have been provided with their own EC number (see text fordetails).

Page 20: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

8

chelators, such as 1.10-phenantroline and EDTAand have their pH optimum mainly at neutralpH (36). The M4 family currently consists ofapproximately 38 members, six of which havetheir own EC number. The M4 family in turn isone of many families that form the clan MA.The MA clan is characterized by a waternucleophile in the form of water bound by asingle zinc ion ligated to two histidine residues,within the motif HExxH, and Glu, His or Asp asthe catalytic unit. Other families in the MA claninclude M34, with anthrax lethal factor as itsprototype and M13, with neprilysin as itsprototype. The MA clan together with otherclans that can be distinguished on the basis oftheir catalytic residues and the metal ionsinvolved in catalysis, form the sub-subclass ofmetallo endopeptidases EC 3.4.24. (Fig. 1.3).The primary sequence (37) and structure ofthermolysin (38) were published as early as1972. All TLPs bind two calcium ions in adouble calcium binding site, whereas the morestable TLPs bind two additional calcium ions intwo separate single calcium binding sites (39).

Subtilisin (EC 3.4.21.62), member of thesub-subgroup of serine endopeptidases, is theprototype of family S8A (40). This family inturn is part of clan SB. The SB clan ischaracterized by a serine nucleophile and has itscatalytic residues in the order Asp, His, Ser inthe primary amino acid sequence. In contrast tothe TLPs of family M4, subtilisins are inhibitedby phenylmethylsulfonylfluoride (PMSF) anddi-isopropylfluorophosphate (DIPF) and havetheir pH optimum mainly at alkaline pH (4, 41-43).

1.4 Enzyme catalysis.During the last few decades many key

questions in the life sciences have been solved.However, the origin of the catalytic power ofenzymes is still controversial (44). Although asearly as 1946, Pauling already postulated (45)that an enzyme might be complementary to the

transition state of a reaction and accelerate thereaction by binding the transition state andlowering the energy of activation, thequantitation of the physico chemical propertiesinvolved is still problematic. The catalyticpower of enzymes, that is the acceleration ofchemical reactions, should be explained by thecontribution that various enzyme-substrateinteractions have to lowering the energy of thetransition state without the need to invokespecial enzymatic secrets i.e. a special force.

The catalytic power of an enzyme issimple in its definition (44), namely the rate ofthe catalyzed reaction divided by the rate of theuncatalyzed reaction. The simplicity of thisdefinition is in contrast with the problems ofmeasuring the rate enhancement. A few of theproblems include (a) the frequent lack of ameasurable uncatalyzed reaction, (b)comparison of a pseudo-first order enzymaticreaction to a second order simple catalyst (suchas OH-), and (c) the automatic exclusion ofproximity effects if an intramolecular catalyst isused for comparison. Nevertheless, the rateenhancement of chymotrypsin and triose-phosphate isomerase are estimated at 108 to1012-fold (44).

Most of the current theories includePaulings suggestion that an enzyme might becomplementary to the transition state of areaction, enhancing the reaction by binding thetransition state and lowering the energy ofactivation. Some aspect of entropy is usuallyincorporated, although it can take many forms.Ideas of orbital steering (46, 47), solvation (48),low barrier hydrogen bonds (49), electrostatics,and pre-organized active sites all incorporateentropic factors. The second problem inexplaining the catalytic power of enzymes is thedetermination of the magnitude of thecontributions to the rate enhancement that thesedifferent effects have.

One of the most popular models toexplain the catalytic power of enzymes is the

Page 21: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

9

model of the pre organized active sites (50). Inpolar solvents half of the energy gained fromcharge-dipole interactions between a substrateand the solvent is spent on changing the dipole-dipole interaction needed to stabilize thetransition state. In proteins however, internalwater molecules and ionized residues arealready partially oriented toward the transitionstate charge center. Thus, the reorganization

energy is lower and the enzyme is a moreefficient catalyst than the solvent. One of theimplications of this model is that electrostaticinteractions within the active site areresponsible for the majority of the stabilizationof the transition state due to the pre-organizedpolar environment.

Figure 1.4. Abbreviated mechanism of thermolysin catalysis, as advanced byMatthews and co-workers (61, 62). The Zn2+-bound water molecule is displacedtowards Glu143 upon substrate binding. The tetrahedral intermediate, formed after anucleophilic attack of the displaced water, coordinates to the Zn2+, forming a tetrahedraltransition state. His231 acts as proton donor in the subsequent cleavage step. See text forfurther details. The figure was reproduced from (63).

Figure 1.5. Summary of thermolysin catalysis, as advanced by Mock and co-workers(63, 64). The Zn2+-bound water molecule is displaced upon substrate binding, and isactivated by His231 to perform a nucleophilic attack. His231 performs a crucial protondonation in the subsequent cleavage step. This figure was reproduced from (63).

Page 22: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

10

1.5 Catalysis in thermolysin-like proteases.The amino acid sequences of several

TLPs have been determined [see (51), or theMerops data base at http://www.merops.co.uk/merops/famcards/m4.htm], and the three-dimensional structure of TLPs isolated fromseveral bacteria have been solved i.e. Bacillusthermoproteolyticus (52), Bacillus cereus (53,54), Pseudomonas aeruginosa (55) andStaphylococcus aureus (56). TLPs consist of anα-helical C-terminal domain and an N-terminaldomain mainly consisting of β-strands. Thedomains are connected by a central α-helix. Theactive site is located in the cleft between thesetwo domains, and the catalytically essential Zn2+

ion is located at the bottom of this cleft. TheZn2+ ion is co-ordinated by His142, His146 andGlu166.

In a significant number of publishedstructures in which TLN was co-crystallisedwith inhibitors (52, 57-61), the residuesinvolved in catalysis could be identified.Matthews and co-workers proposed amechanism, presented in Fig. 1.4, in which thewater molecule bound to the Zn2+ is displaced toGlu143 upon substrate binding. The watermolecule is then activated by Glu143 andperforms a nucleophilic attack on the carbonylgroup in the scissile bond. The carboxyl groupis polarised by Tyr157, His231 and the Zn2+.The nucleophilic attack leads to a tetrahedralintermediate which co-ordinates the Zn2+ withboth oxygens. The tetrahedral intermediate isstabilised by Glu143, Tyr157 and His231. Inthis mechanism, His 231 acts as proton donor inthe subsequent cleavage step and the side chainof Asn112 and the carbonyl oxygen of residue113 stabilises the newly formed amino group(62).

This mechanism was questioned byMock and co-workers (63, 64). Theirexperiments suggested that the pH-dependenceof kcat and Km was incompatible with the role of

Glu143 and His231. Instead, Mock et al.proposed a reverse protonation mechanism inwhich the acidic limb of the pH profile isdetermined by a water molecule or hydroxideion, bound to the Zn2+ ion, whereas the basiclimb is determined by His231 (Fig. 1.5).According to Mock et al. the kcat increases withincreasing pH and has a pKa of 8.3. This wouldmean that the kcat depends on a deprotonatedHis231. Therefore, His231 can not be a protondonor but must be a proton acceptor. Mock etal. therefore proposed that TLPs follow areverse protonation mechanism.

Site-directed mutagenesis studies haveconfirmed the importance of various residues inthe active site. Beaumont et al. (65) showed thatmutating His231 in thermolysin resulted in adrastic decrease in activity and a greatly reducedpH dependence of activity in the alkaline range.Toma et al (66) confirmed the importance ofGlu143 and His231 in B. subtilis TLP (TLP-sub) by showing that mutation of these residuesresulted in a drastic reduction of activity.However, results from mutagenesis studiesseem to favour the less critical role of theHis231, as proposed by Matthews and co-workers (62, 65).

1.6 Changing the pH-activity profile ofthermolysin-like proteases.

The pH-profile of an enzyme is a ratherbroad term, which can have several meanings.In this thesis the following definitions are used:• pH-performance profile: The specific

activity measured by a given assay as afunction of pH under a given set ofconditions.

• pH-activity profile (kcat-profile or kcat/Km-profile): The kcat or kcat/Km as a function ofpH under a given set of conditions,excluding any effects from stability on theseparameters.

Page 23: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

11

• pH-stability profile: The stability of theenzyme as function of pH under a given setof conditions.

The pH-performance profile (which isoften called the pH-profile) of an enzyme is, infact, a combination of the pH-activity profileand the pH-stability profile under the conditionswhere the assay is performed. To be able todetermine the pH-activity profile, assayconditions have to be used that do not affect thepH-stability.

The use of Furylacryloyl (Fa) substrates,particularly Fa-Gly-Leu-Ala (FaGLA), allowthe determination of initial rates of substratedegradation. As long as the initial rate shows alinear relation between time and substrateconversion, it may be assumed that the enzymeis stable during the time of the assay and,therefore, that the pH stability remainsunchanged. The effects measured are thereforechanges in the pH-activity profile.

The dependence of the Michaelis-Menten parameters on the pKa values of theactive site residues in either the substrate-boundor substrate-free form can be derived usingkinetic equations (67). The kcat depends only onthe pKa values of the active site groups whenthe substrate is bound. The Km depends on thepKa values of both the substrate-bound andsubstrate-free form of the enzyme. Informationon the pKa values in the substrate-free form ofthe enzyme is obtained by measuring kcat/Km.

To change the pH-activity profile, thepKa values of the active site residues have to bechanged. The pKa values of titratable groups inproteins can be changed by altering theirenvironment. Three possible changes in theenvironment of a titratable group and theireffects on a positively and negatively chargedgroup are shown in Fig. 1.6. The change in pKaupon placing an acid in a negative, positive orhydrophobic environment can be summarized inthe following way;

Placing a negative charge close to theacid will give an unfavourable interaction withthe negative charge of the charged form of theacid. Compared to the situation in water thismeans that the charged state of the acidbecomes less favourable than the neutral state.The pKa value of a titratable group is the pHvalue where the group is half-protonated.Placing a negative charge close to an acid willmake an acid keep its proton longer uponraising the pH, compared to the situation inwater, and the pKa of the acid will, therefore, beelevated. Placing a positive charge next to theacid will give a favourable interaction betweenthe negative charge of the deprotonated acid.Thus the acid will loose its proton sooner withincreasing pH than it would have done so in

Figure 1.6. Environmental effects on the pKa. Theeffect on the pKa value of an acid (exemplified by aglutamic acid in panel A) and on a base (exemplified by alysine in panel B). The effect on the pKa values ofinserting a charged residue nearby is the same for the acidand the base, namely an upward shift in the pKa upon theinsertion of a negative charge, and a downward shift inthe pKa upon insertion of a positive charge. Transfer ofan acid to a hydrophobic environment gives an upwardshift in the pKa value, whereas a hydrophobicenvironment causes a base to shift its pKa valuedownwards (67).

Page 24: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

12

water. Consequently, the pKa value of an acidwould be lowered when a positive charge isplaced in its close proximity.

Generally, it is unfavourable for acharged residue to reside in a hydrophobicenvironment as compared to residing in water.Placing an acid in a hydrophobic environmentwill therefore have the same effect as placing itin a negative environment. The pKa value of theacid will therefore be elevated in a hydrophobicenvironment. The effect of the environment onthe pKa of a base can be deduced in the sameway as for an acid. For a base, however, itshould be noted that the charged form ispredominant at low pH values, whereas for theacid the charged form is dominant at high pHvalues.

Although it is well established that pKavalues can be modified by adjacent residues(68), rational modification of the pH-performance profile, or pH-optimum, hasproven to be difficult. In the most successfulstudies, natural variants with a different pH-optimum have been used to design mutationsclose to the active site (69, 70). This approach islimited to those cases where well characterised,highly homologous proteins with a different pHoptimum are available. Even then success is notguaranteed, as mutations in and close to theactive site often produce strong negative effectson the activity (71-73). An alternative approachto mutations close to the active site is themodification of the surface charge of theenzyme (72-74). Since the electrostaticinteraction between a charged group and atitratable group is a long-range interaction,mutations outside the active site should be ableto influence the pKa values of the active siteresidues. In this respect, Russell et.al. (74) haveshown that mutating residues as far as 15Å fromthe active site of subtilisin caused a shift in thepKa values of the active site residues ofapproximately 0.4 pH units. However, even the

complete reversal of the surface charge does notnecessarily have an effect on catalysis (75).

1.7 Engineering thermal stability ofthermolysin-like proteases.

For an extensive review on theengineering of the thermal stability determinantsof TLPs see (76). Bacillus strains display largedifferences in optimum growth temperature andthe stabilities of their TLPs vary accordingly(77). The thermal stability of the thermolysin-like protease from B. stearothermophilus (TLP-ste) is considerably lower than that ofthermolysin, which is the most stable TLP.Initial protein engineering studies of TLP-stewere aimed at stabilising the enzyme bymutations designed on the basis of generalprinciples of protein stability (78). On the basisof these results, it was suggested that thermalinactivation of TLP-ste is governed by localunfolding processes that involve only parts ofthe molecule. Mutations with large effects onstability were located in or near a surface-loopin the N-terminal domain, spanning residues 56through 69. Combining 5 stabilising TLP-ste�thermolysin mutations (A4T, T56A, G58A,T63F, A69A), four of which are located in thestability determining surface loop, resulted in anenzyme that was considerably more stable thanthermolysin itself (79).

An important consequence of the factthat stability-determining unfolding processes inTLP-ste have a local character is that mutationaleffects may display extreme non-additivity (80).To illustrate this, Vriend et al. (81) created fourpseudo wild-types of TLP-ste in which a secondunfolding region had been created by mutationsin the C-terminal domain of the enzyme. Aftermaking these pseudo wild-types, the effects ofstabilising the two regions individually orsimultaneously were studied. The resultsobtained with these mutants show the ”enoughis enough” effect (80). This effect means that itdoes not help to stabilise a region of the protein

Page 25: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

13

once this region has become so stable that itsunfolding no longer contributes to the overallthermal inactivation process. It is interesting tonote that mutations in the hydrophobic core ofthe C-terminal domain that had surprisinglymarginal effects on the stability of wild-typeTLP-ste, became important once the N-terminalunfolding region had been considerablystabilised. In other words, the contribution of asecond unfolding process (involving thehydrophobic core of the C-terminal domain) tothe overall rate of thermal inactivation becomesnoticeable after the contribution of the othermajor unfolding process (involving the 56-69region) has been reduced by stabilisingmutations.

The most spectacular example of adesigned stabilising mutation in the 56-69region in TLP-ste was an engineered disulphidebridge between residues 8 and 60 whichincreased the T50 by as much as 16.7 degrees(82). This result contrasts with the rathermarginal stability effects that were obtainedupon introducing a variety of designeddisulphide bridges in the broad-specificityprotease subtilisin (83). Eijsink et al. (76)propose that the lack of success of theengineered disulphide bridges in subtilisin is atleast partly due to the fact that these bridgeswere introduced in to regions of the proteasethat do not play a dominant role in stability-determining local unfolding processes.

Combination of the five stabilising TLP-ste�thermolysin mutations (A4T, T56A, G58A,T63F, A69P) with the designed mutations S65P,G8C, and N60C yielded one of the most stableproteins ever obtained by protein engineering(84, 85). This 8-fold variant had a half-life of170 minutes at 100 oC and was namedboilysin™. Boilysin was stable for at least 24hours at 90 degrees, a temperature at which thehalf-life of the original enzyme (wild-type TLP-ste) was approximately one minute. It wasshown that, in contrast to wild-type TLP-ste,

boilysin tolerates considerable amounts of urea,guanidinium-HCl and SDS. For example, theenzyme retained approximately 60 percent of itsactivity in the presence of 5 M urea and 40 % ofits activity in the presence of 1 % (wt/vol) SDS(84).

1.8 Engineering hinge bending inthermolysin-like proteases.

Many proteins undergo hinge-bendingmotions during catalysis (86-90). Hinge-bending is the motion of two domains of anenzyme around a hinge point or hinge axis.Through this hinge movement one domain of anenzyme can close onto another to isolate asubstrate from the environment. In several casesglycine residues have been shown crucial forproviding a protein with the necessary (hinge-bending) flexibility (54, 89, 90). Van Aalten etal. (91) showed that mutation of a glycineresidue in the proposed hinge region in retinolbinding protein indeed dramatically reducedretinol binding.

Originally, the crystal structure ofthermolysin was supposed to be that of the pureenzyme (92, 93). When the crystal structure ofB. cereus TLP (TLP-cer) became available, itwas noticed that the active site cleft of thisenzyme was more open than that in thermolysin,resulting from a hinge-bending between the N-and C-terminal domains (54, 86). Puzzled bythis observation, Holland et al. (86) re-examined the original crystallographic data forthermolysin and TLP-cer. They concluded thatthe structure of thermolysin was not that of thefree enzyme, but of the enzyme containing adipeptide in the active site. This peptide was notpresent in the TLP-cer structure. Theseobservations suggested that a hinge-bendingmotion is part of the catalytic mechanism ofTLPs, and that substrate binding yields a closureof the active site (86). Interestingly, a hinge-bending motion quite similar to the oneproposed by Holland et al. was observed when

Page 26: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

14

concerted motions in thermolysin were studiedusing so-called 'essential dynamics' analysis ofmolecular dynamics simulations of thermolysin(89). Originally, Stark et al. (54) suggested thatthe hinge-bending region in TLP-cer andthermolysin is located around glycines 135 and136. This is an attractive proposal, since itplaces the hinge at the beginning of the α-helixconnecting the two domains between whichhinge-bending occurs. In later studies, Hollandet al. proposed a more complex scheme, inwhich bending of the N-terminal α-helix,around Gly78, plays a prominent role. On thebasis of their molecular dynamics simulations,Van Aalten et al. suggested the hinge region inthermolysin to be at both ends of the α-helixconnecting the N- and C-terminal domain(residues 135 and 136 are on the N-terminal endof the connecting α-helix).

The alignment of several TLPs [see (54)or http://www.merops.co.uk/merops/famcards/m4.htm] shows that Gly78 and several glycinesin the α-helix connecting the two domains areconserved in certain groups of TLPs. Residue135 is the most conserved glycine; otherglycines are less well conserved, but acorrelation exists between the disappearance of

certain glycines and the appearance of others.For example, a glycine at 147 is conserved inthose TLPs that have no glycine at position 136.All these glycine residues (78, 135, 136, 141,147, 154) have dihedral angles that arecompatible with some non-glycine amino acidresidues and therefore, are unlikely to representresidues exclusively required for the 3Dstructure of the enzyme. Rather they may berequired for hinge-bending.

1.9 Engineering substrate specificity andactivity.

Rational design of substrate specificity isone of the main goals of protein engineering.Exploitation of enzymes in industry would befacilitated by the ability to rationally modify thesubstrate specificity of an enzyme.Consequently, an extensive body of literatureexists on engineering the substrate specificity ofproteases. The subsite and substratenomenclature regarding specificity issummarized in Fig. 1.7 (94). Three examples ofspecificity determinants are discussed below.

The first and most common example isthe engineering of the substrate binding pocketsto change the substrate specificity. Mei et al.

Figure 1.7. Subsite and substrate nomenclature of proteases. The amino acids of the substrate are countedfrom the cleavage site toward the N-terminus as P1, P2 etc, and towards the C-terminus as P1', P2' etc. Thecorresponding binding pockets on the enzyme are called S1, S2 and S1' and S2'.

Page 27: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

15

(95) replaced glycines in the S1 subsite ofsubtilisin YaB by larger residues such as alanineand valine. This resulted in an increase inactivity towards substrates with a P1 Ala and asharp decrease in activity towards substrateswith a P1 Phe or Leu. Many other examplesexist in which the preference for largehydrophobic substrates was diminished byreducing the substrate binding pocket sizethrough the replacement of small binding pocketresidues by larger residues (95-97) or byblocking the entrance of the binding pocket(98).

The second example concerns theconversion of trypsin to chymotrypsin. One ofthe most thoroughly studied and now bestunderstood systems is the conversion of trypsinto chymotrypsin and the structural basis ofsubstrate specificity in the serine proteases (99-101). Trypsin (EC 3.4.21.4) and chymotrypsin(EC 3.4.21.1) both belong to the S1 peptidasefamily and catalyze peptide bond cleavage byidentical mechanisms. A serine residue acts as anucleophile and the catalytic residues are in theorder His, Asp, Ser in the primary sequence.Both enzymes are endopeptidases and possessvery similar tertiary structures consisting of twojuxtaposed six stranded β-barrel domains (102,103). The substrate specificity of trypsin,expressed in relative kcat/Km values, is nearly106-fold higher for P1 Arg or Lys containingsubstrates compared to the activity towardsanalogous P1 Phe containing substrates.Conversely, chymotrypsin favours peptidesubstrates possessing Trp, Tyr and Phe at the P1

position, with an overall specificity relative toP1 Lys substrates of up to 104-fold.

Since the structures of the S1 subsites ofthe two enzymes are very similar, the differencein substrate specificity was thought to be asimple property of the local electrostaticenvironment. However, replacement of theprimary binding determinant Asp189 of trypsinwith the analogous Ser189 of chymotrypsin

failed to convert the specificity but, instead,resulted in a poorly performing nonspecificprotease (104). Conversion of trypsin to achymotrypsin-like protease required thesubstitution of four residues in the S1 subsitetogether with the exchange of two adjacentsurface loops, which do not directly contact thesubstrate (105). Inspection of the crystalstructures of the wild-type trypsin andchymotrypsin and those of several mutants,revealed the specificity determinants involved(99). The conserved Gly216, which contacts theP3 residue in both trypsin and chymotrypsin,turned out to be crucial for correct positioningof the substrate in the active site. The differentstructures of the surface loops in trypsin andchymotrypsin maintain Gly216 in distinctconformations, enabling this residue to functionas a specificity determinant despite beingconserved in both proteases.

The study of the trypsin-chymotrypsinsystem has led to a definition of two types ofspecificity determinants (99); primaryspecificity determinants encompassing aminoacids that directly contact the substrate, andsecondary specificity determinants which aremore distantly located elements in the protein.The secondary determinants can act throughvarious mechanisms such as influencing theconformation of primary determinants, as in thecase of Gly216 in trypsin and chymotrypsin, orby modulating the degree of flexibility in thesubstrate binding site. Examples of the latter canbe found in elastase (106, 107) and coenzyme Atransferase (108). The existence of secondaryspecificity determinants imply that substratespecificity is not necessarily determined by alimited set of amino acids in the substratebinding pockets. Instead, substrate specificitycan be a globally distributed propertydetermined by a large part of the protein fold.

The third example is another example ofa specificity determinant which is not located ina subsite. This example relates to the S10 family

Page 28: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

16

of serine carboxypeptidases. Carboxypeptidases(CPD's) catalyze the removal of amino acidsfrom the C-terminus of peptide substrates. TheS10 family of serine carboxypeptidases is agroup of eukaryotic proteases that, based ontheir primary structures, can be divided intothree groups (109), namely those that have asimilar S1 pocket environment as CPD-C, thosethat have a similar S1 pocket as CPD-D and asmall group of unassignable proteases. AllCPD-D like proteases preferentially hydrolyzesubstrates with a P1 Lys as compared toanalogous Leu containing substrates. Amongthe CDP-C carboxypeptidases some areselective for P1 Leu, others for P1 Lys.

Unexpectedly, the comparison ofprimary structures showed that the substratebinding pocket itself is fully conserved in allS10 family members, offering no explanationfor the differences in substrate specificity.However three residues around the S1 pocketwere not conserved. Mutation of these residuesand analysis of their effects showed that thepreference for a P1 Lys originated from the

accessibility of the P1 side chain in the pocket towater, not from a direct interaction between theprotein and the P1 side chain of the substrate(109-112).

The examples referred to above illustratethat the substrate binding pockets play animportant role in determining the substratespecificty. However, other residues outside thebinding pockets can influence the substratespecificity as well.

1.10 Binding modes and geometric effects inhydrophobic subsites.

A number of studies have shown thathydrophobic binding pockets can displaycomplex substrate binding behaviour (113).Different amino acids can show differentbinding modes in which substrates interact withdifferent residues in a hydrophobic bindingpocket. Furthermore, examples exist in whichneighbouring amino acids in the substrateinfluence the exact conformation of a substrateamino acid in a binding pocket. This isillustrated by the following two examples.

Table I.I. Dependence on the fluorogenic group of the substrate of the activity of thermolysin.

kcat/Km P2' SpecificityX =

Fa-GlyaX =

Cbz-GlybX =

Aaf cX =

Fa-GlyaX =

Cbz-GlybX =Aaf c

s-1 ⋅ M-1 × 10-3 ratio kcat/Km

X-Leu-NH2 22 5.1 0.0054 Gly/NH2 3.8 1.2 3.7X-Leu-Gly-OH 83 6.1 0.020 Phe/Gly 3.6 8.2 5.0X-Leu-Phe-OH 300 50 0.10 Ala/Phe 2.9 1.6 4.7X-Leu-Ala-OH 870 78 0.47 Leu/Ala n.d. 1.8 1.2X-Leu-Leu-OH n.d. 144 0.58 Ala/NH2 39.5 15.3 87

Leu/NH2 n.d. 28 107

adata from (116), bdata from (115), cdata from (64). Fa = 3-(2-Furyl)acryloyl, Cbz = benzyloxycarbonyl,Aaf = N-4-methoxyphenylazoformyl, n.d. = not determined.

Page 29: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

16

of serine carboxypeptidases. Carboxypeptidases(CPD's) catalyze the removal of amino acidsfrom the C-terminus of peptide substrates. TheS10 family of serine carboxypeptidases is agroup of eukaryotic proteases that, based ontheir primary structures, can be divided intothree groups (109), namely those that have asimilar S1 pocket environment as CPD-C, thosethat have a similar S1 pocket as CPD-D and asmall group of unassignable proteases. AllCPD-D like proteases preferentially hydrolyzesubstrates with a P1 Lys as compared toanalogous Leu containing substrates. Amongthe CDP-C carboxypeptidases some areselective for P1 Leu, others for P1 Lys.

Unexpectedly, the comparison ofprimary structures showed that the substratebinding pocket itself is fully conserved in allS10 family members, offering no explanationfor the differences in substrate specificity.However three residues around the S1 pocketwere not conserved. Mutation of these residuesand analysis of their effects showed that thepreference for a P1 Lys originated from the

accessibility of the P1 side chain in the pocket towater, not from a direct interaction between theprotein and the P1 side chain of the substrate(109-112).

The examples referred to above illustratethat the substrate binding pockets play animportant role in determining the substratespecificty. However, other residues outside thebinding pockets can influence the substratespecificity as well.

1.10 Binding modes and geometric effects inhydrophobic subsites.

A number of studies have shown thathydrophobic binding pockets can displaycomplex substrate binding behaviour (113).Different amino acids can show differentbinding modes in which substrates interact withdifferent residues in a hydrophobic bindingpocket. Furthermore, examples exist in whichneighbouring amino acids in the substrateinfluence the exact conformation of a substrateamino acid in a binding pocket. This isillustrated by the following two examples.

Table I.I. Dependence on the fluorogenic group of the substrate of the activity of thermolysin.

kcat/Km P2' SpecificityX =

Fa-GlyaX =

Cbz-GlybX =

Aaf cX =

Fa-GlyaX =

Cbz-GlybX =Aaf c

s-1 ⋅ M-1 × 10-3 ratio kcat/Km

X-Leu-NH2 22 5.1 0.0054 Gly/NH2 3.8 1.2 3.7X-Leu-Gly-OH 83 6.1 0.020 Phe/Gly 3.6 8.2 5.0X-Leu-Phe-OH 300 50 0.10 Ala/Phe 2.9 1.6 4.7X-Leu-Ala-OH 870 78 0.47 Leu/Ala n.d. 1.8 1.2X-Leu-Leu-OH n.d. 144 0.58 Ala/NH2 39.5 15.3 87

Leu/NH2 n.d. 28 107

adata from (116), bdata from (115), cdata from (64). Fa = 3-(2-Furyl)acryloyl, Cbz = benzyloxycarbonyl,Aaf = N-4-methoxyphenylazoformyl, n.d. = not determined.

Page 30: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

17

The first of these examples concernssubtilisin Lentus, also known as savinase, whichis a close relative to subtilisin BPN' (EC3.4.21.62) and belongs to the peptidase familyS8. Investigations into the origin of thespecificity of this subtilisin for substrates withlarge hydrophobic P4 residues revealed that notall substrates interact with the same pocketresidues (113). The orientation of a P4 Leudiffers from that of a P4 Phe, such that theyinteract with different residues of the S4 pocket.The differences in interaction enabled theselective increase in specificity towards eitherP4 Leu or P4 Phe by selectively mutating pocketresidues without changing the activity towardsother substrates (98, 111, 113).

The second example concerns to peptidesubstrates labeled with a fluorogenic groupwhich are widely used to determine thesubstrate specificity of a protease. An oftenignored problem using such peptides is the factthat the fluorogenic group can have aconsiderable effect on substrate preference. Theeffects of different fluorogenic groups becameapparent when the substrate specificity of TLNwas determined with Morihara-Tsuzuki peptidesubstrates that only differed in their fluorogenicgroup (114). Table I.I clearly demonstrates thiseffect.

The Morihara-Tsuzuki peptide series is aset of five peptides commonly used to determinethe P2' preference of TLPs (115). The data inTable I.I show that the specificity for thesepeptides strongly depends on the nature of thefluorogenic group. This range of peptides with avariable P2' residue has been used with threedifferent fluorogenic groups, indicated in thetable. Using Cbz as the fluorogenic group theactivity is increased only 28-fold when the P2'NH2 was changed to P2' Leu, which is in sharpcontrast with a 107-fold increase in activitywhen Aaf was used as the fluorogenic group.The effect of the fluorogenic group probablyoriginates from the fact that the group itself

occupies a binding pocket, which in TLPs isusually either the S2 or S1 subsite. The bindingof an amino acid or fluorogenic group in asubsite can influence the binding mode of therest of the substrate resulting in a change inactivity.

The examples referred to above illustratethat the configuration of a substrate is importantfor the activity of the enzyme. The examplesalso illustrate that the conformation of aparticular side chain of the substrate maydepend on the other amino acids of thesubstrate. Thus, if peptide substrate data areused to discuss any mechanism or change inspecificity of a protease, these effects should betaken into account.

1.11 Chemical modification and substratespecificity.

Prior to the development of site-directedmutagenesis techniques only chemical methodswere available to protein chemists to alterenzyme properties (116). One of the mainproblems of chemical modification of enzymesis that the extent and precise location of themodifications often remain uncertain becausemost reagents are unspecific. Furthermore,heterogeneous mixtures are often produced.Despite such disadvantages, combiningchemical modification and site directedmutagenesis provide a unique handle for proteinmodification, because chemical modification ofthe substituted amino acid offers the possibilityto introduce virtually any desired molecule at aspecific site in the protein. This approach allowsone to introduce unnatural amino acid sidechains and to circumvent the limitations instructural variations imposed by the occurrenceof only 20 natural amino acids.

An example of the combinedmutagenesis and chemical modificationapproach to modify the substrate specificity ofsubtilisin concerns the introduction of a uniquecysteine in the S1 binding pocket, followed by

Page 31: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Chapter 1

18

its chemical modification withmethanethiosulfonate reagents to generatechemically modified mutant enzymes (116-118). A potential problem with this approach isthe size reduction of the binding pocket due tothe introduced chemical modification. Studieswith subtilisin indeed showed some sizeexclusion effects (118) although a proper choiceof the modification site could avoid some ofthese problems. The various available reagentsoffer the possibility to introduce novelfunctionalities in a binding pocket, such asmultiple negative charges. In this wayencouraging results concerning the alteration ofsubstrate specificity have already been obtained(119).

Another possible application of chemicalmodification of proteins is the production ofglycosylated heterologous proteins byprokaryotes. A persistent problem of eukaryoticgene expression in prokaryotes is the lack ofglycosylation of the expressed proteins. Regio-selective glycosylation of subtilisin wasobtained through site directed mutagenesis andsubsequent chemical modification afterpurification of the protein (120, 121). This is animportant advance for the production ofproperly glycosylated eukaryotic proteins inprokaryotes.

1.12 Scope of this thesis.This thesis describes the engineering of

the activity and substrate specificity of severalthermolysin-like proteases from Bacillus. Themajority of the experiments were performedwith the thermolysin-like protease from Bacillusstearothermophilus (TLP-ste) and thermolysinfrom Bacillus thermoproteolyticus (TLN).

After a general introduction in chapter 1,chapter 2 discribes the characterization of TLPswith peptide substrates, as well as HPLCanalysis of β-casein digests. The results indicatethat the M4 family is a homogeneous family interms of catalysis, even though there is a

significant degree of amino acid sequencevariation. The results of this study show thatdifferences in substrate specificity within theM4 family do not correlate with overallsequence differences but depend on a smallnumber of identifiable amino acids. Indeed,molecular modeling, followed by site directedmutagenesis of one of the substrate bindingpocket residues of the TLP of B.stearothermophilus, converted the catalyticcharacteristics of this variant into that ofthermolysin.

Chapter 3 shows the importance ofconserved glycines in the proposed hinge-bending regions by analyzing the effects ofGly�Ala mutations on catalytic activity.Comparisons of effects on kcat/Km for varioussubstrates with effects on the Ki forphosphoramidon, suggested that the mutation atposition 78 primarily had an effect on substratebinding, whereas the mutations at positions 135and 136 primarily influence kcat. The apparentimportance of conserved glycine residues inproposed hinge-bending regions for TLPactivity supports the idea that hinge-bending isan essential part of catalysis.

Chapter 4 discribes the properties of themajor specificity determining hydrophobic S1'pocket. The results indicate that the S1' Phe/Leupreference can be changed by increasing theactivity towards substrates with a P1' Phe. Inaddition, the results obtained with TLN andTLP-ste support the quality of the TLP-stemodel and indicate that the substrate preferenceof all TLPs can be modified in a similar manneras the substrate specificity of TLP-ste. The 16-fold increase in activity of the Leu202Tyrmutant towards a P1' Phe containing substrate isone of the highest found in the literature for asingle mutant.

Chapter 5 examines the possibility ofchanging the active site electrostatics of thethermolysin-like protease from B.stearothermophilus by inserting or removing

Page 32: University of Groningen Engineering specificity and ... · other bonds, including phosphoric anhydride bonds. The second figure is the subclass. Subclass 3.4, one of the 12 current

Introduction

19

charges on the protein surface by site-directedmutagenesis. The results show that the effectson the kcat/Km of single point mutations are non-additive, even in cases where the pointmutations are 10Å or more removed from theactive site Zn2+ and separated from each otherby up to 25Å. This suggests that electrostaticnetworks are probably more complex thanpreviously thought and that possible othereffects, such as active site dynamics, play animportant role in determining the active site

electrostatics. Several mutations caused asignificant increase in enzyme activity, the mostactive mutant being almost four times as activeas the wild-type. The shape of the pH-activityprofile was changed significantly. Remarkably,this was achieved without large changes of thepH-optimum of the enzyme.

Chapter 6 contains the summary and ageneral discussion of the results and conclusionsof the present work.