supplementary information rational assignment of key ... · supplementary information rational...

1

Supplementary Information

Rational Assignment of Key Motifs for Function Guide in silico Enzyme Identification

Matthias Höhne1,3, Sebastian Schätzle1,3, Helge Jochens1 Karen Robins2, Uwe. T. Bornscheuer*1

1Institute of Biochemistry, Dept. of Biotechnology and Enzyme Catalysis, Greifswald University,

Felix Hausdorff-Str. 4, 17487 Greifswald (Germany) 2Lonza AG, Valais Works, Visp (Switzerland) 3These authors contributed equally to this work.

Supplementary Results

Figure S1: View on the superimposed active sites of BCAT and DATA. The structures of DATA (pdb:

3DAA) and BCAT (pdb: 1IYE) are colored blue and yellow, respectively. Selected amino acid residues and

the PLP-cofactor with bound D-alanine of DATA are shown as lines, residues from BCAT including PLP

bound L-glutamate as sticks. For reasons of clarity, only the α-carbon atom and the α-carboxyl group of the

PLP-bound L-glutamate of BCAT are shown. Residues that are part of binding pocket A or B (Figure 1B)

are highlighted in cyan and green, respectively. An asterisk indicates that these residues are part of the other

monomer of the dimeric enzyme. The numbering corresponds to the unified numbering scheme shown in the

multiple sequence alignment in Figure S2.

Y95 R97

R107*

H109*

Y36 A263 T262

R/K40

PLP

Nature Chemical Biology: doi: 10.1038/nchembio.447

2

Figure S2: Multiple structural sequence alignment of fold class IV PLP dependent enzymes.

ttBCAT: Thermos thermophilus BCAT (pdb: 1WRV); ecBCAT – E. coli BCAT (pdb: 1IYD); bsDATA –

Bacillus species DATA (pdb 3DAA), ecADCL – E. coli 4-amino-4-deoxychorismate lyase (ADCL, pdb:

1ET0). The alignment was performed using the ClustalW3D algorithm of the program STRAP. The color

shading indicates the secondary structure: yellow – β-sheets, pink – α-helices. The catalytically important

lysine residue is shaded in red, likewise the catalytically important threonine residue of ADCL. The amino

acids contributing to substrate recognition in binding pocket A (Figure S1, Figure 1B) are marked in cyan,

amino acids that are important for binding the substrate in binding pocket B are highlighted in green. The

first block of the sequence motif is located on a β-sheet, which ranges across the whole active site (Figure

S1). This is the reason why in a short segment of five amino acids different residues contribute to different

binding pockets.


3

Figure S3: Section of the multiple sequence alignment of bacterial ADCL-sequences from the B6-

Database. NP_251654 (Pseudomonas aeruginosa); NP_744071 (Pseudomonas putida ); NP_718200

(Shewanella oneidensis MR-1); PABC_VIBCH (Vibrio cholerae); T12054 (Vibrio harveyi); ZP_00128240

(Pseudomonas syringae); CAE15204 (Photorhabdus luminescens); NP_455691 (Salmonella enterica subsp.

enterica serovar Typhi); NP_903080 (Chromobacterium violaceum); NP_405184 (Yersinia pestis);

NP_798430 (Vibrio parahaemolyticus); ZP_00089659 (Azotobacter vinelandii); PABC_ECOLI

(Escherichia coli); Protein sequences with experimentally confirmed activity are shown in bold.


4

Figure S4: Section of the multiple sequence alignment of BCAT-sequences from the B6-Database. ILVE_SALTY (Salmonella typhimurium); ZP_00117422 (Cytophaga hutchinsonii); NP_715980 (Shewanella oneidensis); AAK79447 (Clostridium acetobutylicum); AAM72830 (Chlorobium tepidum); AAQ00003 (Prochlorococcus marinus); ZP_00071333 (Trichodesmium erythraeum ); BAC96006 (Vibrio vulnificus); ILVE_BACSU (Bacillus subtilis); CAC12788 (Staphylococcus carnosus); AAO91867 (Bacillus cereus); AAO91868 (Bacillus anthracis); ILVE_ECOLI (Escherichia coli); ZP_00061646 (Clostridium thermocellum ); ILVE_HELPY (Helicobacter pylori); AAF34406 (Lactococcus lactis ); NP_216726 (Mycobacterium tuberculosis H37Rv); NP_607111 (Streptococcus pyogenes ); YBGE_BACSU (Bacillus subtilis); FungiBCA2_YEAST (Saccharomyces cerevisiae); BCA1_YEAST (Saccharomyces cerevisiae); BCA1_SCHPO (Schizosaccharomyces pombe); MetazoaAF184916 (Ovis aries); BCAT_CAEEL (Caenorhabditis elegans); BCAT_HUMAN (Homo sapiens); AAH48072 (Mus musculus); BCAM_HUMAN (Homo sapiens); BCAM_RAT (Rattus norvegicus); AAH59513 (Danio rerio); BCAT_RAT (Rattus norvegicus); Protein sequences with experimentally confirmed activity are shown in bold.


5

Figure S5: Section of the multiple sequence alignment of DATA-sequences from the B6-Database.

DAAA_LISIN (Listeria innocua); AAY98539 (Geobacillus toebii); AAO91869 (Bacillus cereus);

ZP_00015048 (Rhodospirillum rubrum); NP_243677 (Bacillus halodurans); DAAA_BACSU (Bacillus

subtilis); CAB82475 (Staphylococcus aureus); DAAA_BACSP (Bacillus sp.); DAAA_STAHA

(Staphylococcus haemolyticus); AAY98538 (Geobacillus sp. KLS-1); NP_692004 (Oceanobacillus

iheyensis); NP_847638 (Bacillus anthracis); DAAA_LISMO (Listeria monocytogenes); NP_764978

(Staphylococcus epidermidis); DAAA_BACSH (Bacillus sphaericus); ZP_00023345 (Ralstonia

metallidurans); CAE39908 (Bordetella parapertussis); Protein sequences with experimentally confirmed

activity are shown in bold.


6

Figure S6: Flow chart for the sequence motif-based prediction of substrate specificity and

enantiopreference of PLP-dependent proteins of fold-class IV. In the first two steps of the algorithm,

query proteins are aligned to BCAT or DATA sequences and proteins that are likely to be non-functional (e.

g. truncated proteins) are removed. Next, proteins that can be annotated as BCAT, DATA and ADCL by

means of the respective sequence motifs are removed. In the last step of the algorithm, the remaining

proteins are examined as to whether they match criteria predicted for (R)-selective amine-TA activity.


7

Figure S7: Section of the multiple sequence alignment of DATA protein sequences found in the NCBI

curated CDD Database. The NCBI curated CDD database lists 26 proteins, which were annotated as DATA

(http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=cd01558&#seqhrch; as of Sept. 10th

2009)

Amino acids, which are in accordance with the sequence motifs described here are highlighted in yellow. In

some cases, we suggest a different annotation since important amino acids correspond to the sequence motifs

found for BCAT (colored in green) or amine-TA (colored in cyan). In some cases, an unambiguous

annotation according to our sequence motifs is not possible, for example for gi_22299586, gi_22299586, and

gi_26391648.


8

Table S1: Structural features used for annotation of PLP-fold class IV proteins

From multiple sequence alignments, performed with the sequences deposited in the B6-database1 (Figures

S3-S5): http://bioinformatics.unipr.it/cgi-bin/bioinformatics/B6db/home.pl (as of March 1st 2009), conserved

motifs considered important for substrate recognition and enantiopreference are described here. The

numbering of amino acid residues in the following section corresponds to the unified numbering scheme

shown in the multiple sequence alignment in Figure S2.

Most transaminases exhibit dual substrate recognition, for example BCAT converts both hydrophobic amino

acids valine and leucine, but also the acidic amino acid glutamate. Hence, to allow a binding of acidic and

hydrophobic side chains in the same binding pocket, transaminases developed complex structural features.

For reasons of clarity, these dual substrate recognition mechanisms of the enzymes are not discussed here in

every detail. More structural informations for dual substrate recognition can be found in a review2.

Feature Explanation/remark

4-Amino-4-deoxychorismate lyase (ADCL); for a multiple sequence alignment of ADCL see Figure S3.

Thr38 Conserved catalytic residue; plays a role in the catalytic mechanism for shuttling a

proton in the transition state (see Fig. 2 in main text)

Arg97, Lys107 Binding of a carboxyl group of the substrate in binding pockets A and B

Phe36 Conserved residue

107 RGY-motif Conserved motif with unknown structural function. The presence of this motif may

indicate a lyase activity, its absence, however cannot be used to rule out a lyase activity

Branched chain amino acid aminotransferase (BCAT); For a multiple sequence alignment see Figure S4.

95YzR-motiv

Arg/Lys 40

G38

Coordination of substrate α-carboxylate group in binding pocket B: most characteristic

conserved amino acid sequence motif; z is one of the hydrophobic amino acids valine,

isoleucine or leucine.

activates the backbone-amide nitrogen of Thr262/Ala263

conserved in order to provide enough space for the carboxyl oxygen atoms

107 ZGZ-motif, F36 Z is one of the hydrophobic amino acids valine, isoleucine or leucine or methionine

accommodation of the hydrophobic side chain of the substrate in binding pocked B

Y31 Conserved residue; this tyrosine is involved in the dual substrate recognition of the γ-

carboxyl group of L-glutamate in BCAT


9

D-Amino acid aminotransferase (DATA); For a multiple sequence alignment see Figure S5.

107RxH

Tyr36

Most important structural determinant in DATA since these residues are important for

coordination of the carboxyl group in binding pocket A. Also Tyr36 may contribute to

the α-carboxylate group coordination in binding pocket A.

95-xzYzQ-motif strongly conserved motif; x is any amino acid and z represents one of the hydrophobic

amino acids valine, leucine, isoleucin, methionine; binding of the hydrophobic

substituent in pocket B

F31 Conserved residue near to binding pocket A. This F31 differs to Y31 of BCAT. The

reason might be that the hydroxyl group of a tyrosine would cause a steric clash with the

side chain of Arg107 (in BCAT there is more space because of the smaller

methionine107 residue, which additionally adopts a different configuration).

Furthermore, this tyrosine is not needed for dual substrate recognition, since the γ-

carboxyl group of glutamate have to be bound on the opposite side close to binding

pocket B because of the D-configuration of glutamate.

V38 Valine is found as conserved residue in the DATA sequences, compared to G38 in

BCAT and T38 in ADCL.

(R)-Amine-TA

95Y(F)Vz-motif Binding of small alkyl substituent in binding pocket B; z preferably being glutamate, but

also the amino acids alanine, serine, glutamine and asparagine occur.

40 ≠ Arg,Lys

Although the presence of R or K40 could not exclude an amine-TA activity, all

sequences showed amino acids different from arginine or lysine at position, namely His,

Ser, Ala, Thr or Pro. This supports additionally possible amine-TA activity.

105 and following

residues

–a)

a) binding of a carboxylate group in binding pocket A can not be excluded, but this is not necessary, since all

of the in the literature described amine-TA show dual substrate recognition. Alongside the hydrophobic

residue, also a carboxylate group have to be recognized in binding pocket A (for example if alanine or

pyruvate is bound). The presence of an arginine or lysine residue in positions 107–109 may be involved in a

dual substrate recognition. Differentiating a possible amine-TA from a putative DATA is therefore not as

straight forward as the differentiation from a BCAT, since a (R)-amine-TA is expected to convert the D-

amino acid alanine. Thus the question is not whether the candidate amine-TA do convert D-amino acids or

not, but whether the activity is restricted only to D-amino acids or not. In the sequence alignments of the

putative amine-TA, clear differences to all known DATA are observed, since the 97-YxQ and 107-RxH

motifs are absent in the putative amine-TA sequences.


10

Table S2. Summary of most important amino acid differences in fold class IV PLP-dependent enzymes.

Position 31 36 38 40 95ff 105ff

ADCL F/Y F T X zxK RGY

DATA F Y V K(R/X) xzYzQ RxH

BCAT Y F G R(K) YzR zGz

Amine-TA H/R Y V/T S(T/A/H/P) F(Y)VE(ANQ) –


11

Table S3: Protein yield after purification of (R)-amine-TA.

Entry Enzyme source Protein yield

after purification

[mg]

4 Aspergillus terreus 8.6

5 Penicillium chrysogenum 26.2

6 Aspergillus niger –

7 Aspergillus oryzae 20.6

8 Aspergillus fumigatus 14.8

9 Neosartorya fischeri 23.3

10 Gibberella zeae 4.8

11 Hyphomonas neptunium 6.5

12 Mycobacterium vanbaalenii 8.9

13 Mesorhizobium loti 6.9

14 Mesorhizobium loti 5.3

15 Roseobacter sp. 27.5

16 Marimonas sp. 23.7

17 Rhizobium etli 6.5

18 Rhodoferax ferrireducens 7.5

19 Jannaschia sp. 24.8

20 Labrenzia alexandrii 12.5

21 Burkholderia sp. 41.6

22 Burkholderia cenocepacia –

23 alpha proteobacterium –

24 gamma proteobacterium 2.6

The amount of purified protein originates from ~3 g cells (see Method section)


12

Table S4: Specific activities of the enzymes towards various substrates.

Substratea Pyruvate Pyruvate Pyruvate Pyruvate 2KG MOB

1 2 3 4 D-Ala L_Glu

Entry R S R S R S R S

4 2.91 <0.001 15.2 <0.001 9.7 <0.001 0.031 <0.001 <0.001 0.003

5 1.1 0.044 1.3 <0.001 5.6 <0.001 0.264 <0.001 <0.001 <0.001

6 - b) - - - - - - - - -

7 1.4 0.023 3.7 0.001 5.2 0.002 0.051 0.002 <0.001 <0.001

8 2.4 <0.001 4.1 <0.001 4.5 <0.001 0.009 <0.001 <0.001 0.005

9 7.4 <0.001 4.5 <0.001 6.0 <0.001 0.013 <0.001 <0.001 0.005

10 19.6 <0.001 18.6 <0.001 8.2 <0.001 <0.001 <0.001 <0.001 0.016

11 3.2 0.225 3.6 <0.001 20.7 <0.001 0.163 <0.001 <0.001 0.012

12 5.6 <0.001 4.7 <0.001 2.6 <0.001 <0.001 <0.001 <0.001 0.003

13 0.003 <0.001 0.011 <0.001 0.010 <0.001 0.001 <0.001 0.004 0.004

14 0.124 <0.001 0.013 <0.001 0.002 <0.001 <0.001 <0.001 <0.001 0.005

15 0.001 <0.001 0.003 <0.001 0.001 <0.001 0.001 <0.001 0.003 0.002

16 0.020 <0.001 0.002 <0.001 0.003 <0.001 <0.001 <0.001 <0.001 0.003

17 0.012 <0.001 0.867 <0.001 0.260 <0.001 <0.001 <0.001 0.020 0.016

18 0.001 <0.001 0.056 <0.001 0.307 <0.001 <0.001 <0.001 0.010 0.098

19 0.071 0.002 0.059 0.007 0.370 0.068 0.022 <0.001 0.062 0.020

20 0.073 0.001 0.060 0.003 0.120 0.027 0.205 0.002 0.063 0.023

21 0.002 <0.001 0.017 <0.001 1.1 0.007 <0.001 <0.001 <0.001 0.001

22 - - - - - - - - - -

23 - - - - - - - - - -

24 0.610 0.004 0.028 <0.001 0.034 <0.001 <0.001 <0.001 <0.001 0.031

a1 – 1-phenylethylamine, 2 – 2-aminohexane, 3 – 4-phenyl-2-aminobutane, 4 – 1-N-Boc-3-aminopyrrolidine, 2KG – 2-ketoglutarate, D-Ala – D-alanine, L-Glu – L-glutamate, MOB – 3-methyl-2-oxobutyric acid.

Details of the assays are described in materials and methods. The protein entry is identical to supplementary Table S3. All measurements were done at least in duplicates. The deviation of single measurements from the mean value was < 10 %. The specific activity was expressed as units per milligram protein. One unit of activity was defined as the amount of enzyme that produced 1 µmol ketone product per minute. bMeasurement was not possible since protein yield during expression was very low or the protein was unstable during purification.


13

Table S5: Asymmetric synthesis of (R)-amines 2–4 using three newly identified amine-TAsa.

Product Amine-TAb Conversion

[%]c

Enantiomeric

excess [%eeP]d

2 Ate 32 >99

2 Mlo 41 >99

2 Mva 35 >99

3 Ate 15 >99

3 Mlo 1 95.0

3 Mva 2 >99

4a Ate 14 >99

4b Ate 11 >99

aReaction conditions: 50 mM ketone, 250 mM D-alanine, 100 mM sodium phosphate buffer pH 7.0, 1 mM

PLP, 1 mM NADH. The co-product pyruvate from the reaction was removed with lactate dehydrogenase

(LDH) as described in literature31. For cofactor recycling, glucose dehydrogenase (GDH) was used. bAte:

Aspergillus terreus; Mva: Mycobacterium vanbaalenii and Mlo: Mesorhizobium loti (Entries Nr. 4, 12 and

14 in Table 1) cConversions were not optimized. The deviation of a single measurement from the mean value

did not exceed 10 %. Amines 4a–b were only detected in reactions with Ate-TA. d(R)-enantiomers.

References

1. Percudani, R. & Peracchi, A. The B6 database: a tool for the description and classification of vitamin

B6-dependent enzymatic activities and of the corresponding protein families. BMC Bioinformatics

10, 273 (2009).

2. Hirotsu, K., Goto, M., Okamoto, A. & Miyahara, I. Dual substrate recognition of aminotransferases.

Chem. Rec. 5, 160-172 (2005).


supplementary information rational assignment of key ... · supplementary information rational...

Documents