supplementary information rational assignment of key ... · supplementary information rational...
TRANSCRIPT
1
Supplementary Information
Rational Assignment of Key Motifs for Function Guide in silico Enzyme Identification
Matthias Höhne1,3, Sebastian Schätzle1,3, Helge Jochens1 Karen Robins2, Uwe. T. Bornscheuer*1
1Institute of Biochemistry, Dept. of Biotechnology and Enzyme Catalysis, Greifswald University,
Felix Hausdorff-Str. 4, 17487 Greifswald (Germany) 2Lonza AG, Valais Works, Visp (Switzerland) 3These authors contributed equally to this work.
Supplementary Results
Figure S1: View on the superimposed active sites of BCAT and DATA. The structures of DATA (pdb:
3DAA) and BCAT (pdb: 1IYE) are colored blue and yellow, respectively. Selected amino acid residues and
the PLP-cofactor with bound D-alanine of DATA are shown as lines, residues from BCAT including PLP
bound L-glutamate as sticks. For reasons of clarity, only the α-carbon atom and the α-carboxyl group of the
PLP-bound L-glutamate of BCAT are shown. Residues that are part of binding pocket A or B (Figure 1B)
are highlighted in cyan and green, respectively. An asterisk indicates that these residues are part of the other
monomer of the dimeric enzyme. The numbering corresponds to the unified numbering scheme shown in the
multiple sequence alignment in Figure S2.
Y95 R97
R107*
H109*
Y36 A263 T262
R/K40
PLP
Nature Chemical Biology: doi: 10.1038/nchembio.447
2
Figure S2: Multiple structural sequence alignment of fold class IV PLP dependent enzymes.
ttBCAT: Thermos thermophilus BCAT (pdb: 1WRV); ecBCAT – E. coli BCAT (pdb: 1IYD); bsDATA –
Bacillus species DATA (pdb 3DAA), ecADCL – E. coli 4-amino-4-deoxychorismate lyase (ADCL, pdb:
1ET0). The alignment was performed using the ClustalW3D algorithm of the program STRAP. The color
shading indicates the secondary structure: yellow – β-sheets, pink – α-helices. The catalytically important
lysine residue is shaded in red, likewise the catalytically important threonine residue of ADCL. The amino
acids contributing to substrate recognition in binding pocket A (Figure S1, Figure 1B) are marked in cyan,
amino acids that are important for binding the substrate in binding pocket B are highlighted in green. The
first block of the sequence motif is located on a β-sheet, which ranges across the whole active site (Figure
S1). This is the reason why in a short segment of five amino acids different residues contribute to different
binding pockets.
Nature Chemical Biology: doi: 10.1038/nchembio.447
3
Figure S3: Section of the multiple sequence alignment of bacterial ADCL-sequences from the B6-
Database. NP_251654 (Pseudomonas aeruginosa); NP_744071 (Pseudomonas putida ); NP_718200
(Shewanella oneidensis MR-1); PABC_VIBCH (Vibrio cholerae); T12054 (Vibrio harveyi); ZP_00128240
(Pseudomonas syringae); CAE15204 (Photorhabdus luminescens); NP_455691 (Salmonella enterica subsp.
enterica serovar Typhi); NP_903080 (Chromobacterium violaceum); NP_405184 (Yersinia pestis);
NP_798430 (Vibrio parahaemolyticus); ZP_00089659 (Azotobacter vinelandii); PABC_ECOLI
(Escherichia coli); Protein sequences with experimentally confirmed activity are shown in bold.
Nature Chemical Biology: doi: 10.1038/nchembio.447
4
Figure S4: Section of the multiple sequence alignment of BCAT-sequences from the B6-Database. ILVE_SALTY (Salmonella typhimurium); ZP_00117422 (Cytophaga hutchinsonii); NP_715980 (Shewanella oneidensis); AAK79447 (Clostridium acetobutylicum); AAM72830 (Chlorobium tepidum); AAQ00003 (Prochlorococcus marinus); ZP_00071333 (Trichodesmium erythraeum ); BAC96006 (Vibrio vulnificus); ILVE_BACSU (Bacillus subtilis); CAC12788 (Staphylococcus carnosus); AAO91867 (Bacillus cereus); AAO91868 (Bacillus anthracis); ILVE_ECOLI (Escherichia coli); ZP_00061646 (Clostridium thermocellum ); ILVE_HELPY (Helicobacter pylori); AAF34406 (Lactococcus lactis ); NP_216726 (Mycobacterium tuberculosis H37Rv); NP_607111 (Streptococcus pyogenes ); YBGE_BACSU (Bacillus subtilis); FungiBCA2_YEAST (Saccharomyces cerevisiae); BCA1_YEAST (Saccharomyces cerevisiae); BCA1_SCHPO (Schizosaccharomyces pombe); MetazoaAF184916 (Ovis aries); BCAT_CAEEL (Caenorhabditis elegans); BCAT_HUMAN (Homo sapiens); AAH48072 (Mus musculus); BCAM_HUMAN (Homo sapiens); BCAM_RAT (Rattus norvegicus); AAH59513 (Danio rerio); BCAT_RAT (Rattus norvegicus); Protein sequences with experimentally confirmed activity are shown in bold.
Nature Chemical Biology: doi: 10.1038/nchembio.447
5
Figure S5: Section of the multiple sequence alignment of DATA-sequences from the B6-Database.
DAAA_LISIN (Listeria innocua); AAY98539 (Geobacillus toebii); AAO91869 (Bacillus cereus);
ZP_00015048 (Rhodospirillum rubrum); NP_243677 (Bacillus halodurans); DAAA_BACSU (Bacillus
subtilis); CAB82475 (Staphylococcus aureus); DAAA_BACSP (Bacillus sp.); DAAA_STAHA
(Staphylococcus haemolyticus); AAY98538 (Geobacillus sp. KLS-1); NP_692004 (Oceanobacillus
iheyensis); NP_847638 (Bacillus anthracis); DAAA_LISMO (Listeria monocytogenes); NP_764978
(Staphylococcus epidermidis); DAAA_BACSH (Bacillus sphaericus); ZP_00023345 (Ralstonia
metallidurans); CAE39908 (Bordetella parapertussis); Protein sequences with experimentally confirmed
activity are shown in bold.
Nature Chemical Biology: doi: 10.1038/nchembio.447
6
Figure S6: Flow chart for the sequence motif-based prediction of substrate specificity and
enantiopreference of PLP-dependent proteins of fold-class IV. In the first two steps of the algorithm,
query proteins are aligned to BCAT or DATA sequences and proteins that are likely to be non-functional (e.
g. truncated proteins) are removed. Next, proteins that can be annotated as BCAT, DATA and ADCL by
means of the respective sequence motifs are removed. In the last step of the algorithm, the remaining
proteins are examined as to whether they match criteria predicted for (R)-selective amine-TA activity.
Nature Chemical Biology: doi: 10.1038/nchembio.447
7
Figure S7: Section of the multiple sequence alignment of DATA protein sequences found in the NCBI
curated CDD Database. The NCBI curated CDD database lists 26 proteins, which were annotated as DATA
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=cd01558&#seqhrch; as of Sept. 10th
2009)
Amino acids, which are in accordance with the sequence motifs described here are highlighted in yellow. In
some cases, we suggest a different annotation since important amino acids correspond to the sequence motifs
found for BCAT (colored in green) or amine-TA (colored in cyan). In some cases, an unambiguous
annotation according to our sequence motifs is not possible, for example for gi_22299586, gi_22299586, and
gi_26391648.
Nature Chemical Biology: doi: 10.1038/nchembio.447
8
Table S1: Structural features used for annotation of PLP-fold class IV proteins
From multiple sequence alignments, performed with the sequences deposited in the B6-database1 (Figures
S3-S5): http://bioinformatics.unipr.it/cgi-bin/bioinformatics/B6db/home.pl (as of March 1st 2009), conserved
motifs considered important for substrate recognition and enantiopreference are described here. The
numbering of amino acid residues in the following section corresponds to the unified numbering scheme
shown in the multiple sequence alignment in Figure S2.
Most transaminases exhibit dual substrate recognition, for example BCAT converts both hydrophobic amino
acids valine and leucine, but also the acidic amino acid glutamate. Hence, to allow a binding of acidic and
hydrophobic side chains in the same binding pocket, transaminases developed complex structural features.
For reasons of clarity, these dual substrate recognition mechanisms of the enzymes are not discussed here in
every detail. More structural informations for dual substrate recognition can be found in a review2.
Feature Explanation/remark
4-Amino-4-deoxychorismate lyase (ADCL); for a multiple sequence alignment of ADCL see Figure S3.
Thr38 Conserved catalytic residue; plays a role in the catalytic mechanism for shuttling a
proton in the transition state (see Fig. 2 in main text)
Arg97, Lys107 Binding of a carboxyl group of the substrate in binding pockets A and B
Phe36 Conserved residue
107 RGY-motif Conserved motif with unknown structural function. The presence of this motif may
indicate a lyase activity, its absence, however cannot be used to rule out a lyase activity
Branched chain amino acid aminotransferase (BCAT); For a multiple sequence alignment see Figure S4.
95YzR-motiv
Arg/Lys 40
G38
Coordination of substrate α-carboxylate group in binding pocket B: most characteristic
conserved amino acid sequence motif; z is one of the hydrophobic amino acids valine,
isoleucine or leucine.
activates the backbone-amide nitrogen of Thr262/Ala263
conserved in order to provide enough space for the carboxyl oxygen atoms
107 ZGZ-motif, F36 Z is one of the hydrophobic amino acids valine, isoleucine or leucine or methionine
accommodation of the hydrophobic side chain of the substrate in binding pocked B
Y31 Conserved residue; this tyrosine is involved in the dual substrate recognition of the γ-
carboxyl group of L-glutamate in BCAT
Nature Chemical Biology: doi: 10.1038/nchembio.447
9
D-Amino acid aminotransferase (DATA); For a multiple sequence alignment see Figure S5.
107RxH
Tyr36
Most important structural determinant in DATA since these residues are important for
coordination of the carboxyl group in binding pocket A. Also Tyr36 may contribute to
the α-carboxylate group coordination in binding pocket A.
95-xzYzQ-motif strongly conserved motif; x is any amino acid and z represents one of the hydrophobic
amino acids valine, leucine, isoleucin, methionine; binding of the hydrophobic
substituent in pocket B
F31 Conserved residue near to binding pocket A. This F31 differs to Y31 of BCAT. The
reason might be that the hydroxyl group of a tyrosine would cause a steric clash with the
side chain of Arg107 (in BCAT there is more space because of the smaller
methionine107 residue, which additionally adopts a different configuration).
Furthermore, this tyrosine is not needed for dual substrate recognition, since the γ-
carboxyl group of glutamate have to be bound on the opposite side close to binding
pocket B because of the D-configuration of glutamate.
V38 Valine is found as conserved residue in the DATA sequences, compared to G38 in
BCAT and T38 in ADCL.
(R)-Amine-TA
95Y(F)Vz-motif Binding of small alkyl substituent in binding pocket B; z preferably being glutamate, but
also the amino acids alanine, serine, glutamine and asparagine occur.
40 ≠ Arg,Lys
Although the presence of R or K40 could not exclude an amine-TA activity, all
sequences showed amino acids different from arginine or lysine at position, namely His,
Ser, Ala, Thr or Pro. This supports additionally possible amine-TA activity.
105 and following
residues
–a)
a) binding of a carboxylate group in binding pocket A can not be excluded, but this is not necessary, since all
of the in the literature described amine-TA show dual substrate recognition. Alongside the hydrophobic
residue, also a carboxylate group have to be recognized in binding pocket A (for example if alanine or
pyruvate is bound). The presence of an arginine or lysine residue in positions 107–109 may be involved in a
dual substrate recognition. Differentiating a possible amine-TA from a putative DATA is therefore not as
straight forward as the differentiation from a BCAT, since a (R)-amine-TA is expected to convert the D-
amino acid alanine. Thus the question is not whether the candidate amine-TA do convert D-amino acids or
not, but whether the activity is restricted only to D-amino acids or not. In the sequence alignments of the
putative amine-TA, clear differences to all known DATA are observed, since the 97-YxQ and 107-RxH
motifs are absent in the putative amine-TA sequences.
Nature Chemical Biology: doi: 10.1038/nchembio.447
10
Table S2. Summary of most important amino acid differences in fold class IV PLP-dependent enzymes.
Position 31 36 38 40 95ff 105ff
ADCL F/Y F T X zxK RGY
DATA F Y V K(R/X) xzYzQ RxH
BCAT Y F G R(K) YzR zGz
Amine-TA H/R Y V/T S(T/A/H/P) F(Y)VE(ANQ) –
Nature Chemical Biology: doi: 10.1038/nchembio.447
11
Table S3: Protein yield after purification of (R)-amine-TA.
Entry Enzyme source Protein yield
after purification
[mg]
4 Aspergillus terreus 8.6
5 Penicillium chrysogenum 26.2
6 Aspergillus niger –
7 Aspergillus oryzae 20.6
8 Aspergillus fumigatus 14.8
9 Neosartorya fischeri 23.3
10 Gibberella zeae 4.8
11 Hyphomonas neptunium 6.5
12 Mycobacterium vanbaalenii 8.9
13 Mesorhizobium loti 6.9
14 Mesorhizobium loti 5.3
15 Roseobacter sp. 27.5
16 Marimonas sp. 23.7
17 Rhizobium etli 6.5
18 Rhodoferax ferrireducens 7.5
19 Jannaschia sp. 24.8
20 Labrenzia alexandrii 12.5
21 Burkholderia sp. 41.6
22 Burkholderia cenocepacia –
23 alpha proteobacterium –
24 gamma proteobacterium 2.6
The amount of purified protein originates from ~3 g cells (see Method section)
Nature Chemical Biology: doi: 10.1038/nchembio.447
12
Table S4: Specific activities of the enzymes towards various substrates.
Substratea Pyruvate Pyruvate Pyruvate Pyruvate 2KG MOB
1 2 3 4 D-Ala L_Glu
Entry R S R S R S R S
4 2.91 <0.001 15.2 <0.001 9.7 <0.001 0.031 <0.001 <0.001 0.003
5 1.1 0.044 1.3 <0.001 5.6 <0.001 0.264 <0.001 <0.001 <0.001
6 - b) - - - - - - - - -
7 1.4 0.023 3.7 0.001 5.2 0.002 0.051 0.002 <0.001 <0.001
8 2.4 <0.001 4.1 <0.001 4.5 <0.001 0.009 <0.001 <0.001 0.005
9 7.4 <0.001 4.5 <0.001 6.0 <0.001 0.013 <0.001 <0.001 0.005
10 19.6 <0.001 18.6 <0.001 8.2 <0.001 <0.001 <0.001 <0.001 0.016
11 3.2 0.225 3.6 <0.001 20.7 <0.001 0.163 <0.001 <0.001 0.012
12 5.6 <0.001 4.7 <0.001 2.6 <0.001 <0.001 <0.001 <0.001 0.003
13 0.003 <0.001 0.011 <0.001 0.010 <0.001 0.001 <0.001 0.004 0.004
14 0.124 <0.001 0.013 <0.001 0.002 <0.001 <0.001 <0.001 <0.001 0.005
15 0.001 <0.001 0.003 <0.001 0.001 <0.001 0.001 <0.001 0.003 0.002
16 0.020 <0.001 0.002 <0.001 0.003 <0.001 <0.001 <0.001 <0.001 0.003
17 0.012 <0.001 0.867 <0.001 0.260 <0.001 <0.001 <0.001 0.020 0.016
18 0.001 <0.001 0.056 <0.001 0.307 <0.001 <0.001 <0.001 0.010 0.098
19 0.071 0.002 0.059 0.007 0.370 0.068 0.022 <0.001 0.062 0.020
20 0.073 0.001 0.060 0.003 0.120 0.027 0.205 0.002 0.063 0.023
21 0.002 <0.001 0.017 <0.001 1.1 0.007 <0.001 <0.001 <0.001 0.001
22 - - - - - - - - - -
23 - - - - - - - - - -
24 0.610 0.004 0.028 <0.001 0.034 <0.001 <0.001 <0.001 <0.001 0.031
a1 – 1-phenylethylamine, 2 – 2-aminohexane, 3 – 4-phenyl-2-aminobutane, 4 – 1-N-Boc-3-aminopyrrolidine, 2KG – 2-ketoglutarate, D-Ala – D-alanine, L-Glu – L-glutamate, MOB – 3-methyl-2-oxobutyric acid.
Details of the assays are described in materials and methods. The protein entry is identical to supplementary Table S3. All measurements were done at least in duplicates. The deviation of single measurements from the mean value was < 10 %. The specific activity was expressed as units per milligram protein. One unit of activity was defined as the amount of enzyme that produced 1 µmol ketone product per minute. bMeasurement was not possible since protein yield during expression was very low or the protein was unstable during purification.
Nature Chemical Biology: doi: 10.1038/nchembio.447
13
Table S5: Asymmetric synthesis of (R)-amines 2–4 using three newly identified amine-TAsa.
Product Amine-TAb Conversion
[%]c
Enantiomeric
excess [%eeP]d
2 Ate 32 >99
2 Mlo 41 >99
2 Mva 35 >99
3 Ate 15 >99
3 Mlo 1 95.0
3 Mva 2 >99
4a Ate 14 >99
4b Ate 11 >99
aReaction conditions: 50 mM ketone, 250 mM D-alanine, 100 mM sodium phosphate buffer pH 7.0, 1 mM
PLP, 1 mM NADH. The co-product pyruvate from the reaction was removed with lactate dehydrogenase
(LDH) as described in literature31. For cofactor recycling, glucose dehydrogenase (GDH) was used. bAte:
Aspergillus terreus; Mva: Mycobacterium vanbaalenii and Mlo: Mesorhizobium loti (Entries Nr. 4, 12 and
14 in Table 1) cConversions were not optimized. The deviation of a single measurement from the mean value
did not exceed 10 %. Amines 4a–b were only detected in reactions with Ate-TA. d(R)-enantiomers.
References
1. Percudani, R. & Peracchi, A. The B6 database: a tool for the description and classification of vitamin
B6-dependent enzymatic activities and of the corresponding protein families. BMC Bioinformatics
10, 273 (2009).
2. Hirotsu, K., Goto, M., Okamoto, A. & Miyahara, I. Dual substrate recognition of aminotransferases.
Chem. Rec. 5, 160-172 (2005).
Nature Chemical Biology: doi: 10.1038/nchembio.447