genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the...

6
Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir Lu Feng* †‡ , Wei Wang* †‡ , Jiansong Cheng*, Yi Ren*, Guang Zhao*, Chunxu Gao*, Yun Tang*, Xueqian Liu*, Weiqing Han*, Xia Peng* †‡ , Rulin Liu †§ , and Lei Wang* †‡§¶ *TEDA School of Biological Sciences and Biotechnology and Tianjin Key Laboratory of Microbial Functional Genomics, Nankai University, 23 Hongda Street, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; Tianjin Research Center for Functional Genomics and Biochip, 23 Hongda Street, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; and § College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, China Edited by David M. Karl, University of Hawaii, Honolulu, HI, and approved February 6, 2007 (received for review November 2, 2006) The complete genome sequence of Geobacillus thermodenitrifi- cans NG80-2, a thermophilic bacillus isolated from a deep oil reservoir in Northern China, consists of a 3,550,319-bp chromo- some and a 57,693-bp plasmid. The genome reveals that NG80-2 is well equipped for adaptation into a wide variety of environmental niches, including oil reservoirs, by possessing genes for utilization of a broad range of energy sources, genes encoding various transporters for efficient nutrient uptake and detoxification, and genes for a flexible respiration system including an aerobic branch comprising five terminal oxidases and an anaerobic branch com- prising a complete denitrification pathway for quick response to dissolved oxygen fluctuation. The identification of a nitrous oxide reductase gene has not been previously described in Gram-positive bacteria. The proteome further reveals the presence of a long-chain alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA, is confirmed by in vivo and in vitro experiments. The thermophilic soluble monomeric LadA is an ideal candidate for treatment of en- vironmental oil pollutions and biosynthesis of complex molecules. adaptation degradation monooxygenase G eobacillus is a phenotypically and phylogenetically coherent genus of thermophilic bacilli with a high 16S rRNA se- quence similarity (98.5–99.2%), and was recently separated from the genus Bacillus (1). Members of Geobacillus have been isolated from various terrestrial and marine environments, not only in geothermal areas, but also in temperate regions and permanently cold habitats (2), demonstrating great capabilities for adaptation to a wide variety of environmental niches. Geobacillus spp. have attracted industrial interest for their potential applications in biotechnological processes as sources of various thermostable enzymes (2). There are 11 validly described Geobacillus species (3), and only the whole genome sequence of Geobacillus kaustophilus HTA426, a deep-sea sediment isolate, has been reported (4). G. thermodenitrificans are Gram-positive, facultative soil bacteria with distinctive property of denitrification (5). G. thermodeni- trificans NG80-2 was isolated from a deep-subsurface oil reser- voir in Dagang oilfield, Northern China. It grows between 45°C and 73°C (optimum 65°C) and can use long-chain (C 15 -C 36 ) alkanes as a sole carbon source (6). Alkanes are the major components of crude oils and are commonly found in oil contaminated environments. Biotechno- logical applications for microbial degradation of those pollutants are of long-standing interest. Although long-chain alkanes are more persistent in the environment than shorter-chains, only genes involved in the degradation of alkanes up to C 16 have been well studied (7–11), and those for longer alkanes have not yet been reported. G. thermodenitrificans NG80-2 is the only de- scribed thermophilic bacterial strain that degrades long-chain alkanes up to at least C 36 (6). Here, we report the full genome sequence of NG80-2 and its partial proteome. General and specific metabolic features are described, and mechanisms for adaptation into oil reservoirs are discussed. We identified in silico the genes for the degradation pathway of long-chain alkanes, and confirmed in vitro their encoded proteins by proteomic analysis. The key enzyme of this pathway, a long-chain alkane monooxygenase, was fully characterized and shows particular potential to be used for the treatment of environ- mental oil pollutions and in other biocatalytic processes. Results General Genome Features. The genome of G. thermodenitrificans NG80-2 is composed of a 3,550,319-bp chromosome and a 57,693-bp plasmid, designated pLW1071, with mean G C con- tents of 49.0% and 39.8%, respectively (Table 1). Both the size and structure of the NG80-2 genome are similar to that of G. kausto- philus HTA426, the only other sequenced Geobacillus genome, which contains a 3,544,776-bp chromosome and a 47,890-bp plas- mid (4). There are 3,499 predicted ORFs, 11 rRNA operons and 87 tRNA genes for all 20 aa, covering 86% of the genome (Table 1 and Fig. 1). Putative functions were assigned to 2,479 ORFs [70.9%; supporting information (SI) Table 2]. Of the remainder, 757 (21.6%) showed similarity to hypothetical proteins, and 263 (7.5%) had no detectable homologs in the public protein databases (e-value 1 10 10 ). There are 68 putative transposase genes in intact or mutated forms. The genome contains 30 phage-related genes, 22 of which constitute a defective prophage located 2,915 kb from oriC. The likely origin of replication (oriC) on the chromosome was assigned to the intergenic region upstream dnaA (GT0001) based on GC skew analysis, and four DnaA box-like sequences were also found in this region. pLW1071 shows no overall similarity to other sequenced plas- mids. Genes involved in plasmid replication and conjugation were Author contributions: L.F. and W.W. contributed equally to this work; L.F. and L.W. designed research; W.W., G.Z., C.G., Y.T., X.L., and W.H. performed research; L.F., W.W., J.C., Y.R., X.P., and R.L. contributed new reagents/analytic tools; L.F., W.W., J.C., Y.R., G.Z., and C.G. analyzed data; and L.F., W.W., J.C., and L.W. wrote the paper. Conflict of interest statement: L.F., W.W., Y.T., and L.W. have a financial conflict of interest resulting from a patent application for the DNA sequence of the ladA gene. This article is a PNAS Direct Submission. Abbreviation: COG, clusters of orthologous groups. Data deposition: The complete genome sequence of G. thermodenitrificans NG80-2 re- ported in this paper has been deposited in the GenBank database [accession nos. CP000557 (chromosome) and CP000558 (plasmid)]. To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0609650104/DC1. © 2007 by The National Academy of Sciences of the USA 5602–5607 PNAS March 27, 2007 vol. 104 no. 13 www.pnas.orgcgidoi10.1073pnas.0609650104 Downloaded by guest on July 17, 2020

Upload: others

Post on 28-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

Genome and proteome of long-chain alkanedegrading Geobacillus thermodenitrificans NG80-2isolated from a deep-subsurface oil reservoirLu Feng*†‡, Wei Wang*†‡, Jiansong Cheng*, Yi Ren*, Guang Zhao*, Chunxu Gao*, Yun Tang*, Xueqian Liu*,Weiqing Han*, Xia Peng*†‡, Rulin Liu†§, and Lei Wang*†‡§¶

*TEDA School of Biological Sciences and Biotechnology and †Tianjin Key Laboratory of Microbial Functional Genomics, Nankai University, 23 HongdaStreet, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; ‡Tianjin Research Center for Functional Genomics andBiochip, 23 Hongda Street, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; and §College of Life Sciences,Nankai University, 94 Weijin Road, Tianjin 300071, China

Edited by David M. Karl, University of Hawaii, Honolulu, HI, and approved February 6, 2007 (received for review November 2, 2006)

The complete genome sequence of Geobacillus thermodenitrifi-cans NG80-2, a thermophilic bacillus isolated from a deep oilreservoir in Northern China, consists of a 3,550,319-bp chromo-some and a 57,693-bp plasmid. The genome reveals that NG80-2 iswell equipped for adaptation into a wide variety of environmentalniches, including oil reservoirs, by possessing genes for utilizationof a broad range of energy sources, genes encoding varioustransporters for efficient nutrient uptake and detoxification, andgenes for a flexible respiration system including an aerobic branchcomprising five terminal oxidases and an anaerobic branch com-prising a complete denitrification pathway for quick response todissolved oxygen fluctuation. The identification of a nitrous oxidereductase gene has not been previously described in Gram-positivebacteria. The proteome further reveals the presence of a long-chainalkane degradation pathway; and the function of the key enzymein the pathway, the long-chain alkane monooxygenase LadA, isconfirmed by in vivo and in vitro experiments. The thermophilicsoluble monomeric LadA is an ideal candidate for treatment of en-vironmental oil pollutions and biosynthesis of complex molecules.

adaptation � degradation � monooxygenase

Geobacillus is a phenotypically and phylogenetically coherentgenus of thermophilic bacilli with a high 16S rRNA se-

quence similarity (98.5–99.2%), and was recently separated fromthe genus Bacillus (1). Members of Geobacillus have beenisolated from various terrestrial and marine environments, notonly in geothermal areas, but also in temperate regions andpermanently cold habitats (2), demonstrating great capabilitiesfor adaptation to a wide variety of environmental niches.Geobacillus spp. have attracted industrial interest for theirpotential applications in biotechnological processes as sources ofvarious thermostable enzymes (2).

There are 11 validly described Geobacillus species (3), and onlythe whole genome sequence of Geobacillus kaustophilusHTA426, a deep-sea sediment isolate, has been reported (4). G.thermodenitrificans are Gram-positive, facultative soil bacteriawith distinctive property of denitrification (5). G. thermodeni-trificans NG80-2 was isolated from a deep-subsurface oil reser-voir in Dagang oilfield, Northern China. It grows between 45°Cand 73°C (optimum 65°C) and can use long-chain (C15-C36)alkanes as a sole carbon source (6).

Alkanes are the major components of crude oils and arecommonly found in oil contaminated environments. Biotechno-logical applications for microbial degradation of those pollutantsare of long-standing interest. Although long-chain alkanes aremore persistent in the environment than shorter-chains, onlygenes involved in the degradation of alkanes up to C16 have beenwell studied (7–11), and those for longer alkanes have not yetbeen reported. G. thermodenitrificans NG80-2 is the only de-

scribed thermophilic bacterial strain that degrades long-chainalkanes up to at least C36 (6).

Here, we report the full genome sequence of NG80-2 and itspartial proteome. General and specific metabolic features aredescribed, and mechanisms for adaptation into oil reservoirs arediscussed. We identified in silico the genes for the degradationpathway of long-chain alkanes, and confirmed in vitro their encodedproteins by proteomic analysis. The key enzyme of this pathway, along-chain alkane monooxygenase, was fully characterized andshows particular potential to be used for the treatment of environ-mental oil pollutions and in other biocatalytic processes.

ResultsGeneral Genome Features. The genome of G. thermodenitrificansNG80-2 is composed of a 3,550,319-bp chromosome and a57,693-bp plasmid, designated pLW1071, with mean G � C con-tents of 49.0% and 39.8%, respectively (Table 1). Both the size andstructure of the NG80-2 genome are similar to that of G. kausto-philus HTA426, the only other sequenced Geobacillus genome,which contains a 3,544,776-bp chromosome and a 47,890-bp plas-mid (4). There are 3,499 predicted ORFs, 11 rRNA operons and 87tRNA genes for all 20 aa, covering 86% of the genome (Table 1 andFig. 1). Putative functions were assigned to 2,479 ORFs [70.9%;supporting information (SI) Table 2]. Of the remainder, 757(21.6%) showed similarity to hypothetical proteins, and 263 (7.5%)had no detectable homologs in the public protein databases (e-value�1 � 10�10). There are 68 putative transposase genes in intact ormutated forms. The genome contains 30 phage-related genes, 22 ofwhich constitute a defective prophage located 2,915 kb from oriC.

The likely origin of replication (oriC) on the chromosome wasassigned to the intergenic region upstream dnaA (GT0001) basedon GC skew analysis, and four DnaA box-like sequences were alsofound in this region.

pLW1071 shows no overall similarity to other sequenced plas-mids. Genes involved in plasmid replication and conjugation were

Author contributions: L.F. and W.W. contributed equally to this work; L.F. and L.W.designed research; W.W., G.Z., C.G., Y.T., X.L., and W.H. performed research; L.F., W.W., J.C.,Y.R., X.P., and R.L. contributed new reagents/analytic tools; L.F., W.W., J.C., Y.R., G.Z., andC.G. analyzed data; and L.F., W.W., J.C., and L.W. wrote the paper.

Conflict of interest statement: L.F., W.W., Y.T., and L.W. have a financial conflict of interestresulting from a patent application for the DNA sequence of the ladA gene.

This article is a PNAS Direct Submission.

Abbreviation: COG, clusters of orthologous groups.

Data deposition: The complete genome sequence of G. thermodenitrificans NG80-2 re-ported in this paper has been deposited in the GenBank database [accession nos. CP000557(chromosome) and CP000558 (plasmid)].

¶To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0609650104/DC1.

© 2007 by The National Academy of Sciences of the USA

5602–5607 � PNAS � March 27, 2007 � vol. 104 � no. 13 www.pnas.org�cgi�doi�10.1073�pnas.0609650104

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 2: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

found, and 22 more genes could be assigned functions including along-chain alkane monooxygenase gene (ladA, see below). Involve-ment of pLW1071 in alkane degradation was confirmed by the factthat a derivative of NG80-2 with the plasmid cured failed to degradehexadecane (data not shown).

Comparative Genomics. Most predicted proteins (75.6%) have clos-est homologs in Geobacillus. Of the remainder, 11.4% are inBacillus, 3.0% in other Gram-positive bacteria, and only 2.7% inother organisms. There are 2,578 (74.9%) NG80-2 genes havingorthologs with an average protein identity of 83.2% in G. kausto-phillus HTA426, and genome-wide synteny for the orthologs be-tween the two strains could be detected (SI Fig. 4). Also, between53.1% and 58.7% of the NG80-2 genes have orthologs in each of 13Bacillus strains sequenced (www.ncbi.nih.gov/genomes/lproks.cgi),and the average identity levels are in the range of 53.5–58.1%. Noorthologs were found between the plasmids of NG80-2 andHTA426, indicating different origins.

There are 1,179 orthologs shared among NG80-2, HTA426 andall 13 sequenced Bacillus genomes, of which, 1,099 (93.2%) arelocated in the 0–1.3 Mb and 2.2–3.55 Mb regions of the NG80-2genome (SI Table 3). The NG80-2 genes involved in nucleotidemetabolism and translation were found most conserved, with79.4% and 76.7%, respectively, of the total related genes havingorthologs in others, in contrast to 26.8% of the genes involved incarbohydrate metabolism shared (SI Fig. 5).

G. thermodenitrificans NG80-2 and G. kaustophillus HTA426share 385 orthologs which are absent in the 13 Bacillus genomes, ofwhich, 200 could be assigned putative functions (SI Table 4). 14genes involved in fatty acid synthesis and degradation are shared byNG80-2 and HTA426, and coincidentally, all Geobacillus specieshave a distinctive fatty acid profile with high proportion of iso-branched saturated acids (1). Other shared genes include a genecluster (GT1516-GT1518) with genes encoding a heme O oxygen-ase and subunits of b(o/a)3 cytochrome c oxidase involved inaerobic respiration, and a gene cluster (GT1698-GT1681) for thesynthesis of cobalamin. The ability to synthesize cobalamin was alsoreported for G. stearothermophilus, but not for Bacillus spp. with theexception of B. megaterium in which cobalamin synthesis genes arelocated on a plasmid (12).

Four hundred ninety-eight of the NG80-2 genes are absent inHTA426 and the 13 Bacillus genomes, of which, 253 are orphangenes and 202 could be assigned putative functions (SI Table 5).They include those encoding proteins for the reduction of N2O to

N2 (GT1734-GT1731), in line with the distinctive denitrificationproperty of G. thermodenitrificans.

General Metabolism and Adaptation to Oil Reservoirs. Genes re-quired for the synthesis of purine and pyrimidine nucleotides, fattyacids, and all 20 aa except for serB of the serine pathway wereidentified. serB was not found in any of the sequenced Bacillusgenomes, suggesting the presence of nonorthologous genes. Com-plete sets of genes for the synthesis of all vitamins and cofactors arepresent except for biotin. However, a putative biotin transportergene (bioY, GT2896) was found, and the requirement of exogenousbiotin for growth of NG80-2 was confirmed.

All central metabolic pathways for carbohydrates except for theEntner-Doudoroff pathway are present. NG80-2 utilizes a largevariety of carbohydrates such as glycerol, cellobiose, trehalose and

Table 1. General features of G. thermodenitrificansNG80-2 genome

Chromosome Plasmid

Total size, bp 3,550,319 57,693G � C content, % 49.01 39.75Coding density, % 84.5 83.5ORFs:

with assignedfunction

2,445 34

conservedhypothetical

749 8

with no databasematch

250 13

total 3,444 55Average ORF length, bp 871 876rRNA operons 11 0tRNAs 87 0Transposases 55 13Prophage-related genes 30 0

Fig. 1. Circular maps of the G. thermodenitrificans NG80-2 chromosome andplasmid pWL1071. (a) The chromosome map from the outside inward: the firstand second circles show predicted protein-coding regions on the plus andminus strands, respectively (colors were assigned according to the color codeof the COG functional classes; see key); the third circle shows transposasegenes and insertion sequence elements in red; the fourth and fifth circles showtRNAs and rRNAs in dark slate blue and dark golden red, respectively; the sixthand seventh circles show percentage G � C in relation to the mean G � C andGC skew, respectively. (b) The plasmid map shows ORFs color-coded accordingto their assigned functions (see key).

Feng et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5603

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 3: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

starch as indicated by the fermentation tests using API 50 CHB/Etest strips (BioMerieux, Marcy l’Etoile, France), and correspondinggenes including genes encoding different glycolytic activities werefound (SI Table 6). Genome analysis further revealed the presenceof a gene cluster (GT1801-GT1756) for utilization of plant hemi-cellulose xylans, and the ability of NG80-2 to use xylans as a solecarbon source was confirmed. A gene cluster for the synthesis ofcarbon storage compound glycogen (GT2782-GT2778) was alsofound. In line with its capacity to use a broad range of carbohy-drates, NG80-2 has large number of genes involved in uptake ofsugars (SI Table 7). At least four complete PTS systems withpredicted specificity to glucose (GT0878), fructose (GT1726),mannitol (GT1857) and cellulose (GT1748-GT1750), as well as aglycerol facilitator (GT1215) are present. In addition, 16 ABC-typesugar transporter systems were found, 5 of which are clustered withthe genes involved in utilization of starch or xylans.

In oil reservoirs, organic acids, with acetate being the mostabundant, are commonly detected and thought to be important forbacterial survival (13). Acetate may be assimilated through thereversible Pta-AckA (GT3361, GT2688) pathway and Glyoxylatebypass. The presence of genes encoding butyrate kinase (GT2309)and formate dehydrogenases (GT0464, GT1579) indicates thatbutyrate and formate may also be used. A variety of aromaticcompounds are present in reservoirs, and NG80-2 encodes fourdistinct aerobic ring cleavage pathways for degradation of benzoate(CoA activated, GT1899-GT1888), phenylacetate (CoA activated,GT1930-GT1919), 4-hydroxyphenylacetate (GT2993-GT2973),and 3-hydroxyanthranilate (GT3163-GT3150). Fatty acids are themajor products of alkane degradation (see below) and NG80-2contains multiple candidate genes for enzymes of �-oxidation.Genes encoding the two subunits of methylmalonyl-CoA mutase(B12-dependent) required for degradation of odd-carbon numberfatty acids are also present (GT2300-GT2299, GT3336).

Ammonia, which is the primary nitrogen source in oil reservoirs(14), may be taken up by a specific Amt type transporter (GT1313),and assimilated via glutamate dehydrogenase (GT2171, GT3148),or the glutamine synthetase (GT1182, GT1483)-glutamate synthase(GT1292-GT1293) pathway. The latter route has a high affinity forammonia and is more efficient in N-limited conditions (15). TypicalnasAB genes encoding assimilatory nitrate and nitrite reductaseswere not found, indicating presence of nonorthologous genes inNG80-2, which can use nitrate as a nitrogen source. G. thermod-enitrificans does not produce urease (5), and no genes encodingurease activity were found in NG80-2. Instead, we found genesencoding allophanate hydrolase (GT1351-GT1352) of the ureaamidolysis reaction (16), and utilization of urea by NG80-2 wasconfirmed. NG80-2 contains genes for degradation of most aminoacids, and at least 60 genes encoding various proteases and pepti-dases were also found (SI Table 6). NG80-2 has a large number oftransporter genes for uptake of exogenous amino acids and peptidesincluding 13 sets of ABC-type systems with predicted specificity foroligopeptides (4 sets), branched-chain amino acids (3 sets), polaramino acids (2 sets), spermidine/putrescine (2 sets) and methionine(2 sets) (SI Table 7), indicating that exogenous proteins, peptidesand amino acids can serve as nitrogen sources.

Respiration and Fermentation. In oil reservoirs, oxygen is onlytransiently available during water flushing used for oil production(17). Therefore, a flexible respiration system for quick response tochanged O2 concentration is very important for the survival ofbacteria in reservoirs. G. thermodenitrificans is a facultative aerobe,capable of oxygen and nitrate respiration (5). For aerobic respira-tion, NG80-2 contains a gene cluster encoding Complex I-likeenzymes, comprising 11 (GT3302-GT3292) of the 14 subunitsencoded by the E. coli nuo operon, adjacent to the ATP synthasecomplex genes (GT3311-GT3303). As in some thermophilic bac-teria such as G. stearothermophilus (18), NG80-2 lacks three sub-units of the nuo operon (nuoEFG) that code for NADH dehydro-

genase activity, and no homologs of the FpoF subunit [containingthe motif to bind F420H2 in M. mazei (18)] were found despite thepresence of genes encoding F420H2 related proteins. Therefore, thenature of the electron donors for the Complex I-like enzymes is notclear. The NG80-2 genome also encodes two type II NADHdehydrogenases (GT2904, GT2909), a succinate dehydrogenasecomplex (GT2601-GT2599), a cytochrome b6c1 reductase complex(GT2126-GT2124), and five terminal oxidases of caa3-type(GT0947-GT0950), b(o/a)3-type (GT1395-GT1394, GT1517-GT1518), aa3-type (GT3391-GT3388), and bd-type (GT0518-GT0519). The bd-type quinol oxidase has high affinity to O2 foroperating under low O2 concentrations (19).

NG80-2 has genes needed for the reduction of nitrate to N2.Typical narGHJI operon encoding membrane-bound nitrate reduc-tase is present in two copies (GT0653-GT0656, GT1712-GT1715),and the identity level between orthologous genes ranges from 38%to 68%, indicating different origins. The GT0653-GT0656 set isclustered with nirK (GT0650) encoding a copper-containing nitritereductase, norZ (GT0643) encoding a quinol oxidizing nitric oxidereductase, narK (GT0649) for nitrate or nitrite uptake, fnr(GT0657, GT0668) encoding the global anaerobic regulator, dnrN(GT0660) encoding the nitric oxide-dependent regulator, andgenes involved in synthesis of molybdenum cofactor (GT0645,GT0658-GT0659, GT0662-GT0665), which is required for theactivity of nitrate reductase. Fnr is the anaerobic activator ofnarGHJI in E. coli and B. subtilis, and both Fnr and other Fnr-likeproteins such as Dnr are also involved in regulation of differentdenitrifying reductase genes (20).

A gene cluster (GT1735-GT1729) containing homologs ofnosZDYF for the reduction of N2O to N2 as the last step ofdenitrification was found. GT1734 shares 41% identity with NosZ(nitrous oxide reductase) of Wolinella succinogenes (21). The pu-rified product of GT1734 showed nitrous oxide reductase activity,and the gene was confirmed to be a nosZ (unpublished data). Thepresence of a nos gene cluster in Gram-positive bacteria has notbeen previously described.

Fermentation is an important mechanism for anaerobic growthof bacteria in the absence of alternative electron acceptors (22), andseveral genes of typical pyruvate fermentation pathways were foundsuch as pta (GT3361), ackA (GT2688), ldh (GT0487), and buk(GT2309). The por genes (GT1718-GT1717), which encode pyru-vate-ferredoxin oxidoreductase and are commonly present in an-aerobic bacteria (23), were also found.

Nutrient Uptake and Detoxification. Petroleum reservoirs representan oligotrophic environment in which such elements as N, P, S, andFe are limited (14). In another hand, growth and survival ofmicroorganisms may be affected by the toxicity of crude oilhydrocarbons, various antimicrobial metabolites and heavy metals.Efficient transporter systems for uptake of nutrient elements anddetoxification play a crucial role for the survival of bacteria underreservoir conditions. We identified 368 (10.5%) genes encodingtransporter-related proteins by searching the Transport ProteinDatabase (www.tcdb.org) (SI Table 7). The largest number (193) ofthese proteins are primary active transporters (class 3) including 186members of the ATP-binding cassette transporter superfamily, andthe second largest number (128) are electrochemical potential-driven transporters (class 2).

Phosphorus is considered to be the major rate limiting elementfor biological activity in oil reservoirs (14). NG80-2 contains atypical ABC-type Pst system (GT2397-GT2394), which is undercontrol of the phosphate uptake regulator PhoU (GT2393). Phos-phate may also be taken up by a Na�/phosphate symporter(GT2434) and two members of the PiT (inorganic phosphatetransporter) family (GT0388, GT2897). GT0388 is also likely to actas a sulfate transporter, as it is clustered with other sulfate assim-ilation genes (GT0389-GT0386). Typical sulfate ABC-type trans-porter and sulfate permease (SulP) were not found. Iron uptake

5604 � www.pnas.org�cgi�doi�10.1073�pnas.0609650104 Feng et al.

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 4: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

seems to be very important for NG80-2, as 4 sets of iron chelateABC-type systems (GT2199-GT2197, GT1317-GT1320, GT1272-GT1275, GT0175-GT0172) and a set of FeoAB proteins (GT1740-GT1739) were found. NG80-2 also has specific uptake transportersfor Zn2� (ZnuABC, GT2407-GT2409), Co2� (CbiMNOQ,GT1695-GT1698), Mg2�/Co2� (CorA, GT0899, GT1415, GT1433),Zn2�/Fe2� (ZIP, GT0845), Mn2�/Fe2� (Nramp, GT1354), andMg2� (MgtC, GT2865). Organic acids including benzoate, propi-onate and butyrate are commonly detected in reservoirs (13), andputative genes for their uptake were found (GT1615, GT1850,GT3161). We also found a putative transporter gene (GT2686) forfatty acids, the common degradation products of n-alkanes.

NG80-2 utilizes both primary and secondary transporters forexport of various drugs and heavy metals. At least 7 ABC-typedrugs efflux systems are recognized, and another 27 electrochem-ical gradient driven systems including 14 members of the Drug:H�

antiporter, 4 of the cation diffusion facilitator (CDF), 3 of theresistance-nodulation-cell division (RND), 2 of the small multidrugresistance (SMR), and 2 of the multi antimicrobial extrusion(MATE) families; as well as two arsenite efflux pumps (GT1547,GT1599) were found. In addition to be expelled, arsenite may alsobe detoxified by arsenate reductase (GT1600, GT2957). The P-ATPase exporting systems for Cu2� (GT1534-GT1535), and Zn2�/Cd2�/Pb2� (GT0637) were also found.

Motility, Chemotaxis, and Signal Transduction. Similar to those foundin B. subtilis, NG80-2 possesses 50 genes involved in assembly andfunction of flagella (GT1069-GT1100, GT2142, GT2466-GT2467,GT3052-GT3069, GT3273-GT3274) including a cluster of 31 genes(GT1069-GT1100) corresponding to the fla/che operon of B. sub-tilis. However, motPS encoding the second set of Mot proteins in B.subtilis was not found in NG80-2. Therefore, the flagella system inNG80-2 seems to be powered solely by MotAB complex. Inaddition, five intact and one interrupted methyl-accepting chemo-taxis proteins (MCPs) genes (GT345, GT661, GT884, GT1165-GT1166, GT2002, GT3317) were found. Inactivation of the MCPmay be resulted from changed environment in oil reservoirs.

Most of the Gram-positive bacteria genomes contain relativesmaller number of genes involved in signal transduction (24).NG80-2 has genes encoding 27 histidine kinases and 29 responseregulators including 18 pairs of histidine kinases-response regula-tors, 5 Ser/Thr protein kinases (GT0059, GT0497, GT1530,GT3032, GT3493), a Tyr protein kinase (GT3268) and 10 proteinswith EAL (GT0376, GT0905, GT1555), HD-GYP (GT0007,GT1736), GGDEF (GT1560, GT2703, GT3401), EAL:GGDEF(GT0605) and GAF (GT2702) domains.

Thermotolerance. As a thermophilic bacterium, NG80-2 can easilyadapt to geothermal oil reservoirs when other survival require-ments are met. Mesophilic bacteria response to heat inducedstresses by induction of heat shock proteins, which remove or refolddamaged proteins (25). NG80-2 contains a wide range of genesencoding molecular chaperones including dnaK operon comprisedof genes encoding DnaJ-DnaK-GrpE and the HrcA repressor(GT2444-GT2441), genes encoding GroEL-GroES (GT0223-GT0224), a disulfide bond chaperone of HSP33 family (GT0065),and small heat shock proteins of IbpA family (GT0237, GT2085,GT2097). Genes encoding ATP-dependent heat shock-responsiveproteases such as HslVU (GT1067-GT1068), Clp (GT0079,GT0678, GT0856), and Lon (GT2582, GT2583) are also present.Makarova et al. (26) listed 58 COG (clusters of orthologous groups)families, whose members are frequently detected in archaea andthermophilic bacteria and predicted to be associated with the(hyper)thermophilic phenotype. NG80-2 has three genes encodingproteins of those COG families: GT0279 (COG1583), GT0445(COG3044), and GT3202 (COG2152). Homologs of GT0445 andGT3202 are also present in HTA426, and GT0279 is ‘‘unique’’ toNG80-2. None of the three genes were found in any of the

sequenced Bacillus genomes. Polyamines such as norspermine andnorspermidine are commonly found in hyperthermophilic bacteriaand are necessary for the hyperthermophilic phenotype of thosebacteria (27). Spermine is the major type of polyamine in Geoba-cillus species, and has been implicated in thermophily in G. kaus-tophilus based on the presence of the ‘‘unique’’ genes encodingspermine/spermidine synthase and polyamine ABC transporters(4). Those unique genes were also found in NG80-2 (GT1637,GT0623-GT0626). Takami et al. (4) reported possible association ofasymmetric amino acid substitutions (Arg, Ala, Gly, Val, and Proare more frequently used in HTA426 than in mesophilic Bacillusstrains) with thermoadaptation of G. kaustophilus HTA426. Similaramino acid substitution pattern was also observed in NG80-2 whencompared with 7 sequenced Bacillus strains including 4 used tocompare with HTA426 (SI Table 8), supporting the previoussuggestion.

Long-Chain Alkane Degradation System. Alkanes are the majorcomponents of crude oil, and potentially the most abundant andavailable carbon and energy sources in reservoirs. NG80-2 can uselong-chain alkanes as its sole carbon and energy source underaerobic conditions (6). Here, we describe the pathway and func-tional genes (Fig. 2).

Alkane molecules are chemically inert and must be activated toallow further metabolic steps. An aerobic degradation pathway forshort to medium-chain alkanes (C5-C16) has been well studied anda membrane-bound monooxygenase AlkB is responsible for thecrucial initial activation of alkanes to the corresponding primary

Fig. 2. Proteomic characterization of pathways involved in hexadecanemetabolism in G. thermodenitrificans NG80-2. Differentially expressed pro-teins in hexadecane-grown cells, in comparison with sucrose-grown cells, wereinvestigated by 2-D electrophoresis (2-DE)/MALDI-TOF MS analysis. The pre-dicted metabolic pathways for sucrose and hexadecane are shown. The en-zymes of the pathways and corresponding genes in NG80-2 (as gene IDnumbers) are noted. Proteins detected by 2-DE/MALDI-TOF MS are high-lighted. Blue, induced; red up-regulated; yellow, down-regulated; green,unchanged. Proteins with pI values outside the range of pH 4 to 7 were notinvestigated. Analogs are in brackets. P, phosphate; DH, dehydrogenase.

Feng et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5605

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 5: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

alcohols, which are further oxidized by alcohol and aldehydedehydrogenases to fatty acids before entering �-oxidation (28).NG80-2 does not have any AlkB homologs. Real-time RT-PCRshowed a 120-fold increase in transcription of a plasmid-borneputative monooxygenase gene (GT3499) when crude oil was usedas a sole carbon source instead of sucrose (SI Fig. 6). GT3499 shares38% identity with DBT-5,5�-dioxide monooxygenase of Paeniba-cillus sp. A11-2 (29) but has no detectable similarity with any knownalkane monooxygenases.

The purified product of GT3499 converted alkanes ranging fromC15 to C36 to corresponding primary alcohols as determined by GCand GC-MS (Fig. 3 b–d). In vivo experiments showed that a plasmidcarrying GT3499 (pCOM8-ladA) partially restored an alkB knock-out mutant of P. fluorescens CHA0 (KOB2‚1) the ability to growon alkanes by allowing growth on C15-C17, but not C6-C14. Comple-mentation on longer chain alkanes could not be assessed as themutant retains the ability to grow on C18-C28 because of presenceof a second unidentified alkane hydroxylase system (30). ThusGT3499 was identified as an alkane monooxygenase gene, anddesignated as ladA (long-chain alkane degradation). The resultsindicate that NG80-2 also utilizes a terminal oxidation pathway forthe degradation of long-chain alkanes. LadA and AlkB, whichcatalyze the initial attack on alkanes, determine the size range ofalkanes to be degraded.

By using sucrose-grown cells as the reference state, proteinsdifferentially expressed in hexadecane-grown cells were investi-gated by 2-D electrophoresis and MALDI-TOF MS analysis.Expression of LadA was induced and mainly found in extracellularfraction (Figs. 2 and 3a; see also SI Fig. 7), although this protein hasno leader peptides. A putative aldehyde dehydrogenase (GT3117)and enzymes of the �-oxidation pathway were also up-regulated,

whereas enzymes of the glycolysis pathway were down-regulated(Fig. 2; see also SI Fig. 8). Alcohol dehydrogenase is the secondenzyme of the terminal oxidation pathway and three putativealcohol dehydrogenases (GT1287, GT1754, GT2878) were de-tected at similar levels under both conditions, suggesting theirinvolvement in multiple metabolic functions (Fig. 2; see also SI Fig.8). Expression of the TCA cycle enzymes was also unchanged asexpected for this central metabolic pathway. However, glyoxylatebypass which allows growth on acetate as the sole carbon source wasfound up-regulated. The proteomic analysis confirmed the pres-ence of a terminal oxidation pathway for degradation of long-chainalkanes, which is initiated by LadA, and fatty acids produced arefurther degraded to acetyl-CoA via �-oxidation before entering theTCA cycle or glyoxylate bypass when alkanes are present as the solecarbon source for energy production (Fig. 2).

Thermophilic enzymes offer major biotechnological advantagesover mesophilic enzymes. Oxygenases have been used for industrialsynthesis of many complex molecules (31), and LadA has greatpotential to be used in this area, particularly using petroleumcompounds as substrates, and also in treatment of environmentaloil pollution. Industrial applications of currently known alkaneoxygenases are restricted because of their complex biochemistryand process requirements, e.g., AlkB is a membrane protein andrequires rubredoxin and a rubredoxin reductase for activity (31).LadA acts on long-chain alkanes, is single-component with nocoenzyme requirement, soluble (extracellular), and easily expressedand purified in E. coli.

DiscussionG. thermodenitrificans NG80-2 has versatile metabolic pathways,f lexible respiration systems and robust transport systems for itssurvival under various environmental conditions. The ability todegrade long-chain alkanes and carry out denitrification pro-vides major survival advantages to NG80-2 in the current oilreservoir, in which water had been used as the driving force foroil exploitation for 11 years before NG80-2 was isolated (un-published data). When oxygen is available through water flush-ing, alkanes can be used. Upon exhaustion of oxygen, NG80-2can use nitrate, which may be carried in by the flushing waters,as an alternative electron acceptor and various organic acids aselectron donors for survival.

Presence of genes involved in utilization of xylans and degrada-tion of oligopeptides confirms that NG80-2 originated from a soilenvironment. We postulate that NG80-2 gained the capacity foralkane oxidization using plasmid pLW1071 in an oil contaminatedsoil before invading the oil reservoir. Gene ladA is flanked at bothends (14,207 bp upstream and 267 bp downstream) by two insertionsequences of the IS21 family. IS21 elements are mainly found inBacillus, Pseudomonas and Yersinia strains (32), indicating that ladAoriginated outside of G. thermodenitrificans and moved into itscurrent position on the plasmid mediated by the two flanking IS21elements. The upstream IS21 is intact, indicating that the move-ment of the ladA gene and its flanking DNA was a recent event. Thedownstream IS21 is also intact except for the insertion of an intactIS982 in the istB gene. The IS982 element is also present in theNG80-2 chromosome in eight places, and these nine elements sharehigh level DNA identity ranging from 90% to 100%. This factindicates that IS982 inserted into the plasmid in NG80-2.

NG80-2 can facilitate oil recovery when added to oil reser-voirs, and this property may be partially attributed to its abilityto selectively degrade long-chain alkanes, which leads to reducedviscosity of crude oils (data not shown). We have demonstrateda combined genomic and proteomic approach to predict in silicoand confirm in vitro a long-chain alkane degrading pathway. TheNG80-2 genome provides an excellent platform for furtherimprovement of this organism for oil bioremediation and otherbiotechnological applications.

Fig. 3. Characterization of long-chain alkane monooxygenase LadA. (a)Induced expression of LadA as an extracellular protein in hexadecane-grownNG80-2 cells. Sucrose-grown cells were used as the reference state. Sections of2-D PAGE gels stained with colloidal CBB G-250 are shown. The spot in circlewas identified as LadA by MALDI-TOF MS analysis. (b) Expression and purifi-cation of LadA in E. coli as shown by SDS/PAGE. Lane 1, crude extract of E. coliBL21 (pET-ladA) before IPTG induction; lane 2, crude extract induced withIPTG; lane 3, fractions eluted from Ni2� column; lane 4, molecular size markers.(c) Hydroxylation of hexadecane by purified LadA. GC chromatographs showconversion of hexadecane to 1-hexadecanol. IS, internal standard (squalane).(d) Specific activity of purified LadA on alkanes with different chain lengths.

5606 � www.pnas.org�cgi�doi�10.1073�pnas.0609650104 Feng et al.

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 6: Genome and proteome of long-chain alkane degrading ... · alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA,

MethodsGenome Sequencing and Assembly. Genomic DNA, isolated from G.thermodenitrificans NG80-2, was sequenced by using a conventionalwhole genome random shotgun strategy (33). Plasmid libraries ofsmall inserts (2–3 kb) and large inserts (8–10 kb) generated bymechanical shearing of genomic DNA were constructed in pUC118(Takara, Dalian, China). Plasmid DNA, for use as template forsequencing, was extracted by the alkaline lysis method from cellsgrown in 96-well plates, and purified by using Whatman Unifilterplates (Whatman Inc., Clifton, NJ). Double-ended plasmid se-quencing reactions were carried out by using ABI BigDye Termi-nator V3.1 Cycle Sequencing Kit, and the sequencing ladders wereresolved on an ABI 3730 Automated DNA Analyzer (AppliedBiosystems, Foster City, CA). Approximately 52,000 reads with anaverage length of 629 bp were generated, providing a 9.1-foldgenome coverage. These sequences were assembled into contigs byusing the PHRED�PHRAP�CONSED software package (34).Sequence gaps were closed by primer walking on gap-spanninglibrary clones identified based on linking information from forwardand reverse reads. The remaining 229 physical gaps were closed byusing an improved Multiplex PCR method (35). The sequence wasedited manually, and additional PCR and sequencing reactionswere done to improve coverage and resolve sequence ambiguities.All repeated DNA regions were verified by PCR amplificationacross the repeat and sequencing of the product. The final genomeis based on 55,727 reads.

Genome Analysis. ORFs were identified by using CRITICA (36) andGLIMMER (37), followed by BLASTX searches of the remainingintergenic regions. RBSFINDER (www.tigr.org/software) was usedto locate the start codons. Transfer RNAs were predicted by using

tRNAscan-SE (38). Ribosomal RNAs were identified by searchingthe genomic sequences against a set of known rRNAs withBLASTN and verified by multiple alignments with other knownrRNA sequences. The software Artemis (39) was used for whole-genome visualization. Genome annotation was performed by com-paring protein sequences with those in the NCBI protein database(www.ncbi.nlm.nih.gov) and the clusters of orthologous groups(COG) database (www.ncbi.nlm.nih.gov/COG). All annotationswere inspected manually through searches against PFAM (40),SMART (41), and PROSITE (42) databases. TMHMM 2.0 (43)was used to identify transmembrane domains, and SignalP 3.0 (44)was used to predict signal peptide regions. Orthologous gene setswere identified by BLASTP Reciprocal Best Hits (RBH) (45).Putative metabolic pathways were analyzed by Metacyc (46) andthe KEGG database (47).

Other Methods. Proteomic analysis, and methods used for in vivoand in vitro characterization of ladA are presented as SI Materialsand Methods.

We thank Dr. Jan B. van Beilen (Institute of Biotechnology, Zurich,Switzerland) for kindly providing Pseudomonas fluorescens KOB�1 andplasmid pCom8 and for advice on in vivo functional analysis of long-chainalkane monooxygenase genes. The following persons are acknowledgedfor their contributions to genome sequencing: Yuan Sui, Yue Li, ChunZhang, Gang Yuan, Likun Lu, Tao Tang, Hao Yu, Jianxiong Peng,Zhifeng Yang, and Chengtai Liao. This work was supported by TianjinMunicipal Special Fund for Science and Technology InnovationGrant 05FZZDSH00800, National 863 Program of China Grants2002AA226011 and 2006AA020703, and funds from the Tianjin Mu-nicipal Science and Technology Committee (Grants 033105811 and06YFJZJC02200).

1. Nazina TN, Tourova TP, Poltaraus AB, Novikova EV, Grigoryan AA, IvanovaAE, Lysenko AM, Petrunyaka VV, Osipov GA, Belyaev SS, et al. (2001) Int J SystEvol Microbiol 51:433–446.

2. McMullan G, Christie JM, Rahman TJ, Banat IM, Ternan NG, Marchant R (2004)Biochem Soc Trans 32:214–217.

3. Rahman TJ, Marchant R, Banat IM (2004) Biochem Soc Trans 32:209–213.4. Takami H, Takaki Y, Chee GJ, Nishi S, Shimamura S, Suzuki H, Matsui S,

Uchiyama I (2004) Nucleic Acids Res 32:6292–6303.5. Manachini PL, Mora D, Nicastro G, Parini C, Stackebrandt E, Pukall R, Fortina

MG (2000) Int J Syst Evol Microbiol 50:1331–1337.6. Wang L, Tang Y, Wang S, Liu RL, Liu MZ, Zhang Y, Liang FL, Feng L (2006)

Extremophiles 10:347–356.7. van Beilen JB, Panke S, Lucchini S, Franchini AG, Rothlisberger M, Witholt B

(2001) Microbiology 147:1621–1630.8. Tani A, Ishige T, Sakai Y, Kato N (2001) J Bacteriol 183:1819–1823.9. van Beilen JB, Smits TH, Whyte LG, Schorcht S, Rothlisberger M, Plaggemeier

T, Engesser KH, Witholt B (2002) Environ Microbiol 4:676–682.10. Whyte LG, Smits TH, Labbe D, Witholt B, Greer CW, van Beilen JB (2002) Appl

Environ Microbiol 68:5933–5942.11. Yadav JS, Loper JC (1999) Gene 226:139–146.12. Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS (2003) J Biol Chem

278:41148–41159.13. Barth T (1991) Appl Geochem 6:1–15.14. Head IM, Jones DM, Larter SR (2003) Nature 426:344–352.15. Merrick MJ, Edwards RA (1995) Microbiol Rev 59:604–622.16. Kanamori T, Kanou N, Atomi H, Imanaka T (2004) J Bacteriol 186:2532–2539.17. Whelan JK, Kennicutt MC, Brooks JM, Schumacher D, Eglington LB (1994) Org

Geochem 22:587–615.18. Pereira MM, Bandeiras TM, Fernandes AS, Lemos RS, Melo AM, Teixeira M

(2004) J Bionenerg Biomembr 36:93–105.19. D’Mello R, Hill S, Poole RK (1996) Microbiology 142:755–763.20. Philippot L (2002) Biochim Biophys Acta 1577:355–376.21. Simon J, Einsle O, Kroneck PM, Zumft WG (2004) FEBS Lett 569:7–12.22. Eschbach M, Schreiber K, Trunk K, Buer J, Jahn D, Schobert M (2004) J Bacteriol

186:4596–4604.23. Kletzin A, Adams MW (1996) J Bacteriol 178:248–257.24. Galperin MY (2004) Environ Microbiol 6:552–567.

25. Yura T, Nakahigashi K (1999) Curr Opin Microbiol 2:153–158.26. Makarova KS, Koonin EV (2003) Genome Biol 4:115.27. Daniel RM, Cowan DA (2000) Cell Mol Life Sci 57:250–264.28. van Beilen JB, Li Z, Duetz WA, Smits THM, Witholt B (2003) Oil Gas Sci Technol

58:427–440.29. Ishii Y, Konishi J, Okada H, Hirasawa K, Onaka T, Suzuki M (2000) Biochem

Biophys Res Commun 270:81–88.30. Smits TH, Balada SB, Witholt B, van Beilen JB (2002) J Bacteriol 184:1733–1742.31. van Beilen JB, Funhoff EG (2005) Curr Opin Biotechnol 16:308–314.32. Mahillon J, Chandler M (1998) Microbiol Mol Biol Rev 62:725–774.33. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage

AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. (1995) Science269:496–512.

34. Gordon D, Abajian C, Green P (1998) Genome Res 8:195–202.35. Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL (1999) Genomics 62:500–

507.36. Badger JH, Olsen GJ (1999) Mol Biol Evol 16:512–524.37. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Nucleic Acids Res

27:4636–4641.38. Lowe TM, Eddy SR (1997) Nucleic Acids Res 25:955–964.39. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell

B (2000) Bioinformatics 16:944–945.40. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T,

Moxon S, Marshall M, Khanna A, Durbin R, et al. (2006) Nucleic Acids Res34:D247–251.

41. Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P (2006) Nucleic Acids Res34:D257–260.

42. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS,Pagni M, Sigrist CJ (2006) Nucleic Acids Res 34:D227–230.

43. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) J Mol Biol 305:567–580.

44. Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) J Mol Biol 340:783–795.45. Hirsh AE, Fraser HB (2001) Nature 411:1046–1049.46. Caspi R, Foerster H, Fulcher CA, Hopkinson R, Ingraham J, Kaipa P, Krum-

menacker M, Paley S, Pick J, Rhee SY, et al. (2006) Nucleic Acids Res 34:D511–516.47. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) Nucleic Acids Res

32:D277–280.

Feng et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5607

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

17,

202

0