phylogeny of nitrogenase sequences infrankia and other nitrogen-fixing microorganisms

12
J Mol Evol (1989) 29:436-447 Journal of Molecular Evolution (~) Springer-Verlag New York Inc. 1989 Phylogeny of Nitrogenase Sequences in Frankla and Other Nitrogen-Fixing Microorganisms Philippe Normand and Jean Bousquet Centre de Recherche en Biologie Forestirre, Universit6 Laval, Qurbec, Province of Qurbec, Canada G1K 7P4 Summary. The complete nucleotide sequence of a nitrogenase (n/fH) gene was determined from a sec- ond strain (HRN18a) of Frankia, an aerobic soil bacterium. The open reading frame is 870 bp long and encodes a polypeptide of 290 amino acids. The amino acid and nucleotide sequences were com- pared with 21 other published sequences. The two Frankia strains were 96% similar at the amino acid level and 93% similar at the nucleotide level. A number of methods were used to infer phylogenies of these nitrogen fixers, based on n/fH amino acid and nucleotide sequences. The results obtained do not agree completely with other phylogenies for these bacteria and thus make probable occurrences of lat- eral transfer of the nifgenes. The time of divergence of the two Frankia strains could be estimated at about 100 million years. The vanadium-dependent (Type 2) nitrogenase present in Azotobacter spp. ap- pears to be a recent derivation from the conven- tional molybdenum-dependent (Type 1) enzyme, whereas the iron-dependent (Type 3) alternative nitrogenase would have a much older origin. Key words: Bacterial phylogeny -- Base compo- sition -- Base sequence -- Frankia -- Molecular dating -- Nitrogen fixation -- Nitrogenase gene Introduction The capacity to fix nitrogen, which requires around 20 nif genes, exists in microorganisms of widely different habitats and from different phylogenetic Offprint requeststo: P. Normand backgrounds (Akkermans and Houwers 1983). Se- quence similarity is high between the n/fH genes, which code for a subunit of the nitrogenase complex (Hennecke et al. 1985). This interspecific similarity could be explained in either of two ways (1) byyecent lateral transfers (Ruvkun and Ausubel 1980), a hy- pothesis based on the fact that nif genes are fre- quently found on plasmids, or (2) by vertical descent coupled with strong functional constraints (Hen- necke et al. 1985). The enzyme complex is known to be able to reduce other molecules such as R-C -= C and R-C-N, besides its natural N=N substrate. This low substrate specificity has given rise to the theory of an ancient origin of the complex, according to which nitrogenase would have had as first func- tion that of detoxifying cyanides and other chemi- cals present in the primitive reducing earth atmo- sphere (Silver and Postgate 1973). The nifH gene has been sequenced from 14 eu- bacteria, including a Frankia strain (ArI3) that has a symbiotic association with the roots of trees in the genus Alnus (Normand et al. 1988). We now have determined the corresponding sequence of a second Frankia strain (HRN18a) that lives symbiotically with species of the genus Elaeagnus (Simonet et al. 1989). This sequence allowed us to measure phy- logenetic relationships among eubacterial nifH genes and also to estimate the approximate time of di- vergence between the two Frankia strains. Materials and Methods Source ofDNA. Frankia strain HRN 18a, referred to here as strain A, was chosen because it differs greatly biologically (Gardes et al. 1987; Simonet et al. 1989) from the previously sequenced strain ArI3 (designated B in the present paper). The source of

Upload: philippe-normand

Post on 10-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

J Mol Evol (1989) 29:436-447 Journal of Molecular Evolution (~) Springer-Verlag New York Inc. 1989

Phylogeny of Nitrogenase Sequences in Frankla and Other Nitrogen-Fixing Microorganisms

Philippe N o r m a n d and Jean Bousquet

Centre de Recherche en Biologie Forestirre, Universit6 Laval, Qurbec, Province of Qurbec, Canada G1K 7P4

Summary. The complete nucleotide sequence o f a nitrogenase (n/fH) gene was de te rmined f rom a sec- ond strain (HRN18a) o f Frankia, an aerobic soil bacterium. The open reading frame is 870 bp long and encodes a polypept ide o f 290 amino acids. The amino acid and nucleotide sequences were com- pared with 21 other published sequences. The two Frankia strains were 96% similar at the amino acid level and 93% similar at the nucleotide level. A number o f methods were used to infer phylogenies o f these nitrogen fixers, based on n / fH amino acid and nucleotide sequences. The results obta ined do not agree completely with other phylogenies for these bacteria and thus make probable occurrences of lat- eral transfer o f the nifgenes. The t ime o f divergence of the two Frankia strains could be est imated at about 100 million years. The vanad ium-dependen t (Type 2) nitrogenase present in Azotobacter spp. ap- pears to be a recent der ivat ion from the conven- tional molybdenum-dependen t (Type 1) enzyme, whereas the i ron-dependent (Type 3) alternative nitrogenase would have a much older origin.

Key words: Bacterial phylogeny -- Base compo- sition -- Base sequence -- Frankia -- Molecular dating -- Nitrogen fixation -- Nitrogenase gene

Introduction

The capacity to fix nitrogen, which requires a round 20 nif genes, exists in microorganisms o f widely different habitats and from different phylogenetic

Offprint requests to: P. Normand

backgrounds (Akkermans and Houwers 1983). Se- quence similarity is high between the n / fH genes, which code for a subunit o f the nitrogenase complex (Hennecke et al. 1985). This interspecific similarity could be explained in ei ther o f two ways (1) byyecent lateral transfers (Ruvkun and Ausubel 1980), a hy- pothesis based on the fact that nif genes are fre- quently found on plasmids, or (2) by vertical descent coupled with strong functional constraints (Hen- necke et al. 1985). The enzyme complex is known to be able to reduce other molecules such as R - C -= C and R - C - N , besides its natural N = N substrate. This low substrate specificity has given rise to the theory of an ancient origin o f the complex, according to which nitrogenase would have had as first func- t ion that o f detoxifying cyanides and other chemi- cals present in the pr imit ive reducing earth a tmo- sphere (Silver and Postgate 1973).

The nifH gene has been sequenced f rom 14 eu- bacteria, including a Frankia strain (ArI3) that has a symbiot ic association with the roots o f trees in the genus Alnus (Normand et al. 1988). We now have de termined the corresponding sequence o f a second Frankia strain (HRN18a) that lives symbiotically with species o f the genus Elaeagnus (Simonet et al. 1989). This sequence allowed us to measure phy- logenetic relationships among eubacterial nifH genes and also to est imate the approximate t ime of di- vergence between the two Frankia strains.

Materials and Methods

Source ofDNA. Frankia strain HRN 18a, referred to here as strain A, was chosen because it differs greatly biologically (Gardes et al. 1987; Simonet et al. 1989) from the previously sequenced strain ArI3 (designated B in the present paper). The source of

DNA was plasmid pFQI65, described before (Normand et al. 1988), which carries nifH-hybridizing sequences from Frankia strain HRN 18a, an Elaeagnus-compatible strain isolated from a HippophaF rhamnoides growing in the French Alps (P. Simonet, 1983, Th~se de 3 ~me cycle, Universit6 Lyon I). Sequencing was done according to Sanger et al. (1977) using M 13 mp 18 and mp 19 derivatives (Yanish-Perron et al. 1985) cloned in Escherichia coli DH5ctF' frozen competent cells (BRL, Gaithersburg, MD) fol- lowing the strategy presented in Fig. I.

437

Numerical Analysis. Sequences (nucleotide and amino acid) were compiled, aligned, and compared using the Best fit algorithm (Smith and Waterman 1981) of the University of Wisconsin nu- cleic acid sequence analysis software (Devereux et al. ! 984).

The fraction of sites identical (q) between the 22 n/fH se- quences (listed in Table 1) were computed at the amino acid level. Estimates of the number of substitutions per site (d) were calculated (hereafter referred to as replacements per site) using:

du = - I n q (1)

the variance being calculated as recommended by Nei (1987). Phylogenetic trees were obtained using both distance and par-

simony methods. The distance approach was applied only to the amino acid data. Rooted trees were obtained using the average distance method (UPGMA, Sneath and Sokal 1973) and the standard errors of the branching points were calculated to ap- preciate the accuracy of the topologies obtained (Nei et al. 1985). This method assumes that there is a molecular clock, the expected rate of gene substitution being constant among lineages. The KITSCH algorithm of the PHYLogeny Inference Package (PHYL- IP, Felsenstein 1985) was also used for a similar purpose. For each set of data in this case, more than 1000 trees were examined and the criterion for selecting the best fitting topology was the minimization of the quantity

~(0 - E)VOP (2)

Fig. 1. Sequencing strategy for the Frankia HRN 18a ntf/-/gene. The coordinates (0-3.7) are those of plasmid pFQ 165 (Normand et al. 1988) with a map ofrelevant restriction sites. The direction and extent of sequence determination are indicated by arrows above the map. Clones A and B were obtained by partial digestion of fragment SalI-SmaI with Sau3AI (mp 18 • BamHI) or TaqI (mpl9 x AccI). Clones C are Bg/II-BamHI fragments (mpl8 x BamHI), clones D are BglII-SstI fragments (mpl 8 x BamHI- SstI), and E are SstI-BglII fragments (mp19 • SstI-BamHI). The boxes indicate where synthetic 15-mer oligonucleotides were used for sequencing and the numbers above it refer to those in Norrnand et al. (1988). The direction of transcription of the DNA coding fragment (shaded area) was determined by sequence anal- ysis. The sequence was determined for both strands.

where O and E refer, respectively, to the observed and expected distances. The power p was set to 1.0 because the standard de- viation was approximately proportional to the square root of distances and similarities.

Two distance methods were used to obtain unrooted gene trees that did not assume strict homogeneity of substitution rates among lineages. The Fitch and Margoliash (1967) method (FM) was used, with or without an outgroup, and for each set of data, more than 1000 trees were assessed. The criterion for selecting the best fitting topology was the same as for the KITSCH algo- rithm. The Neighbor-Joining (NJ) method (Saitou and Nei 1987) was also used.

For parsimony analysis, the amino acid and nucleotide se- quences of two groups of four representative taxa (group A, Type 1 : Frankia A, Anabaena, Klebsiellapneumoniae, and Clostridium pasteurianum #2; group B, Type 1 vs 3: Azotobacter vinelandi #1, A. vinelandi #3, C. pasteurianum #2, and C. pasteurianum #3) were aligned (not shown). In the nucleotide sequences, the analysis was applied only to the first and second positions of codons, after testing for compositional equilibrium among se- quences. The phylogenetically informative sites were compiled and the probabilities for each of the three possible configurations (Fig. 2) calculated according to Prager and Wilson (1988).

Results

Frankia Sequence

The nifH sequence of strain A appears in Fig. 3 with the predicted amino acids given underneath. The

open reading frame (ORF) is 870 bp and encodes a 290 amino acid polypeptide (predicted molecular weight 31,814). The first ATG begins at position 91 (Fig. 3) and is preceded 6 bases upstream by a GAG- GAG motif, i.e., a Shine-Dalgarno (1974) sequence.

Sequence Comparisons

The coding sequences of Frankia strains A and B are 96% similar at the amino acid level and 93% similar at the nucleotide level. The similarity de- creases dramatically both upstream and down- stream from this region. Upstream of the initial ATG, there is no similarity on a 30-bp stretch except for the Shine-Dalgarno motif (GAGGAG). Down- stream of the nonsense TGA, sequence similarity is around 50% for 25 bp and then increases to about 90% in the zone that is expected to contain the beginning of gene nifD. There is an ATG at coor- dinate 1007, preceded by a GAGG motif at coor- dinate 995, which is a potential ribosome binding site and followed by an ORF (Fig. 3).

When the 22 sequences were aligned (Fig. 4), sev- eral regions were found to be conserved in all strains considered (uppercase letters in the consensus se- quence), whereas others were found to be highly

438

Table 1. Matrices of percentage similarity between pairs of mfH amino acid sequences

Bacterial strain

1 2 3 4 5 6 7 8 9 10 Fh Fa An Ac Av Av3 Kp Rc Tf Rm 290 287 299 290 290 275 293 297 298 298

1. Frankia A 2. Frankia B 96 -- 3. Anabaena 79 78 -- 4. A. chroococcum �9 75 76 73 -- 5. A. vinelandi #1 76 76 72 90 -- 6. A. vinelandi #3 58 59 61 63 63 -- 7. K. pneumoniae 74 75 70 86 88 63 -- 8. Rh. capsulatus 70 74 69 74 74 61 73 -- 9. T. ferrooxidans 70 73 71 73 74 60 76 79 -

10. R. meliloti 69 71 71 71 69 59 68 79 78 -- 11. R. phaseoli 70 7l 72 71 76 60 76 80 79 94 12. R. trifoli 70 72 70 72 71 61 69 79 78 90 13. B. japonicum 71 74 72 75 76 59 76 77 86 78 14. P. Rhizobium 71 74 72 75 76 59 76 77 86 78 15. A. caulinodans 71 75 72 72 74 59 75 76 83 76 16. C. pasteurianum #1 b 65 65 64 68 69 62 68 68 64 65 17. C. pasteurianum #2 64 64 65 68 69 64 69 68 63 65 18. C. pasteurianum #3 62 61 61 64 63 81 64 61 58 59 19. M. voltae 48 48 50 51 50 51 50 46 46 46 20. M. t. #1 61 61 62 63 63 73 65 61 61 60 21. M. t. #2 47 48 48 47 46 47 46 47 44 47 22. M. ivanovi 49 49 48 50 51 51 50 49 47 47

The matrix was constructed using the bestfit function (perfect matches only/pair = 1.5). The numbers below the strains' abbreviations refer to the length in amino acids of each sequence. Frankia A (this work), Frankia B (Normand et al. 1988), Anabaena strain 7120 (Mevarech et al. 1980), Azotobacter chroococcurn Type 2 (Robson et al. 1986), Azotobacter vinelandi Type 1 (Brigle et al. 1985) and Type 3 (Bishop et al. 1988), Klebsiella pneumoniae (Sundaresan and Ausubel 1981), Rhodobacter capsulatus (Schumann et al. 1986), Thiobacillus ferrooxidans (Pretorius et al. 1987), Rhizobium meliloti (T6r6k and Kondorosi 1981), Rhizobium phaseoli (Quinto et al. 1985), Rhizobium trifoli (Scott et al. 1983a), Bradyrhizobiumjaponicum (Fuhrmann and Hennecke 1984), Parasponia Rhizobium (Scott et al. 1983b), Azorhizobium caulinodans strain ORS571 (Norel and Elmerich 1987), Clostridium pasteurianum copies 1, 2, and 3 (Wang et al. 1988), Methanococcus voltae (Souillard and Sibold 1986), Methanococcus thermolithotrophicus #1 (Sibold and Souillard 1988) and M. therrnolithotrophicus #2, and Methanobacterium ivanovi (Souillard et al. 1988)

A B Frankla A K. pneumonlae I ( A. v i n e l a n d i # 3 ~ ~ A , v l n e l a n d i # ! i

Anabaena C. oasteurienum #2 C. oasteurlanum #3 . Dasteurianum #2

Fr'nklaA a ~ # ]I ( K. iorleumoniae C. DasteariaDum#~

C. pasteurl 2 Anabaena C. oasteurlanum #3

II <IIIIIIIIIIIEI Frankia A Anabaena A. uineland! # I A. vlnelandi #3

K. pneumonlae C. pasteurlanum #2 C. pasteurlanum C. aasteurianum #2

Fig. 2. The three possible unrooted trees relating nitrogenases of Frankia, Clostridium pasteurianum #2, Anabaena, and Klebsiella pneumoniae (Group A) and C. pasteurianum sequences #2 and #3 and Azotobacter vinelandi sequences #1 (Type 1) and #3 (Type 3) (Group B).

divergent. The four conserved cysteine residues marked with an asterisk refer to the ligands for the iron-sulfur [4Fe--4S] cluster (Burgess 1985). The conserved arginine residue at coordinate 99, nec-

essary for reversible inactivation through ADP-ri- bose fixation (Zumft 1985) is highlighted with an "@" sign and the conserved putative Mg-ATP bind- ing site (Jones et al. 1985) at coordinates 8-13 is

Table 1. Extended

439

11 12 13 14 15 16 17 18 19 20 21 22 Rp Rt Bj Pr As Cpl Cp2 Cp3 Mv Mtl Mt2 Mi 297 297 294 294 296 273 272 275 278 284 292 263

90 79 78 -- 79 78 97 -- 78 77 92 93 66 65 64 64 66 65 65 65 59 58 60 60 47 49 48 47 6l 60 59 60 47 48 45 45 48 51 48 48

m

64 65 92 -- 60 65 65 47 51 52 60 64 64 44 52 53 48 51 51

51 70 52 -- 48 54 50 51 52 52

m

52

The A. chroococcum sequence used is that of Type 2 (vanadium-dependent) nitrogenase, that of Type 1 (molybdenum-dependent) being >99% similar to that ofA. vinelandi (Robson et al. 1986)

b Three other copies present in that strain were omitted as they are very similar to copies #1 and #2

highlighted with "$$$" signs. Regions at coordi- nates 5-16, 37-45, 82-104, 123-138, 153-164, 177- 186, 238-242, which cover about 25% of the poly- peptide length, are invariant (upper case letters in COnsensus sequence).

Base Composition

The base compositions of the nifH regions are pre- sented in Table 2 and show that Clostridium, An- abaena, and methanogens have a low total percent GC content (35-38% GC) as compared to Frankia (64--65% GC), which is typical of actinomycetes. According to that character, Frankia would be clus- tered readily with typical purple bacteria such as Rhizobium (55-64% GC) and Azotobacter (58-64% GC). The third (degenerate) codon position is par- ticularly skewed with respect to percent GC with extreme values of 17-94% GC. The total percent GC is strongly dependent upon the percent GC of the third codon, which explains most of the varia- tion observed. Individual bases were analyzed and appear to be at compositional equilibrium at second codon positions, but not at first and especially not

at third codon positions (Table 2). Due to the skew at third positions, nucleotides were not used for dis- tance matrix analysis. However, when only four se- quences were considered in the two parsimony anal- yses, compositional equilibrium was reached at first and second codon positions (Table 3).

Amino Acid Distance Trees

Figure 5a shows the rooted tree obtained by UPGMA using the amino acid data where standard errors of branching points indicate uncertainty in the branch- ing order of a certain number of clusters. The two Frankia strains are clustered together, the two Azo- tobacter molybdenum-dependent (#1) and vanadi- um-dependent (#2, Type 2), sequences are grouped with Klebsiella, Rhodobacter capsulatus, and the fast- growing Rhizobium species are separated from the Bradyrhizobium group but the two groups join later on, and the C. pasteurianum copies #1 and #2 are grouped as expected. Surprisingly, the Frankia are not clustered closely with C. pasteurianum, the oth- er gram-positive in this tree, but with Anabaena, a cyanobacterium. Another unexpected feature is that

440

TCCGGGGCCACGACCGCGACCGATATCCGGCCCGCGGGACGCACTGTGCGAAACCGCGCA 60

TGCGTACCCAGGCTCCAGGAGGAGAAAGCAATGCGCCAGATCGCGTTCTACGGCAAGGGT 120

RBS M R Q I A F ~ G K G

GGTATCGGCAAGTCCACCACCCAGCAGAACACCATGGCTGCCATGGCCGAGATGGGCCGT 180

G I G K S T T Q Q N T M A A M A E M G R

CGGGTCATGATCGTCGGCTGCGAUCCCAAGGCTGACTCGACCCGCCTCATCCTGCACTCG 240 R V M I V G C D P K A D S T R L I L H S

SstI AAGGCCCAGACCTCGGTGATCCAGCTCGCTGCCGAGAAGGGGTCCGTCGAGGACCTGGAG 300 K A Q T S V I Q L A A E K G S V E D L E

CTCGACGAGGTGCTCGTCGAGGGCCAGTGGGGCATCAAGTGCGTCGAGTCCGGTGGCCCG 360 L D E V L V E C. Q W G I K C V K S c~ c~ p

GAGCCGGGCGTCGGCTGCGCCGGCCGCGGTGTCATCACCTCCATCACGTACCTGGAGGAG 420 E p ~ V ~ C A G R G V ~ T S I T Y L E E

GCCGGCGCCTACGAGAACCTCGACTTCGTCACCTACGACGTCCTCGGTGACGTTGTCTGC 480

A G A Y E N L D F V T Y D V L G D V V C

BglII GGTGGCTTCGCGATGCCGATCCGCCGGGGCAAGGCCCAGGAGATCTACATCGTGACCTCC 540 C~ C~ P /% M P I R Q G K /% Q E I %' I V T $

GGCGAGATGATGGCGATGTACGCGGCGAACAACATCGCCCGCGGCATCCTCAAGTACGCG 600 G E M M A M Y A A N N T A R ~ I L K Y A

CACTCCGGCGGCGTCCGCCTCGGTGGCCTCATCTGC~CAGCCGC~GACCGACCGTGAG 660 H S G G V R L G ~ L I C ~ S ~ K T D R E

SstI

GACGAGCTGATCATGGAGCTCGCCCGCCGTCTC~CACCCAGATGATCCACTTCATCCCG 720 D E L I M E L A R R L N T Q M I H F I ~

Sstl

CGTAAC~CGTCGTGCAGCACGCCGAGCTCCGCCGGATGACGGTCATCGAGTACGACGAG 780 R N N V V Q H A ~ L R R M ~ V I E Y D E

AAGAAC T CGCAGGC C G ACGAG T ACCCCGCG C TGGCCAAGAAGATC G ACGACGAGAACGAG 840

K N S Q A D E Y II A ~ K K I D D E N E

Sst I/XhoI/SstI ATGAAGACCATCCCGACTCCGATCACAATGGACGAGCTCGAGGAGCTCCTGATCGAGTTC 900 M K T I P T P I T M D E L E E L L I E F

GGGATCATGGCGCAGGAAGACGAGGGCGTCATCGGCAAGAAGGCCGACGCGACCATCGCC 960 G I M A Q E D E S V I G K K A D A T I A

TGAC TCCGATCACCGGCTCTGAAACCTCGAGGATGAC~C.TCCCGATCATGCCGACGTCGCC 1020

* RBS M P T S P

GACCCCGCCGCGGGCCGAGACCGAGGCGAZGA~CGCCG~G~CCTG~CGCACTACCCCGC 1o8o T P P R A E r ~ A M ~ A E v L s H y o A

Fig. 3. Complete nucleotide sequence ofmfH from Frankia strain HRN 18a with its derived amino acid sequence underneath. The nifH ORF is preceded 6 bp upstream by an underlined RBS sequence (GAGGAG), starts at coordinate 91 (ATG), and ends at coordinate 961 (TGA). A second ORF preceded 8 bp upstream by another underlined RBS sequence (GAGG) starts at coordi- nate 1007 (ATG) and extends into the zone corresponding to nifD (Normand et al. 1988). The sites of the four conserved cysteine residues (*) are highlighted. The amino acids conserved in all sequences published so far are underlined. Relevant re- striction sites are indicated above the sequence.

A. vinelandi #3 (alternative molybdenum- , vana- dium-free i ron-dependent nitrogenase, Type 3) clus- ters with the C. pasteurianum #3 (Type 3) and Meth- anococcus thermolithotrophicus #1 (Type 3). The tree obtained with K I T S C H had a very similar to- pology and the only difference was the clustering o f the Azotobacter-Klebsiella group with the Rhizo- bium-Bradyrhizobium before c lus ter ing o f the Frankia-Anabaena group with the large gram-neg- ative cluster. From these analyses, the three meth-

anogens (Type 1) appear to have diverged early. Figure 5b shows the unrooted tree (network) ob- tained with FM applied on amino acid replacement data where the branch lengths are allowed to vary among lineages. Whether or not the methanogens were considered as an outgroup, the tree topology did not vary. The tree obtained with NJ has the same topology as the one obtained with FM. The topology o f this tree differs in one respect f rom that in Fig. 5a: the Frankia-Anabaena group is located closer to the Rhizobium-Bradyrhizobium group than to the Azotobacter-Klebsiella group. The tree in Fig. 5b implies that methanogens (Type 1) exhibit a slower replacement rate than other bacteria repre- sented.

Parsimony Analysis

Pars imony analysis was with the first and second position ofcodons , and with amino acid data. Anal- ysis o f the first group clustered Frankia with Ana- baena both at the amino acid and at the nucleotide levels. Fifteen phylogenetically informative sites were found in the amino acid sequence, seven of which favored tree I (Fig. 2), three tree II, and five tree III. This is not significant however (P = 0.172), due to the low number o f informative amino acid sites in the ntfH sequences. The same strains had 39 informative nucleotide sites, 24 o f which favored tree I in a significant manner (P < 0.01). The case was more clear-cut with group B, where out o f 30 informative amino acid sites, 29 favored grouping A. vinelandi #3 with C. pasteurianum #3; and out o f 37 nucleotide sites, 34 favored the same tree in a highly significant manner (Table 3). In all parsi- mony analyses, third codon positions were not con- sidered because o f the observed skew.

Discussion

Comparison of Nitrogenase and 16S rRNA Phylogenies

The topology of the nifH trees, based on amino acid replacement data is comparable to that obtained by Ochman and Wilson (1987) for Klebsiella, Brady- rhizobium, Rhizobium, and Clostridium and by Woese (1987) and Fox et al. (1980) for Archaebac- teria, Clostridium, and Rhizobium. However , there are two remarkable discrepancies with 16S r R N A trees, that of Frankia and Anabaena, and that o f A. vinelandi #3 and C. pasteurianum #3. Frankia is a typical act inomycete according to 16S r R N A mea- surements (Stackebrandt 1985), and Anabaena is a typical filamentous heterocystous cyanobacter ium belonging to the Nostocaceae ( H u m m and Wicks 1973). Yet, Frankia consistently groups with Ana-

1 Frankia A M ...... RQI Frankia B M ...... RQI

Anabaena 7120 MT-DENIRQI A. chroococcum #2* M .... ALRQC A.vinelandi #i MAM .... RQC A. vinelandi #3 MT ..... RKV

K. pneumoniae MTM .... RQC Rh. capaulatus M---GKLRQI T. ferrooxidans MAMSDKLRQI R. meliloti M---AALRQI R. phaseoli M--SD-LRQI R. trifoli M---AALRQI B. japonicum M---ASLRQI R. Parasponia M---SSLRQI A. caulinodana M---SSLRQI C. pasteurianum #i M ...... RQV C, pasteurianum #2 M ...... RQL C. pasteurianum #3 MT ..... RKI M, voltae M ...... RKF M. thermolithotrophicus#1M~IAPDAKKV M. thermolithotrophicus#2 ...... LKQI M. ivanovi

Consensus

$ $ $ * 43 AFYGKGGIGK STTQQNTMAA MAE-MGRRVM IVGCDPKADS AFYGKGGIGK STTQQNTMAA MAE-MGQRVM IVGCDPKADS AFYGKGGIGK STTSQNTLAA MAE-MGQRIM IVGCDPKADS AIYGKGGIGK STTTQNLVAA LAE-AGKKVM IVGCDPKADS AIYGKGGIGK STTTQNLVAA LAE-MGKKVM IVGCDPKADS AIYGKGGIGK STTTQNTAAA LAYFHDKKVF THGCDPKADS AIYGKGGIGK STTTQNLVAA LAE-MGKKVM IVGCDPKADS AFYGKGGIGK STTSQNTLAA LVE-MGQKIL IVGCDPKADS AFYGKGGIGK STTSQKHLAA LAE-MGQKIL IVGCDPKADS AFYGKGGIGK STTSQNTLAA LVD-LGQKIL IVGCDPKADS AFYGKGGIGK STTSQNTLAA LVD-LGQKIL IVGCDPKADS

AFYGKGGIGK STTSQNTLAA LVE-LGQKIL IVGCDPKADS AFYGKGGIGK STTSQNTLAA LAE-MGQKIL IVGCDPKADS AFYGKGGIGK STTSQNTLAA LAE-MGQKIL IVGCDPKADS AFYGKGGIGK STTSQNTLAA LAE-MGHRIL IVGCDPKADS AIYGKGGIGK STTTQNLTSG LHA-MGKTIMVVGCDPKADS AIYGKGGIAK STTTQNLTAG LVE-RGNKIMVVGCDPKADS AIYGKGGIGK STTQQNTAAA MAHFYDKKVF IHGCDPKADS CIYGKGGIGK STNVGNMAAA LAE-DGKKVL VVGCDPKADS AIYGKGGIGK STTTQNTAAA LAYFFDKKVM IHGCDPKADS

AFYGKGGIGK STTVCNIAAA LAD-QGKKVM VVGCDPKHDC M---SK-R-I AIYGKGGIGK STIVSNIAAAYSK-DYN-VLVIGCDPKADT .......................................................

m-m---irqi afYGKGGIgK STtsqntlaa lae-mgqkil ivGCDPKaDs

441

44 * 93 Frankia A TRLILHSKAQ TSVIQLAAEK GSVEDLELDE VLVEGQWGIK CVESGGPEPG Frankia B TRLILHSKAQ TSVIQLAAEK GSVEDLELDE VLVEGQWGIK CVESGGPEPG Anabaena 7120 TRLMLHSKAQ TTVLHLAAER GAVEDLELHE VMLTGFRGVK CVESGGPEPG A. chroococcum #2 TRLILHSKAQ NTVMEMAASA GSGEDLELED VLQIGYGGVK CVESGGPEPG A.vinelandi #i TRLILHSKAQ NTIMEMAAEA GTVEDLELED VLKAGYGGVK CVESGGPEPG A.vinelandi #3 TRLILGGKPE ETLMDMVRDK G-AEKITNDD VIKKGFLDIQ CVESGGPEPG K. pneumoniae TRLILHAKAQ NTIMEMAAEV GSVEDLELED VLQIGYGDVR CAESGGPEPG Rh. capsulatu8 TRLILNTKLQ DTVLHLAAEV GSVEDLELED WKIGYRGIK CTESGGPEPG T. ferrooxidans TRLILHSKAQ DTVLSLAAEA GSVEDLELED VMKVGYRDIR CVESGGPEPG R. meliloti TRLILNAKAQ DTVLHLAATE GSVEDLELED VLKVGYRGIK CVESGGPEPG R. phaseoli TRLILNAKAQ DTVLHLAAQE GSVEDLELED VLKAGYKGIK CVESGGPEPG R. trifoli TRLILNSKAQ GTVLDLAATK GSVEDLELGD VLKTGYGGIK CVESGGPEPG B. japonicum TRLILHAKAQ DTILSLAASA GSVEDLELED VMKVGYQDIR CVESGGPEPG R. Parasponia TRLILHAEAQ DTILSLAASA GSVEDLELED VMKVGYKDIR CVESGGPEPG A. caulinodan8 TRLILHAKAQ DTILSLAAAA GSVEDLELEE VMKIGYRDIR CVESGGPEPG C. pasteurianum #i TRLLLGGLAQ KSVLDTLREE G--EDVELDS ILKEGYGGIR CVESGGPEPG C. pasteurianum #2 TRLLLGGLAQ KTVLDTLREE G--EDVELDS ILKTGYAGIR C~-ESGGPEPG C, pasteurianum #3 TRLILGGMPQ KTLMDMLRDE G-EEKITTEN IVRVGYEDIR CVESGGPEPG M. voltae TR-TLMHGKI NTVLDTFRDK GPEYMKIED- IVYEGFNGVY CVESGGPEPG Mt #i TRMILHGKPQ DTVMDVLREE GEEAVTLEK- VRKIGFKDIL CVESGGPEPG Mt #2 TSNLRGGQEI PTVLDILREK GLDKMIEIND IIYEGYNGIY CVEAGGPKPG

M. ivanovi TRTLIGKRL- PTILDIVKKK KNAS---IEE VLFEGYGNVK CVESGGPEPG

Consensus Trlilhskaq dtvldlaae- gsvedleled vlkeGyggik CvEsGGPePG

~: The two underlined bases in the M. thermolithotrophicus #i sequence at coordinates 1-2

frame a 3 bp insertion (MSFDEI).

~: The two underlined bases in the M. thermolithotrophicus #2 sequence at coordinates 67-

68 frame a I0 bp insertion (D•LGLETIIEKEMI).

Fig, 4. Line-up of amino acid and consensus sequences. Numbers above line refer to the Frankia HRN18a polypeptide with its f-met numbered 1. The consensus line represents amino acids present in all sequences with an upper case letter and those occurring more frequently than the others with a lower case letter. The * signs above the Frankia sequence highlight the cysteine residues Conserved in all sequences [the bold *s would be the ligands for the (4Fe--4S) cluster (Burgess 1985)], and the @ sign above the conserved arginine residue at coordinate 96 shows the site for the ADP-ribosylation interaction [which provides a reversible enzymatic inactivation of the Fe-protein when attached to it in response to environmental factors (Zumft 1985)]. The $ sign above the conserved glycine residues indicates the site where MgATP binding has been postulated to occur (Jones et al. 1985). Continued on pages 442- 443.

baena rather than with Clostridium, using both dis- tance and parsimony analyses. Therefore, "'illegiti- mate" lateral transfer could be invoked to resolve this discrepancy, and the nift-I genes of Frankia and Anabaena may have been exchanged by such a mechanism. These transfers would involve an an- cestral gram-negative soil bacterium toward Ana- baena and toward Frankia, both of which live in aerobic soil. The idea of such transfers is reinforced

by the observations ofSimonet et al. (1986) that m f genes have been observed on a plasmid in Frankia.

The closeness of R. capsulatus to fast-growing Rhizobium (Table l) agrees with the 16S rRNA- based phylogeny of Woese (1987) and differs from the classic taxonomy in Bergey's manual (Buchanan and Gibbons 1974) where the two organisms belong to two different parts, namely the phototrophic bac- teria (part 1) and the gram-negative aerobic rods

442

94 * @ * 141 Frankia A VGCAGRGVIT SITYLEEAGA Y--ENLDFVT YDVLGDVVCG GFAMPIRQGK Frankia B VGCAGRGVIT SITYLEEAGA Y--ENLDFVT YDVLGDVVCG GFAMPIRQGK Anabaena 7120 VGCAGRGIIT AINFLEENGA YQ-D-LDFVS YDVLGDVVCG GFAMPIREGK A. chroococcum #2 VGCAGRGVIT AINFLEEEGA YS-DDLDFVF YDVLGDWCG GFAMPIRENK A. vinelandi #i VGCAGRGVIT AINFLEEEGA YE-DDLDFVF YDVLGDWCG GFAMPIRENK A.vinelandi #3 VGCAGRGVIT AIDLMEENGA YT-DDLDFVF FDDLGDWCG GFAMPIRDGK K. pneumoniae VGCAGRGVIT AINFLEEEGA YE-DDLDFVF YDVLGDWCG GFAMPIRENK R h . capsulatu8 VGCAGRGVIT AINFLEENGA Y--DDVDYVS YDVLGDWCG GFAMPIRENK T. ferrooxidans VGCAGRGVIT SINFLEENGA Y--DGANYVS YDVLGDVVCG GFAMPIRK-Q R. meliloti VGCAGRGVIT SINFLEENGA Y--NDVDYVS YDVLGDVVCG GFAMPIRENK R. phaseoli VGCAGRGVIT SINFLEENGA Y--DDVDYVS YDVLGDVVCG GFAMPIRENK R. trifoli VGCAGRGVIT SINFLEENGA Y--DDVDYVS YDVLGDVVCG GFAMPIRENK B. japonicum VGCAGRGVIT SINFLEENGA Y--ENIDYVS YDVLGDVVCG GFAMPIRENK R. Parasponia VGCAGRGVIT SINFLEENGA Y--ENIDYVS YDVLGDVVCG GFAMPIRENK A. caulinodans VGCAGRGVIT SINFLEENGA Y--EDIDYVS YDVLGDVVCG GFAMPIRENK C. pasteurianum #1 VGCAGRGIIT SINMLEQLGA YT-DDLDYVF YDVLGDVVCG GFAMPIREGK C. pasteurianum #2 VGCAGRGIIT SINMLEQLGA YT-DDLDFVF YDVLGDVVCG GFAMPIREGK C. pasteurianum #3 VGCAGRGVIT AIDLMEKNGA YT-EDLDFVF FDVLGDVVCG GFAMPIRDGK M. voltae VGCAGRGVIT AVDMLDRLGV YDELKPDVVI YDILGDVVCG GFAMPLQKKL Mt #i VGCAGRGVIT AVDMMBELEG YP-DDLDNLF FDVLGDWCG GFAMPLRDGL Mt #2 YGCAGRGVIV VIDLLKKMNL YKDLKLDIVL YDVLGDVVCG GFAMPLRMGL M. ivanovi VGCAGRGVIV AMGLLDKLGT FS-DDIDIII YDVLGDVVCG GFAVPLREDF

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vGCAGRGvIt sinfleenga y--ddldyvs yDvLGDWCG GFAmPirenk Consensus

142 191 Frankia A AQEIYIVTSG EMMAMYAANN IARGILKYAH SGGVRLGGLT CNSRKTDRED Frankia B AQEIYIVTSG EMMAMYAANN IARGILKYAH SGGVRLGGLI CNSRKTDRED Anabaena 7120 AQEIYIVTSG EMMAMYAANN IARGILKYAH SGGVRLGGLI CNSRKVDRED A. chroococcum #2 AQEIYIVCSG EMMAMYAANN IAKGIVKYAH SGSVRLGGLI CNSRKTDRED A.vinelandi #i AQEIYIVCSG EMMAMYAANN ISKGIVKYAN SGSVRLGGLI CNSRNTDRED A.vinelandi #3 AQEVYIVASG EMMAIYAANN ICKGLVKYAK QSAVGLGGII CNSRKVDGER K. pneumoniae AQEIYIVCSG EMMAMYAANN ISKGIVKYAK SGKVRLGGLI CNSRQTDRED Rh. capsulatus AQEIYIVCSG EMMAVYAANN IPKGILKYAN SGGVRLGGLI CNERKTDREL T. ferrooxidans AQEIYIVMSG EMMAMYAANN ISKGVLKYAN SGGVRLGGLI CNERQTDKEL R. meliloti AQEIYIVMSG EMMALYAANN IAKGILKYAH AGGVRLGGLI CNERHTDREL R. phaseoli AQEIYIVMSG EMMALYAANN IAKGILKYAH SGGVRLGGLI CNERQTDREL R. trifoli AQEIYIVMSG EMMALYAANN IARGILKYAS AGSVRLGGLI CNERQTDREL B. japonicum AQEIYIVMSG EMMAMYAANN ISKGILKYAN SGGVRLGGLI CNERQTDKEL R. Parasponia AQEIYIVMSG EMMAMYAANN ISKGILKYAN SGGVRLGGLI CNERQTDKEL A. caulinodans AQEIYIVMSG EMMAMYAANN !SKGILKYAN SGGVRLGGLV CNERQTDKEL C, pasteurianum #I AQEIYIVASG EMMALYAANN ISKGIQKYAK SGGVRLGGII CNSRKVANEY C. pasteurianum #2 AQEIYIVASG EMMALYAANN ISKGIQKYAK SGGVRLGGII CNSRKVANEY C. pasteurianum #3 AQEVYIVASG EMMAVYAANN ICKGLVKYAN QSGVRLGGII CNSRMVDLER M. voltae AEDVYIVTTC DPMAIYAANN ICKGIKRYGN RGKIALGGII YNGRSVVDEP Mt #I AQEIYIVTSG EMMALYAANN IAKGILKYAE QSGVRLGGII CNARNVDGEK Mt #2 AEQIYVVTSS DYMAIYAANN ICRGISEFVK RGGSKLGGLI YNVRGSMDAY M. ivanovi ADEVYIVTSG EYMALYAANN ICRGI .... K KLKSNLGGII CNCRGIENEV

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Consensus AqeiYiVmsg emMAmYAANN I skGilkyan sggvrLGGli cNsR-tdrel

Fig. 4. Continued

and cocci (part 7). Similarly, Thiobacillus ferroox- idans was assigned to part 12 (gram-negative che- molithotrophic) (Buchanan and Gibbons 1974), but Woese (1987) grouped it in the purple bacteria to- gether with other bacteria formerly designated as gram-negative such as Rhizobium.

Alternative Nitrogenase Forms a Separate Lineage

The three amino acid sequences Mt 1, Cp3, and Av3 form a coherent cluster, which branches out early from the main lineage (Fig. 5a). The conclusion of Sibold and Souillard (1988) who studied the Mtl and Cp3 sequences, was that lateral transfer could explain the observed similarity. Another explana- tion which we propose is that these three sequences must be considered as a separate phylum, being de-

rived early from an alternative nitrogenase (Type 3). The Av3 sequence has been shown to code for an iron protein in a molybdenum-free, vanadium-free nitrogenase complex (Bishop et al. 1988). The Cp3 sequence was not transcribed in a Mo-containing medium, whereas the five other nifH copies were detected as m R N A (Wang et al. 1988). Furthermore, the M~I sequence had properties reminiscent of type 3 nitrogenase (Sibold and Souillard 1988). The Type 2 nitrogenase would appear to have diverged much later, as there is 88-90% similarity between the Mo- dependent (Av 1) and the V-dependent (Ac) nitroge- nase proteins (Table 1).

Dating the Divergence of Two Frankia Strains

An approximation of the replacement rate per site per year in nifH can be deduced if we assume ho-

192 241 Frankia A ELIMELARRL NTQMIHFIPR NNVVQHAELR RMTVIEYDEK NSQADEYRAL Frankia B ELIMELARRL NTQMIHFIPR NNVVQHAELR RMTVIEYDPK NSQADEYRQL Anabaena 7120 ELIMNLAERL NTQMIHFVPR DNIVQHAELR RMTVNEYAPD SNQGQEYRAL A. chroococcum #2 ELIMALAAKI GTQMIHFVPR DNVVQHAEIR RMTVIEYDPK AKQADEYRAL A.vinelandi #I ELIIALANKL GTQMIHFVPR DNVVQRAEIR RMTVIEYDPK AKQADEYRAL A.vinelandi #3 ESVEEFTAAI GTKMIHFVPR DNIVQKAEFN KKTVTEFAPE ENQAKEYGEL K. pneumoniae ELIIALAEKL GTQMIHFVPR DNIVQRAEIR RMTVIEYDPA CKQANEYRTL Rh. capsulatus ELAEALAAKL GCKMIHFVPR NNVVQHAELR RETVIQYDPT CSQAQEYREL T. ferrooxidans ELAEALAGKL GTKLIHFVPR DFIVQHAELR RMTVLEYAPE SKQAQEYRTL R. meliloti DLAEALAARL NSKLIHFVPR DNIVQHAELR KMTVIQYAPN SKQAGEYRAL R. phaseoli DLSEALAARL NSKLIHFVPR DNIVQHAELR KMTVIQYAPD SKQAGEYRAL R. trifoli DLAEALAAKL NSKLIHFVPR DNIVQHAELR KMTVIQYAPR SKQAAEYRWL B. Japonicum ELAEALAKKL GTQLIYFVPR DNVVQHAELR RMTVLEYAPD SKQADHYRKL R. Parasponia ELAEALAKKL GTQLIYFVPR DNVVQHAELR RMTVLEYAPE SQQADHYRNL A. caulinodans ELAENLAKKL GTQLIYF"gPR DNIVQHAELR P/~I~/IEYAPD SVQANHYRNL C. pasteurianum #1 ELLDAFAKEL GSQLIHFVPR SPMVTKAEIN KQTVIEYDPT CEQAEEYREL C. pasteurianum #2 ELLDAFAKEL GSQLIHFVPR SPSVTKAEIN KKTVIEYDPT CEQANEYREL C. pasteurianum #3 EFIEEFAASI GTQMIHFMPR DNIVQKAEFN KQTVIEFDDT CNQAKEYGEL M. voltae EIIDKFVEGI NSQVMGKVPM SNIITKAELR KQTTIEYAPD SEIANKFREL Mt #i ELMDEFCDKL GTKLIHYVPR DNIVQKAEFN KMTVIEFDPE CNQAKEYRTL Mt #2 DIINEFADKL GANIVGKVPN SHLIPEAEIE GKTVIEYDPN DEISQVYREL M. ivanovi QIVSEFAGKV GSKVIGIIPG SEMVQKSEID AKTVIEKFGE SEQADLYREL

COnsensus elieala-kl gtqlihfvPr dnivqhaElr rmTviey-pd skqadeyr-L

443

242 290 Frankia A AKKIDDENEM KTIPTPITMD ELEELLIEFG IMAQEDE-S- VIGKKADATI A Frankia B ANKIVNN-DM KTIPTPITMD ELEELLIDFG IMAQEDE-S- VIGK-AAA-V A Anabaena 7120 AKKI-NN-DK LTIPTPMEMD ELEALKIEYG LLDDDTKHSE IIGKPAEATN RSCRN A. chroococcum #2 AQKILNN-KL LVIPNPGSME DLEELLMEFG IMEAEDE-SI V-GKAGAEG A.Finelandi #I ARKWDN-KL LVIPNPITMD ELEELLMEFG IMEVEDE-SI V-GKTAEEV A.vinelandi #3 ARKIIEN-DE FVIPKPLTMD QLEDMVVKYG IAD K. pneumoniae AQKIVNNTM- VVPTPCTMD ELESLLMEFG IMEEEDT-SI I-GKT-A-AE ENAA Rh. capsulatus ARKVPRNLGQ GVIPTPITME ELEEMLMDFG IMQSEEDREK QIAEMEAAMK PEA T. ferrooxidans AEKIHANAGN PAIPTPITMD ELEDLLMDFG IMQKEDT-SI I-GKTAAELA AAGM R. meliloti AEKIHANSGR GTVPTPITME ELEDMLLDFG IMK-SDEQML AELHAKEAKV IAPH R. phaseoli AEKIHANSGQ GTIPTPITME ELEDMLLDFG IMK-SDEQML AELQAKESAV -VAAQ R. trifoli AEKIHSNSGK GTIPTPITME ELEDMLLDFG IMK-SDEQML EELLAKEVQA AVAP B. japonicum AAKVHNNGGK GIIPTPISMD ELEDMLMEHG IIKAVDE-SI I-GKTAAELA AS R. Parasponia ATKVHNNGGK GIIPTPISMD ELEDMLMEHG IMKPVDE-SI V-GKTAAELA AS A. caulinodans AERVHNNGGK GIIPTPITMD ELEDMLMEHG IMKTVDE-SQ V-GKTAAELA ALSA C. pasteurianum #1 ARKVDAN-EL FVIPKPMTQE RLEEILMQYG LMDL C. pasteurianum #2 ARKVEEN-DM FVIPKPMTQE RLEQILMEHG LID C. pasteurianum #3 ARKIIEN-EM FVIPTPLKMD DLEAMVVKYG MTD M. voltae ANSIYEN.KK TTIPTPLSEQ GLDELTESIE ELVRRKYE Mt #I AKNIDEN-DE LVKPTPMTMD ELEELVVKYG LIDL Mt %2 AKKIYEN-NE GTIPKPLENI EIMTIGKKIK ERLKKERMKN M. ivanovi AKSIYSN-ED FVIPEPMGVD EFDEFF .......... RGFQ

Consensus Arkihnn.gk gviPtPitmd elee.lmefg im..ede.si v.gktaaela asaa

* The sequence given for A. chroococcum (#2) is that of the alternative Vanadium nifH*. The normal ("molybdenum") nifH product is nearly identical to the sequence of A. vinelandi (#1)

Fig. 4. Continued from pages 442-443

mogeneity among lineages and if certain assump- tions are made about events used to calibrate the molecular clock. Hennecke et al. (1985) have argued against using nifgenes as a clock because o f strong selective pressures on several regions o f the proteins. However, Woese (1987) pointed out that molecules with highly constrained regions situated next to un- constrained areas such as nifH or 16S r R N A are reliable molecular clocks. The likely occurrence o f lateral transfers may not invalidate the est imation o f replacement rate i f dating events are chosen so as to exclude divergent lineages between which lat- eral transfers are hypothesized.

The amino acid sequence similarity o f 96% be- tween H R N 18a and ArI3 nifH sequences was high.

Based on 16S rRNA, the 24% divergence observed between the Rhizobium and the Bradyrhizobium groups has been estimated to correspond to roughly t = 600 mil l ion years (Myr) ago (Ochman and Wil- son 1987). The number o f replacements per site per year (~ = d/2t) would then be 0.21 x 10 -9. Slightly higher estimates are obtained, if one assumes a later divergence between Rhizobium and Bradyrhizo- bium, and slightly lower estimates may be deduced if one is willing to accept that divergence between Clostridium and purple bacteria did not occur later than 1300 Myr ago. Overall, the possible range o f X seems to be restricted from 0.15 x 10 -9 to 0.25 x 10 -9. Given that rate o f divergence, the two Fran- kia strains may be estimated to have diverged about

444

Table 2. Base composition characteristic of n/fH nucleotide sequences

Base composition % G + C of

% G + C of First Second Third First position

n/fH ORF base base base G A

1. Frankia A 65 61 39 94 118 83 2. Frankia B 65 61 39 94 115 82 3. Anabaena 48 56 39 48 115 87 4. A. chroococcum 64 66 39 90 128 77 5. A. vinelandi #1 59 62 37 78 128 81 6. A. vinelandi #3 58 59 37 78 120 83 7. K. pneumoniae 59 60 39 77 123 85 8. Rh. capsulatus 60 63 38 79 123 78 9. T. ferrooxidans 58 61 40 73 122 80

10. R. meliloti 58 64 39 72 124 76 11. R. phaseoli 60 63 38 79 123 73 12. R. trifoli 57 61 40 69 121 77 13. B. japonicurn 55 60 39 67 120 81 14. P. Rhizobium 59 60 39 78 118 81 15. A. caulinodans 64 62 40 92 120 81 16. C. pasteurianurn #1 36 54 37 19 109 85 17. C. pasteurianum #2 37 54 39 19 108 89 18. C, pasteurianum #3 35 52 36 17 111 94 19. M. voltae 38 49 37 27 110 101 20. M. thermolithotrophicus #1 n.a. a . . . . . 21. M. thermolithotrophicus #2 36 49 33 27 111 106 22. M. ivanovi 38 52 37 27 110 87

Compositional equilibrium at first, second, and third codon positions was assessed using a heterogeneity x 2 procedure (Sokal and Rohlf 1981): heterogeneity was not significant (P > 0.05) at the second position (X2t6o~ = 53.45), whereas it was significant (P > 0.05) for the first (X2t6oj = 107.93) and especially third positions (X2t60j = 1969.79)

Fig. 5. Phylogenetic gene trees. A Rooted tree showing relatedness of 22 nifH genes based on UPGMA using the amino acid pairwise distance matrix. The thick bars at nodes indicate the standard error of branching points. B Unrooted tree showing relatedness of 22 nifH genes based on the FM method using the amino acid pairwise distance matrix. Bar length represents a distance of 0.1 amino acid replacements per site; only horizontal bars represent distances.

100 Myr ago, at about the time of appearance of the Betulaceae and Myricaceae families (Furlow 1979; Thomas and Spicer 1987), and much before the Elaeagnaceae (Thomas and Spicer 1987). The amino acid replacement rate of nifH in HRNI 8a is three times higher than in ArI3, which would indicate that evolution is taking place faster in that lineage (Fig.

5b). This parallels the more recent appearance of Elaeagnaceae 30 Myr ago relative to the actinorhizal genera nodulated by ArI3: the Myricaceae and Be- tulaceae (Thomas and Spicer 1987). Whether co- evolution between the microsymbiont and its host is very stringent and results in similar evolutionary rates is not known at this time.

Table 2. Extended

Base composition

First position Second position Third position

445

T C G A T C G A T C

31 59 53 94 84 60 105 2 16 168 30 61 53 91 85 59 109 3 14 162 46 52 55 99 84 62 39 74 83 104 33 53 54 92 86 59 97 17 14 163 31 51 52 94 90 55 79 32 34 146 32 41 47 96 78 55 78 25 36 137 34 52 52 94 87 61 87 36 31 140 32 65 56 98 86 58 93 35 29 141 38 59 52 94 86 67 91 40 42 126 33 65 49 96 87 66 87 40 43 128 37 65 46 100 86 66 104 28 35 131 39 61 50 93 87 68 93 47 46 112 39 55 49 96 85 65 98 38 59 100 39 57 49 96 85 65 112 25 42 116 34 62 53 94 86 64 102 10 14 171 46 33 55 87 81 50 20 140 88 25 44 32 54 88 80 51 26 134 86 27 39 32 50 91 85 50 26 116 112 22 42 26 54 95 81 49 34 96 108 41

45 31 53 107 91 42 36 134 80 43 41 26 54 84 84 42 35 80 113 36

The M. thermolithotrophicus DNA sequence was not available

Table 3. Parsimony analysis and statistical comparison of trees for two groups of mfH sequences at both the amino acid and nucleotide (first and second position) levels

Probability"

Tree Sites favored (I vs II) (I vs III)

Group A b

Amino acids

I (FA) (CK) 7 0.17 0.81 II (FC) (AK) 3 III (FK) (AC) 5

Nucleotides

I (FA) (CK) 24 2.7 x 10 -4* 0.01" II (FC) (AK) 5 III (FK) (AC) 10

Group B c

Amino acids

I (A3, C3) (A1, C2) 29 1.9 x 10-9. 2.9 x 10 -s* II (C2, C3) (A1, A3) 0 III (A1, C3) (C2, A3) 1

Nucleotides

I (A3, C3) (A1, C2) 34 1.9 • 10 -4* 6.2 x 10 -s* II (C2, C3) (A1, A3) 0 III (A1, C3) (C2, A3) 3

Heterogeneity was assessed using a X 2 procedure (Sokal and Rohlf 1981): group A heterogeneity was not significant (P > 0.05) at the first position (x2tgl = 11.90) nor at the second position (x2tgj = 1.66), whereas it was significant at the third position (x~tgj = 407.73). For group B, heterogeneity was not significant (P > 0.05) at the first position (x2tgl = 12.69) nor at the second position (X2tgl = 0.98), whereas it was significant at the third position (X2tg~ = 403.50)

Probabilities (P) calculated according to Prager and Wilson (1988). P values < 0.05 are marked with asterisks b F: Frankia A, A: Anabaena, K: K. pneumoniae, C: C. pasteurianum #2; see Fig. 2 cC: C. pasteurianum #2, #3, A: A. vinelandi #1, #3; see Fig. 2

446

Scenar io

O n e scenar io tha t cou ld a c c o u n t for the o b s e r v e d phy logene t i c gene tree is tha t n i t rogenase (Type 1) arose early in the r educ ing a t m o s p h e r e o f p r i m i t i v e ear th in anae rob i c bac te r ia tha t had to detoxify cy- anides . A m u t a t i o n m a y then have t aken place lead- ing to the emergence o f a m o l y b d e n u m - f r e e , i ron - d e p e n d e n t e n z y m e (Type 3). N i t rogenase b e c a m e a c o m m o n feature o f all m i c r o b i a l life un t i l pho to - syn thes iz ing cyanobac t e r i a largely increased the oxygen c o n c e n t r a t i o n a n d b u r n e d cyan ides a b o u t 800 M y r ago ( O c h m a n a n d W i l s o n 1987). The genes i n v o l v e d t h e n d i s appea red f rom m o s t r ap id ly evo lv - ing ae rob ic l ineages ( H e r d m a n 1985) such as acti- n o m y c e t e s (Woese 1987) or eucaryotes , a n d pre- s u m a b l y f rom some cyanobac t e r i a l l ineages wi th large genomes . The e n z y m e was t hen mod i f i ed a n d adap t ed to a n o t h e r re la ted t r ip le b o n d subs t ra te , d in i t rogen , a n d was selected for a n d r e t a ined by some eubac te r ia l l ineages to enab le su rv iva l in n i - t rogen-def ic ien t e n v i r o n m e n t s (Si lver a n d Postgate 1973). Later, it is p r o b a b l e tha t la teral transfer(s) occur red in the soil a n d tha t the F r a n k i a a n d A n - abaena ances tors acqu i r ed the capaci ty to reduce d in i t r ogen no t la ter t h a n 100 M y r ago or ear l ier f rom some soil g r a m - n e g a t i v e m i c r o b e such as f rom an A z o t o b a c t e r ances tor . The ances t ra l F r a n k i a t hen es tab l i shed a symbios i s p r o b a b l y wi th an a n c i e n t a n d comple t e ly ac t ino rh iza l f ami ly such as the Myr - icaceae, f rom which s t ra ins i so la ted today are widely d ivergent (Gardes et al. 1987; S t -Lauren t et al. 1987).

Acknowledgments. P.N. gratefully acknowledges receiving grants (#A-3501) from NSERC of Canada and from Cooprration France- Qurbec, and J.B. thanks NSERC of Canada and Fondation Des- jardins (Montrral) for scholarships. Thanks are expressed to Drs. W. Ellis (Instituut voor Taxonomische Zoologic, Zoologisch Mu- seum, Universiteit van Amsterdam), M. Nei (Center for De- mographic and Population Genetics, The University of Texas, Houston) and P.H. Roy (Dept. Biochimie, U. Laval) for making sequence analysis and phylogeny inference software available, to C. Lemieux (Dept. Biochimie, U. Laval) for making oligonu- cleotide primers available to us, to N. Souillard (Institut Pasteur, Paris) and P.E. Bishop (North Carolina State University, Raleigh) for communication of results prior to publication, and to A. Wilson (Univ. Calif., Berkeley) for critical comments.

References

Akkermans ADL, Houwers A (1983) Morphology of nitrogen fixers in forest ecosystems. In: Gordon JC, Wheeler CT (eds) Biological nitrogen fixation in forest ecosystems: foundations and applications. Martinus Nijhoff/Dr W Junk Publ., The Hague, pp 7-54

Bishop PE, Premakumar R, Joerger RD, Jacobson MR, Dalton DA, Chisnell JR, Wolfinger ED (1988) Alternative nitrogen fixation systems in Azotobacter vinelandii. In: Bothe H, de Bruijn FJ, Newton WE (eds) Nitrogen fixation: hundred years after. Gustav Fischer, Stuttgart, pp 71-79

Brigle KE, Newton WE, Dean DR (1985) Complete nucleotide sequence of the Azotobacter vinelandii nitrogenase structural gene cluster. Gene 37:37--44

Buchanan RE, Gibbons NE (1974) Bergey's manual of deter- minative bacteriology, ed 8. The Williams and Wilkins Co, Baltimore MD

Burgess BK (1985) Nitrogenase mechanism, an overview. In: Evans H J, Bottomley PJ, Newton WE (eds) Nitrogen fixation research progress. Martinus Nijhoff, Dordrecht, The Neth- erlands, pp 543-549

Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12:387-395

Felsenslein J (1985) Confidence limits on phylogenies: an ap- proach using the bootstrap. Evolution 39:783-791

Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155:279-284

Fox GE, Stackebrandt E, Hespell RB, Gibson J, ManiloffJ, Dyer TA, Wolfe RS, Balch WE, Tanner RS, Magrum LJ, Zablen LB, Blakemore R, Gupta R, Bonen L, Lewis B J, Stahl DA, Luehrsen KR, Chen KN, Woese CR (1980) The phylogeny of prokaryotes. Science 209:457-463

Fuhrmann M, Hennecke H (1984) Rhizobiumjaponicum ni- trogenase Fe protein gene (mfH). J Bacteriol 158:1005-1011

Furlow JJ (1979) The systematics of the American species of A lnus (Betulaceae). Rhodora 81 : 1-121

Gardes M, Bousquet J, Lalonde M (1987) Isozyme variation among 40 Frankia strains. Appl Environ Microbiol 53:1596- 1603

Hennecke H, Kaluza K, Thrny B, Fuhrmann M, Ludwig W, StackebrandtE (1985) Concurrent evolution ofnitrogenase genes and 16S rRNA in Rhizobium species and other nitro- gen-fixing bacteria. Arch Microbiol 142:342-348

Herdman M (1985) The evolution of bacterial genomes. In: Cavalier-Smith T (ed) The evolution of genome size. Wiley & Sons, New York, pp 37-68

Humm HJ, Wicks SR (1973) Introduction and guide to the marine blue-green algae. Wiley & Sons, New York

Jones R, Taylor W, Robson R (1985) A model for the MgATP binding site of nitrogenase iron protein. In: Evans HJ, Bot- tomley PJ, Newton WE (eds) Nitrogen fixation research pro- gress. Martinus Nijhoff, Dordrecht, The Netherlands, p 634

Mevarech M, Rice DA, Haselkorn R (1980) Nucleotide se- quence of a cyanobacterial n/fH gene coding for nitrogenase reductase. Proc Natl Acad Sci USA 77:6476-6480

Nei M (1987) Molecular evolutionary genetics. Columbia Uni- versity Press, New York

Nei M, Stephens JC, Saitou N (1985) Methods for computing the standard errors of branching points in an evolutionary tree and their application to the molecular data from human and apes. Mol Biol Evol 2:66-85

NorelF, EImerichC (1987) Nucleotide sequence and functional analysis of the two mfH copies of Rhizobium ORS571. J Gen Microbiol 133:1563-1576

Normand P, Simonet P, Bardin R (1988) Conservation of n f sequences in Frankia. Mol Gen Genet 213:238-246

Ochman H, Wilson AC (1987) Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J Mol Evol 26:74-86

Prager EM, Wilson AC (1988) Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. J M01 Evol 27:326-335

Pretorius IM, Rawlings DE, O'Neill EG, Jones WA, Kirby R, WoodsDR (1987) Nucleotidesequenceofthegeneencoding the nitrogenase iron protein of Thiobacillus ferrooxidans. J Bacteriol 169:367-370

Quinto C, De la Vega H, Flores M, Leemans J, Cevallos MA, Pardo MA, Azpiroz R, Girard M de L, Calva E, Palacios R

447

(1985) Nitrogenase reductase: a functional multigene family in Rhizobiurn phaseoli. Proc Natl Acad Sci USA 82:1170- 1174

Robson R, Woodley P, Jones R (1986) Second gene (n/fH*) coding for a nitrogenase iron protein in Azotobacter chroo- coccum is adjacent to a gene coding for a ferredoxin-like pro- tein. EMBO J 5:1159-1163

Ruvkun GB, Ausubel FM (1980) lnterspecies homology ofni- trogenase genes. Proe Natl Acad Sci USA 77:191-195

St-Laurent L, Bousquet J, Simon L, Lalonde M (1987) Sepa- ration of various Frankia strains in the Alnus and Elaeagnus host specificity groups using sugar analysis. Can J Microbiol 33:764-772

Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406--425

Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitor. Proc Nail Acad Sci USA 74:5463- 5467

Schumann Jp, Waitches GM, Scolnick PA (1986) A DNA frag- ment hybridizing to a n/fprobe in Rhodobacter capsulatus is homologous to a 16S rRNA gene. Gene 48:81-92

Scott KF, Rolfe BG, Shine J (1983a) Biological nitrogen fixa- tion: primary structure of the Rhizobium trifolii iron protein gene. DNA 2:149-155

Scott K_F, Rolfe BG, Shine J (1983b) Nitrogenase structural genes are unlinked in the nonlegume symbiont Parasponia Rhizobium. DNA 2:141-148

Shine J, Dalgarno L (1974) The 3'-terminal sequence of Esch- erichia coli 16S ribosomal RNA: complementary to nonsense triplets and ribosome binding sites. Proc Nail Acad Sci USA 71:1342-1346

Sibold L, Souillard N (1988) Genetic analysis of nitrogen fix- ation in methanogenic archaebacteria. In: Bothe H, de Bruijn FJ, Newton WE (eds) Nitrogen fixation: hundred years after. Gustav Fischer, Stuttgart, pp 705-710

SilverWS, PostgateJR (1973) Evolution ofasymbiotic nitrogen fixation. J Theor Biol 40:1-10

Simonet p, Haurat J, Normand P, Bardin R, Moiroud A (1986) Localization of nifgenes on a large plasmid in Frankia sp. strain ULQ0132105009. Mol Gen Genet 204:492--495

Simonet P, Normand P, Hirsh AM, Akkermans ADL (1989) The genetics of the Frankia actinorhizal symbiosis. CRC Crit Rev (in press)

SmithTF, WatermanMS (1981) Comparisonofbio-sequences. Adv Appl Math 2:482-489

Sheath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, San Francisco

Sokal RR, Rohlf FJ (1981) Biometry, ed 2. W.H. Freeman, New York

Souillard N, Sibold S (1986) Primary structure and expression of a gene homologous to nifH (nitrogenase Fe protein) from the archaebacterium Methanococcus voltae. Mol Gen Genet 203:21-28

Souillard N, Magot M, Possot O, Sibold L (1988) Nucleotide sequence of regions homologous to niflt (nitrogenase Fe pro- tein) from the nitrogen-fixing archaebacteria Methanococcus therrnolithotrophicus and Methanobaeterium ivanovii: evo- lutionary implications. J Mol Evol 26:65-76

Stackebrandt E (1985) The significance of"wall types" in phy- logenetically based taxonomic studies on actinomycetes. In: Szabo G, Biro S, Goodfellow M (eds) Sixth International Sym- posium on Actinomycetes biology, Symposia Biologica Hun- garica, vol 32. Akademiai Kiado, Budapest, pp 497-506

Sundaresan V, Ausubel FM (I981) Nucleotide sequence of the gene coding for the nitrogenase iron protein from Klebsiella pneumoniae. J Biol Chem 256:2808-2812

Thomas BA, Spicer RA (1987) The evolution and palaeobiol- ogy of land plants. Croom Helm, London.

T/Srrk I, Kondorosi A (1981) Nucleotide sequence of the Rhi- zobium mefiloti nitrogenase reductase (mfH) gene. Nucleic Acids Res 9:5711-5723

Wang S-Z, Chert J-S, Johnson JL (1988) The presence of five n/fH-like sequences in Clostridium pasteurianum: sequence divergence and transcription properties. Nucleic Acids Res 16:439--454

Woese CR (1987) Bacterial evolution. Microbiol Rev 51:221- 271

Yanisch-Perron C, Vieira J, Messing J (1985) Improved MI3 phage cloning vectors and host strains: nucleotide sequences of the Ml3mpl8 and pUC19 vectors. Gene 33:103-119

Zumft WG (1985) Regulation of nitrogenase activity in the anoxygenic phototrophic bacteria. In: Evans HJ, Bottom|ey PJ, Newton WE (eds) Nitrogen fixation research progress. Martinus Nijhoff, Dordrecht, The Netherlands, pp 551-557

Received August 8, 1988/Revised and accepted March 14, 1989