supporting online material for - sciencescience.sciencemag.org/content/suppl/2011/05/04/... ·...
TRANSCRIPT
www.sciencemag.org/cgi/content/full/332/6030/714/DC1
Supporting Online Material for
Single-Cell Genomics Reveals Organismal Interactions in Uncultivated Marine Protists
Hwan Su Yoon, Dana C. Price, Ramunas Stepanauskas, Veeran D. Rajah, Michael E. Sieracki, William H. Wilson, Eun Chan Yang, Siobain Duffy, Debashish Bhattacharya*
*To whom correspondence should be addressed. E-mail: [email protected]
Published 6 May 2011, Science 332, 714 (2011)
DOI: 10.1126/science.1203163
This PDF file includes:
Materials and Methods SOM Text Figs. S1 to S3 Tables S1 to S8 References
1
REPORT
Single-cell genomics reveals organismal interactions in uncultivated marine protists
Hwan Su Yoon, Dana C. Price, Ramunas Stepanauskas, Veeran D. Rajah, Michael E. Sieracki,
William H. Wilson, Eun Chan Yang, Siobain Duffy, Debashish Bhattacharya
Supporting Online Material
Analysis of Plastid Genes in Paulinella chromatophora
To determine whether the inability to identify plastid DNA in the picobiliphytes, in spite of
extensive genome sampling of MS584-11 and MS584-22, may reflect an unknown bias
associated with our approach, we searched for plastid DNA in SAG-MDA derived Illumina
genome data from a 50-cell sample of the photosynthetic amoeba Paulinella chromatophora
FK01 for which the plastid genome sequence is known (S24). We chose this species because the
genome data for P. chromatophora were generated using the same approach as for the
picobiliphytes and therefore provided a direct test of the idea that plastid genes can be
successfully recovered from SAG-MDA derived Illumina sequence reads. Ten bins of
unassembled data, each totaling 80 Mbp (theoretical 1x coverage of the amoeba nuclear
genome), were created by randomly retrieving 640,000 reads of length 125 bp from a 3.1 Gbp P.
chromatophora Illumina-generated DNA library. The bins were then each used as a BLASTx
query (e-value ≤ 1e-20) against a protein database containing all FK01 plastid proteins. Using
this approach, we identified an average of 149 matches per bin to the 841 distinct proteins on the
FK01 organelle genome. A total of 459/841 plastid proteins had matches over the ten bins of
data (the P. chromatophora plastid sequence and Illumina genome data used to determine the
frequency of plastid genes recovered from these reads are freely available at
http://dbdata.rutgers.edu/data/pico). Although the P. chromatophora plastid genome is ~5-6-fold
larger than in a typical alga (S24), and we sampled pooled DNA from a culture, our data suggest
that if present, plastid DNA should have been identified among the ~3 Gbp and ~9 Gbp of total
data from MS584-11 and MS584-22, respectively.
2
Materials and Methods
A 50 mL coastal water sample was collected from 1 m depth in Boothbay Harbor in the Gulf of
Maine, U.S.A. (43°50'39.76"N, 69°38'27.76"W). Sampling was at high tide (8:15 am) on July
25th, 2007. Water temperature was 18°C. Samples were kept in the dark at in situ temperature
until processing (< 6h). Subsamples (3 mL) were incubated for 10 min with Lysotracker Green
DND-26 (75 nmol.L-1; Invitrogen), a pH-sensitive green fluorescing probe that stains food
vacuoles in protists (S25). Target cells were identified and sorted using a MoFlo™ (Beckman-
Coulter) flow cytometer equipped with a 488 nm laser for excitation. Prior to sorting, the
cytometer was cleaned thoroughly with bleach: all tubes, plates, and buffers were UV-treated
prior to use to remove any DNA contamination: a 1% NaCl solution (0.2 µm filtered and UV
treated) was used as sheath fluid (S26).
Heterotrophic protists were identified by the presence of Lysotracker fluorescence and absence
of chlorophyll fluorescence. Side scatter was used to select protists <10 µm in diameter that were
deposited into 96 well plates, with some wells dedicated to positive (10 cells/well) and negative
controls (0 cells/well). All wells on the microplates contained 5 µL 1 x PBS (sample labels
starting with MS584) or Lyse-N-Go (Pierce) (sample labels starting with MS609). Samples were
centrifuged briefly and stored at -80ºC. Processing of a cell to generate a single cell amplified
genome (SAG) using multiple displacement amplification (MDA) was done as previously
described (S25). The PCR survey of the SAGs included 18S rDNA, actin, alpha-, and beta-
tubulin all of which returned positive gene products. DNA from four picobiliphyte SAGs
(MS584-5, MS584-11, MS584-22, and MS609-66) were re-amplified using the Repli-G midi kit
(Qiagen) using the manufacturer’s instructions. The products of the second MDA reaction were
de-branched with S1 nuclease to reduce chimeric sequences during MDA (S27) and purified with
a spin column (QIAquick PCR Purification Kit, Qiagen).
About 5 µg of genomic DNA derived from each SAG with the A260/280 ratio of 1.85 were used
for shotgun sequencing with the GS-FLX Titanium platform (Roche) at the DNA Facility at the
University of Iowa (http://dna-9.int-med.uiowa.edu/). One-quarter of a picotitre plate was used to
generate sequence data from each picobiliphyte SAG resulting in over 230,000 reads per SAG.
The individual sequence reads were assembled using Celera wgs-6.0 beta (see
3
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page) using default
settings (see table S3 for assembly output).
Thereafter about 10 µg of MDA-derived total DNA from MS584-11 and MS584-22 were each
used to construct a library (sheared DNA fragments were of size 500 bp) for 100 bp x 100 bp
paired-end sequencing using an Illumina GAIIx instrument in the Bhattacharya lab. Standard
Illumina protocols (http://www.illumina.com/) were used to generate the library. We generated
29,286,431 reads totaling nearly 3 Gbp for MS584-11 and 68,757,098 reads totaling 9.5 Gbp for
MS584-22. The MS584-11 Illumina data were co-assembled with the 454 reads from this SAG-
using the proprietary software in CLC Genomics Workbench (http://www.clcbio.com/) resulting
in 73,286 contigs with a total size 27.6 Mbp and a N50 of 638 bp. Assembly of only the Illumina
data from MS584-22 using the CLC Genomics Workbench resulted in 74,660 contigs with a total
size 29.4 Mbp and a N50 of 506 bp.
A local database was used to analyze the singletons and contigs resulting from the picobiliphyte
454-derived single cell genome assemblies. This database is described in Moustafa et al. (S28)
and is composed of predicted and annotated proteins from RefSeq (Release 42), the genome of
the red alga Cyanidioschyzon merolae (S29), diatom and green algal genomes available from the
Joint Genome Institute, and partial EST data from protists such as dinoflagellates and
cryptophytes available from other public repositories. The singleton analysis was done from each
SAG 454 assembly to determine the phylogenetic origins of the unassembled reads. Using a
BLASTx cut-off value of E≤1e-10 and the database described above, we found hits to 14402,
17671, and 2244 singletons in MS584-5, MS584-11, and MS584 -22, respectively (list of
singleton hits for each SAG available at http://dbdata.rutgers.edu/data/pico). BLASTx analysis
with a threshold value of E≤1e-5 identified 62, 3646, and 102 hits to mitochondrial DNA in the
contigs of MS584-5, MS584-11, and MS584-22, respectively. Phylogenomic analysis was done
as described in Moustafa et al. (S28). Resulting alignments were analyzed using PhyML (S30)
with the approximate likelihood ratio test (aLRT) SH-like support values (S31) to infer ML trees
under the WAG model. These trees were filtered with PhyloSort (S32) by searching for the
monophyly of picobiliphytes with other eukaryotic and prokaryotic groups of interest with aLRT
support score ≥0.90, or ≥0.70. For the trees presented in the main text paper we also used
4
RAxML (S33) with the WAG + Γ + I model of amino acid evolution to generate the trees. One
hundred bootstrap replicates were used with RAxML, PhyML, or maximum parsimony (for
rDNA) to assess the stability of nodes in these phylogenies (e.g., S34)
References and Notes
S24. A. Reyes-Prieto et al., Mol Biol Evol 27, 1530 (2010).
S25. J. M. Rose, D. A. Caron, M. E. Sieracki, N. Poulton, Aquat Micob Ecol 34, 263 (2004).
S26. R. Stepanauskas, M. E. Sieracki, Proc Natl Acad Sci U S A 104, 9052 (2007).
S27. K. Zhang et al., Nature Biotechnol 24, 680 (2006).
S28. A. Moustafa et al., Science 324, 1724 (2009).
S29. M. Matsuzaki et al., Nature 428, 653 (2004).
S30. S. Guindon, O. Gascuel, Syst Biol 52, 696 (2003).
S31. M. Anisimova, O. Gascuel, Syst Biol 55, 539 (2006).
S32. A. Moustafa, D. Bhattacharya, BMC Evol Biol 8, 6 (2008).
S33. A. Stamatakis, T. Ludwig, H. Meier, Bioinformatics 21, 456 (2005).
S34. J. D. Hackett et al., Mol Biol Evol 24, 1702 (2007).
S35. S. Q. Le, O. Gascuel, Mol Biol Evol 25, 1307 (2008).
S36. J. P. Huelsenbeck, M. A. Suchard, Syst Biol 56, 975 (2007).
5
Figure Legends Figure S1. Analysis of genome data from picobiliphyte SAGs. (A) The bar graphs on the left are
the results of analysis of the taxonomic distribution of total and unique BLASTx hits for genes in
eukaryotic phyla using as query the 454-derived singleton reads from each SAG assembly. The
total number of singletons analyzed for MS584-5, MS584-11, and MS584-22 is shown. The pie
charts on the right of the bar graphs show the total number of hits to viral or bacterial phyla. (B)
Distribution of the total number of BLASTx hits to different ssDNA virus sequences using as
query contigs derived from the assembly of 454 data from MS584-5.
Figure S2. Phylogeny of picobiliphyte sequences. (A) Maximum likelihood (RAxML) tree of
Rep proteins from representative ssDNA viruses showing the phylogenetic position of the
MS584-5 Rep. RAxML bootstrap values are above the branches and those derived from PhyML
(when nodes are shared) are below the branches. Only bootstrap values ≥60% are shown.
Circoviruses and their proposed sister group cycloviruses are in maroon text and nanoviruses in
green. Rep from marine ssDNA viruses are shown in blue, whereas sequences derived from
ocean metagenome data is in red. RW viruses are from reclaimed water, CB from Chesapeake
Bay, and BCC from the coast of British Columbia. (B) Bayesian phylogeny inferred using a
concatenated alignment (2594 aa) of the nuclear proteins actin, alpha-tubulin, beta-tubulin, heat
shock protein 90, cytosolic heat shock protein 70, ribosomal protein L3, and 26S proteasome
non-ATPase regulatory subunit. This is the most-likely tree derived from Phylobayes (V3.2e)
analysis under the LG rate matrix (S35). Rates across sites were modeled under a Dirichlet
process (S36). Four independent chains were run for 43,191 cycles each, until the mean
discrepancy (meandiff) across all bipartitions was < 0.0015 (burnin = 20%). Bayesian posterior
probability values are shown above the branches, whereas RAxML bootstrap values (when
≥60%) are shown below.
Figure S3. Maximum likelihood (PhyML) tree returned by the phylogenomics pipeline that
shows members of the major facilitator superfamily (MFS) of membrane transporters. MFS
proteins are single-polypeptide secondary carriers that facilitate the transport across cytoplasmic
or internal membranes of a variety of small metabolites. The aLRT values (when ≥0.500) are
shown at the branches. GenBank numbers are shown for each taxon. Viridiplantae are shown in
green text, chromalveolates are shown in brown text, and Cyanobacteria in blue.
6
Table S1. Temperature, chlorophyll a (Chl), and microbe abundances (by flow cytometry) in the
25 July 2007 sample, compared to the 10-year average for week number 30 in Boothbay Harbor,
ME. Abbreviations: HBac: heterotrophic bacteria, Syn: Synechococcus, PPROT: phototrophic
protists (<20µm), Crypt: cryptophytes, HPROT: heterotrophic protists (<20µm).
Table S2. Results of rDNA analysis of SAG DNA generated using FACS-MDA. The SAG data
shown in black text were derived from cells sorted using Lysotracker Green DND-26 to identify
heterotrophs. The SAG data shown in green text were derived from cells sorted using
chlorophyll autoflourescence to identify phototrophs. The SAG data shown in red text had
intermediate autoflourescence levels. Note that picobiliphytes occur only in the heterotrophic
fraction in these SAG data.
Table S3. Results of the Celera wgs-6.0 beta draft genome assembly using as input 454
pyrosequencing reads from SAGs MS584-5, MS584-11, and MS584-22.
Table S4. The number of protein sequences in our local database that was used for the BLASTx
and phylogenomic analyses (based on phyla).
Table S5. Annotation of representative BLASTx hits to mtDNA and ptDNA (in gray
background) using as query, translated 454-derived picobiliphyte genome contigs (utg [unitig]
under Celera) from MS584-5, MS584-11, and MS584-22.
Table S6. BLASTx top hits to contigs derived from the MS584-22 Illumina assembly using the
CLC Genomics Workbench. Proteins with plastid-encoded homologs in other taxa are shown
with the green background and mitochondrial proteins with the red background.
Table S7. Results of the phylogenomic analysis of contigs generated from the assembly of
454+Illumina data from MS584-11. The putative proteins were predicted using BLASTx, which
were then used as a query against our local database and the output analyzed with PhyloSort (S9)
to identify the different monophyletic groups. A total of 5231 maximum likelihood (PhyML)
trees were returned by the pipeline.
7
Table S8. Gene ontology (GO) annotations of the 1683 Stramenopiles proteins that grouped at
aLRT≥0.70 (using PhyML) with proteins encoded on MS584-11 contigs (454+Illumina
assembly). The maximum likelihood phylogenetic approach provides strong evidence that the
Stramenopiles and picobiliphyte proteins are putative homologs.
MS584-5 Virus BLASTx contig hits
Faba bean necrotic yellows virus 20143468 Faba bean necrotic yellows virus 20143464
Columbid circovirus 9635462 Subterranean clover stunt virus 20530237
Milk vetch dwarf virus 20177460 Milk vetch dwarf virus 20177462
Subterranean clover stunt virus 20530225 Faba bean necrotic stunt virus 255961479
Milk vetch dwarf virus 20177476 Milk vetch dwarf virus 20177478
0 200 400 600 800
Total reads = 194,410
Faba bean necrotic yellows virus 20143454
Subterranean clover stunt virus 20530225 Columbid circovirus 9635462
Milk vetch dwarf virus 20177460
Tomato leaf curl Pakistan virus associated DNA 1 239740610 Gossypium mustilinum symptomless alphasatelite 254728909
Raven circovirus 115334608
A
B
Total reads = 187,791
Total reads = 203,608
BLASTx singleton hits
BLASTx singleton hits
BLASTx singleton hits
Ostreococcus virus OsV5 163955008
Enterobacteria phage RB69 32453540 Enterobacteria phage JSE 238694906
Synechococcus phage S-RSM4 255928994
Synechococcus phage S-PM2 58532945 Paramecium bursaria Chlorella virus NY2A 157952472 Vibrio phage KVP40 34419317
MS584-5
Eukaryote BLASTx Hits
Figure S1
MS584-11
PBSX family phage teminase large subunit [Elusimicrobium minutum Pei191]
Phage DNA modification methylase [Gramella forsetii KT0803]DNA methylase [Salmonella enterica subsp. enterica serovar 4,[5],12:i:- str. CVM23701]
Hypothetical protein [Flavobacterium johnsoniae UW101]Hypothetical protein [Bacteroides sp. 4 3 47FAA]Hypothetical protein RB2501 01256 [Robiginitalea biformata HTCC2501]Hypothetical protein [Bacteroides sp. D1]Glutathionylspermidine synthase family protein [Campylobacter concisus 13826]
N-6 adenine-specific DNA methylase [Chryseobacterium gleum ATCC 35910]DNA methylase N-4/N-6 domain-containing protein [Fusobacterium mortiferum ATCC 9817]
N-6 adenine-specific DNA methylase [Neisseria meningitidis MC58]
Hypothetical protein ALPR1 14269 [Algoriphagus sp. PR1]Methyltransferase type 11 [Psychromonas ingrahamii 37]Hypothetical protein plu2793 [Photorhabdus luminescens subsp. laumondii TTO1]
Hypothetical protein [Chryseobacterium gleum ATCC 35910]Methylglyoxal synthase [Gramella forsetii KT0803]
Conserved protein [Spirosoma linguale DSM 74]Hypothetical protein [Sphingobacterium spiritivorum ATCC 33861Long-chain acyl-CoA thioester hydrolase [Lentisphaera araneosa HTCC2155]Carbamoyl transferase [Prochlorococcus marinus str. MIT 9215]Hypoxanthine phosphoribosyltransferase [Thermobaculum terrenum ATCC BAA-798]
MS584-22
Virus Replication-associated (Rep) protein top hits
31
242116
16
1414
246
199
145109
105
804 5344
1685
3650
2458
2435
1613
842331
0 20 40 60 80 100 120 140 160 180
Metazoa
Viridiplantae
Stramenopiles
Haptophyta
Fungi
Choanoflagellata
Apicomplexa
Rhodophyta
Ciliata
Amoebozoa
All gene hits
Unique gene hits
7777
2346
2945
484
76
0 20 40 60 80 100
Metazoa Viridiplantae Haptophyta
Ciliata Fungi
Choanoflagellata Stramenopiles
Amoebozoa Heterolobosea
Rhodophyta Malawimonadidae
Euglenozoa
//
546//0 50 100 150 200
Viridiplantae Metazoa
Stramenopiles Haptophyta
Fungi Choanoflagellata
Rhodophyta Jakobida
Cryptophyta Ciliata
Heterolobosea Malawimonadidae
Amoebozoa Euglenozoa
Apicomplexa
A B
Figure S2
RW_D FJ959080
CB_A FJ959082BBC_A FJ959086
RW_C FJ959079
RW_E FJ959081RW_A FJ959077
RW_B FJ959078
EBA53362 GOS_11576ECU79003 GOS_10979
EBA56617 GOS_7546ECL36795 GOS_3680446
ECU78738 GOS_11246ECU78869 GOS_11113
93
MS584-5 nanovirusEBA56731 GOS_7420
EBA53737 GOS_9213EBA57629 GOS_6345
ECU79006 GOS_10976EBA56841 GOS_7308
ECU78740 GOS_11241 ECU78741 GOS_11242100
62
EBA54545 GOS_5400ECU78686 GOS_11284
EBA55223 GOS_966390ECU78694 GOS_11292
EBA54350 GOS_8983
98
100EBA53738 GOS_9212
EBA54666 GOS_5125EBA56619 GOS_7541EBA55027 GOS_9630EBA54301 GOS_9846
99
82100
100
70
100
100
99
100
94
100
71
98
82
66
99
96 95
63
69
65
62
95
97
73
62
92
100
100
100
100
100100
100
100
100
100100
80
ECU78821 GOS_11159EBA55575 GOS_881096
10094
76
62100
100
100100
0.2 substitution/sites
abaca bunchy top virus EF546813pea necrotic yellow dwarf virus GU553134
subterranean clover stunt virus U16731
banana bunchy top virus S56276
fava bean necrotic stunt virus GQ150778
milk vetch dwarf virus AB000921
fava bean necrotic yellows virus AJ132180
coconut foliar decay virus M29963
raven circovirus DQ146997
duck circovirus DQ100076
finch circovirus DQ845075
mulard duck circovirus AY228555
cyclovirus NG14 GQ404855
starling circovirus DQ172906
beak and feather disease virus AF071878
cyclovirus TN25 GQ404857
columbid circovirus AF252610
porcine circovirus 2 AY424401
cyclovirus TN18 GQ404858
gull circovirus DQ845074
porcine circovirus 1 AF071879muscovy duck circovirus AY394721
canary circovirus AJ301633
cyclovirus NG13 GQ404856
Neurospora crassaMagnaporthe oryzae
Monosiga brevicollisXenopus laevis
Danio rerioDrosophila malenogaster
Apis melliferaHartmanella vermiformisPolysphondylium pallidum
Dictyostelium discoideumGaldieria sulphuraria
Cyanidioschyzon merolaePorphyrideum cruentum
Porphyra yezoensisCalliarthron tuberculosum
Glaucocystis nostochinearumCyanophora paradoxa
Pavlova lutheriPrymnesium parvum
Isochrysis galbanaEmiliania huxleyi
Leucocryptos marinaKatablepharis japonica
Rhodomonas salina
Ostreococcus tauriOstreococcus lucimarinus
Micromonas pusilla CCMP1545Volvox carteri
Chlamydomonas reinhardtiiSelaginella moelendorffii
Physcomitrella patensZea mays
Glycine maxNicotiana tabacumArabidopsis thaliana
Bigelowiella natansPhytophthora sojaePhytophthora ramorum
Spumella uniguttataThalassiosira pseudonana
Phaeodactylum tricornutumHeterosigma akashiwo
Ectocarpus siliculosusAureococcus anophagefferens
Tetrahymena thermophilaParamecium tetraurelia
Toxoplasma gondiiPlasmosium bergheiPlasmodium yoelii
Plasmodium falciparumTheileria parva
Babesia bovisPerkinsus marinus
Oxyrrhis marinaKarlodinium micrum
Karenia brevisAmphidinium carterae
Heterocapsa triquetraCrypthecodinium cohnii
Alexandrium tamarenseMalawimonas jakobiformis
Reclinomonas americanaJakoba libera
Naegleria gruberi
Euglena gracilis
Leishmania braziliensisTrypanosoma cruzi
Trypanosoma brucei
0.05 substitutions/site
Debaryomyces hansenii var. hansenii
Goniomonas truncata
Cryptomonas paramecium
Storeatula sp. CCMP1868
Guillardia theta
picobiliphytesTelonema spp.82
Bigelowiella sp. RCC337
Thaumatomonas spp.Cercomonas spp.
EXCA
VATES
ALVEO
LATES
Ciliates
Dinoflagellates
Apicom
plexans STRA
MEN
OPILES
VIRIDIPLANTAE
RHIZARIA
HAPTOPHYTES
CRYPTOPHYTES
1.001.00
1.00
1.001.00
1.00
1.00
1.00
1.00
1.001.00
1.001.00
1.00
1.00
1.00
1.00
1.001.00
1.001.00
1.00
1.00
1.001.00
1.001.00
1.001.00
1.001.00
1.00
1.001.00
1.00
1.00
1.00
1.00
1.00
1.001.00
1.001.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.001.001.00
0.84
0.97
0.97
0.99
1.00
0.84
0.99 1.00
0.97
0.98
0.98
0.88
0.75
0.77
0.94
0.97
RHODOPHYTES
GLAUCOPHYTES
KATABLEPHARIDS
OPISTH
OK
ON
TS
AMOEBOZOANS
100
100100
100
100
100
100
95100
79
5875
100
100100
100
100
100
100
100
100
100
8691100
71
86
100
100
100
100
66
100
83
100
57
85
90
98
95
10094
100100
10098
100
100
100
100
100
99
99
92
100100100
10079
69
100
67
100
90
100
0.71
Firmicutes-Desulfitobacterium hafniense DCB 2 gi219669246Firmicutes-Desulfitobacterium hafniense Y51 gi89894809
Archaea-uncultured methanogenic archaeon RC I gi147919199Firmicutes-Geobacillus sp. WCH70 gi239827233Firmicutes-Geobacillus thermodenitrificans NG80 2 gi138895101Firmicutes-Geobacillus sp. G11MC16 gi1962483170.9991.000
1.000
Firmicutes-Clostridium difficile ATCC 43255 gi255306166Firmicutes-Bacillus pumilus SAFR 032 gi157693658Firmicutes-Bacillus pumilus ATCC 7061 gi1940156711.000
0.517
0.973Proteobacteria-Myxococcus xanthus DK 1622 gi108757274
Chloroflexi-Herpetosiphon aurantiacus ATCC 23779 gi159897325DeinococciThermus-Deinococcus radiodurans R1 gi15805499
DeinococciThermus-Truepera radiovictrix DSM 17093 gi2976243931.000Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298245573Proteobacteria-Anaeromyxobacter sp. Fw109 5 gi153003014
ChlamydiaeVerrucomicrobia-bacterium Ellin514 gi223937403BacteroidetesChlorobi-Chloroherpeton thalassium ATCC 35110 gi193216275
BacteroidetesChlorobi-Chlorobium phaeobacteroides BS1 gi189500461BacteroidetesChlorobi-Chlorobium phaeobacteroides DSM 266 gi119356932BacteroidetesChlorobi-Chlorobium limicola DSM 245 gi189346520
BacteroidetesChlorobi-Chlorobium phaeovibrioides DSM 265 gi145219611BacteroidetesChlorobi-Chlorobium luteolum DSM 273 gi78187011
BacteroidetesChlorobi-Pelodictyon phaeoclathratiforme BU 1 gi194336310BacteroidetesChlorobi-Chlorobium ferrooxidans DSM 13031 gi110598825
BacteroidetesChlorobi-Chlorobium chlorochromatii CaD3 gi781887010.889
0.997
0.9210.946
0.9470.986
1.0001.000
0.828
0.873
0.589
0.944
0.952
Amoebozoa-Dictyostelium purpureum jgi153860Excavata-Euglena gracilis tbELL00004958 1
Fungi-Aspergillus nidulans FGSC A4 gi67540906Fungi-Cryphonectria parasitica jgi98183
Fungi-Neurospora discreta jgi91020Fungi-Neurospora tetrasperma jgi128437Fungi-Neurospora crassa OR74A gi850902150.544
1.0000.996
0.9990.971
Fungi-Phycomyces blakesleeanus jgi58217Metazoa-Gallus gallus gi118093093
Archaea-uncultured methanogenic archaeon RC I gi147919579Archaea-Methanoculleus marisnigri JR1 gi126178485
0.931
0.989 Archaea-Methanoculleus marisnigri JR1 gi126178372Proteobacteria-Desulfococcus oleovorans Hxd3 gi158521441Synergistetes-Dethiosulfovibrio peptidovorans DSM 11002 gi288574428
Proteobacteria-delta proteobacterium MLMS 1 gi94264370Archaea-Archaeoglobus profundus DSM 5631 gi284161696
0.939
0.797Archaea-Methanocaldococcus sp. FS406 22 gi289192209Archaea-Methanococcus aeolicus Nankai 3 gi150401740
0.985
1.000Cyanobacteria-Thermosynechococcus elongatus BP 1 gi22299784
Cyanobacteria-Nodularia spumigena CCY9414 gi119513633Firmicutes-Bacillus mycoides Rock1 4 gi229003478Firmicutes-Bacillus mycoides Rock3 17 gi228995861
1.000
1.000Proteobacteria-Geobacter lovleyi SZ gi189423960
Proteobacteria-Geobacter metallireducens GS 15 gi78222026
0.973
1.000Chloroflexi-Sphaerobacter thermophilus DSM 20745 gi269836125
Archaea-Natrialba magadii ATCC 43099 gi289581721Archaea-Haloterrigena turkmenica DSM 5511 gi2841649491.000
Thermotogae-Kosmotoga olearia TBF 1951 gi239617592Chloroflexi-Dehalogenimonas lykanthroporepellens BL DC 9 gi300087199Archaea-Methanocorpusculum labreanum Z gi124485825
Fungi-Aspergillus nidulans FGSC A4 gi67542007Fungi-Gibberella zeae PH 1 gi46111151
Fungi-Trichoderma atroviride jgi81021Proteobacteria-Proteus penneri ATCC 35198 gi226329390Proteobacteria-Proteus mirabilis HI4320 gi197287414Proteobacteria-Proteus mirabilis ATCC 29906 gi2273583430.855
0.999
1.000
1.000
0.787
BacteroidetesChlorobi-Rhodothermus marinus DSM 4252 gi268318007Proteobacteria-Burkholderia sp. H160 gi2095197431.000
Synergistetes-Anaerobaculum hydrogeniformans ATCC BAA 1850 gi289524222Stramenopiles-Aureococcus anophagefferens jgi68835
Viridiplantae-Chlorella vulgaris jgi84047Viridiplantae-Micromonas sp. RCC299 gi255086117
Viridiplantae-Ostreococcus RCC809 jgi59417Viridiplantae-Ostreococcus lucimarinus CCE9901 gi145352299Viridiplantae-Ostreococcus tauri jgi355530.800
1.0000.9990.673
Rhizaria-Reticulomyxa filosa esgi113375476 2Alveolata-Perkinsus marinus ATCC 50983 gi294890468
Alveolata-Alexandrium tamarense dxJHC2643 2Alveolata-Alexandrium tamarense dxJHC1912 6
Stramenopiles-Thalassiosira pseudonana CCMP1335 gi224014272Stramenopiles-Phaeodactylum tricornutum CCAP 1055/1 gi219129023
Stramenopiles-Fragilariopsis cylindrus jgi1910450.931
0.9650.824
Haptophyceae-Emiliania huxleyi jgi251833Haptophyceae-Emiliania huxleyi jgi214474
Picobiliphyte MS584-11 Contig31230_30.926
0.901
0.984Stramenopiles-Aureococcus anophagefferens jgi66784
Alveolata-Alexandrium tamarense dxJHC3961 5Viridiplantae-Micromonas sp. RCC299 gi255088117
Viridiplantae-Ostreococcus tauri jgi35061Viridiplantae-Ostreococcus RCC809 jgi37682
Viridiplantae-Ostreococcus lucimarinus CCE9901 gi1453510761.000Stramenopiles-Phaeodactylum tricornutum CCAP 1055/1 gi219117791
Stramenopiles-Fragilariopsis cylindrus jgi179802
0.970
1.000
0.853
Haptophyceae-Emiliania huxleyi jgi99444Excavata-Euglena gracilis tbELL00003695 3
0.936
0.874
0.544
0.985
0.985
0.595
0.853
Chloroflexi-Thermomicrobium roseum DSM 5159 gi221633466Chloroflexi-Sphaerobacter thermophilus DSM 20745 gi269836157
0.940
1.000
0.633
Actinobacteria-Rothia mucilaginosa DY 18 gi283458194Actinobacteria-Kocuria rhizophila DC2201 gi184200701
Actinobacteria-Brevibacterium mcbrellneri ATCC 49030 gi295395562Actinobacteria-Nocardia farcinica IFM 10152 gi54026425Actinobacteria-Mycobacterium gilvum PYR GCK gi145224984
Actinobacteria-Mycobacterium vanbaalenii PYR 1 gi120402948Actinobacteria-Mycobacterium sp. KMS gi119867815Actinobacteria-Mycobacterium sp. JLS gi1264342981.000
0.7221.000
Actinobacteria-Rhodococcus erythropolis PR4 gi226305816Actinobacteria-Rhodococcus erythropolis SK121 gi229490989
0.608
1.000
0.967
0.9240.824
0.918
1.000
0.864
0.911
0.796
0.958
0.817
0.990
0.779
1.000
0.732
0.954
0.898
1 substitution/sites
Figure S3