supporting information - pnas · 10/8/2014  · gmf1061042970635f_1_839_1 diplosphaera colitermitum...

10
Supporting Information Sakowski et al. 10.1073/pnas.1401322111 SI Materials and Methods Metagenomic Libraries. Chesapeake Bay library CBB was sampled in September 2002 and amplified using the linker-amplified shotgun library (LASL) method before transformation and picking random colonies for Sanger sequencing (1). Chesapeake Bay library CBJ was sampled in October 2004 as part of the first Global Ocean Survey (2). DNA was inserted in a medium-copy plasmid and randomly selected clones were Sanger-sequenced (2). Dry Tortu- gas libraries were sampled from surface seawater near the Dry Tortugas, Florida (24°29N, 83°4W) in January 2004. Chesapeake Bay libraries CFA through CFD were collected over 24 h at station CB 858 (38°58N, 76°23W) in July 2007. Water samples were collected on July 30, 2007 at 0600 hours (CFA), 1130 hours (CFB), 1630 hours (CFC), and July 31, 2007 at 0600 hours (CFD). Total viral nucleic acids from the time series samples were separated into dsDNA, ssDNA, and RNA fractions using hydroxyapatite chromatography. The ssDNA and RNA fractions from each time point were pooled and transformed into dsDNA to provide li- braries CBS and CBR, respectively. Dry Tortugas and Chesapeake Bay libraries were amplified by the LASL method before trans- formation and sequencing (3). After the induction treatment virus particles were concentrated by tangential-flow filtration (4). The Gulf of Maine Library GMF was sampled at station GOM04 (44°075 N, 67°583 W) in January 2006. Identification and Distribution of Putative Viroplankton Ribonucleotide Reductases. All sequences were screened by a Conserved Domain BLAST search (5) to confirm homology to ribonucleotide re- ductase (RNR). Each read was queried against the CDD database (v3.10) using Conserved Domain BLAST (5) and BLASTx (6) and sorted by the top results. Interlibrary RNR Frequency Normalization. Differences in read length between libraries were corrected as follows to allow for interlibrary RNR frequency comparisons: % RNR of library corrected = 0 B @ r R × RNR L 1 C A × 100%; where r is the mean read length of the individual library, R is the mean read length of all libraries being compared, RNR is the number of reads with homology to RNR within the individual library, and L is the number of reads in the individual library. Intralibrary RNR Frequency Normalization. RNR frequencies were normalized within a library by the mean read length of the library and gene length of the top BLAST hit reference phage for each metagenomic sequence. The corrected number of reads was calculated as follows: # Corrected readsðRNRcÞ = r G × RNRn; where r is the mean read length of the individual library, G is the gene length of the RNR from reference phage, and RNRn is the number of reads with a top BLAST hit to a particular reference phage within the individual library. The proportion of each group/subunit combination per library was calculated as follows: % RNR group=subunit = RNRc P RNRc × 100%; where RNRc is the corrected number of reads for a given RNR group/subunit. Alignments and Phylogenetic Trees. Subsequent alignments and trees were made by extracting sequences from the original alignment (Figs. S2, S3, and S4). Because of the high number of reference sequences within the class I Other, class II Other, and class II Ribonucleotide TriPhosphate Reductase (RTPR) groups (Fig. 1) it was necessary to cluster these sequences at 80% identity using the furthest neighbor algorithm in mothur (7). Representative se- quences from each cluster were aligned with the metagenomic sequences. Metagenomic sequences belonging to the class II RTPR group were also clustered at 80% identity to reduce the number of sequences on the tree (Fig. S1). Predicted RNR Group Abundances in the Chesapeake Bay. The abundance (viruses per milliliter) of identified RNR groups was predicted for libraries CFACFD using direct count values ob- tained from epifluorescence microscopy (8) and recruitment to RefSeq viral genomes. The abundance of each group was calcu- lated as (VA) × (BR/TB) × (RNR/G), where VA is the observed viral abundance (per milliliter) for a given library, BR is the number of bases in the library that recruited to reference viral genomes, TB is the number of total bases in the library, RNR is the number of predicted RNR genes sampled in a given group in the library, and G is the number of total predicted genomes sampled in the library. Identification of Redoxins. Thioredoxins and glutaredoxins from RNR-encoding phages within the order Caudovirales were ob- tained from the GenBank nr database. These sequences were compiled to create a reference database. Phage genomes were queried against this reference database using BLASTx with an e-value cutoff of 1e-01 to identify hypothetical or unannotated proteins that may be putative redoxins. Contig Assembly and Annotation from the Rhode River. Fifty liters of surface water from the Rhode River was sampled at the Smithsonian Environmental Research Center in Edgewater, Maryland. The <0.2-μm fraction was concentrated by the Fe (III) chloride method (9); 2 ×150 bp paired-end reads were sequenced with the Illumina HiSeq at the University of Delaware Sequencing and Genotyping Center at the Delaware Biotechnology Institute. Contigs were assembled from 50 million paired-end reads using MetaVelvet (kmer = 67). Contigs over 5 kb were queried against the UniProt90 RNR database (BLASTx, e value 1e-05) and retained for ORF identification using MetaGene Annotator (10). ORFs were annotated by homology to reference sequences identified by BLAST (6) (accession nos. KM520158KM520331). 1. Bench SR, et al. (2007) Metagenomic characterization of Chesapeake Bay vir- ioplankton. Appl Environ Microbiol 73(23):76297641. 2. Rusch DB, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through eastern tropical Pacific. PLoS Biol 5(3):e77. 3. Andrews-Pfannkoch C, Fadrosh DW, Thorpe J, Williamson SJ (2010) Hydroxyap- atite-mediated separation of double-stranded DNA, single-stranded DNA, and RNA genomes from natural viral assemblages. Appl Environ Microbiol 76(15): 50395045. Sakowski et al. www.pnas.org/cgi/content/short/1401322111 1 of 10

Upload: others

Post on 01-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Supporting InformationSakowski et al. 10.1073/pnas.1401322111SI Materials and MethodsMetagenomic Libraries.ChesapeakeBay libraryCBBwassampled inSeptember 2002 and amplified using the linker-amplified shotgunlibrary (LASL)method before transformation and picking randomcolonies for Sanger sequencing (1). Chesapeake Bay library CBJwas sampled in October 2004 as part of the first Global OceanSurvey (2). DNA was inserted in a medium-copy plasmid andrandomly selected clones were Sanger-sequenced (2). Dry Tortu-gas libraries were sampled from surface seawater near the DryTortugas, Florida (24°29′N, 83°4′W) in January 2004. ChesapeakeBay libraries CFA through CFDwere collected over 24 h at stationCB 858 (38°58′N, 76°23′W) in July 2007. Water samples werecollected on July 30, 2007 at 0600 hours (CFA), 1130 hours (CFB),1630 hours (CFC), and July 31, 2007 at 0600 hours (CFD). Totalviral nucleic acids from the time series samples were separatedinto dsDNA, ssDNA, and RNA fractions using hydroxyapatitechromatography. The ssDNA and RNA fractions from each timepoint were pooled and transformed into dsDNA to provide li-braries CBS and CBR, respectively. Dry Tortugas and ChesapeakeBay libraries were amplified by the LASL method before trans-formation and sequencing (3). After the induction treatment virusparticles were concentrated by tangential-flow filtration (4). TheGulf of Maine Library GMF was sampled at station GOM04(44°07′5 ″N, 67°58′3 ″W) in January 2006.

    Identification and Distribution of Putative Viroplankton RibonucleotideReductases. All sequences were screened by a Conserved DomainBLAST search (5) to confirm homology to ribonucleotide re-ductase (RNR). Each read was queried against the CDD database(v3.10) using Conserved Domain BLAST (5) and BLASTx (6) andsorted by the top results.

    Interlibrary RNR Frequency Normalization.Differences in read lengthbetween libraries were corrected as follows to allow for interlibraryRNR frequency comparisons:

    % RNR of library corrected =

    0B@

    rR×RNR

    L

    1CA× 100%;

    where r is the mean read length of the individual library, R is themean read length of all libraries being compared, RNR is thenumber of reads with homology to RNR within the individuallibrary, and L is the number of reads in the individual library.

    Intralibrary RNR Frequency Normalization. RNR frequencies werenormalized within a library by the mean read length of the libraryand gene length of the top BLAST hit reference phage for eachmetagenomic sequence. The corrected number of reads wascalculated as follows:

    # Corrected readsðRNRcÞ =�rG

    �×RNRn;

    where r is the mean read length of the individual library, G is thegene length of the RNR from reference phage, and RNRn is the

    number of reads with a top BLAST hit to a particular referencephage within the individual library.The proportion of each group/subunit combination per library

    was calculated as follows:

    % RNR group=subunit =�

    RNRcPRNRc

    �× 100%;

    where RNRc is the corrected number of reads for a given RNRgroup/subunit.

    Alignments and Phylogenetic Trees. Subsequent alignments and treeswere made by extracting sequences from the original alignment(Figs. S2, S3, and S4). Because of the high number of referencesequences within the class I Other, class II Other, and class IIRibonucleotide TriPhosphate Reductase (RTPR) groups (Fig. 1) itwas necessary to cluster these sequences at 80% identity using thefurthest neighbor algorithm in mothur (7). Representative se-quences from each cluster were aligned with the metagenomicsequences. Metagenomic sequences belonging to the class IIRTPR group were also clustered at 80% identity to reduce thenumber of sequences on the tree (Fig. S1).

    Predicted RNR Group Abundances in the Chesapeake Bay. Theabundance (viruses per milliliter) of identified RNR groups waspredicted for libraries CFA–CFD using direct count values ob-tained from epifluorescence microscopy (8) and recruitment toRefSeq viral genomes. The abundance of each group was calcu-lated as (VA) × (BR/TB) × (RNR/G), where VA is the observedviral abundance (permilliliter) for a given library, BR is the numberof bases in the library that recruited to reference viral genomes, TBis the number of total bases in the library, RNR is the number ofpredicted RNR genes sampled in a given group in the library, andG is the number of total predicted genomes sampled in the library.

    Identification of Redoxins. Thioredoxins and glutaredoxins fromRNR-encoding phages within the order Caudovirales were ob-tained from the GenBank nr database. These sequences werecompiled to create a reference database. Phage genomes werequeried against this reference database using BLASTx with ane-value cutoff of 1e-01 to identify hypothetical or unannotatedproteins that may be putative redoxins.

    Contig Assembly and Annotation from the Rhode River. Fifty liters ofsurface water from theRhodeRiver was sampled at the SmithsonianEnvironmental Research Center in Edgewater, Maryland. The

  • 4. Wommack KE, Sime-Ngando T, Winget DM, Jamindar S, Helton RR (2010) Filtration-based methods for the collection of viral concentrates from large water samples.Manual of Aquatic Viral Ecology, eds Wilhelm SW, Weinbaur MG, Suttle CA (Am. Soc.of Limnology and Oceanography, Waco, TX), pp 110–117.

    5. Marchler-Bauer A, et al. (2011) CDD: A Conserved Domain Database for the functionalannotation of proteins. Nucleic Acids Res 39(Database issue):D225–D229.

    6. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignmentsearch tool. J Mol Biol 215(3):403–410.

    7. Schloss PD, et al. (2009) Introducing mothur: Open-source, platform-independent,community-supported software for describing and comparing microbial communities.Appl Environ Microbiol 75(23):7537–7541.

    8. Chen F, Lu JR, Binder BJ, Liu YC, Hodson RE (2001) Application of digital imageanalysis and flow cytometry to enumerate marine viruses stained with SYBR gold.Appl Environ Microbiol 67(2):539–545.

    9. John SG, et al. (2011) A simple and efficient method for concentration of ocean vi-ruses by chemical flocculation. Environ Microbiol Rep 3(2):195–202.

    10. Noguchi H, Taniguchi T, Itoh T (2008) MetaGeneAnnotator: Detecting species-specific patterns of ribosomal binding site for precise gene prediction in anony-mous prokaryotic and phage genomes. DNA Res 15(6):387–396.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 2 of 10

    www.pnas.org/cgi/content/short/1401322111

  • DTF1061043203475r_1_800_1; DT (1); GM (9)

    Ectocarpus siliculosus (9)

    GMF1061042942020f_103_880_1; CB (1); DT (1); GM (2)

    CFA1061053817669f_1_691_1

    Rhodococcus phage RGL3 (2)

    Bacteriophage RM378

    Alkaliphilus metalliredigens QYMF (2)

    Candidatus Protochlamydia amoebophila UWE25

    Mycobacterium phage Tiger (22)

    Mycobacterium phage Gladiator (16)

    CFA Contig 1 (6); CB (21); DT (12); GM (13)

    DTF Contig 15 (2)

    Thermus phage P23-45

    Thermus phage P74-26

    CFA1061053822165r_5_739_1

    Nitrosococcus halophilus Nc4

    Roseiflexus castenholzii DSM 13941 (4)

    CBJ1098127015314_915_1_1

    Celeribacter phage P12053L

    Roseophage SIO1

    Acanthocystis turfacea Chlorella virus 1 (7)

    GMF Contig 14 (3); DT (2); GM (9)

    GMF1061042970635f_1_839_1

    Diplosphaera colitermitum TAV2 (2)

    Salpingoeca sp. ATCC 50818

    Monosiga brevicollis MX1

    GMF1061042661264f_1_935_1

    GMF Contig 3 (8); DT (6); GM (21)

    DTF Contig 5 (3); CB (1); DT (25); GM (2)

    Puniceispirillum phage HMO-2011

    Chlamydomonas reinhardtii

    Volvox carteri f. nagariensis

    0.4

    Mycobacterium phage

    Phage RM378

    Clade II

    ATCV-1Clade I

    Clade III

    Clade IV

    Clade I

    Clade II

    Clade III

    Clade IV

    Fig. S1. Unrooted maximum likelihood tree with 100 bootstrap replicates of class II RNR reference and putative metagenomic RTPR sequences. Metagenomicsequences from the large tree (Inset) were clustered at 80% identity. Representative metagenomic sequences were placed on the tree, with the number ofreads from each environment within that cluster listed. Bacterial references from the large tree (Inset) were clustered at 80% identity. Representative se-quences were placed on each tree. Numbers in parentheses following bacterial references indicate the number of reference sequences within that cluster. Scalebar represents amino acid substitutions per site. Bacteria are shown in purple, eukaryotes and eukaryotic viruses in orange, myoviruses in red, siphoviruses inblue, podoviruses in green, and metagenomic sequences in black. Celeribacter phage P12053L was colored as a podovirus based on its T7-like DNA polymeraseeven though it is officially listed as an unclassified dsDNA phage. Black, gray, and white circles represent bootstrap support ≥100%, 75%, and 50%, re-spectively. CB, Chesapeake Bay; DT, Dry Tortugas; GM, Gulf of Maine.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 3 of 10

    www.pnas.org/cgi/content/short/1401322111

  • Cyanophage Ma-LMM01 Cyanophage Ma-HPM05

    Cyanophage Ma-LMM02 Cyanophage Ma-LMM03

    Cyanothece sp. CCY 0110 NrdA2

    P-SSM2

    Synechococcus sp. PCC7002

    S-CBS2

    CBR1061057323945r_825_1_1

    P60

    Pelagibacter phage HTVC008M CBJ1098101650305_23_923_1

    Arthrospira platensis NIES-39 Arthrospira maxima CS-328

    P-SSP9 DTS Contig 2 (4)

    DTS1061059628551r_1_780_1

    CBJ1098101635164_885_1_1 CFB1061053832039f_822_1_1

    P-SS2

    S-ShM2

    Synechocystis sp. PCC6803

    Cyanobacterium UCYN-A

    DTF1061043186908f_854_1_1

    S-SSM7

    P-HM2 P-HM1

    Cyanothece sp. PCC 7424 Microcystis aeruginosa PCC 7806

    Cyanothece sp. CCY 0110 NrdA1 Cyanothece sp. ATCC 51142

    Syn5

    GMF Contig 7 (6)

    Syn9

    Syn1 S-PM2

    CBJ1098127013765_232_910_2

    Synechococcus sp. PCC 7335

    CBJ Contig 2 (2) S-SM2

    CFD Contig 14 (3)

    S-CBS4

    DTR1061059708722r_64_828_2

    Synechococcus sp. JA-3-3Ab Synechococcus sp. JA-2-3B'a(2-13) Nostoc sp. PCC 7120 Anabaena variabilis ATCC 29413

    Syn33

    P-SSM7 P-SSM4

    DTF1061043225462r_796_1_1 P-RSP5

    P-SSP10

    CBB017C04.y01_102_812_1

    CFA Contig 10 (2) CFB Contig 5 (4)

    P-GSP1

    DTF Contig 8 (3) DTF1061042876837_990_1_1

    Synechococcus elongatus PCC 7942 Thermosynechococcus elongatus BP-1

    S-SM1

    P-HP1

    DTF1061043197464f_808_1_1 DTF1061043192230f_735_1_1

    CBB008H08.y01_1_669_1

    GMF Contig 39 (2)

    P-SSP7

    Prochlorococcus marinus subsp. marinus str. CCMP1375 Prochlorococcus marinus str. NATL1A

    Syn19

    P-SSP11 P-SSP5

    S-CBP4

    GMF1061042926875f_1_685_1

    P-SSP3 P-SSP2

    P-RSM4 S-SSM5

    CBJ1098214050374_886_1_1 S-CBP3

    Synechococcus sp. RCC307 Synechococcus sp. WH 5701

    Prochlorococcus marinus str. MIT 9515 Prochlorococcus marinus subsp. pastoris str. CCMP1986 Prochlorococcus marinus str. MIT 9312

    GMF Contig 11 (4)

    Prochlorococcus marinus str. MIT 9303 Prochlorococcus marinus str. MIT 9313

    Prochlorococcus marinus str. MIT 9301 Prochlorococcus marinus str. MIT 9215

    GMF1061042934309f_687_1_1 GMF1061042937625r_817_1_1

    GMF Contig 23 (3)

    Synechococcus sp. WH 8102

    GMF1061042970087r_1_912_1 KBS-S-1A

    S-RIP2 DTS Contig 4 (2)

    Synechococcus sp. CC9605

    Synechococcus sp. WH 7803 Synechococcus sp. WH 7805

    Synechococcus sp. RS9917

    GMF Contig 5 (7)

    CBB019C09.y01_1_709_1 S-RIP1

    Synechococcus sp. CC9902 Synechococcus sp. BL107

    Synechococcus sp. RS9916 Synechococcus sp. CC9311

    DTR1061059708668r_1_902_1 DTS Contig 3 (3)

    0.3

    T4-like cyanomyoviruses

    Cyanosiphoviruses&

    cyanopodoviruses

    Cyanobacteria

    CyanobacteriaClass I

    Class II

    Cyanosiphoviruses&

    cyanopodoviruses

    T4-like cyanomyoviruses

    Fig. S2. Unrooted maximum likelihood tree with 100 bootstrap replicates of class I alpha and class II RNR reference and putative metagenomic Cyano se-quences. Numbers in parentheses indicate the number of reads assembled in each contig. Scale bar represents amino acid substitutions per site. Bacteria areshown in purple, myoviruses in red, siphoviruses in blue, podoviruses in green, and metagenomic sequences in black. Black, gray, and white circles representbootstrap support ≥100%, 75%, and 50%, respectively.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 4 of 10

    www.pnas.org/cgi/content/short/1401322111

  • Thermus phage phiYS40 Thermus phage TMA

    Aeromonas salmonicida subsp. salmonicida A449 (15)Thioalkalivibrio sp. K90mix

    Roseovarius sp. 217 phage 1

    Magnetospirillum magneticum AMB-1 (3) Roseophage DSS3P2 Roseophage EE36P1

    Phenylobacterium zucineum HLK1 (3) Roseobacter sp. CCS2 (14)

    Clostridium difficile QCD-32g58 (4)

    Thermobaculum terrenum ATCC BAA-798

    CFD Contig 11 (3) DTF Contig 26 (2)

    DTF1061043181921_1782_938_2 Salinibacter ruber DSM 13855

    Rhodospirillum rubrum ATCC 11170 Burkholderia multivorans ATCC 17616 (3)

    Bacillus coagulans 36D1 (8)

    Phage phiJL001

    Zunongwangia profunda SM-A87 (10)

    Oxalobacter formigenes HOxBLS

    Paenibacillus sp. JDR-2 (2) Bacillus tusciae DSM 2912 (2)

    Roseobacter phage RDJL Phi 1

    GMF1061042906920f_834_1_1 GMF1061042906719f_1_672_1

    GMF Contig 40 (2) GMF1061042963326f_1_823_1

    Burkholderia pseudomallei DM98 (7)

    Pseudomonas phage YuA Pseudomonas phage M6

    Candidatus Puniceispirillum marinum IMCC1322 GMF1061042661109r_1_940_1

    GMF1061042967907r_717_1_1

    Ralstonia eutropha H16 (7) Methylococcus capsulatus str. Bath

    GMF1061042934442r_760_1_1 GMF1061042970573r_630_1_1

    GMF1061043206965f_804_1_1 GMF Contig 10 (5)

    0.4

    α-proteobacteria

    Clade I Clade II

    Clade III

    Bacteroidetes

    γ-proteobacteria

    PhiJL-likeclade

    α-proteobacteria

    β-proteobacteriaFirmicutes

    Class CI lass II

    Clade III

    Clade II

    Clade I

    PhiJL-like

    Class I

    Class IIAeromonas phage 44RR2.8t Aeromonas phage 31

    Aeromonas phage phiAS4 Aeromonas salmonicida bacteriophage 25

    Blattabacterium sp. (Blattella germanica) str. Bge (7) Vibrio phage pVp-1

    CFC Contig 5 (2)

    Thermus aquaticus Y51MC23 (7) Cyanophage S-TIM5

    GMF1061042927759r_1_807_1

    GMF Contig 56 (2)

    Deftia phage phiW-14

    GMF Contig 4 (7) Halophage AAJ-2005

    Lymphocystis disease virus-isolate China Lymphocystis disease virus 1

    DTF1061042915561r_791_1_1 GMF1061043206175r_1_828_1

    GMF Contig 42 (2)

    CFA1061053829314f_1_770_1 Francisella novicida FTG (6)

    Grouper iridovirus Singapore grouper iridovirus

    GMF1061043245027r_933_1_1

    Rana tigrina ranavirus

    Frog virus 3 Soft-shelled turtle iridovirus

    GMF1061042968114f_929_1_1

    GMF1061042925962f_784_1_1

    GMF1061042971560f_1_840_1 CFA1061053812415f_675_1_1

    Regina ranavirus Ambystoma tigrinum stebbensi virus

    GMF Contig 30 (3) GMF1061042926741f_1_727_1

    Pelagibacter phage HTVC019P GMF1061042949966r_830_1_1

    CBS1061057326800r_735_1_1

    GMF1061042926426f_648_1_1 GMF1061042943969r_1_765_1

    Candidatus_Pelagibacter_ubique_HTCC1002 (27)

    Caulobacter phage CcrColossus

    Pseudomonas phage KPP10

    Caulobacter phage CcrRogue

    Pseudomonas phage PAK P3 Pseudomonas phage P3 CHA

    GMF Contig 2 (8) CBJ1098101801525_1_814_1

    DTF1061043183606r_811_1_1 GMF1061042924671r_733_1_1

    GMF Contig 22 (3) GMF1061042968032r_1_866_1

    Invertebrate iridescent virus 6

    Caulobacter phage phiCbK

    GMF Contig 25 (3) GMF1061043214682f_1_747_1

    CFB Contig 1 (13) CBJ1098101648625_866_21_1

    DTF1061043129216r_923_1_1

    Invertebrate iridescent virus 3 Aedes taeniorhynchus iridescent virus

    Caulobacter phage CcrKarma

    GMF1061042943894f_1_624_1 GMF1061042957534r_1_794_1

    DTF Contig 3 (4)

    Caulobacter phage CcrSwift Caulobacter phage CcrMagneto

    GMF Contig 9 (6)

    GMF Contig 24 (3) DTF1061042874584r_837_1_1

    DTF1061043202875r_1_816_1

    GMF Contig 49 (2)

    GMF1061039282610r_1_967_1 GMF1061042969816r_1_904_1

    GMF1061042925690f_1_710_1 CFC Contig 1 (6) CFA Contig 12 (2) CFD Contig 3 (5)

    0.3

    Fig. S3. Unrooted maximum likelihood tree with 100 bootstrap replicates of class I alpha (Left) and class II (Right) RNR reference and putative metagenomicOther sequences. Numbers in parentheses following metagenomic contigs indicate the number of reads assembled in each contig. Bacterial references fromthe large tree (Inset) were clustered at 80% identity. Representative sequences were placed on each tree. Numbers in parentheses following bacterial ref-erences indicate the number of reference sequences within that cluster. Scale bar represents amino acid substitutions per site. Bacteria are shown in purple,eukaryotic viruses in orange, myoviruses in red, siphoviruses in blue, podoviruses in green, and metagenomic sequences in black. Black, gray, and white circlesrepresent bootstrap support ≥100%, 75%, and 50%, respectively.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 5 of 10

    www.pnas.org/cgi/content/short/1401322111

  • Fig. S4. Distribution and dynamics of cyanophage populations. ORFs with a top BLASTx hit to a cyanophage were translated, aligned, and clustered at 98%identity. (A) Rank abundance of cyanophage-like RNR clusters in the Chesapeake Bay, Gulf of Maine, and Dry Tortugas. The morphology of the referencephage with the closest homology to sequences in each cluster is identified. Myoviruses are shown in red, siphoviruses in blue, and podoviruses in green. (B)Comparison of RNR distribution frequency in extracted region for peptide cluster analysis and those predicted by normalization of RNR sequences. Gulf ofMaine (GM), Chesapeake Bay (CB), and Dry Tortugas (DT). (C) Dynamics of phage populations by cluster in the Chesapeake Bay time series. Myoviruses areshown in red, siphoviruses in blue, and podoviruses in green.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 6 of 10

    www.pnas.org/cgi/content/short/1401322111

  • Primase/Helicase DNA Polymerase A

    Hypo

    thet

    ical

    Prot

    ein

    HypotheticalProtein

    Methyltransferase

    DUF3

    310

    Exonuclease

    Endo

    nucle

    ase

    I

    Hypo

    thet

    ical

    Prot

    ein

    Ribonucleotide Reductase

    Primase/Helicase

    DNA Polymerase AExonuclease Domain DNA Polymerase A

    Hypo

    thet

    ical

    Prot

    ein

    ThymidylateSynthase

    Hypo

    thet

    ical

    Prot

    ein

    DUF3

    310

    Exonuclease

    Hypo

    thet

    ical

    Prot

    ein

    Hypo

    thet

    ical

    Prot

    ein

    Nucle

    otid

    e

    Pyro

    phos

    phoh

    ydro

    lase

    Ribonucleotide Reductase

    DNA Primase DNA Helicase

    Hypo

    thet

    ical

    Prot

    ein

    Hypo

    thet

    ical

    Prot

    ein

    ThymidylateSynthase

    Hypo

    thet

    ical

    Prot

    ein

    Hypo

    thet

    ical

    Prot

    ein

    DNA Polymerase A Hypothetical Protein Exonuclease

    Hypo

    thet

    ical

    Prot

    ein

    Endo

    nucle

    ase

    I

    Hypo

    thet

    ical

    Prot

    ein

    DUF3

    310

    Hypo

    thet

    ical

    Prot

    ein

    Ribonucleotide Reductase

    500bp

    Myoviridae

    Podoviridae

    Eukaryotic virus

    Non-viral/ No hit

    Contig 5585

    Top BLAST Hit

    ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10

    ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789

    10

    gamma proteobactgerium SCGC AAA160-D02 (1e-165) Vibrio phage CHOED (7e-142)gamma proteobactgerium SCGC AAA160-D02 (2e-157) Podovirus GOM (7e-133)

    Puniceispirillum phage HMO-2011 (8e-80) Puniceispirillum phage HMO-2011 (8e-80)gamma proteobactgerium SCGC AAA160-D02 (2e-06) Vibrio phage CHOED (1e-03)

    Rhizobium leguminosarum (43-31) Tetraselmis viridis virus S20 (1e-18)Puniceispirillum phage HMO-2011 (6e-23) Puniceispirillum phage HMO-2011 (6e-23)

    Celeribacter phage P12053L (1e-73) Celeribacter phage P12053L (1e-73)Puniceispirillum phage HMO-2011 (9e-32) Puniceispirillum phage HMO-2011 (9e-32)

    Rickettsia felis URRWXCal2 (2e-04) N/APuniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)

    ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789

    10111213

    Enterobacteria phage phi92 (2e-06) Enterobacteria phage phi92 (2e-06)gamma proteobactgerium SCGC AAA160-D02 (1e-27) Roseobacter phage SIO1 (6e-26)

    Roseobacter phage SIO1 (7e-77) Roseobacter phage SIO1 (7e-77)Dialister micraerophilus (9e-2) N/A

    Roseobacter phage SIO1 (2e-101) Roseobacter phage SIO1 (2e-101)N/A N/A

    Vibrio cholerae (1e-19) Yersinia phage phiA1122 (9e-18)Celeribacter phage P12053L (3e-50) Celeribacter phage P12053L (3e-50)

    Synechococcus phage S-CRM01 (1e-29) Synechococcus phage S-CRM01 (1e-29)Odoribacter laneus (6e-2) N/A

    N/A N/ACeleribacter phage P12053L (8e-29) Celeribacter phage P12053L (8e-29)Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)

    ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10 ORF11 ORF12 ORF13

    ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10 ORF11 ORF12 ORF13 ORF14 ORF15

    ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789

    101112131415

    Puniceispirillum phage HMO-2011 (4e-103) Puniceispirillum phage HMO-2011 (4e-103)Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)

    Polymorphum gilvum SL003B-26A1 (2e-09) N/ARoseobacter phage SIO1 (6e-97) Roseobacter phage SIO1 (6e-97)

    Celeribacter phage P12053L (1e-09) Celeribacter phage P12053L (1e-09)N/A N/A

    Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)Puniceispirillum phage HMO-2011 (1e-115) Puniceispirillum phage HMO-2011 (1e-115)Puniceispirillum phage HMO-2011 (1e-134) Puniceispirillum phage HMO-2011 (1e-134)

    N/A N/APuniceispirillum phage HMO-2011 (3e-56) Puniceispirillum phage HMO-2011 (3e-56)Puniceispirillum phage HMO-2011 (1e-08) Puniceispirillum phage HMO-2011 (1e-08)

    Olsenella profusa (1e-15) Salicola phage CGphi29 (2e-15)N/A N/A

    Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95)Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95) Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95)

    DNA Primase DNA Polymerase A Exonuclease

    Hypo

    thet

    ical

    Prot

    ein

    Glut

    ared

    oxin

    Endo

    nucle

    ase

    IHy

    poth

    etica

    lPr

    otei

    nHy

    poth

    etica

    lPr

    otei

    n

    Ribonucleotide Reductase Class I alpha

    Ribo

    nucle

    otid

    e Re

    duct

    ase

    Clas

    s I b

    eta

    ORF1ORF2ORF3ORF4ORF5ORF6ORF7ORF8ORF9ORF10

    500bp

    Contig 12643

    ORF # Annotation Homologous SequencesHomologous SequencesHomologous Sequences Top BLAST hit (E value)

    12345678910

    HTVC011P HTVC019P SIO1RNR beta Novosphingobium sp. PP1Y (3e-14)RNR alpha Pelagibacter phage HTVC019P (0)

    Hypothetical protein Clostridium leptum (4e-16)Hypothetical protein NA

    Endonuclease I Celeribacter phage P12053L (1e-27)Glutaredoxin alpha proteobacterium HIMB59 (1e-20)

    Hypothetical protein NAExonuclease Celeribacter phage P12053L (1e-77)

    DNA Polymerase A Roseobacter phage SIO1 (2e-125)DNA Primase Azorhizobium caulinodans ORS 571 (3e-151)

    Class I ‘Other’

    Class II ‘RTPR’

    Contig 8066

    500bp

    Contig 12399

    500bp

    Fig. S5. Predicted ORFs on Rhode River contigs 12643, 5585, 8066, and 12399. These contigs contained class I Other, clade II (contig 12643) and RTPR (contigs5585, 8066, and 12399) RNR sequences. The contigs were assembled from ∼50 million Illumina 2 × 150 bp reads using MetaVelvet. ORFs were predicted usingMetageneAnnotator. Annotations were assigned by consensus BLASTx results. ORFs without hits less than 1e-3 or lacking hits to definitive genes were an-notated as hypothetical protein. ORFs with homology to phage sequences in the Caudovirales were colored by the viral family of the top BLASTx repre-sentatives. Scale bar represents 500 nucleotides.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 7 of 10

    www.pnas.org/cgi/content/short/1401322111

  • 100

    33

    24

    31

    33

    60

    58

    56

    65

    90

    57

    78

    97

    100

    88

    88

    58

    98

    58

    100

    100

    89

    61

    82

    100

    100

    100

    95

    100

    DTF Contig 5 (3); CB (1); DT (25); GM (2)

    Chlamydomonas reinhardtii

    Volvox carteri f. nagariensis

    Rhode River Contig 5585

    GMF Contig 14 (3); DT (2); GM (9)

    Acanthocystis turfacea Chlorella virus 1 (7)

    Rhode River Contig 8066

    CFA Contig 1 (6); CB (21); DT (12); GM (13)

    DTF Contig 15 (2)

    Nitrosococcus halophilus Nc4

    Roseiflexus castenholzii DSM 13941 (4)

    Bacteriophage RM378

    CBJ1098127015314915_1_1

    GMF1061042970635f_1_839_1

    Diplosphaera colitermitum TAV2 (2)

    Salpingoeca sp. ATCC 50818

    Monosiga brevicollis MX1

    GMF1061042661264f_1_935_1

    GMF Contig 3 (8); DT (6); GM (21)

    Roseobacter phage SIO1

    CFA1061053822165r_5_739_1

    Candidatus Protochlamydia amoebophila UWE25

    Thermus phage P23-45

    Thermus phage P74-26

    Alkaliphilus metalliredigens QYMF (2)

    GMF1061042942020f_103_880_1; CB (1); DT (1); GM (2)

    CFA1061053817669f_1_691_1

    Mycobacterium phage Tiger (22)

    Mycobacterium phage Gladiator (16)

    Rhodococcus phage RGL3 (2)

    Ectocarpus siliculosus (9)

    DTF1061043203475r_1_800_1; DT (1); GM (9)

    0.4

    Clade I

    Clade II

    Clade III

    Clade IV

    44 38

    100 97

    95 97

    70 40

    50 100

    100 49

    87

    60

    99

    96 32

    70

    98

    75

    18

    100 84

    100

    67 31

    84 27

    100

    36

    66

    33

    28 70

    100 100

    97

    42

    29 100

    46

    6

    95 99

    7

    23

    98

    23

    38

    73 55

    67

    100

    100

    86

    99

    95

    100

    53 100

    61

    56

    62 43

    57

    51

    100

    80

    99

    45

    20

    71

    100

    100

    98

    100

    Rhode River Contig 12643

    CFA1061053812415f_675_1_1 GMF1061042971560f_1_840_1

    GMF1061043214682f_1_747_1 GMF Contig 25 (3)

    GMF Contig 2 (8)

    CFD Contig 3 (5) CFA Contig 12 (2) CFC Contig 1 (6) GMF1061042925690f_1_710_1 GMF Contig 49 (2)

    GMF Contig 9 (6) DTF Contig 3 (4)

    GMF1061042957534r_1_794_1 GMF1061042943894f_1_624_1

    CBJ1098101801525_1_814_1

    GMF1061042968032r_1_866_1 GMF Contig 22 (3)

    GMF1061042924671r_733_1_1 DTF1061043183606r_811_1_1

    GMF1061042943969r_1_765_1 GMF1061042926426f_648_1_1

    Candidatus Pelagibacter ubique HTCC1002 (27)

    Aedes taeniorhynchus iridescent virus Invertebrate iridescent virus 3

    Invertebrate iridescent virus 6

    CBJ1098101648625_866_21_1 CFB Contig 1 (13)

    GMF1061042969816r_1_904_1 GMF1061039282610r_1_967_1DTF1061043202875r_1_816_1

    DTF1061042874584r_837_1_1 GMF Contig 24 (3)

    DTF1061043129216r_923_1_1

    Caulobacter phage CcrMagneto Caulobacter phage CcrSwift Caulobacter phage CcrKarma

    Caulobacter phage phiCbK Caulobacter phage CcrRogue

    Caulobacter phage CcrColossus

    Pseudomonas phage P3 CHAPseudomonas phage PAK P3Pseudomonas phage KPP10

    CBS1061057326800r_735_1_1

    GMF1061042949966r_830_1_1 Pelagibacter phage HTVC019P

    GMF1061042925962f_784_1_1

    Francisella novicida FTG (6) CFA1061053829314f_1_770_1

    Deftia phage phiW-14

    Rana tigrina ranavirus

    Ambystoma tigrinum stebbensi virus Regina ranavirus

    Soft-shelled turtle iridovirus Frog virus 3

    Singapore grouper iridovirus Grouper iridovirus

    Lymphocystis disease virus 1 Lymphocystis disease virus-isolate China

    GMF Contig 4 (7) Halophage AAJ-2005

    Thermus aquaticus Y51MC23 (7) Cyanophage S-TIM5

    GMF1061042927759r_1_807_1

    GMF1061043245027r_933_1_1

    GMF1061042926741f_1_727_1 GMF Contig 30 (3) GMF1061042968114f_929_1_1

    GMF Contig 42 (2)

    GMF1061043206175r_1_828_1 DTF1061042915561r_791_1_1

    GMF Contig 56 (2) Blattabacterium sp. (Blattella germanica) str. Bge (7)

    CFC Contig 5 (2) Vibrio phage pVp-1

    Aeromonas salmonicida bacteriophage 25 Aeromonas phage phiAS4

    Aeromonas phage 31 Aeromonas phage 44RR2.8t

    Clade I

    Clade II

    Clade III

    0.2

    A. B.Class II ‘RTPR’ Class I ‘Other’

    Fig. S6. Unrooted maximum likelihood trees with 100 bootstrap replicates of Rhode River RNRs from contigs >5 kb. (A) RTPR RNRs from contigs 5585 and 8066(bold) with class II RNR reference and putative metagenomic RTPR sequences. Metagenomic sequences were clustered at 80% identity. Representative met-agenomic sequences were placed on the tree, with the number of reads from each environment within that cluster listed. Bacterial references were clustered at80% identity. Representative sequences were placed on each tree. (B) Rhode River contig 12643 (bold) on the class I alpha Other tree. Numbers in parenthesesfollowing metagenomic contigs indicate the number of reads assembled in each contig. Bacterial references were clustered at 80% identity. Representativesequences were placed on the tree. Numbers in parentheses following bacterial references indicate the number of reference sequences within that cluster. Scalebar represents amino acid substitutions per site. Bacteria are shown in purple, eukaryotic viruses in orange, myoviruses in red, siphoviruses in blue, podoviruses ingreen, and metagenomic sequences in black. Integer values are bootstrap support values. CB, Chesapeake Bay; DT, Dry Tortugas; GM, Gulf of Maine.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 8 of 10

    www.pnas.org/cgi/content/short/1401322111

  • CB (dsDNA) CB (ssDNA) DT (dsDNA) DT (ssDNA)

    18% 48% 16% 25%

    37% 84% 31% 73%

    38% 94% 42% 93%

    Do

    mai

    nd

    sDN

    A v

    iru

    ses

    Po

    do

    viri

    dae

    2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000

    2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000

    2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000

    2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000

    24 33

    50 270

    52 183

    2547

    26225

    CB (dsDNA)CB (ssDNA)DT (dsDNA)DT (ssDNA)

    RNR

    DN

    A P

    ol

    DN

    A p

    rimas

    e/ h

    elic

    ase

    Endo

    nucl

    ease

    RNA

    pol

    ymer

    ase

    A.

    B.

    Fig. S7. Taxonomic distribution and alignment of ssDNA virome sequences. (A) Taxonomic distribution of translated ORFs in Chesapeake Bay and Dry Tor-tugas dsDNA and ssDNA virome libraries. Chesapeake Bay libraries CFA–CFD were combined for taxonomic composition analysis. Podoviral sequences witha top BLASTp hit to known cyanopodoviral sequences were categorized as cyanophage-like. (B) Recruitment of dsDNA and ssDNA virome library reads tocyanopodoviral genomes. Reads from Chesapeake Bay dsDNA libraries CFA–CFD were combined before mapping. Reads were mapped to each genome in-dependently. Maximal coverage values of reads from dsDNA libraries and ssDNA libraries against each genome are listed on the left and right sides of eachplot, respectively. Genomes were aligned with Mauve. Colors on horizontal axes are aligned regions.

    Sakowski et al. www.pnas.org/cgi/content/short/1401322111 9 of 10

    www.pnas.org/cgi/content/short/1401322111

  • Table S1. Frequency of sampled genomes and RNR alpha subunits in cyanophage and pelagiphage populations

    Group No. of predicted RNR genes No. of predicted genomes Predicted RNR frequency, %

    Cyano I 22 16 138Cyano II 63 85 74Pelagiphage

    HTVC008M 4 6HTVC010P — 44HTVC011P — 15HTVC019P 36 11

    Pelagiphage total 40 76 53

    Table S2. Distribution frequencies of RNR alpha subunit sequences among designated groups

    Library

    Groups CBB CBJ CBR CBS CFA CFB CFC CFD CIA DTF DTR DTS GMF

    Nucleic acid type dsDNA dsDNA RNA ssDNA dsDNA dsDNA dsDNA dsDNA dsDNA dsDNA RNA ssDNA dsDNACyanoClass I 25% (5) 16% (4) — — 4% (1) 5% (1) 3% (