microbial phylogenomics (eve161) class 17: genomes from uncultured

88
Paper Analysis

Upload: jonathan-eisen

Post on 11-Apr-2017

318 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Paper Analysis

Page 2: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 3: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 4: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Cover Page

• Jonathan Eisen

• Unusual biology across a group comprising more than 15% of domain Bacteria.

• For EVE161 Winter 2016

Page 5: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Introduction

•Provide an introduction and background to your topic and analysis.

•Why do you think this paper is interesting? •How does it relate to the class? •Why did the authors conduct this study? •Why is it important? •What questions does this study address? •What hypotheses did the authors provide (if any)?

Page 6: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Analysis of Paper Sections

• Methods

• Results

• Discussion and Conclusion

Page 7: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Methods• Provide an overview of the Methods used in the paper (not fine

scale detail, just a general summary of what was done)

• Are all the methods used presented in a clear and comprehensive manner? Was it easy to tell what the authors did and why?

• Were any new methods presented?

• How do these methods address the questions and hypotheses of the study?

• Include a discussion of at least one strength and one weakness of the Methods of the paper.

Page 8: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Results• Provide an overview of the Results of the paper.

• Highlight at least one Result and discuss it in more detail.

• Are all the Results presented in a clear and comprehensive manner?

• Include a discussion of at least one strength and one weakness of the Results of the paper.

Page 9: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Discussion

• Provide an overview of the Discussion and Conclusions of the paper

• Discuss one strength and one weakness of the Discussion and Conclusions of the paper.

Page 10: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Overview• Discuss overall strengths and weaknesses of the

paper?

• What would you do differently?

• How does the paper compare to other work?

• What would you do next to follow up on this study?

• Do you think the paper is important?

Page 11: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

What was Sampled?

Page 12: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

What was Sampled?

Page 13: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 14: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

How to Sample?

• Sample “natural” system or modified system?

Page 15: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 16: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 17: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Sampling?

Page 18: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

What Obtained from Samples?

• What did they collect from the samples and why?

Page 19: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

What Obtained from Samples?

Page 20: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on What Done with Samples?

Page 21: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Binning

• Why bin metagenomies?

• How did they bin metagenomes?

• How did they test their bins?

Page 22: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Binning

Page 23: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 24: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

How do ESOMs work?

Page 25: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 26: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 27: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 28: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Self Organized MapsTo analyze the distribution of genome signatures among and between populations, all contigs and assembled genomes were fragmented into 5-kb pieces, then pooled and clustered by self-organizing map (SOM) [58] based on tetranucleotide frequency distributions (Figure 1; see Materials and methods for details). The SOM is an unsupervised neural network algorithm that clusters multidimensional data and represents it on a two-dimensional map. SOMs of tetranucleotide frequencies have been used previously to successfully bin sequence fragments from isolate genomes [33, 59] and some environmental samples [46, 48, 52]. We utilized an implementation of the SOM, emergent SOM (ESOM), which is distinguished by its use of large borderless maps (for example, thousands of neurons) and visualization of underlying distance structure with background topography [60]. This visualization, where map 'elevation' represents the distance in tetranucleotide frequency between data points, is referred to as the U-Matrix [60]. Thus, genomic clusters were visualized not only by the cohesive clustering of fragments from each genome, but also by distance structure whereby barriers between clusters represent the large differences in genome signatures between genomes relative to those within genomes (Figure 3). This visualization of genomic clustering was used to evaluate the accuracy of the binning based on assembled genomes and to identify novel regions of sequence signature space.

Page 29: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

ESOMsContigs were clustered by tetranucleotide frequency utilizing Databionics ESOM Tools [94]. The input for tetra-ESOM was a 136-dimensional vector (representing the frequencies of the 136 unique reverse complement tetranucleotide pairs, normalized for contig length) for each contig/window. These raw frequencies were transformed with the 'Robust ZT' option built into Databionics ESOM Tools, which normalizes the data using robust estimates of mean and variance. Data were permuted before each run to avoid errors due to sampling order. Maps were toroidal (borderless) with Euclidean grid distance and dimensions scaled from the default map size (50 × 82) as a function of the number of data points, to a ratio of approximately 5.5 map nodes per data point. For example, a typical clustering with approximately 7,500 data points was run on map with dimensions 155 × 255. Training was conducted with the K-Batch algorithm (k = 0.15%) for 20 training epochs. The standard best match search method was used with local best match search radius of 8. Other training parameters were as follows: Gaussian weight initialization method; Euclidean data space function; starting value for training radius of 50 with linear cooling to 1; starting value for learning rate of 0.5 with linear cooling to 0.1; Gaussian kernel function.

Page 30: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Binning?

Page 31: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Binning

Page 32: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR

Page 33: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

How did they name new phyla?

Page 34: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR

Page 35: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR

Page 36: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 37: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR Genomes

Page 38: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Phyla and Genomes?

Page 39: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR Size

• Why does size matter here?

Page 40: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR are Small

Page 41: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Size?

Page 42: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Unusual rRNAs 1

• Why so much focus on unusual rRNAs?

Page 43: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Unusual rRNAs 1

Page 44: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Insertions Diverse

Page 45: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 46: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Unusual rRNAs 1

Page 47: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Insertion Sites

Page 48: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Unusual rRNAs 2

Page 49: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Insertion Locations by Phyla

Page 50: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 51: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Unusual rRNAs 2

Page 52: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

rRNA Copy = 1 in CPR

Page 53: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on rRNAs?

Page 54: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Metatranscriptomes and rRNAs

• What is metatranscriptomics?

• Why do it?

Page 55: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Metatranscriptomes and rRNAs

Page 56: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Page 57: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Metatranscriptomes?

Page 58: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

rRNA insertions

Page 59: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

23S rRNA insertions

Page 60: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Metagenomics and rRNA

• Why examine rRNAs in metagenomic data not PCR data?

• What can you learn from this?

Page 61: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Metagenomics and rRNA

Page 62: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Universal Primers?a, b, PrimerProspector was used to assess the ability of primers 515F and 806R to bind a non-redundant set of assembled near-complete 16S rRNA gene sequences (clustered at 97% sequence identity). The percentage of sequences that would be amplified by these primers is shown on the left axis, the total number of sequences analysed is on the top of each bar, and the number of sequences these primers would not bind to is indicated by the shading. Many assembled groundwater-associated 16S rRNA gene sequences would evade amplification by PCR primers 515F and 806R. Results of the analysis are shown at the domain (a) and superphylum or phylum (b) levels.

Page 63: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Primers?

Page 64: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Phylogeny

• How infer phylogeny of CPR?

Page 65: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Phylogeny

Page 66: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

CPR

Page 67: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Phylogeny?

Page 68: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Phyla

• How many phyla are there?

• Why does it matter?

• How determine # of phyla

Page 69: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Phyla

Page 70: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Phyla?

Page 71: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Novel Ribosomes

• Why examine ribosomes (beyond rRNA) in these organisms?

• What do the findings mean?

Page 72: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Novel Ribosomes

Page 73: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Assembled genomes were analysed using ggKbase (Supplementary Data 4). Shown here is a non-redundant set of complete and near-complete genomes (≥75% of single copy genes, ≤1.125 copies) organized based on a subset of a maximum-likelihood 16S rRNA gene phylogeny (Supplementary Fig. 1). CPR organisms have partial tricarboxylic acid (TCA) cycles and lack electron transport chain (ETC) complexes. In addition, they have incomplete biosynthetic pathways for nucleotides and amino acids. The Peregrinibacteria are a notable exception to some of these limitations. Several Parcubacteria exhibit a complete ubiquinol (cytochrome bo) oxidase operon, as previously seen in Saccharibacteria3. However, lack of NADH dehydrogenase and other ETC components suggests that this enzyme is involved in oxygen scavenging/detoxification rather than energy production. AA Syn., amino acid synthesis; PP, pentose phosphate pathway.

Page 74: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Novel Ribosomes 2

Page 75: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Ribosomes?

Page 76: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Novel Metabolism

• Why examine metabolism in these organisms?

Page 77: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Novel Metabolism

Page 78: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Assembled genomes were analysed using ggKbase (Supplementary Data 4). Shown here is a non-redundant set of complete and near-complete genomes (≥75% of single copy genes, ≤1.125 copies) organized based on a subset of a maximum-likelihood 16S rRNA gene phylogeny (Supplementary Fig. 1). CPR organisms have partial tricarboxylic acid (TCA) cycles and lack electron transport chain (ETC) complexes. In addition, they have incomplete biosynthetic pathways for nucleotides and amino acids. The Peregrinibacteria are a notable exception to some of these limitations. Several Parcubacteria exhibit a complete ubiquinol (cytochrome bo) oxidase operon, as previously seen in Saccharibacteria3. However, lack of NADH dehydrogenase and other ETC components suggests that this enzyme is involved in oxygen scavenging/detoxification rather than energy production. AA Syn., amino acid synthesis; PP, pentose phosphate pathway.

Page 79: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Comments on Metabolism?

Page 80: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Paper Analysis

Page 81: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

#81

Environmental Genome ShotgunSequencing of the Sargasso SeaJ. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3

Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3

Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3

Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6

Michael W. Lomas,6 Ken Nealson,5 Owen White,3

Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6

Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4

Hamilton O. Smith1

Wehave applied “whole-genome shotgun sequencing” tomicrobial populationscollected enmasse on tangential flow and impact filters from seawater samplescollected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairsof nonredundant sequencewas generated, annotated, and analyzed to elucidatethe gene content, diversity, and relative abundance of the organisms withinthese environmental samples. These data are estimated to derive from at least1800 genomic species based on sequence relatedness, including 148 previouslyunknown bacterial phylotypes. We have identified over 1.2 million previouslyunknown genes represented in these samples, including more than 782 newrhodopsin-like photoreceptors. Variation in species present and stoichiometrysuggests substantial oceanic microbial diversity.

Microorganisms are responsible for most of thebiogeochemical cycles that shape the environ-ment of Earth and its oceans. Yet, these organ-isms are the least well understood on Earth, asthe ability to study and understand the metabol-ic potential of microorganisms has been ham-pered by the inability to generate pure cultures.Recent studies have begun to explore environ-mental bacteria in a culture-independent man-ner by isolating DNA from environmental sam-ples and transforming it into large insert clones.For example, a previously unknown light-drivenproton pump, proteorhodopsin, was discoveredwithin a bacterial artificial chromosome (BAC)from the genome of a SAR86 ribotype (1), andsoil microbial DNA libraries have been construct-ed and screened for specific activities (2).

Here we have applied whole-genome shot-gun sequencing to environmental-pooled DNAsamples to test whether new genomic approach-es can be effectively applied to gene and spe-cies discovery and to overall environmental

characterization. To help ensure a tractable pilotstudy, we sampled in the Sargasso Sea, a nutrient-limited, open ocean environment. Further, weconcentrated on the genetic material captured onfilters sized to isolate primarily microbial inhabit-ants of the environment, leaving detailed analysisof dissolved DNA and viral particles on one endof the size spectrum and eukaryotic inhabitants onthe other, for subsequent studies.The Sargasso Sea. The northwest Sar-

gasso Sea, at the Bermuda Atlantic Time-seriesStudy site (BATS), is one of the best-studiedand arguably most well-characterized regionsof the global ocean. The Gulf Stream representsthe western and northern boundaries of thisregion and provides a strong physical boundary,separating the low nutrient, oligotrophic openocean from the more nutrient-rich waters of theU.S. continental shelf. The Sargasso Sea hasbeen intensively studied as part of the 50-yeartime series of ocean physics and biogeochem-istry (3, 4) and provides an opportunity forinterpretation of environmental genomic data inan oceanographic context. In this region, for-mation of subtropical mode water occurs eachwinter as the passage of cold fronts across theregion erodes the seasonal thermocline andcauses convective mixing, resulting in mixedlayers of 150 to 300 m depth. The introductionof nutrient-rich deep water, following thebreakdown of seasonal thermoclines into thebrightly lit surface waters, leads to the bloom-ing of single cell phytoplankton, including twocyanobacteria species, Synechococcus and Pro-

chlorococcus, that numerically dominate thephotosynthetic biomass in the Sargasso Sea.

Surface water samples (170 to 200 liters)were collected aboard the RV Weatherbird IIfrom three sites off the coast of Bermuda inFebruary 2003. Additional samples were col-lected aboard the SV Sorcerer II from “Hydro-station S” in May 2003. Sample site locationsare indicated on Fig. 1 and described in tableS1; sampling protocols were fine-tuned fromone expedition to the next (5). Genomic DNAwas extracted from filters of 0.1 to 3.0 !m, andgenomic libraries with insert sizes ranging from2 to 6 kb were made as described (5). Theprepared plasmid clones were sequenced fromboth ends to provide paired-end reads at the J.Craig Venter Science Foundation Joint Tech-nology Center on ABI 3730XL DNA sequenc-ers (Applied Biosystems, Foster City, CA).Whole-genome random shotgun sequencing ofthe Weatherbird II samples (table S1, samples 1 to4) produced 1.66 million reads averaging 818 bpin length, for a total of approximately 1.36 Gbp ofmicrobial DNA sequence. An additional 325,561sequences were generated from the Sorcerer IIsamples (table S1, samples 5 to 7), yielding ap-proximately 265 Mbp of DNA sequence.Environmental genome shotgun as-

sembly. Whole-genome shotgun sequencingprojects have traditionally been applied to iden-tify the genome sequence(s) from one particularorganism, whereas the approach taken here isintended to capture representative sequencefrom many diverse organisms simultaneously.Variation in genome size and relative abun-dance determines the depth of coverage of anyparticular organism in the sample at a givenlevel of sequencing and has strong implicationsfor both the application of assembly algorithmsand for the metrics used in evaluating the re-sulting assembly. Although we would expectabundant species to be deeply covered and wellassembled, species of lower abundance may berepresented by only a few sequences. For asingle genome analysis, assembly coveragedepth in unique regions should approximate aPoisson distribution. The mean of this distribu-tion can be estimated from the observed data,looking at the depth of coverage of contigsgenerated before any scaffolding. The assem-bler used in this study, the Celera Assembler(6), uses this value to heuristically identifyclearly unique regions to form the backbone ofthe final assembly within the scaffolding phase.However, when the starting material consists ofa mixture of genomes of varying abundance, athreshold estimated in this way would classifysamples from the most abundant organism(s) asrepetitive, due to their greater-than-averagedepth of coverage, paradoxically leaving themost abundant organisms poorly assembled.We therefore used manual curation of an initial

1The Institute for Biological Energy Alternatives, 2TheCenter for the Advancement of Genomics, 1901 Re-search Boulevard, Rockville, MD 20850, USA. 3TheInstitute for Genomic Research, 9712 Medical CenterDrive, Rockville, MD 20850, USA. 4The J. Craig VenterScience Foundation Joint Technology Center, 5 Re-search Place, Rockville, MD 20850, USA. 5University ofSouthern California, 223 Science Hall, Los Angeles, CA90089–0740, USA. 6Bermuda Biological Station forResearch, Inc., 17 Biological Lane, St George GE 01,Bermuda.

*To whom correspondence should be addressed. E-mail: [email protected]

RESEARCH ARTICLE

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org66

Page 82: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Cover Page

• Jonathan Eisen

• Environmental Genome Shotgun Sequencing of the Sargasso Sea .

• For EVE161 Winter 2016

Page 83: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Introduction

•Provide an introduction and background to your topic and analysis.

•Why do you think this paper is interesting? •How does it relate to the class? •Why did the authors conduct this study? •Why is it important? •What questions does this study address? •What hypotheses did the authors provide (if any)?

Page 84: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Analysis of Paper Sections

• Methods

• Results

• Discussion and Conclusion

Page 85: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Methods• Provide an overview of the Methods used in the paper (not fine

scale detail, just a general summary of what was done)

• Are all the methods used presented in a clear and comprehensive manner? Was it easy to tell what the authors did and why?

• Were any new methods presented?

• How do these methods address the questions and hypotheses of the study?

• Include a discussion of at least one strength and one weakness of the Methods of the paper.

Page 86: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Results• Provide an overview of the Results of the paper.

• Highlight at least one Result and discuss it in more detail.

• Are all the Results presented in a clear and comprehensive manner?

• Include a discussion of at least one strength and one weakness of the Results of the paper.

Page 87: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Discussion

• Provide an overview of the Discussion and Conclusions of the paper

• Discuss one strength and one weakness of the Discussion and Conclusions of the paper.

Page 88: Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured

Overview• Discuss overall strengths and weaknesses of the

paper?

• What would you do differently?

• How does the paper compare to other work?

• What would you do next to follow up on this study?

• Do you think the paper is important?