metagenome analysis and draft genome reconstruction of … · 2016. 10. 3. · importantly, a draft...

1
Research & Innovation Center Science & Engineering To Power Our Future Metagenome Analysis and Draft Genome Reconstruction of Produced Water Samples from Coalbed Methane Environments Daniel E. Ross, 1,2 * Djuna Gulliver 1 1 National Energy Technology Laboratory, U.S. Department of Energy, 2 AECOM, (*[email protected]) Abstract Biogasification is a process that utilizes the microbial community native to coalbeds to naturally convert currently unusable coal into readily available methane. One methodology involves injection of nutrients into the coal seams to stimulate biogenic coal degradation and methanogenesis. Identification of major functional pathways of biogenic coal degradation and subsequent methane production will lead to a better understanding of the coal-to-methane conversion, the microorganisms responsible for this conversion, and the nutrients required to bolster this conversion in situ. This study examines the metagenome of four produced water samples from the Central Appalachian Basin (Pocahontas 3 coal seam) to determine the composition (who’s there) and the potential functional pathways (what can they do) of the resident microbial community. Nucleic acid was recovered from produced water samples using DNA isolation techniques and the quality and quantity of DNA was assessed. Illumina MiSeq next generation sequencing was employed, and the resultant nucleic acid sequence data was processed using a suite of bioinformatics software. Four metagenomes, named K34, K35, BB137, and L32A were obtained from produced water samples from a depth of 1704 ft, 1912 ft, 1980 ft, and 2578 ft, respectively. Methanogens were present in all samples, suggesting methanogenesis can occur. Furthermore, hydrocarbon degradation pathways were found, suggesting a route for biodegradation of coal. Importantly, a draft genome most closely related to Pseudomonas stutzeri CCUG was extracted from the K35 metagenome. Initial analysis of the draft genome revealed a complete nitrogen fixation pathway, and a naphthalene degradation pathway. Methods: Metagenome Analysis Pipeline (MAP) Metagenome Analysis Genome Analysis 1. Sample Collection and Preservation 2. Nucleic Acid extraction 3. Nucleic Acid Analysis 3.1 16S rRNA gene analysis 3.2 Metagenome analysis 3.1.1 16S amplification and sequencing 3.2.1 DNA sequencing 3.2.2 Binning and Assembly 3.1.3 16S clustering and annotation 3.2.3 Annotation Quality assessment | DNA Library preparation metadata Store aliquots for later use Store aliquots for later use PAL analysis Enrichment cultures Sequencing to be done at NETL-PIT in Gulliver Lab (84-223) and samples sent to Earth Microbiome Project (EMP) Samples to be collected by drillers, mailed to NETL-MGN. Bioinformatics on Linux in Gaia Lab at NETL-PIT K35 1912 ft. BB137 1980 ft. K34 1704 ft. L32A 2578 ft. DISCLAIMER: This project was funded by the Department of Energy, National Energy Technology Laboratory, an agency of the United States Government, through a support contract with AECOM. Neither the United States Government nor any agency thereof, nor any of their employees, nor AECOM, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. REFERENCES: 1. Milici, R.C., and Polyak, D.E., 2014, Coalbed-methane production in the Appalachian basin, chap. G.2 of Ruppert, L.F., and Ryder, R.T., eds., Coal and petroleum resources in the Appalachian basin; Distribution, geologic framework, and geochemical character: U.S. Geological Survey Professional Paper 1708, 25 p., http://dx.doi.org/10.3133/pp1708G.2. (Chapter G.2 supersedes USGS Open-File Report 02–105.) 2. Jones EJP, Voytek MA, Corum MD, and Orem WH (2010). Stimulation of Methane Generation from Nonproductive Coal by Addition of Nutrients or a Microbial Consortium 76(21):7013-7022. K35 K34 L32A BB137 Pseudomonas Methanogens Methanogens Methanogens Marine isolates Soil/sludge isolates Pseudomonas stutzeri K35 Nitrate Nitrite Dinitrogen oxide Dinitrogen Nitrite Ammonia Fd Fd Sugars Glucose-6-P 2-keto-3-deoxy-6-P-gluconate Glyceraldehyde- 3-phosphate Pyruvate Glyceraldehyde- 3-phosphate Acetyl-CoA TCA Fructose- 6-P ED Pathway Fructose- 1,6-2P D-Xylulose-5P pro, arg ser, gly, cys DHAP purines, pyrimidines his FAD, folate, riboflavin Goals: Investigate the microbial community in potential biogasification sites Characterize relevant functional pathways found in coal systems required for coal-to-methane bioconversion Construct draft genomes of abundant microorganisms in coal systems to complete a detailed characterization of prevalent functional potential Metagenome Assembly Taxonomic assessment of metagenome Metagenome Results. Four metagenomes were analyzed and classified taxonomically. Generally, all samples contained Bacteria and Archaea, mostly comprised of Proteobacteria and Euryarchaeota. With the exception of K35, all metagenomes were dominated by Methanogens from the order Euryarchaeota. Short DNA sequencing reads (250 bp) were assembled into longer contigs. The contigs were binned according to genomic signature. Each bin was individually analyzed for genome completeness by comparing contigs to a reference marker gene set. Based upon the presence or absence of these marker genes, the % genome completeness was estimated. DNA isolation and sequencing Metagenome Assembly Metagenome Binning Metagenome Metagenome contigs Metagenome reads Binned contigs Genome bins Mapping binned contigs to reference genomes Jones et al., 2010 ACKNOWLEDGEMENT: This technical effort was performed in support of the National Energy Technology Laboratory’s ongoing research under the RES contract DEFE0004000. Genome Results. The K35 metagenome was estimated to be ~50% Pseudomonas. After careful contig binning and genome mapping, the Pseudomonas genome bin was 99.2% complete, estimated by the presence/absence of 833 marker genes. The pan-genome was determined and the core genome (genes common across all strains tested) was estimated. The draft genome encodes for a complete nitrification pathway as well as the upper and lower naphthalene degradation pathways. The work presented here represents an initial metagenomic/genomic approach to functional characterization of coalbed methane microbial communities. The first step in the metagenome analysis pipeline (MAP) involves careful and calculated sample collection. Samples are collected by drillers, or when possible on site by NETL researchers. Importantly, to preserve sample integrity and prevent nucleic acid degradation, samples are immediately aliquoted and frozen. After transport from the field, samples are thawed and processed for DNA extraction. The quality and quantity of DNA is assessed before preparing samples for sequencing. The second step involves processing DNA samples to generate a sequence library to be loaded into the sequencer (Illumina MiSeq). Processing involves cleaning, barcoding, and pooling DNA samples. The third MAP step is the most time- consuming and computationally intensive. Here, data that is retrieved from the sequencer is processed and analyzed. Processing involves removing barcodes and trimming reads based upon the quality score (a measure of the confidence of each base call). Analysis involves metagenome assembly, binning, and annotation. Pan-genome Analysis Contig mapping to reference genome sequence Whole genome comparisons Overview of functional potential Future work. Results presented here will provide a framework for tailoring nutrient amendments for microbial enhanced coalbed methane and provide a baseline for monitoring changes in the microbial community during amendments. Clinical isolates Sequence reads were assembled and contig length was plotted against contig number. A steeper slope represents a better assembly. Metagenome Binning Contigs from the K35 metagenome assembly were binned. Each dot represents a contig and each cluster represents a potential genome. Contigs from the Pseudomonas genome bin from the K35 metagenome (bottom) were aligned to the complete genome sequence of Pseudomonas stutzeri CCUG 29243 (top). Red lines demarcate contig boundary. Pan-genome tree comparing the draft genome of P. stutzeri K35 to 31 complete and draft Pseudomonas genomes. The tree represents the degree of similarity between the predicted proteins encoded by each genome. Pan-genome of 32 Pseudomonas stutzeri strains. Each bar represents the number of gene clusters found in 1 to 32 strains examined. Milici and Polyak, 2014

Upload: others

Post on 29-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metagenome Analysis and Draft Genome Reconstruction of … · 2016. 10. 3. · Importantly, a draft genome most closely related to. Pseudomonas stutzeri. CCUG was extracted from the

Research & Innovation Center

Science & Engineering To Power Our Future

Metagenome Analysis and Draft Genome Reconstruction of Produced Water Samples from Coalbed Methane EnvironmentsDaniel E. Ross,1,2* Djuna Gulliver1

1National Energy Technology Laboratory, U.S. Department of Energy, 2AECOM, (*[email protected])

AbstractBiogasification is a process that utilizes the microbialcommunity native to coalbeds to naturally convertcurrently unusable coal into readily available methane.One methodology involves injection of nutrients intothe coal seams to stimulate biogenic coal degradationand methanogenesis. Identification of major functionalpathways of biogenic coal degradation and subsequentmethane production will lead to a better understandingof the coal-to-methane conversion, the microorganismsresponsible for this conversion, and the nutrientsrequired to bolster this conversion in situ. This studyexamines the metagenome of four produced watersamples from the Central Appalachian Basin(Pocahontas 3 coal seam) to determine thecomposition (who’s there) and the potential functionalpathways (what can they do) of the resident microbialcommunity. Nucleic acid was recovered from producedwater samples using DNA isolation techniques and thequality and quantity of DNA was assessed. IlluminaMiSeq next generation sequencing was employed, andthe resultant nucleic acid sequence data wasprocessed using a suite of bioinformatics software.Four metagenomes, named K34, K35, BB137, andL32A were obtained from produced water samplesfrom a depth of 1704 ft, 1912 ft, 1980 ft, and 2578 ft,respectively. Methanogens were present in allsamples, suggesting methanogenesis can occur.Furthermore, hydrocarbon degradation pathways werefound, suggesting a route for biodegradation of coal.Importantly, a draft genome most closely related toPseudomonas stutzeri CCUG was extracted from theK35 metagenome. Initial analysis of the draft genomerevealed a complete nitrogen fixation pathway, and anaphthalene degradation pathway.

Methods: Metagenome Analysis Pipeline (MAP)

Metagenome Analysis Genome Analysis

1. Sample Collection and Preservation

2. Nucleic Acid extraction

3. Nucleic Acid Analysis

3.1 16S rRNA gene analysis

3.2 Metagenome analysis

3.1.1 16S amplification and

sequencing

3.2.1 DNA sequencing

3.2.2 Binning and Assembly

3.1.3 16S clustering and

annotation 3.2.3 Annotation

Quality assessment | DNA Library preparation

metadata

Store aliquots for later use

Store aliquots for later use

PAL analysis

Enrichment cultures

Sequencing to be done at NETL-PIT in Gulliver Lab (84-223) and samples sent to Earth Microbiome Project (EMP)

Samples to be collected by drillers, mailed to NETL-MGN.

Bioinformatics on Linux in Gaia Lab at NETL-PIT

K351912 ft.

BB1371980 ft.

K341704 ft.

L32A2578 ft.

DISCLAIMER: This project was funded by the Department of Energy, National Energy Technology Laboratory, an agency of the United States Government, through asupport contract with AECOM. Neither the United States Government nor any agency thereof, nor any of their employees, nor AECOM, nor any of their employees,makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, orservice by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the UnitedStates Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United StatesGovernment or any agency thereof.

REFERENCES:1. Milici, R.C., and Polyak, D.E., 2014, Coalbed-methane production in the Appalachian basin, chap. G.2

of Ruppert, L.F., and Ryder, R.T., eds., Coal and petroleum resources in the Appalachian basin;Distribution, geologic framework, and geochemical character: U.S. Geological Survey ProfessionalPaper 1708, 25 p., http://dx.doi.org/10.3133/pp1708G.2. (Chapter G.2 supersedes USGS Open-FileReport 02–105.)

2. Jones EJP, Voytek MA, Corum MD, and Orem WH (2010). Stimulation of Methane Generation fromNonproductive Coal by Addition of Nutrients or a Microbial Consortium 76(21):7013-7022.

K35 K34

L32A

BB137

Pseudomonas

Methanogens

Methanogens

Methanogens

Marine isolates

Soil/sludgeisolates

Pseudomonas stutzeri K35

NitrateNitrite

Dinitrogen oxide

Dinitrogen

Nitrite

Ammonia

Fd

Fd

Sugars Glucose-6-P

2-keto-3-deoxy-6-P-gluconate

Glyceraldehyde-3-phosphate

Pyruvate

Glyceraldehyde-3-phosphate

Acetyl-CoA

TCA

Fructose-6-P

ED Pathway

Fructose-1,6-2P

D-Xylulose-5P

pro, arg

ser, gly, cys

DHAP

purines, pyrimidines his FAD, folate,

riboflavin

Goals:• Investigate the microbial community in potential

biogasification sites• Characterize relevant functional pathways found in

coal systems required for coal-to-methanebioconversion

• Construct draft genomes of abundantmicroorganisms in coal systems to complete adetailed characterization of prevalent functionalpotential

Metagenome AssemblyTaxonomic assessment of metagenome

Metagenome Results. Four metagenomes wereanalyzed and classified taxonomically. Generally,all samples contained Bacteria and Archaea,mostly comprised of Proteobacteria andEuryarchaeota. With the exception of K35, allmetagenomes were dominated by Methanogensfrom the order Euryarchaeota. Short DNAsequencing reads (250 bp) were assembled intolonger contigs. The contigs were binned accordingto genomic signature. Each bin was individuallyanalyzed for genome completeness by comparingcontigs to a reference marker gene set. Basedupon the presence or absence of these markergenes, the % genome completeness wasestimated.

DNA isolation and sequencing

Metagenome Assembly

Metagenome Binning

Metagenome Metagenome contigs

Metagenome reads Binned contigs

Genome bins

Mapping binned contigsto reference genomes

Jones et al., 2010

ACKNOWLEDGEMENT: This technical effort was performed in support of the National Energy Technology Laboratory’s ongoing research under the RES contractDE‐FE0004000.

Genome Results. The K35metagenome was estimated to be~50% Pseudomonas. After carefulcontig binning and genomemapping, the Pseudomonasgenome bin was 99.2% complete,estimated by thepresence/absence of 833 markergenes. The pan-genome wasdetermined and the core genome(genes common across all strainstested) was estimated. The draftgenome encodes for a completenitrification pathway as well as theupper and lower naphthalenedegradation pathways. The workpresented here represents aninitial metagenomic/genomicapproach to functionalcharacterization of coalbedmethane microbial communities.

The first step in the metagenome analysispipeline (MAP) involves careful andcalculated sample collection. Samples arecollected by drillers, or when possible onsite by NETL researchers. Importantly, topreserve sample integrity and preventnucleic acid degradation, samples areimmediately aliquoted and frozen. Aftertransport from the field, samples are thawedand processed for DNA extraction. Thequality and quantity of DNA is assessedbefore preparing samples for sequencing.

The second step involves processing DNAsamples to generate a sequence library tobe loaded into the sequencer (IlluminaMiSeq). Processing involves cleaning,barcoding, and pooling DNA samples.

The third MAP step is the most time-consuming and computationally intensive.Here, data that is retrieved from thesequencer is processed and analyzed.Processing involves removing barcodes andtrimming reads based upon the qualityscore (a measure of the confidence of eachbase call). Analysis involves metagenomeassembly, binning, and annotation.

Pan-genome Analysis

Contig mapping to reference genome sequence

Whole genome comparisons

Overview of functional potential

Future work. Results presentedhere will provide a framework fortailoring nutrient amendments formicrobial enhanced coalbedmethane and provide a baselinefor monitoring changes in themicrobial community duringamendments.

Clinical isolates

Sequence reads were assembled and contig length was plottedagainst contig number. A steeper slope represents a betterassembly.

Metagenome Binning

Contigs from the K35 metagenome assembly were binned. Eachdot represents a contig and each cluster represents a potentialgenome.

Contigs from the Pseudomonas genome bin from the K35 metagenome (bottom) were aligned to the complete genome sequenceof Pseudomonas stutzeri CCUG 29243 (top). Red lines demarcate contig boundary.

Pan-genome tree comparing the draft genome of P. stutzeri K35 to 31 complete and draft Pseudomonas genomes. The treerepresents the degree of similarity between the predicted proteins encoded by each genome.

Pan-genome of 32 Pseudomonas stutzeristrains. Each bar represents the number ofgene clusters found in 1 to 32 strainsexamined.

Milici and Polyak, 2014