metagenome analysis and draft genome reconstruction of … · 2016. 10. 3. · importantly, a draft...
TRANSCRIPT
Research & Innovation Center
Science & Engineering To Power Our Future
Metagenome Analysis and Draft Genome Reconstruction of Produced Water Samples from Coalbed Methane EnvironmentsDaniel E. Ross,1,2* Djuna Gulliver1
1National Energy Technology Laboratory, U.S. Department of Energy, 2AECOM, (*[email protected])
AbstractBiogasification is a process that utilizes the microbialcommunity native to coalbeds to naturally convertcurrently unusable coal into readily available methane.One methodology involves injection of nutrients intothe coal seams to stimulate biogenic coal degradationand methanogenesis. Identification of major functionalpathways of biogenic coal degradation and subsequentmethane production will lead to a better understandingof the coal-to-methane conversion, the microorganismsresponsible for this conversion, and the nutrientsrequired to bolster this conversion in situ. This studyexamines the metagenome of four produced watersamples from the Central Appalachian Basin(Pocahontas 3 coal seam) to determine thecomposition (who’s there) and the potential functionalpathways (what can they do) of the resident microbialcommunity. Nucleic acid was recovered from producedwater samples using DNA isolation techniques and thequality and quantity of DNA was assessed. IlluminaMiSeq next generation sequencing was employed, andthe resultant nucleic acid sequence data wasprocessed using a suite of bioinformatics software.Four metagenomes, named K34, K35, BB137, andL32A were obtained from produced water samplesfrom a depth of 1704 ft, 1912 ft, 1980 ft, and 2578 ft,respectively. Methanogens were present in allsamples, suggesting methanogenesis can occur.Furthermore, hydrocarbon degradation pathways werefound, suggesting a route for biodegradation of coal.Importantly, a draft genome most closely related toPseudomonas stutzeri CCUG was extracted from theK35 metagenome. Initial analysis of the draft genomerevealed a complete nitrogen fixation pathway, and anaphthalene degradation pathway.
Methods: Metagenome Analysis Pipeline (MAP)
Metagenome Analysis Genome Analysis
1. Sample Collection and Preservation
2. Nucleic Acid extraction
3. Nucleic Acid Analysis
3.1 16S rRNA gene analysis
3.2 Metagenome analysis
3.1.1 16S amplification and
sequencing
3.2.1 DNA sequencing
3.2.2 Binning and Assembly
3.1.3 16S clustering and
annotation 3.2.3 Annotation
Quality assessment | DNA Library preparation
metadata
Store aliquots for later use
Store aliquots for later use
PAL analysis
Enrichment cultures
Sequencing to be done at NETL-PIT in Gulliver Lab (84-223) and samples sent to Earth Microbiome Project (EMP)
Samples to be collected by drillers, mailed to NETL-MGN.
Bioinformatics on Linux in Gaia Lab at NETL-PIT
K351912 ft.
BB1371980 ft.
K341704 ft.
L32A2578 ft.
DISCLAIMER: This project was funded by the Department of Energy, National Energy Technology Laboratory, an agency of the United States Government, through asupport contract with AECOM. Neither the United States Government nor any agency thereof, nor any of their employees, nor AECOM, nor any of their employees,makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, orservice by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the UnitedStates Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United StatesGovernment or any agency thereof.
REFERENCES:1. Milici, R.C., and Polyak, D.E., 2014, Coalbed-methane production in the Appalachian basin, chap. G.2
of Ruppert, L.F., and Ryder, R.T., eds., Coal and petroleum resources in the Appalachian basin;Distribution, geologic framework, and geochemical character: U.S. Geological Survey ProfessionalPaper 1708, 25 p., http://dx.doi.org/10.3133/pp1708G.2. (Chapter G.2 supersedes USGS Open-FileReport 02–105.)
2. Jones EJP, Voytek MA, Corum MD, and Orem WH (2010). Stimulation of Methane Generation fromNonproductive Coal by Addition of Nutrients or a Microbial Consortium 76(21):7013-7022.
K35 K34
L32A
BB137
Pseudomonas
Methanogens
Methanogens
Methanogens
Marine isolates
Soil/sludgeisolates
Pseudomonas stutzeri K35
NitrateNitrite
Dinitrogen oxide
Dinitrogen
Nitrite
Ammonia
Fd
Fd
Sugars Glucose-6-P
2-keto-3-deoxy-6-P-gluconate
Glyceraldehyde-3-phosphate
Pyruvate
Glyceraldehyde-3-phosphate
Acetyl-CoA
TCA
Fructose-6-P
ED Pathway
Fructose-1,6-2P
D-Xylulose-5P
pro, arg
ser, gly, cys
DHAP
purines, pyrimidines his FAD, folate,
riboflavin
Goals:• Investigate the microbial community in potential
biogasification sites• Characterize relevant functional pathways found in
coal systems required for coal-to-methanebioconversion
• Construct draft genomes of abundantmicroorganisms in coal systems to complete adetailed characterization of prevalent functionalpotential
Metagenome AssemblyTaxonomic assessment of metagenome
Metagenome Results. Four metagenomes wereanalyzed and classified taxonomically. Generally,all samples contained Bacteria and Archaea,mostly comprised of Proteobacteria andEuryarchaeota. With the exception of K35, allmetagenomes were dominated by Methanogensfrom the order Euryarchaeota. Short DNAsequencing reads (250 bp) were assembled intolonger contigs. The contigs were binned accordingto genomic signature. Each bin was individuallyanalyzed for genome completeness by comparingcontigs to a reference marker gene set. Basedupon the presence or absence of these markergenes, the % genome completeness wasestimated.
DNA isolation and sequencing
Metagenome Assembly
Metagenome Binning
Metagenome Metagenome contigs
Metagenome reads Binned contigs
Genome bins
Mapping binned contigsto reference genomes
Jones et al., 2010
ACKNOWLEDGEMENT: This technical effort was performed in support of the National Energy Technology Laboratory’s ongoing research under the RES contractDE‐FE0004000.
Genome Results. The K35metagenome was estimated to be~50% Pseudomonas. After carefulcontig binning and genomemapping, the Pseudomonasgenome bin was 99.2% complete,estimated by thepresence/absence of 833 markergenes. The pan-genome wasdetermined and the core genome(genes common across all strainstested) was estimated. The draftgenome encodes for a completenitrification pathway as well as theupper and lower naphthalenedegradation pathways. The workpresented here represents aninitial metagenomic/genomicapproach to functionalcharacterization of coalbedmethane microbial communities.
The first step in the metagenome analysispipeline (MAP) involves careful andcalculated sample collection. Samples arecollected by drillers, or when possible onsite by NETL researchers. Importantly, topreserve sample integrity and preventnucleic acid degradation, samples areimmediately aliquoted and frozen. Aftertransport from the field, samples are thawedand processed for DNA extraction. Thequality and quantity of DNA is assessedbefore preparing samples for sequencing.
The second step involves processing DNAsamples to generate a sequence library tobe loaded into the sequencer (IlluminaMiSeq). Processing involves cleaning,barcoding, and pooling DNA samples.
The third MAP step is the most time-consuming and computationally intensive.Here, data that is retrieved from thesequencer is processed and analyzed.Processing involves removing barcodes andtrimming reads based upon the qualityscore (a measure of the confidence of eachbase call). Analysis involves metagenomeassembly, binning, and annotation.
Pan-genome Analysis
Contig mapping to reference genome sequence
Whole genome comparisons
Overview of functional potential
Future work. Results presentedhere will provide a framework fortailoring nutrient amendments formicrobial enhanced coalbedmethane and provide a baselinefor monitoring changes in themicrobial community duringamendments.
Clinical isolates
Sequence reads were assembled and contig length was plottedagainst contig number. A steeper slope represents a betterassembly.
Metagenome Binning
Contigs from the K35 metagenome assembly were binned. Eachdot represents a contig and each cluster represents a potentialgenome.
Contigs from the Pseudomonas genome bin from the K35 metagenome (bottom) were aligned to the complete genome sequenceof Pseudomonas stutzeri CCUG 29243 (top). Red lines demarcate contig boundary.
Pan-genome tree comparing the draft genome of P. stutzeri K35 to 31 complete and draft Pseudomonas genomes. The treerepresents the degree of similarity between the predicted proteins encoded by each genome.
Pan-genome of 32 Pseudomonas stutzeristrains. Each bar represents the number ofgene clusters found in 1 to 32 strainsexamined.
Milici and Polyak, 2014