metagenomics tools: bacs/fosmid libraries whole genome ... 17, 2005 · history of marine...
TRANSCRIPT
Metagenomics Tools:
BACS/Fosmid Libraries
Whole Genome Shotgun Sequencing
Amy Apprill
OCN 750: Molecular Methods in Biological Oceanography
November 17, 2005
- Limited physiology and functional role information known about microbes from cultures
-Phylotypes of noncultured microbes derived from rRNA genes only provide phylogenetic info, no information about physiology, biochemistry, or ecological function; subject to PCR-based biases
- Metagenomics allows isolation of large portions of genomes which provide access to genes for protein-coding for biochemical pathways
Why Metagenomics?
→ insight into specific physiological and ecological functions, metabolic variability of an environment
History of Marine Metagenomics
1991: Lambda phage used as a vector to create 10-20 kb insert shotgun library of picoplankton rRNA gene sequences, but also revealed other genes of interest (Schmidt TM, DeLong EF, Pace NR, 1991)
1992: Introduction of BAC & Fosmid cloning vectors from E. coli improved cloning efforts by controlling copy numbers
- BAC vectors replicate >300kb & display few chimeras (Shizuya et al. 1992)
1996: First environmental fosmid library with environmental samples from Oregon coast (Stein et al. 1996)
2000: First BAC library from marine environment (Beja et al. 2000); proteorhodopsin discovered from Monterey Bay BAC (Beja et al. 2000)
2005: Whole genome shotgun sequencing approach used on first marine environmental samples from the Sargasso Sea (Venter et al. 2005)
2002: AAnP diversity uncovered from Monterey Bay BAC (Beja et al. 2002)
BAC: bacterial artificial chromosomeA modified plasmid that contains an origin of replication derived from the E. coli F factor frequently used for large insert cloning experiments; exists within the cell very much like a cellular chromosome.
- 100- 300kb (even 600kp!) inserts; 1 insert ~10-15% bacterial genome
- Requires large amounts DNA (800-2000 L seawater)
- Useful for screening specific protein-coding genes and genes of uncultivated microbes
- Used to discover proteorhodopsin in several phylotypes, genes for anoxygenicphotosynthesis
Specifics:
Marine bacteria BAC/ fosmid construction - general
DeLong, 2005
How to create a BAC from seawater:1. Collect ~1000 L seawater
2. Pre-filter, use TFF to pellet cells
3. Agarose embed cell pellet
4. Lyse agarose embedded cells
5. Prepare large DNA fragments byHindIII digestion of agarose slices
- Run PFGE- Excise 150-400 kbp
regions- Extract gel-embedded DNA
(Beja et al 2000, Fig. 1)
6. Ligate DNA into vector (previously removed from cells)
How to create a BAC, cont.
http://www.ptf.okstate.edu/pulser.html
7. Transform vector into cells usingelectrophoration
8. Screen for phylogenetic info, purify & sequence
plasmid
(Beja et al 2000, Fig. 2A)
Pulse Field Gel Electrophoresis of BAC clones digested with NotI describes size of inserts
BAC Screening: rRNA Gene Surveys using Multiplex PCR
- Digest BAC/fosmid DNA to remove E. coli chromosome
- Screen fragments for rRNA gene from clones using 3 bacterial primer sets (SSU & LSU) and Archaea-specific
- Excise amplicons form gel, purify
- Clone & sequence purified products
Phylogenetic-informative multiplex PCR products describes phylogenetic groupings (Beja et al 2000, Fig. 5)
BAC Screening: ITS-LH-PCR
Figure 4. Suzuki et al. 2004
Uses natural length variations in ITS, and location of tRNA-alanine gene within the ITS, to ID unique gene fragments corresponding to phylogenetic groupings
1. Pool plasmid-safe treated DNA and PCR with fluorescent labeled SSU & LSU primers to amp ITS & tRNA genes
2. Capillary electrophoresis compares size stds to fragment lengths
3. Sequence unknown fragments w/ ITS primers and 16S primers
PROS:
- Sequence data; no fragment interpretation
rRNA gene surveysPROS:
- No direct DNA sequencing
- Easier to distinguish E. coli fragments
- High-throughput analysis
LH-ITS-PCR
CONS:
- Contaminating E. coliDNA
- PCR-based biases
- Not suitable for high-throughput analysis
CONS:
- Multiple clones w/ over lapping size
- Disruption of ITS may occur w/ cloning
- Some groups w/o linked SSU & LSU
- PCR-based biases
BAC Screening Comparison
- Represents 10-15% bacterial genome; gain info about uncultured microbes
- Functional gene presence implies physiology or ecology
- Controlled replication (replicon at 2 copies/cell)
- Low level of chimerism
- Requires large amounts sample (800-2000L sw)
- No direct phylogeneticinformation
- Screening may introduce PCR biases
- Expensive (time, screening)
Pros & Cons of BAC libariesPros: Cons:
- F1 origin-based cosmid vector
- ~40kb DNA inserts
- Requires smaller samples (>1L sw)
PROS: Quick; Takes days compared to months – year for BACS
CONS: Recovers fewer clones & more sheared DNA compared to BACS
Figure from Epicentre® biotechnologies (http://www.epibio.com/item.asp?ID=278&CatID=125&SubCatID=60)
Fosmid library
Whole genome shotgun sequencing: cloning the entire genome in a random fashion and sequencing the resultant clones
-Collect >200L seawater, pre-filter, TFF or 0.22µm
- Shotgun cloning of small fragments ranging 2-6 kb
-Shotgun Assembly: Computer program searches for overlapping sequences and assembles the sequenced fragments in correct order
(DeLong 2005)
Figure 2. Venter et al. 2005
Assembled FragmentsProchlorococcus marinus MED4
Pros:
- Lots of data
- Various phylogenetic marker genes assess diversity without PCR biases
- Unbiased identification of gene diversity
- Functional gene info implies ecology, physiology for generating hypothesis
Cons:
- Challenging to assemble fragments correctly in current context (lots of data!)
- Redundant sequencing
- Unknown order and orientation of clones
- Expensive
- Large sample size (>200L)
Whole genome shotgun sequencing
Table 1. Suzuki et al. 2004
Figure 1. Suzuki et al. 2004
Figure 2. Suzuki et al. 2004
Figure 3. Suzuki et al. 2004
Figure 4. Suzuki et al. 2004
Figure 1. Venter et al. 2005
Figure 2. Venter et al. 2005
Figure 3. Venter et al. 2005
Figure 4. Venter et al. 2005
Figure 5. Venter et al. 2005
Table 1. Venter et al. 2005
Figure 6. Venter et al. 2005
Venter et al. 2005
Table 3.
Table 2.
Figure 7. Venter et al. 2005
Whole genome shotgun sequencing success
Large magnitude and total gene count-1.045 billion base pairs non-redundant sequence
-1,625 Mb DNA sequence
-1,214,207 new genes identified
New discoveries
- 1,800 new microbial species
- 148 previously unknown bacterial phylotypes
- 782 new rhodopsin-like photoreceptors
- Open ocean Burkholderia Shewanella presence (??)
- Archaea with amo gene (followed up by Francis et al. 2005)
Sargasso Sea WGS (Venter et al. 2005):
What we can learn from marine BAC libraries
Apparent taxonomic affiliation of protein-encoding genes from different depths in Monterey Bay (DeLong 2005).
Published Metagenomics studies
DeLong 2005