hertweck bbl2012
DESCRIPTION
Annual presentation to NESCent scientists in Dec 2012 about my postdoctoral research projects.TRANSCRIPT
Genome-wide effects of transposable element evolution
Kate L HertweckNational Evolutionary Synthesis Center (NESCent)
● Teaching half time for Duke Bio 202 (genetics and evolution)
● Responsible for one lab section, lab development, and lecturing
● Interesting integration of Duke course with Coursera next semester
But first...a teaching interlude
Overview
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in Asparagales
3. TEs and aging in Drosophila
What is in a genome?
● The first step in analyzing genomes is usually to mask or filter repetitive sequences, which often comprise a large portion of the nuclear genome
● Repetitive sequences include satellites, telomeres, and other “junk” DNA elements
● “Selfish” DNA (or mobile genetic elements) is a category of repetitive sequences representing transposable elements (parasitic self-replicating derived from viruses)
● Growing evidence (including ENCODE) supports that “junk” DNA contains essential function and provides material for evolutionary innovation
TEs Asparagales Drosophila
Class I: RetrotransposonsLTRLINESINEERVSVA
Class II: DNA transposonsTIRCryptonHelitronMaverick
www.virtualsciencefair.org
TEs directly affect organisms as they move throughout a genome
Kate Hertweck, Genomic effects of repetitive DNA
● TEs interact with genes
● TE insertion within a gene disrupts function
● Exaptation of TEs into genes: Alu elements contributed to evolution of three color vision (Dulai, 1999)
● Gene expression and regulatory changes
● TEs affect molecular evolution
● Indels
● increased recombination (chromosomal restructuring)
● Links between TEs and adaptation/speciation
Kate Hertweck, NESCent, Genomic effects of junk DNATEs Asparagales Drosophila
TEs indirectly affect organisms through changes in genome size
Changes in overall genome size
Physical-mechanical effects of nuclear size and mass
Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, ecology, growth form)
TEs Asparagales Drosophila
Research questions and goals
● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?
● Patterns in genome size change
● Proliferation of TEs within lineages
Evolutionnews.org
TEs Asparagales Drosophila
Research questions and goals
● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?
● Patterns in genome size change
● Proliferation of TEs within lineages
Evolutionnews.org
● Do genomic patterns correlate with changes in life history?
● Improving methods for comparative genomics across broad taxonomic levels
● Application of phylogenetic comparative methods to genomic data
TEs Asparagales Drosophila
Overview
Collaborators:J. Chris Pires and lab (U of Missouri)Patrick EdgerDustin Mayfield
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in Asparagales
3. TEs and aging in Drosophila
Genomic evolution in Asparagales
● Many edible species (onion, asparagus, agave) and ornamentals (orchid, amaryllis, yucca)
● Lots of variation in life history traits: physiology, growth habit, habitat
● Interesting patterns of genomic evolution● Wide variation genome size● Bimodal karyotypes
● Despite possessing some of the largest angiosperm genomes, we know little about the TEs in Asparagales
● Possibility to test hypotheses of correlations between genomic changes and life history traits
ag.arizona.edu Naturehills.com
TEs Asparagales Drosophila
TEs Asparagales Drosophila
TEs Asparagales Drosophila
TEs Asparagales Drosophila
TEs Asparagales Drosophila
Our data
● Illumina (80-120 bp single end), 6 taxa per lane
● GSS (Genome Survey Sequences): total genomic DNA!
● Data originally collected for systematics
● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)
● Poaceae (family of grasses, model system)
● Medium-sized genomes
● Well-annotated library of repeats
● Asparagales (order of petaloid monocots, non-model system)
● Very large genomes
● Discovery of novel repeats
TEs Asparagales Drosophila
Our data
● Illumina (80-120 bp single end), 6 taxa per lane
● GSS (Genome Survey Sequences): total genomic DNA!
● Data originally collected for systematics
● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)
● Poaceae (family of grasses, model system)
● Medium-sized genomes
● Well-annotated library of repeats
● Asparagales (order of petaloid monocots, non-model system)
● Very large genomes
● Discovery of novel repeats
● Is there a way to characterize repeats when the genome
is a big black box?
TEs Asparagales Drosophila
Bioinformatics approach
● Sequence assembly:
● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences
● De novo sequence assembly: standard genome assembly methods, screen resulting contigs
TEs Asparagales Drosophila
Bioinformatics approach
● Sequence assembly:
● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences
● De novo sequence assembly: standard genome assembly methods, screen resulting contigs
● Annotation method:
Motif searching
● Reference library
TEs Asparagales Drosophila
Bioinformatics approach
Sidenote: improving the ontology for transposable elements (classification and annotation)Sequence Ontology (SO)Comparative Data Analysis Ontology (CDAO)
● Sequence assembly:
● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences
● De novo sequence assembly: standard genome assembly methods, screen resulting contigs
● Annotation method:
Motif searching
● Reference library
TEs Asparagales Drosophila
Pipeline
TEs Asparagales Drosophila
Raw fastq files
De novo genome assembly (MSR-CA)
Filter out scaffolds that BLAST to reference organellar genomes
Map raw reads back to scaffolds to estimate relative proportion of TE
Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )
Discard unknown scaffolds and “unimportant” repeats, categorize others by type
Scripts available on GitHub:AsparagalesTEscripts
Pipeline
TEs Asparagales Drosophila
Raw fastq files
De novo genome assembly (MSR-CA)
Filter out scaffolds that BLAST to reference organellar genomes
Map raw reads back to scaffolds to estimate relative proportion of TE
Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )
Discard unknown scaffolds and “unimportant” repeats, categorize others by type
Scripts available on GitHub:AsparagalesTEscripts
Quality control: Poaceae
● Largest scaffolds with deepest coverage are from the chloroplast and mitochondrial genomes, but are easily identified for exclusion
● All relevant classes of repeats are present in scaffolds from a single genome
● Even long repeats can be reconstructed into a single scaffold
● Characterization of repeats is not dependent on sequence coverage
● Estimates of quantity repeats are not very accurate-- but there is little consensus of TE quantification in published literature!
● Decision: use a dataset constructed from similar data and analyzed in the same pipeline so any error is systematic and shared among all taxa
● How well do these methods work for non-model systems?
TEs Asparagales Drosophila
Example: LTR from Hosta
● Reads map across scaffold: assembly is reliable● Some divergence in reads: measure of diversity?
TEs Asparagales Drosophila
REs in Core Asparagales
TEs Asparagales Drosophila
Genome size varies among core Asparagales
TEs Asparagales Drosophila
0
5
10
15
20
25
Genome size (Gb)#reads (billions)
Number of scaffolds varies among taxa
TEs Asparagales Drosophila
0
500
1000
1500
2000
2500
3000
Total scaffoldsNuclear scaffolds
Proportion of TEs varies among taxa
TEs Asparagales Drosophila
0
10
20
30
40
50
60
other (RC, satellite, low complexity, simple repeats)% Copia LTRs% Gypsy LTRs% LINEs% DNA TEs
Very large genomes in Core Asparagales
TEs Asparagales Drosophila
Small genomes contain variation
TEs Asparagales Drosophila
Developing genomic traits for comparative biology
TEs Asparagales Drosophila
● Genomic traits can be treated just like any other phenotype
• Number of gene copies of a single family
• Genome size, intron size, GC content, number of chromosomes, polyploidy, karyotype (sex chromosomes)
• Sometimes genomic traits evolve in such a way that models need to be altered to accommodate their variation
● We finally have enough information to be able to apply these methods across robust phylogenies of organisms!
● What about transposable elements?
So what?● You can peek into the black box of large plant genomes with even very
limited genomic sequence data
● There is a great deal of variation in TE compliments among closely related plant species
● These methods can easily be applied to extant datasets to summarize TEs
TEs Asparagales Drosophila
So what?● Data available for most plants are low coverage, with little known about
the TEs present and their direct effects on the genome and organism
● Plant genomes tolerate more plasticity than animal genomes
• Polyploidy, chromosomal restructuring more common in plants
• Repetitive compliment comprises a higher proportion of plant genomes
• Differences in gene silencing
● Pretty plants are great, but what if we want a more applied approach?
TEs Asparagales Drosophila
Overview
Collaborators:Joseph Graves (UNCG, NC A&T)Michael Rose (UC Irvine)Mira Han (NESCent)
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in Asparagales
3. TEs and aging in Drosophila
Genomics of aging
● Aging as “detuning” of adaptation
● Age-related genes and expression patterns
● Does the movement of TEs throughout a genome correspond to how long an organism lives?
● Previously discussed life history traits only involve TE proliferation in gametic tissue
● Questions about aging involve changes in organisms throughout lifespan, especially if results can be transferred to human research
TEs Asparagales Drosophila
Experimental data● Replicate populations of fruit flies selected for both short and long life
spans (Burke et al 2010)
● Next-gen sequencing of pooled populations● SNP analysis indicates allele frequency changes at many loci, but
little evidence for selective sweeps● Extensive gene expression change
TEs Asparagales Drosophila
Experimental approach
FBMITELINELTRTIR
● Does the frequency of a TE differ between control and treatment populations?
● Are there patterns consistent with type of TE● T-lex: perl script for identifying presence and absence of annotated
transposable elements
● 2947 transposable elements from publicly available genome sequence
TEs Asparagales Drosophila
Scripts available on GitHub:flyTEscripts
Preliminary results
● Controls and populations selected for shorter lifespan
● All population pairs are statistically the same (Kruskal-Wallis, p=0.9414)
TEs Asparagales Drosophila
1 2 3 4 50
100
200
300
400
500
600
700
NA0100final
population
num
be
r o
f TE
s
Preliminary results
TEs Asparagales Drosophila
● Controls and populations selected for shorter lifespan
● 153 TEs vary in one or more population
● 70 TEs vary in all five populations
● some TE frequencies move to fixation
Finishing the job...
● What are patterns from other population pairs (selection for longer lifespan)?
● Formal statistical testing for variation
● Where are TEs of interest located in the genome? What genes are located nearby?
● T-lex de novo: searching for unannotated insertions
– Are there unique TE insertions related to longer life spans?
TEs Asparagales Drosophila
Conclusions
● What are general patterns of TE evolution?
● Different TEs contribute to genome size obesity.● We still need better methods to compare genomes.
● Are there common patterns between TEs and life history trait evolution?
● Yes, very specific insertions, at least in Drosophila.● How can comparative methods be appropriated for genomic
characeristics?● Does TE proliferation contribute to diversification or shifts in rates of
molecular evolution?
● We are getting closer to possessing enough data to answer these questions.
TEs Asparagales Drosophila
Conclusions
● There are many interesting questions to be investigated using other folks' genomic trash!
● A little sequencing data can tell you a lot about a genome.
● Many markers for systematic purposes ● You can characterize major groups of repeats even in the absence
of a robust reference library for the species.● Informatics tools and resources abound!
TEs Asparagales Drosophila
Acknowledgements
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, TE ontology
NESCent (National Evolutionary Synthesis Center)Allen RoderigoKaren Cranston (and bioinformatics group!)
www.nescent.org
k8hert.blogspot.com
Find me:Twitter @k8hertGoogle+ [email protected]