hertweck asparagales 2013
TRANSCRIPT
Transposable element proliferation and genome size evolution in AsparagalesKate L. Hertweck
National Evolutionary Synthesis Center (NESCent), Durham, NC, [email protected], Twitter: @k8hert, http://k8hert.blogspot.com, http://www.slideshare.net/katehertweck
IntroductionAsparagales as a model system● 14 families, 1122 genera, ~26000 species; diverged over 100 mya (Stevens,
2001 onwards)● Many edible and ornamental species● Variation in karyotype and genome size (Pires et al., 2006; Figure 1)● Paucity of genomic resources, especially for TEs (see below)● “Core” Asparagales represents monophyletic lineage of three closely related families
Transposable elements (TEs)● Mobile genetic elements able to replicate and move throughout a genome● Represent at least 50% of the DNA in many eukaryotic genomes ● Both fine and coarse scale implications in genomic and organismal evolution ● Contribute to increases in genome size independent of, but sometimes in
conjunction with, polyploidy and other types of sequence duplication (Federoff, 2012)
Research objectives● Assemble consensus sequences of the most abundant (recently proliferated)
TEs in Asparagales genomes● Estimate the relative abundance of each type of TE
Results, conclusions, future directions● Assembly of all classes of TEs possible from GSS● Most scaffolds are partial sequences, although full-length TEs occur● Proportion of different TEs varies independent of genome size and phylogeny
● What variation is there among families of each TE type?● Are there unique TE families in Asparagales?● What is the sequence variation of reads mapping to these scaffolds?● Are there correlations between TE presence/abundance and life history traits?
MethodsSequencing● Genome survey sequences (GSS): anonymous, low-coverage sequencing
from total genomic DNA● Illumina GAIIx, single-end, 80 bp reads (Steele et al., 2012)● Proof-of-concept and quality control with six Poaceae taxa, the monocot genomic model system (data not shown)
Bioinformatics● De novo genome assembly, TE annotation, scaffold filtering, read mapping (Figure 2)
● Custom scripts available at http://github.com/k8hertweck/AsparagalesTEscripts
AcknowledgementsI acknowledge the National Science Foundation for funding (DEB 0829849 and DEB 1146603), as well as collaborators on the Monocot AToL project.
ReferencesAPGIII. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal Of The Linnean Society 161: 105-121.Bennett, M. D., and I. J. Leitch. 2010. Angiosperm DNA C-values database. http://www.kew.org/cvalues.Fedoroff, N. V. 2012. Transposable Elements, Epigenetics,and Genome Evolution. Science 338:758-767.Pires, J. C., I. J. Maureira, T. J. Givnish, K. J. Sytsma, O. Seberg, G. Petersen, J. I. Davis, et al. 2006. Phylogeny, genome size, and chromosome evolution of Asparagales. Aliso 22: 285-302.Steele, P. R., K. L. Hertweck, D. Mayfield, M. R. McKain, J. Leebens-Mack, and J. C. Pires. 2012. Quality and quantity of data recovered from massively parallel sequencing: Examples in Asparagales and Poaceae. American Journal Of Botany 99: 330-348.Stevens, P. F. 2001 onwards. Angiosperm Phylogeny Website http://www.mobot.org/MOBOT/research/APweb/ [accessed Jan 2013].
Figure 1. Phylogeny of subfamilies in core Asparagales based on all plastome genes (Steele et al., 2012). Classification based on APGIII (2009). Subfamilies in green were included in sampling for the present study. Species estimates for each subfamily are from the Angiosperm Phylogeny Website (Stevens, 2001 onwards). Chromosome number and genome size ranges obtained from the Plant DNA C-values Database (Bennett, 2010). Asterisks (*) indicate subfamilies containing taxa with bimodal karyotypes.
Taxon Subfamily Genome size (pg/1C)
Average genome
coverage
MSR scaffolds
% organellar scaffolds
% repeat scaffolds
Asphodeloideae Haworthia 15.2 0.02X 1360 2.6 29.1
Agapanthoideae Agapanthus 10.5 0.01X 438 7.8 34.9
Allioideae Allium 13.2 0.03X 1858 7.6 23.9
Amaryllidoideae Scadoxus 22.1 0.02X 1336 4.1 30.0
Lomandroideae Lomandra 1.15 0.33X 1491 7.6 29.2
Asparagoideae Asparagus 1.36 0.30X 1977 2.6 26.5
Nolinoideae Sansevieria 1.25 0.32X 835 6.9 26.7
Aphyllanthoideae Aphyllanthes 0.65 0.34X 436 15.3 38.0
Agavoideae Hosta 19.6 N/A 1084 6.1 34.5
Scilloideae Ledebouria 8.85 0.04X 2481 4.4 24.8
Brodiaeoideae Dichelostemma 9.35 0.03X 1706 1.5 27.7
Subfamily # of species
Chromosome # (taxa sampled)
Genome size (pg/1C) (taxa sampled)
Xeronemataceae 2 34 (1)
3.28 (1)
Asphodeloideae* 785 12–78 (128)
5.25-38.3 (139)
Hemerocallodoideae 85 32 (1)
0.76 (1)
Xanthorrhoeoideae* 30 22 (1)
1.04 (1)
Agapanthoideae 9 30 (7)
11.23-23.78 (9)
Allioideae 795 10-66 (153)
7.6-74.5 (162)
Amaryllidoideae 800 10-72 (93)
6.15-82.15 (112)
Lomandroideae 178 8-32 (6)
1.25-25.3(8)
Asparagoideae 165-295 20-112 (3)
1.28-4.18 (3)
Nolinoideae* 475 30-108 (11)
0.93-53.5 (33)
Aphyllanthoideae 1 N/A 0.65 (1)
Agavoideae* 637 16-180 (56)
2.55-19.6 (98)
Scilloideae 770-1000 6-54 (75)
2.6-75.9 (109)
Brodiaeoideae 62 4 (1)
10.65-18.15 (3)
outgroup
Xan
thor
rhoe
acea
eA
mar
yllid
ace
aeA
spar
aga
ecae
Raw fastq files from low coverage, anonymous, sequencing of total genomic DNA
De novo genome assembly (MSR-CA, http://www.genome.umd.edu/SR_CA_MANUAL.htm)
Filter out scaffolds that BLAST to reference organellar genomes
Map raw reads back to scaffolds to estimate relative proportion of TE
Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )
Discard unknown scaffolds and “unimportant” repeats, categorize others by type
Figure 2. Diagram of the bioinformatics pipeline to assemble and annotate TEs from Illumina GSS data in this study.
Table 1. Results of TE assembly and annotation from Asparagales taxa following the bioinformatics methods in Figure 2. Genome size data for samples sequenced is described in Steele et al. (2012). Average genome coverage calculated from 1C genome size, read length, and number of reads for each sample; coverage data for Hosta is unavailable as sequencing was performed on DNA enriched for plastome. Percentages represent proportion of reads belonging to organelles and annotated repeats from total number of MSR scaffolds.
LINEs
Copia LTRs
Gypsy LTRs
DNA TEs
other (RC, satellite, low complexity, simple repeats)
Genome size (pg/1C)
0
10
20
30
40
50
60
70
80
90
100
0
5
10
15
20
25
gen
om
e s
ize
(p
g/1
C)
per
cen
tage
Figure 3. Percentage of different TE types of total repetitive fraction of representative core Asparagales taxa, arranged in order of increasing total genome size.
Naturehills.comErica Wheeler ag.arizona.edu
wikicommons