hertweck asparagales 2013

1
Transposable element proliferation and genome size evolution in Asparagales Kate L. Hertweck National Evolutionary Synthesis Center (NESCent), Durham, NC, USA [email protected], Twitter: @k8hert, http://k8hert.blogspot.com, http://www.slideshare.net/katehertweck Introduction Asparagales as a model system 14 families, 1122 genera, ~26000 species; diverged over 100 mya (Stevens, 2001 onwards) Many edible and ornamental species Variation in karyotype and genome size (Pires et al., 2006; Figure 1) Paucity of genomic resources, especially for TEs (see below) “Core” Asparagales represents monophyletic lineage of three closely related families Transposable elements (TEs) Mobile genetic elements able to replicate and move throughout a genome Represent at least 50% of the DNA in many eukaryotic genomes Both fine and coarse scale implications in genomic and organismal evolution Contribute to increases in genome size independent of, but sometimes in conjunction with, polyploidy and other types of sequence duplication (Federoff, 2012) Research objectives Assemble consensus sequences of the most abundant (recently proliferated) TEs in Asparagales genomes Estimate the relative abundance of each type of TE Results, conclusions, future directions Assembly of all classes of TEs possible from GSS Most scaffolds are partial sequences, although full-length TEs occur Proportion of different TEs varies independent of genome size and phylogeny What variation is there among families of each TE type? Are there unique TE families in Asparagales? What is the sequence variation of reads mapping to these scaffolds? Are there correlations between TE presence/abundance and life history traits? Methods Sequencing Genome survey sequences (GSS): anonymous, low-coverage sequencing from total genomic DNA Illumina GAIIx, single-end, 80 bp reads (Steele et al., 2012) Proof-of-concept and quality control with six Poaceae taxa, the monocot genomic model system (data not shown) Bioinformatics De novo genome assembly, TE annotation, scaffold filtering, read mapping (Figure 2) Custom scripts available at http://github.com/k8hertweck/AsparagalesTEscripts Acknowledgements I acknowledge the National Science Foundation for funding (DEB 0829849 and DEB 1146603), as well as collaborators on the Monocot AToL project. References APGIII. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal Of The Linnean Society 161: 105-121. Bennett, M. D., and I. J. Leitch. 2010. Angiosperm DNA C-values database. http://www.kew.org/cvalues. Fedoroff, N. V. 2012. Transposable Elements, Epigenetics,and Genome Evolution. Science 338:758-767. Pires, J. C., I. J. Maureira, T. J. Givnish, K. J. Sytsma, O. Seberg, G. Petersen, J. I. Davis, et al. 2006. Phylogeny, genome size, and chromosome evolution of Asparagales. Aliso 22: 285-302. Steele, P. R., K. L. Hertweck, D. Mayfield, M. R. McKain, J. Leebens-Mack, and J. C. Pires. 2012. Quality and quantity of data recovered from massively parallel sequencing: Examples in Asparagales and Poaceae. American Journal Of Botany 99: 330-348. Stevens, P. F. 2001 onwards. Angiosperm Phylogeny Website http://www.mobot.org/MOBOT/research/APweb/ [accessed Jan 2013]. Figure 1. Phylogeny of subfamilies in core Asparagales based on all plastome genes (Steele et al., 2012). Classification based on APGIII (2009). Subfamilies in green were included in sampling for the present study. Species estimates for each subfamily are from the Angiosperm Phylogeny Website (Stevens, 2001 onwards). Chromosome number and genome size ranges obtained from the Plant DNA C-values Database (Bennett, 2010). Asterisks (*) indicate subfamilies containing taxa with bimodal karyotypes. Taxon Subfamily Genome size (pg/1C) Average genome coverage MSR scaffolds % organellar scaffolds % repeat scaffolds Asphodeloideae Haworthia 15.2 0.02X 1360 2.6 29.1 Agapanthoideae Agapanthus 10.5 0.01X 438 7.8 34.9 Allioideae Allium 13.2 0.03X 1858 7.6 23.9 Amaryllidoideae Scadoxus 22.1 0.02X 1336 4.1 30.0 Lomandroideae Lomandra 1.15 0.33X 1491 7.6 29.2 Asparagoideae Asparagus 1.36 0.30X 1977 2.6 26.5 Nolinoideae Sansevieria 1.25 0.32X 835 6.9 26.7 Aphyllanthoideae Aphyllanthes 0.65 0.34X 436 15.3 38.0 Agavoideae Hosta 19.6 N/A 1084 6.1 34.5 Scilloideae Ledebouria 8.85 0.04X 2481 4.4 24.8 Brodiaeoideae Dichelostemma 9.35 0.03X 1706 1.5 27.7 Subfamily # of species Chromosome # (taxa sampled) Genome size (pg/1C) (taxa sampled) Xeronemataceae 2 34 (1) 3.28 (1) Asphodeloideae* 785 12–78 (128) 5.25-38.3 (139) Hemerocallodoideae 85 32 (1) 0.76 (1) Xanthorrhoeoideae* 30 22 (1) 1.04 (1) Agapanthoideae 9 30 (7) 11.23-23.78 (9) Allioideae 795 10-66 (153) 7.6-74.5 (162) Amaryllidoideae 800 10-72 (93) 6.15-82.15 (112) Lomandroideae 178 8-32 (6) 1.25-25.3 (8) Asparagoideae 165-295 20-112 (3) 1.28-4.18 (3) Nolinoideae* 475 30-108 (11) 0.93-53.5 (33) Aphyllanthoideae 1 N/A 0.65 (1) Agavoideae* 637 16-180 (56) 2.55-19.6 (98) Scilloideae 770-1000 6-54 (75) 2.6-75.9 (109) Brodiaeoideae 62 4 (1) 10.65-18.15 (3) outgroup Xanthorrhoeaceae Amaryllidaceae Asparagaecae Raw fastq files from low coverage, anonymous, sequencing of total genomic DNA De novo genome assembly (MSR-CA, http://www.genome.umd.edu/SR_CA_MANUAL.htm) Filter out scaffolds that BLAST to reference organellar genomes Map raw reads back to scaffolds to estimate relative proportion of TE Run RepeatMasker to identify similarity to known repeats (3110 repeats, 98.7% are from grasses ) Discard unknown scaffolds and “unimportant” repeats, categorize others by type Figure 2. Diagram of the bioinformatics pipeline to assemble and annotate TEs from Illumina GSS data in this study. Table 1. Results of TE assembly and annotation from Asparagales taxa following the bioinformatics methods in Figure 2. Genome size data for samples sequenced is described in Steele et al. (2012). Average genome coverage calculated from 1C genome size, read length, and number of reads for each sample; coverage data for Hosta is unavailable as sequencing was performed on DNA enriched for plastome. Percentages represent proportion of reads belonging to organelles and annotated repeats from total number of MSR scaffolds. LINEs Copia LTRs Gypsy LTRs DNA TEs other (RC, satellite, low complexity, simple repeats) Genome size (pg/1C) 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 genome size (pg/1C) percentage Figure 3. Percentage of different TE types of total repetitive fraction of representative core Asparagales taxa, arranged in order of increasing total genome size. Naturehills.com Erica Wheeler ag.arizona.edu wikicommons

Upload: kate-hertweck

Post on 17-Jul-2015

322 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hertweck Asparagales  2013

Transposable element proliferation and genome size evolution in AsparagalesKate L. Hertweck

National Evolutionary Synthesis Center (NESCent), Durham, NC, [email protected], Twitter: @k8hert, http://k8hert.blogspot.com, http://www.slideshare.net/katehertweck

IntroductionAsparagales as a model system● 14 families, 1122 genera, ~26000 species; diverged over 100 mya (Stevens,

2001 onwards)● Many edible and ornamental species● Variation in karyotype and genome size (Pires et al., 2006; Figure 1)● Paucity of genomic resources, especially for TEs (see below)● “Core” Asparagales represents monophyletic lineage of three closely related families

Transposable elements (TEs)● Mobile genetic elements able to replicate and move throughout a genome● Represent at least 50% of the DNA in many eukaryotic genomes ● Both fine and coarse scale implications in genomic and organismal evolution ● Contribute to increases in genome size independent of, but sometimes in

conjunction with, polyploidy and other types of sequence duplication (Federoff, 2012)

Research objectives● Assemble consensus sequences of the most abundant (recently proliferated)

TEs in Asparagales genomes● Estimate the relative abundance of each type of TE

Results, conclusions, future directions● Assembly of all classes of TEs possible from GSS● Most scaffolds are partial sequences, although full-length TEs occur● Proportion of different TEs varies independent of genome size and phylogeny

● What variation is there among families of each TE type?● Are there unique TE families in Asparagales?● What is the sequence variation of reads mapping to these scaffolds?● Are there correlations between TE presence/abundance and life history traits?

MethodsSequencing● Genome survey sequences (GSS): anonymous, low-coverage sequencing

from total genomic DNA● Illumina GAIIx, single-end, 80 bp reads (Steele et al., 2012)● Proof-of-concept and quality control with six Poaceae taxa, the monocot genomic model system (data not shown)

Bioinformatics● De novo genome assembly, TE annotation, scaffold filtering, read mapping (Figure 2)

● Custom scripts available at http://github.com/k8hertweck/AsparagalesTEscripts

AcknowledgementsI acknowledge the National Science Foundation for funding (DEB 0829849 and DEB 1146603), as well as collaborators on the Monocot AToL project.

ReferencesAPGIII. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal Of The Linnean Society 161: 105-121.Bennett, M. D., and I. J. Leitch. 2010. Angiosperm DNA C-values database. http://www.kew.org/cvalues.Fedoroff, N. V. 2012. Transposable Elements, Epigenetics,and Genome Evolution. Science 338:758-767.Pires, J. C., I. J. Maureira, T. J. Givnish, K. J. Sytsma, O. Seberg, G. Petersen, J. I. Davis, et al. 2006. Phylogeny, genome size, and chromosome evolution of Asparagales. Aliso 22: 285-302.Steele, P. R., K. L. Hertweck, D. Mayfield, M. R. McKain, J. Leebens-Mack, and J. C. Pires. 2012. Quality and quantity of data recovered from massively parallel sequencing: Examples in Asparagales and Poaceae. American Journal Of Botany 99: 330-348.Stevens, P. F. 2001 onwards. Angiosperm Phylogeny Website http://www.mobot.org/MOBOT/research/APweb/ [accessed Jan 2013].

Figure 1. Phylogeny of subfamilies in core Asparagales based on all plastome genes (Steele et al., 2012). Classification based on APGIII (2009). Subfamilies in green were included in sampling for the present study. Species estimates for each subfamily are from the Angiosperm Phylogeny Website (Stevens, 2001 onwards). Chromosome number and genome size ranges obtained from the Plant DNA C-values Database (Bennett, 2010). Asterisks (*) indicate subfamilies containing taxa with bimodal karyotypes.

Taxon Subfamily Genome size (pg/1C)

Average genome

coverage

MSR scaffolds

% organellar scaffolds

% repeat scaffolds

Asphodeloideae Haworthia 15.2 0.02X 1360 2.6 29.1

Agapanthoideae Agapanthus 10.5 0.01X 438 7.8 34.9

Allioideae Allium 13.2 0.03X 1858 7.6 23.9

Amaryllidoideae Scadoxus 22.1 0.02X 1336 4.1 30.0

Lomandroideae Lomandra 1.15 0.33X 1491 7.6 29.2

Asparagoideae Asparagus 1.36 0.30X 1977 2.6 26.5

Nolinoideae Sansevieria 1.25 0.32X 835 6.9 26.7

Aphyllanthoideae Aphyllanthes 0.65 0.34X 436 15.3 38.0

Agavoideae Hosta 19.6 N/A 1084 6.1 34.5

Scilloideae Ledebouria 8.85 0.04X 2481 4.4 24.8

Brodiaeoideae Dichelostemma 9.35 0.03X 1706 1.5 27.7

Subfamily # of species

Chromosome # (taxa sampled)

Genome size (pg/1C) (taxa sampled)

Xeronemataceae 2 34 (1)

3.28 (1)

Asphodeloideae* 785 12–78 (128)

5.25-38.3 (139)

Hemerocallodoideae 85 32 (1)

0.76 (1)

Xanthorrhoeoideae* 30 22 (1)

1.04 (1)

Agapanthoideae 9 30 (7)

11.23-23.78 (9)

Allioideae 795 10-66 (153)

7.6-74.5 (162)

Amaryllidoideae 800 10-72 (93)

6.15-82.15 (112)

Lomandroideae 178 8-32 (6)

1.25-25.3(8)

Asparagoideae 165-295 20-112 (3)

1.28-4.18 (3)

Nolinoideae* 475 30-108 (11)

0.93-53.5 (33)

Aphyllanthoideae 1 N/A 0.65 (1)

Agavoideae* 637 16-180 (56)

2.55-19.6 (98)

Scilloideae 770-1000 6-54 (75)

2.6-75.9 (109)

Brodiaeoideae 62 4 (1)

10.65-18.15 (3)

outgroup

Xan

thor

rhoe

acea

eA

mar

yllid

ace

aeA

spar

aga

ecae

Raw fastq files from low coverage, anonymous, sequencing of total genomic DNA

De novo genome assembly (MSR-CA, http://www.genome.umd.edu/SR_CA_MANUAL.htm)

Filter out scaffolds that BLAST to reference organellar genomes

Map raw reads back to scaffolds to estimate relative proportion of TE

Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )

Discard unknown scaffolds and “unimportant” repeats, categorize others by type

Figure 2. Diagram of the bioinformatics pipeline to assemble and annotate TEs from Illumina GSS data in this study.

Table 1. Results of TE assembly and annotation from Asparagales taxa following the bioinformatics methods in Figure 2. Genome size data for samples sequenced is described in Steele et al. (2012). Average genome coverage calculated from 1C genome size, read length, and number of reads for each sample; coverage data for Hosta is unavailable as sequencing was performed on DNA enriched for plastome. Percentages represent proportion of reads belonging to organelles and annotated repeats from total number of MSR scaffolds.

LINEs

Copia LTRs

Gypsy LTRs

DNA TEs

other (RC, satellite, low complexity, simple repeats)

Genome size (pg/1C)

0

10

20

30

40

50

60

70

80

90

100

0

5

10

15

20

25

gen

om

e s

ize

(p

g/1

C)

per

cen

tage

Figure 3. Percentage of different TE types of total repetitive fraction of representative core Asparagales taxa, arranged in order of increasing total genome size.

Naturehills.comErica Wheeler ag.arizona.edu

wikicommons