transposable elements of agavoideae

15
Transposable elements of Agavoideae Kate L Hertweck (@k8hert) The University of Texas at Tyler Alexandros Bousios University of Sussex Michael McKain Donald Danforth Plant Science Center en.wikipedia.org en.wikipedia.org

Upload: kate-hertweck

Post on 15-Aug-2015

187 views

Category:

Science


1 download

TRANSCRIPT

Transposable elements ofAgavoideae

Kate L Hertweck (@k8hert)The University of Texas at Tyler

Alexandros BousiosUniversity of Sussex

Michael McKainDonald Danforth Plant Science Center

en.wikipedia.org en.wikipedia.org

Why Agavoideae? (besides the obvious)

● Asparagaceae subfamily Agavoideae: 23 genera, 637 species● agave, yucca, Joshua Tree● Economically important:

● tequila, food starches● biofuels● ornamentals

● interesting morphological, ecological, life history traits● Recent diversification correlated with ecological traits

(Good-Avila, 2006)

gizmodo.com

Hertweck et al., TEs in Agavoideae

commons.wikimedia.org

Agavoideae genomics

● Emerging genomic/transcriptomic resources ● Polyploidy, bimodality (McKain et al., 2012)

● Variation in TEs (Bousios et al., 2007) and genome size (Zonneveld, 2003)

Darlington 1963

Hertweck et al., TEs in Agavoideae

Guadelupe et al., 2008

Transposable elements as a model system

● TEs, mobile genetic elements, or jumping genes● Parasitic, self-replicating, move independently in the genome● Many different types; some similar to or derived from viruses

Class I: Retrotransposons(copy and paste)

LTR (Gypsy, Copia/Sireviruses,Caulimoviruses)

LINESINE

Class II: DNA transposons(cut and paste)

TIR (EnSpm, hAT, MuDR,TcMar, PIF)

MITEHelitron

Hertweck et al., TEs in Agavoideae

● TE proliferation is associated with modifications across the genome,including changes to gene expression and genome size

● TE composition/abundance may interact with organismal changes, likehybridization, polyploidy, phenotype, life history

Mine existing genomic resources across Agavoideae to characterizerepetitive elements

Estimate abundance and diversity of transposable elements (TEs)

Cross validate results from different methods

The big questions:

Is transposon composition in Agavoideae genomes related tohypothesized patterns of genomic evolution?

Do transposon proliferation and other genomic traits correlate with lifehistory traits in Agavoideae?

Hertweck et al., TEs in Agavoideae

Our goals

Ap

hylla

nth

es

Lom

andr

a

Sa

nse

vie

ria

Asp

ara

gus

Lede

bou

ria

Dic

helo

stem

ma

Ag

apa

nth

is

Alli

um

Ha

wor

thia

Hos

ta

Sca

doxu

s

0%

10%

20%

30%

40%

50%

60%

70%

0

5000

10000

15000

20000

25000

Agavoideae includes substantial diversity (even by Asparagales standards)

Unknown contigs

Known repeats

Gen

ome

size

(M

b/1C

)

Per

cent

age

of s

eque

nce

read

s fr

om n

ucle

ar g

eno

me

Hertweck, 2013, Genome

● Genomes are difficult to assemble● Genome size varies

Repeat characterization methods

Genome survey sequences● most from MonAToL

project (Illumina SE, 30-100 bp)

● quality control of fastq fileswith PRINSEQ

● assembled with MaSuRCA v2.3.2 or RepARK v1.3.0

● organellar sequencesfiltered with BLAST

● 0.02-0.38x coverage● 12 taxa, only 8 with

sufficient contigs to analyze

Scripts available:github.com/k8hertweck/REpipe

Hertweck et al., TEs in Agavoideae

Nuclear contigs

● assembled contigs areconsensus of mostabundant TEs in thegenome

● TEs must exist in high copyto have sufficient reads fordetection (assembly)

● the older a TE insertion,the more likely it hasaccumulated mutationswhich will inhibit detection

● data presented aspercentage of TE type innuclear genome (relativeabundance)

en.wikipedia.org

Repeat characterization methods

Genome survey sequences

Scripts available:github.com/k8hertweck/REpipe

Hertweck et al., TEs in Agavoideae

Transcriptomes● various sources, tissues,

coverage, assemblymethods

● downloaded assemblies(no other filtering)

Nuclear contigs

● contigs represent activelytranscribed TEs, whichmay or may not relate toabundance in the genome

● even relatively rare TEsmay be detectable

● data presented aspercentage of transcripts(relative expresseddiversity)

en.wikipedia.org

Repeat characterization methods

Genome survey sequences

Scripts available:github.com/k8hertweck/REpipe

Hertweck et al., TEs in Agavoideae

TranscriptomesNuclear contigs

RepeatMasker● Liliopsida library (mostly

references from grasses)● searches many types of

TEs, including partswithout genes

● some ambiguous results(same contig, multipletypes of TE)

Domain searching● rpstblastn against protein

domain models (CDD)for TE-specific genes

● clustering with CD-HIT-EST

Repeat contigs

Unknown contigsread mapping

WikimediaCommons

Detectable repeats vary across species

Hertweck et al., TEs in Agavoideae

Repeat abundance● percentage of total reads● repeat annotations from

RepeatMasker● most reads map to unannotated

contigs (or remain unmapped)

Repeat diversity● percentage of nuclear contigs● annotations from RepeatMasker● most contigs are LTRs● transcriptomes represent broader

variation in diverse TEs (becauseof the overall number of contigs)

GSS transcriptome

Sampled taxa possess same diversity of DNA TE families, but at different abundance

Hertweck et al., TEs in Agavoideae

GSS data● percentage of nuclear genome● annotations from RepeatMasker● most taxa have a single family

present in high abundance● may reflect karyotype

Transcriptome data● percentage of contigs ● annotations from RepeatMasker● all families present (active?) in all

taxa● minor variation in family-level

diversity for some taxa● not incongruent with GSS data

Patterns of LTR abundance rely on annotation method

Hertweck et al., TEs in Agavoideae

● Gypsy more abundant inmost genomes, althoughproportions vary

● no relationship with LTRabundance and genomesize

● including CDD annotationscan double LTRabundance in somegenomes

● Proportion of Copia:Gypsyremains same for sometaxa (Schoenolirion), butchanges for others (Hosta)

● LTR diversity (numbers ofcontigs) shows similarpatterns

tetraploid, largest (known) genome in dataset

Hertweck et al., TEs in Agavoideae

Conclusions

● Mine existing genomic resources across Agavoideae to characterizerepetitive elements

● Methods matter; bias is not evenly distributed and patterns difficult todiscern

● Low proportion of GSS data assemble for Agavoideae● large numbers of ancestral (inactive) insertions, related to whole

genome duplication event?● low-level diversity in abundant TEs just different enough from available

libraries to remain undetectable● DNA transposon dominance may differ among clades● Gypsy more abundant in most genomes

Hertweck et al., TEs in Agavoideae

Future work

● Future work:● Improve annotations (build custom repeat libraries) and analyze TE

subtaxonomy ● improve quantification of repeats (P-clouds, RepeatExplorer)● validate results using multiple sequencing attempts/data types

● Big questions:● Is transposon composition in Agaviodeae genomes related to

hypothesized patterns of genomic evolution?● Do transposon proliferation and other genomic traits correlate with life

history traits in Agavoideae?

Acknowledgements

MonAToL

Texas Advanced Computing Center (TACC)

National Evolutionary Synthesis Center (NESCent, Duke U)

Research https://sites.google.com/site/k8hertweck

Blog:k8hert.blogspot.com

Twitter @k8hertGoogle+ [email protected]