plant molecular genetics plant genome chromatine and dna methylation rna interference genome of...
TRANSCRIPT
Plant molecular genetics• Plant genome • Chromatine and DNA methylation• RNA interference • Genome of plastids and mitochondria• Transposible elements• Viruses• Classical genetic mapping • Transgenosis and reverse genetics• Genomics, next generation sequencing• Transcriptomics• Proteomics
Components of plant genome
• nuclear genome = genome sensu stricto
• plastids - plastome
• mitochondria - chondriome
54 Mbp – Cardamine amara 124 852 Mbp - Fritillaria
149 000 Mbp - Paris japonica
- currently the largest(not only plant)
http://data.kew.org/cvalues/
Plant genome sizes
Plant genome sizes 10 Mb Ostreococcus (single cell alga)
54 Mb Cardamine amara
64 Mb Genlisea aurea
125 Mb Arabidopsis
500 Mb Oryza
5 000 Mb Hordeum
17 000 Mb Triticum
84 000 Mb Fritillaria (largest diploid)
143 000 Mb Paris (oktaploid)
- Angiosperms – size differences up to almost 3 000 times
- Gymnosperms – genome sizes often around 10 000 Mb
- Gene number differences much lower (approx. 20 – 200 fold)
Ratio of globe volumes differing 3000 times
Plant genome sizes
What we can deduce?
- Genomes are increasing in evolution - Average increase is higher in Monocots
C-value paradox
- there is no strong correlation between complexity of an organism and the size of its genome
• C-value = size of genome in non-replicated gamete
• genomes of related organisms often strongly differ in sizecauses:
- duplications of whole genomes (polyploidization) or chromosome segments - replication of invasive DNA (transposable elements) - but reductions also possible (recombination – diploid cotton sp.)
genome size (bp) = (0.910 x 109) x DNA content (pg)
DNA content (pg) = genome size (bp) / (0.910 x 109)
1 pg = cca 910 Mbp; MW (1 bp) = cca 660 Da
Sequences in plant genomesUnique sequences – genes, but also non-coding (!)
Repetitive:• Duplications of chromosomal regions• Medium repetitive DNA
– Tandem repeats of rRNA, tRNA a histon genes– Gene families with multiple members– Transposable elements – also high repetitive
• Highly repetitive – low complexity DNA - Tandem arranged simple sequence repeats (SSR)– Centromers (180 bp repeat Arabidopsis) a telomers
(TTTAGGG)n
Types of sequences in plant genomes
• Unique sequences – coding genes, but also non-coding regulatory (!)
• Medium repetitive DNA– Tandem repeats of rRNA, tRNA a histon genes– Gene families with multiple members– Transposable elements – also highly repetitive
• Low complexity DNA (highly repetitive)– Tandem arranged simple sequence repeats (SSR)– Centromers (180 bp repeat Arabidopsis) a telomers
(TTTAGGG)n
- some behave as satelite DNA
Aside – term definition: sequence complexity (~ the amount of information)
repetitiveAAAAAAAAAAAAAAAAAAAAA complexity 1 (21xA)ATCATCATCATCATCATCATC complexity 3 (7xATC)(what is the complexity if it is a coding sequence?)
uniqueATCGTATCGCGATTTTAACGT complexity 21 (1xAT…)
- unique x repetitive – depends on the size of the evaluated frame (= size of analyzed DNA fragments)
Examples of repetitive DNA representationin u Soybean and Silene (clusters of related sequences)
Silene latifolia
Gypsy, copia = retrotransposon families
clDNA = chloroplast DNA(partially contamination, but also recent insertions)
Measuring of genome complexity - reasociation kinetics
• DNA fragmented to 300 - 500 bp, denatured
• Monitoring of reassociation in time - separation (chromatographic) of ss and ds DNA
• Analysis of kinetics (Cot curves) shows representation of various types of repetitive DNA– rare sequences reasociate more slowly that
repetitive
Eucaryotic genomes usually contain three fractions of sequences with different
complexityLow complexity = highly repetitive
Middle repetitive
Unique sequences = High complexity
180 bp A.th. copia A.th.
45S rDNA Crocus tandem repeats dp5a1 wheat
(Heslop-Harrison, Plant Cell 12:617, 2000)
Repetitive sequences can be easily detected in situFISH = fluorescent in situ hybridization (possible even with unique seq.)
Subtelomeric repeats in rye
(Hes
lop-
Har
rison
, P
lant
Cel
l 12:
617,
200
0)
Telomers in rye (TTTAGGG)n
Differences in small and large genome arrangements
large genomes: genes present in „gene-rich islands“ isolated with long regions of repetitive DNA
Reconstruction of gradual cummulation of transposable elements in maize genome
In Panicum in the presented region no transposible elements, in maize 60 % of its size
Plant Genome Sequencinghttp://genomevolution.org/wiki/index.php/Sequenced_plant_genomes
April 13 – less complete in gray
Large Genome Sequencing
- sequencing per partes (separated chromosomes)- sequencing of non-methylated DNA (= transcriptionally active)- sequencing of ESTs
Aside – term definition: Expressed Sequence Tags (ESTs)
- short sequenced regions of cDNA (300-600 nt)- mostly gene segments (primarily from mRNA)
- alternative sourse of coding sequences for large genomes(rapid and inexpensive)
Weak points: - highly redundant, incomplete (!)- problems: various transcript levels
- gene expression regulated spatially and temporally, developmentally, environmentally- regulatory sequences not represented (promotors, introns,...)
Preparation of EST library - mRNA
- RT with oligoT primer cDNA
-cleavage of RNA from heteroduplex RNAseH
- 2nd strand cDNA synthesis
- cleavage with restriction endonuclease
- adaptor ligation cloning
Expressed Sequence Tags (ESTs)
sequencing
Arabidopsis genome: 125 Mbp
genesESTsTEs
High density low density
genesESTsTEs
genesESTsTEs
genesESTsTEs
genesESTsTEs
Feature Value
DNA molecule Chr.1 Chr.2 Chr.3 Chr.4 Chr.5 SUM
Length (bp)Top arm (bp)Bottom arm (bp)
Base composition (%GC) Overall Coding Non-coding
Number of genes Gene density (kb per gene ) Average gene Length (bp) Average peptide Length (bp)
Exons Number Total length (bp) Average per gene Average size (bp)
Number of genesWith ESTs (%) Number of ESTs
29,105,11114,449,21314,655,898
33.444.032.4
6,5434.0
2,078
446
35,4828,772,5595.4247
60.8
30,522
19,646,9453,607,09116,039,854
35.544.032.9
4,0364.9
1,949
421
19,6315,100,2884.9259
56.9
14,989
23,172,61713,590,2689,582,349
35.444.333.0
5,2204.5
1,925
424
26,5706,654,5075.1250
59.8
20,732
17,549,8673,052,10814,497,759
35.544.132.8
3,8254.6
2,138
448
20,0735,150,8835.2256
61.4
16,605
25,53,40911,132,19214,803,217
34.544.132.5
5,8744.4
1,974
429
31,2267,571,0135.3242
61.4
22,885
115,409,949
25,498
132,98233,249,250
105,773
Genome of Arabidopsis - statistics
+ hundreds of MIR genes - role in regulation of gene expression
The majority of plant genes form gene families
• gene families are often in tandem arrangement, but also spead in the genome
• tandem repeats are composed of near, but also far paralogues (recombinations)
• duplications of long chromosomal regions
Number of paraloques
Homologous genesgenes with similar sequences derived from the same ancestral gene(quantification – sequence identity, similarity)
• Paralogous genesgenes with similar sequences derived from the same ancestral gene presentat different loci within the same genome.
• Orthologous genes genes in different species that are similar to each other because they originated from a common ancestral gene in a common ancestor.(if more paralogues are present – genes serving the same function are regarded to be orthologs)
Aside – terms definition:
Orthologues vs. paraloguesOrthologous genes
Paralogous genes = genes duplicated within the species
Species A
Species B
AncestralSpecies
Gene A
Gene A”
Gene A’
Species A
Species B
AncestralSpecies
Gene A
Gene A” Gene A’”
Gene A’
Paralogous genes
Mechanisms of gene duplications
(increase in paralogue number)
• tandem duplication• transpozition• segmental duplications• whole genome duplications
Differences in genes/gene families in genomes
Genes Gene families
Arabidopsis x Populus – large overlap, about 1,5 times more paralogues in poplar
(Arabidopsis + Populus) x Oryza – many genes specific for Monocots
Arabidopsis is ancient tetraploid (as well as probably the majority of plants)
Duplicated chromosomal regions form about 60 % of genome (67.9 Mb)
Polyploidization significantly increases genome (and organism) plasticity and played very important role in plant (genome) evolution;About 30-80% plant species are polyploid
Dating of whole genome duplication according to the number of synonymous mutations per synonymous site - Ks
Phe Leu Met ValKs=3/2,66 UUU CUA AUG GUU
UUC UUG AUG GUU0 0 1/3 1/3 0 1 0 0 0 0 0 1 = number of syn. sites
Ks
Gene number
Fawcet et al. 2013
Comparisons of paralogue pairs
Peaks indicate genome duplications
Polyploidization in plant evolution• 35 % species neopolyploids• most species repeatedly polyploid in evolution• viable aneuploid variants –
(frequetly after allopolyploidization – hexaploid wheat)stabile wheat lines with missing chromosomal arm (of homeologic
chromosome)
Blue dots – duplications, asterix – triplication
K-T
(Fawcett et al. 2013)
Polyploidization- fusion of non-reduced gametes or endoreduplication
x x
Spontaneous duplication (endoreduplication)
autopolyploidy allopolyploidy
n = x = 4 n = x = 4 n = x = 4 n = x = 7
2n = 4x = 16
2n = 4x = 22
Similar frequency in polyploidic plant species
Chromosome doubling is necessary for meiosis in hybrids
species A
species BX
sterile fertileGenome duplication
Preferential pairing of homologous chromosomes
Related from different species (homeologous) can also pair
Allopolyploidic genomes in Brassica genus
Species Caryotype Genome
Brassica rapa
2n = 2x = 20
A
B. nigra 2n = 2x = 16
B
B. oleracea
2n = 2x = 18
C
B. juncea 2n = 4x = 36
AB
B. napus 2n = 4x = 38
AC
B. carinata 2n = 4x = 34
BC
Ancientinterspecies hybrids
Brassica nigra
Brassica rapaBrassica olarecea
Brassica carinata Brassica juncea
Brassica napus
BB
AABB
AAAACC
BBCC
CC
Fade of duplicated genes differ
(gene dosage balance theory) • genes encoding interacting proteins “connected genes“ (signal pathways, complex subunits, …) easily preserve in genome after duplication
- loss or partial duplication of one component results in gene inbalance decreasing fitness, - whole duplicated complex can be specialized for a new
function and increase organism complexity-secondary function probably present already in the ancestral complex (pathway), but only duplication allowed adaptive evolution for both functions without selection constrains - Escape from adaptive conflict - EAC model
• other „single genes“ more easily lost after genome duplication, but can be preserved after individual duplication
- most of duplicated genes is lost after whole genome duplication
- loss is not as even (↑) in both copies - probably frequent epigenetic marks in one copy (methylation) - preferential gene loss and mutagenesis of methylated copy- gene conversion and homogenization can occur (!)
de novo allopolyploids (~ rape seed) – recombinations preferentially in homeologous chromosomes without preference of any parental genome(= homologní, v jednom genomu, ale původem od různých rodičů)
Changes in newly formed allopolyploid genome:
- DNA methylation changes- losses of parts or whole chromosomes (aneuploidy
– decreased fertility)- frequent activation of TE- expression of homeologous genes is not usually additive
- transcriptome usually more reduced than genome- different regulation of expression - often organ specific expression of genes from each parent, new sites of expression, new regulation
- „divergent resolution“ - speciation(different gene loss in individuals - lethality in F2,- absence of essential gene = reproduction barrier
Plants can survive also with haploi genome!- reprogramming of male or female gametophyte development in vitro – no gamete formation, but development resembling embryogenesis
- usually from immature microspores = androgenesis - female gametophyte = gynogenesis
- haploid plants are sterile- through endoreduplication (colchicin or spontaneous) – completely homozygous plants – dihaploids
Androgenesis in rape seed (pollen embryogenesis)
... But genomes are still similar
Colinearity, syntheny
Paterson et al., Plant Cell 12: 1523-1539, 2000
„Syntheny“ is usually missused to describe colinearity
Syntheny = orthologous loci in two species on the same chromosome
A’B’Species A
Species B
AncestralSpecies
C’
A”B”C”
ABC
Colinearity = group of loci in two species on a chromosom in the same order
A’C’Species A
Species B
AncestralSpecies
B’
C”B”A”
ABC