plant molecular genetics plant genome chromatine and dna methylation rna interference genome of...

51
Plant molecular genetics Plant genome Chromatine and DNA methylation RNA interference Genome of plastids and mitochondria Transposible elements Viruses Classical genetic mapping Transgenosis and reverse genetics Genomics, next generation sequencing Transcriptomics Proteomics

Upload: tracy-owen

Post on 27-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Plant molecular genetics• Plant genome • Chromatine and DNA methylation• RNA interference • Genome of plastids and mitochondria• Transposible elements• Viruses• Classical genetic mapping • Transgenosis and reverse genetics• Genomics, next generation sequencing• Transcriptomics• Proteomics

Components of plant genome

• nuclear genome = genome sensu stricto

• plastids - plastome

• mitochondria - chondriome

54 Mbp – Cardamine amara 124 852 Mbp - Fritillaria

149 000 Mbp - Paris japonica

- currently the largest(not only plant)

http://data.kew.org/cvalues/

Plant genome sizes

Plant genome sizes 10 Mb Ostreococcus (single cell alga)

54 Mb Cardamine amara

64 Mb Genlisea aurea

125 Mb Arabidopsis

500 Mb Oryza

5 000 Mb Hordeum

17 000 Mb Triticum

84 000 Mb Fritillaria (largest diploid)

143 000 Mb Paris (oktaploid)

- Angiosperms – size differences up to almost 3 000 times

- Gymnosperms – genome sizes often around 10 000 Mb

- Gene number differences much lower (approx. 20 – 200 fold)

Ratio of globe volumes differing 3000 times

Plant genome sizes

What we can deduce?

- Genomes are increasing in evolution - Average increase is higher in Monocots

C-value paradox

- there is no strong correlation between complexity of an organism and the size of its genome

• C-value = size of genome in non-replicated gamete

• genomes of related organisms often strongly differ in sizecauses:

- duplications of whole genomes (polyploidization) or chromosome segments - replication of invasive DNA (transposable elements) - but reductions also possible (recombination – diploid cotton sp.)

genome size (bp) = (0.910 x 109) x DNA content (pg)

DNA content (pg) = genome size (bp) / (0.910 x 109)

1 pg = cca 910 Mbp; MW (1 bp) = cca 660 Da

Sequences in plant genomesUnique sequences – genes, but also non-coding (!)

Repetitive:• Duplications of chromosomal regions• Medium repetitive DNA

– Tandem repeats of rRNA, tRNA a histon genes– Gene families with multiple members– Transposable elements – also high repetitive

• Highly repetitive – low complexity DNA - Tandem arranged simple sequence repeats (SSR)– Centromers (180 bp repeat Arabidopsis) a telomers

(TTTAGGG)n

Types of sequences in plant genomes

• Unique sequences – coding genes, but also non-coding regulatory (!)

• Medium repetitive DNA– Tandem repeats of rRNA, tRNA a histon genes– Gene families with multiple members– Transposable elements – also highly repetitive

• Low complexity DNA (highly repetitive)– Tandem arranged simple sequence repeats (SSR)– Centromers (180 bp repeat Arabidopsis) a telomers

(TTTAGGG)n

- some behave as satelite DNA

Aside – term definition: sequence complexity (~ the amount of information)

repetitiveAAAAAAAAAAAAAAAAAAAAA complexity 1 (21xA)ATCATCATCATCATCATCATC complexity 3 (7xATC)(what is the complexity if it is a coding sequence?)

uniqueATCGTATCGCGATTTTAACGT complexity 21 (1xAT…)

- unique x repetitive – depends on the size of the evaluated frame (= size of analyzed DNA fragments)

Sequence complexity of plant genomes

Sequence complexity

Higly repetitiveMedium repetitiveUnique

Examples of repetitive DNA representationin u Soybean and Silene (clusters of related sequences)

Silene latifolia

Gypsy, copia = retrotransposon families

clDNA = chloroplast DNA(partially contamination, but also recent insertions)

Measuring of genome complexity - reasociation kinetics

• DNA fragmented to 300 - 500 bp, denatured

• Monitoring of reassociation in time - separation (chromatographic) of ss and ds DNA

• Analysis of kinetics (Cot curves) shows representation of various types of repetitive DNA– rare sequences reasociate more slowly that

repetitive

Reasociation kineticsdepends on sequence complexity

Eucaryotic genomes usually contain three fractions of sequences with different

complexityLow complexity = highly repetitive

Middle repetitive

Unique sequences = High complexity

UniqueMedium repetitiveHighly repetitive

Reasociation kinetics of small and large genomes

180 bp A.th. copia A.th.

45S rDNA Crocus tandem repeats dp5a1 wheat

(Heslop-Harrison, Plant Cell 12:617, 2000)

Repetitive sequences can be easily detected in situFISH = fluorescent in situ hybridization (possible even with unique seq.)

Subtelomeric repeats in rye

(Hes

lop-

Har

rison

, P

lant

Cel

l 12:

617,

200

0)

Telomers in rye (TTTAGGG)n

Differences in small and large genome arrangements

large genomes: genes present in „gene-rich islands“ isolated with long regions of repetitive DNA

Reconstruction of gradual cummulation of transposable elements in maize genome

In Panicum in the presented region no transposible elements, in maize 60 % of its size

Plant Genome Sequencinghttp://genomevolution.org/wiki/index.php/Sequenced_plant_genomes

April 13 – less complete in gray

Large Genome Sequencing

- sequencing per partes (separated chromosomes)- sequencing of non-methylated DNA (= transcriptionally active)- sequencing of ESTs

Aside – term definition: Expressed Sequence Tags (ESTs)

- short sequenced regions of cDNA (300-600 nt)- mostly gene segments (primarily from mRNA)

- alternative sourse of coding sequences for large genomes(rapid and inexpensive)

Weak points: - highly redundant, incomplete (!)- problems: various transcript levels

- gene expression regulated spatially and temporally, developmentally, environmentally- regulatory sequences not represented (promotors, introns,...)

Preparation of EST library - mRNA

- RT with oligoT primer cDNA

-cleavage of RNA from heteroduplex RNAseH

- 2nd strand cDNA synthesis

- cleavage with restriction endonuclease

- adaptor ligation cloning

Expressed Sequence Tags (ESTs)

sequencing

Aside:

Arabidopsisthaliana

the most important model of plant biology

1 week 3 weeks

4 weeks 6 weeks

Arabidopsis genome: 125 Mbp

genesESTsTEs

High density low density

genesESTsTEs

genesESTsTEs

genesESTsTEs

genesESTsTEs

Total gene number prediction in time(after whole genome sequencing)

Feature Value

DNA molecule Chr.1 Chr.2 Chr.3 Chr.4 Chr.5 SUM

Length (bp)Top arm (bp)Bottom arm (bp)

Base composition (%GC) Overall Coding Non-coding

Number of genes Gene density (kb per gene ) Average gene Length (bp) Average peptide Length (bp)

Exons Number Total length (bp) Average per gene Average size (bp)

Number of genesWith ESTs (%) Number of ESTs

29,105,11114,449,21314,655,898

33.444.032.4

6,5434.0

2,078

446

35,4828,772,5595.4247

60.8

30,522

19,646,9453,607,09116,039,854

35.544.032.9

4,0364.9

1,949

421

19,6315,100,2884.9259

56.9

14,989

23,172,61713,590,2689,582,349

35.444.333.0

5,2204.5

1,925

424

26,5706,654,5075.1250

59.8

20,732

17,549,8673,052,10814,497,759

35.544.132.8

3,8254.6

2,138

448

20,0735,150,8835.2256

61.4

16,605

25,53,40911,132,19214,803,217

34.544.132.5

5,8744.4

1,974

429

31,2267,571,0135.3242

61.4

22,885

115,409,949

25,498

132,98233,249,250

105,773

Genome of Arabidopsis - statistics

+ hundreds of MIR genes - role in regulation of gene expression

Gene function

The majority of plant genes form gene families

• gene families are often in tandem arrangement, but also spead in the genome

• tandem repeats are composed of near, but also far paralogues (recombinations)

• duplications of long chromosomal regions

Number of paraloques

Homologous genesgenes with similar sequences derived from the same ancestral gene(quantification – sequence identity, similarity)

• Paralogous genesgenes with similar sequences derived from the same ancestral gene presentat different loci within the same genome.

• Orthologous genes genes in different species that are similar to each other because they originated from a common ancestral gene in a common ancestor.(if more paralogues are present – genes serving the same function are regarded to be orthologs)

Aside – terms definition:

Orthologues vs. paraloguesOrthologous genes

Paralogous genes = genes duplicated within the species

Species A

Species B

AncestralSpecies

Gene A

Gene A”

Gene A’

Species A

Species B

AncestralSpecies

Gene A

Gene A” Gene A’”

Gene A’

Paralogous genes

Mechanisms of gene duplications

(increase in paralogue number)

• tandem duplication• transpozition• segmental duplications• whole genome duplications

Differences in genes/gene families in genomes

Genes Gene families

Arabidopsis x Populus – large overlap, about 1,5 times more paralogues in poplar

(Arabidopsis + Populus) x Oryza – many genes specific for Monocots

Arabidopsis is ancient tetraploid (as well as probably the majority of plants)

Duplicated chromosomal regions form about 60 % of genome (67.9 Mb)

Polyploidization significantly increases genome (and organism) plasticity and played very important role in plant (genome) evolution;About 30-80% plant species are polyploid

Polyploidization in Angiosperm evolution

Fawcett et al. 2009

Dating of whole genome duplication according to the number of synonymous mutations per synonymous site - Ks

Phe Leu Met ValKs=3/2,66 UUU CUA AUG GUU

UUC UUG AUG GUU0 0 1/3 1/3 0 1 0 0 0 0 0 1 = number of syn. sites

Ks

Gene number

Fawcet et al. 2013

Comparisons of paralogue pairs

Peaks indicate genome duplications

Polyploidization in plant evolution• 35 % species neopolyploids• most species repeatedly polyploid in evolution• viable aneuploid variants –

(frequetly after allopolyploidization – hexaploid wheat)stabile wheat lines with missing chromosomal arm (of homeologic

chromosome)

Blue dots – duplications, asterix – triplication

K-T

(Fawcett et al. 2013)

Polyploidization- fusion of non-reduced gametes or endoreduplication

x x

Spontaneous duplication (endoreduplication)

autopolyploidy allopolyploidy

n = x = 4 n = x = 4 n = x = 4 n = x = 7

2n = 4x = 16

2n = 4x = 22

Similar frequency in polyploidic plant species

Chromosome doubling is necessary for meiosis in hybrids

species A

species BX

sterile fertileGenome duplication

Preferential pairing of homologous chromosomes

Related from different species (homeologous) can also pair

Allopolyploidic genomes in Brassica genus

Species Caryotype Genome

Brassica rapa

2n = 2x = 20

A

B. nigra 2n = 2x = 16

B

B. oleracea

2n = 2x = 18

C

B. juncea 2n = 4x = 36

AB

B. napus 2n = 4x = 38

AC

B. carinata 2n = 4x = 34

BC

Ancientinterspecies hybrids

Brassica nigra

Brassica rapaBrassica olarecea

Brassica carinata Brassica juncea

Brassica napus

BB

AABB

AAAACC

BBCC

CC

Allopolyploid tobacco species – DNA size changes

Fade of duplicated genes differ

(gene dosage balance theory) • genes encoding interacting proteins “connected genes“ (signal pathways, complex subunits, …) easily preserve in genome after duplication

- loss or partial duplication of one component results in gene inbalance decreasing fitness, - whole duplicated complex can be specialized for a new

function and increase organism complexity-secondary function probably present already in the ancestral complex (pathway), but only duplication allowed adaptive evolution for both functions without selection constrains - Escape from adaptive conflict - EAC model

• other „single genes“ more easily lost after genome duplication, but can be preserved after individual duplication

- most of duplicated genes is lost after whole genome duplication

- loss is not as even (↑) in both copies - probably frequent epigenetic marks in one copy (methylation) - preferential gene loss and mutagenesis of methylated copy- gene conversion and homogenization can occur (!)

de novo allopolyploids (~ rape seed) – recombinations preferentially in homeologous chromosomes without preference of any parental genome(= homologní, v jednom genomu, ale původem od různých rodičů)

Changes in newly formed allopolyploid genome:

- DNA methylation changes- losses of parts or whole chromosomes (aneuploidy

– decreased fertility)- frequent activation of TE- expression of homeologous genes is not usually additive

- transcriptome usually more reduced than genome- different regulation of expression - often organ specific expression of genes from each parent, new sites of expression, new regulation

- „divergent resolution“ - speciation(different gene loss in individuals - lethality in F2,- absence of essential gene = reproduction barrier

Plants can survive also with haploi genome!- reprogramming of male or female gametophyte development in vitro – no gamete formation, but development resembling embryogenesis

- usually from immature microspores = androgenesis - female gametophyte = gynogenesis

- haploid plants are sterile- through endoreduplication (colchicin or spontaneous) – completely homozygous plants – dihaploids

Androgenesis in rape seed (pollen embryogenesis)

... But genomes are still similar

Colinearity, syntheny

Paterson et al., Plant Cell 12: 1523-1539, 2000

„Syntheny“ is usually missused to describe colinearity

Syntheny = orthologous loci in two species on the same chromosome

A’B’Species A

Species B

AncestralSpecies

C’

A”B”C”

ABC

Colinearity = group of loci in two species on a chromosom in the same order

A’C’Species A

Species B

AncestralSpecies

B’

C”B”A”

ABC

Changes in colinearity caused by chromosomal arm inversion

Colinearity of Poaceae genomes

Colinear regions differ mainly in repetitive DNA

Summary:

• Current plant genomes result from repeated cycles of partial and complete duplications, followed by reduction and modification of duplicated sequences.

• There are no genomes without redundancy.

• Plant genomes are still very dynamic.• High portion of genome consists of

repetitive DNA