alternative splicing: a playground of evolution

51
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2008

Upload: denzel

Post on 12-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2008. % of alternatively splic ed human and mouse genes by year of publication. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alternative splicing:  A playground of evolution

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for BioinformaticsInstitute for Information Transmission Problems RAS,

Moscow, Russia

October 2008

Page 2: Alternative splicing:  A playground of evolution

% of alternatively spliced human and mouse genes by year of publication

Human (genome / random sample)

Human (individual chromosomes)

Mouse (genome / random sample)

All genes

Only multiexon genes

Genes with high EST coverage

Page 3: Alternative splicing:  A playground of evolution

Roles of alternative splicing

• Functional:– creating protein diversity

• ~30.000 genes, >100.000 proteins

– maintaining protein identity• e.g. membrane (receptor) and secreted isoforms• dominant negative isoforms• combinatorial (transcription factors, signaling domains)

– regulatory• E.g. via chanelling to NMD

• Evolutionary

Page 4: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – mammals:

• human compared to mouse and dog• mouse and rat compared to human and dog• paralogs

– dipteran insects• Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae• many drosophilas

• Evolutionary rates in constitutive and alternative regions– human and mouse– D. melanogaster and D. pseudoobscura– many drosophilas– human-chimpanzee vs. human SNPs

• Alternative splicing and protein domains• Regulation of AS via conserved RNA structures

Plan

Page 5: Alternative splicing:  A playground of evolution

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Page 6: Alternative splicing:  A playground of evolution

EDAS: a database of alternative splicing• Sources:

– human and mouse genomes– GenBank– RefSeq

• consider cassette exons and alternative splicing sites• functionality:

potentially translated vs. NMD-inducing elementary alternatives (in-frame stops, length non divisible by 3)

human mousegenes 28957 31811mRNA / cDNA 114624 215212proteins 91844 126797ESTs 4294590 3817531all alternatives 51713 44030elementary alternatives 31746 21329

Page 7: Alternative splicing:  A playground of evolution
Page 8: Alternative splicing:  A playground of evolution

Alternative exon-intron structure in the human, mouse and dog genomes

• Human-mouse-dog triples of orthologous genes

• We follow the fate of human alternative sites and exons in the mouse and dog genomes

• Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation:– conservation of the corresponding region

(homologous exon is actually present in the considered genome);

– conservation of splicing sites (GT and AG)

Page 9: Alternative splicing:  A playground of evolution

Caveats

• we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes

• we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog

• of course, functionality assignments (translated / NMD-inducing) are not very reliable

Page 10: Alternative splicing:  A playground of evolution

Gains/losses: loss in mouse

Commonancestor

Page 11: Alternative splicing:  A playground of evolution

Gains/losses: gain in human (or noise)

Commonancestor

Page 12: Alternative splicing:  A playground of evolution

Gains/losses: loss in dog (or possible gain in human+mouse)

Commonancestor

Page 13: Alternative splicing:  A playground of evolution

Human-specific alternatives: noise?

Conserved alternatives

Triple comparison

Human-specific alternatives: noise?

Conserved alternatives

Lost in dog

Lost in mouse

Page 14: Alternative splicing:  A playground of evolution

Translated and NMD-inducing cassette exons

• Mainly included exons are highly conserved irrespective of function• Mainly skipped translated exons are more conserved than NMD-inducing

ones • Numerous lineage-specific losses

– more in mouse than in dog– more of NMD-inducing than of translated exons

• ~40% of almost always skipped (<1% inclusion) exons are conserved in at least one lineage

Page 15: Alternative splicing:  A playground of evolution

Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise

Page 16: Alternative splicing:  A playground of evolution

The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate

• Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time

• Solution: estimate “FDR” by analysis of conservation of pseudoexons

Page 17: Alternative splicing:  A playground of evolution

Alternative donor and acceptor sites: same trends

• Higher conservation of ~uniformly used sites• Internal sites are more conserved than external ones (as expected)

Page 18: Alternative splicing:  A playground of evolution

Source of innovation: Model of random site fixation

• Plots: Fraction of exon-extending alternative sites as dependent on exon length– Main site defined as the one in

protein or in more ESTs– Same trends for the acceptor

(top) and donor (bottom) sites

• The distribution of alt. region lengths is consistent with fixation of random sites– Extend short exons– Shorten long exons

Page 19: Alternative splicing:  A playground of evolution

Genetic diseases• Mutations in splice sites yield exon skips or activation of

cryptic sites• Exon skip or activation of a cryptic site depends on:

– Density of exonic splicing enhancers (lower in skipped exons)– Presence of a strong cryptic nearby

Av. dist. to a stronger site

Skipped exons

Cryptic site exons

Non-mutated exons

Donor sites 220 75 289

Acceptor sites

185 66 81

Page 20: Alternative splicing:  A playground of evolution

One more source of innovation: site creation

• MAGE-A family of human CT-antigens– Retroposition of a spliced mRNA, then duplication

– Numerous new (alternative) exons in individual copies arising from point mutations

Creation of donor sites

Page 21: Alternative splicing:  A playground of evolution

Improvement of an acceptor site

Page 22: Alternative splicing:  A playground of evolution

Alternative exon-intron structure in fruit flies and the malarial mosquito

• Same procedure (AS data from FlyBase)

– cassette exons, splicing sites

– also mutually exclusive exons, retained introns

• Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes

• Technically more difficult:

– incomplete genomes

– the quality of alignment with the Anopheles genome is lower

– frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Page 23: Alternative splicing:  A playground of evolution

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura

97% 75-80%

D. melanogaster – Anopheles gambiae

77% ~45%

Page 24: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved

• retained introns are the least conserved (are all of them really functional?)

• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 25: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved

• ~30% joined, ~10% divided exons (less introns in Aga)

• mutually exclusive exons are conserved exactly

• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 26: Alternative splicing:  A playground of evolution

Evolution of (alternative) exon-intron structure in nine Drosophila spp.

Dana

Dmel

Dsec

Dyak

Dere

Dpse

Dmoj

DvirDgri

D. melanogasterD. secheliaD. yakubaD. erectaD. ananassaeD. pseudoobscuraD. mojavensisD. virilisD. grimshawi

D. Pollard, http://rana.lbl.gov/~dan/trees.html

Page 27: Alternative splicing:  A playground of evolution

Gain and loss of alternative segments and constitutive exons

Dmel

Dsec

Dyak

DereDana

Dpse

Dmoj

DvirDgri

Caveat:We cannot observe exon gain outside and exon loss within the D.mel. lineage

1 / 719 / 23

20 / 322 / 4

2 / 165 / 13

1 / 167 / 8

Notation:Patterns with single events /Patterns with multiple events

(Dollo parsimony)9 / 217 / 12

Sample size397 / 452

18596 / 18874

5 / 81 / 2

3 / 58 / 21

1 / 59 / 12

6 / 158 / 33

5 / 72 / 3

3 / 1010 / 12

7 / 71 / 1

0 / 20 / 2

2 / 120 / 1

8 / 103 / 5

Page 28: Alternative splicing:  A playground of evolution

Evolutionary rate in constitutive and alternative regions

• Human and mouse orthologous genes• D. melanogaster and D. pseudoobscura

• Estimation of the dn/ds ratio: higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection

Page 29: Alternative splicing:  A playground of evolution

Human/mouse genes: non-symmetrical histogram of

dn/ds(const)–dn/ds(alt)

1 5

3

5

9 1 0

1 8

4 0

6 7

1 3 6

3 2 9

7 5 2 6 4 2

1 9 9

7 3

2 71 8

7 7

01 0 01

1 0

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 – 0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

G en es

C– A

Black: shadow of the left half.In a larger fraction of genes dn/ds(alt) > dn/ds(const), especially for larger values

Page 30: Alternative splicing:  A playground of evolution

Concatenated regions:Alternative regions evolve faster than constitutive ones

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN

dN/dS

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Page 31: Alternative splicing:  A playground of evolution

Weaker stabilizing selection (or positive selection) in alternative regions

(insignificant in Drosophila)

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN/dS

dN

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Page 32: Alternative splicing:  A playground of evolution

Different behavior of terminal alternatives

П A

AN

AI

AC

1,43

0,790,80

0,90

0,62

A

AN

AI

AC

0,22

0,250,23

0,33

0,25

0,28

0,31

0,37

0,23

0,28

П

AN

AI

A AN

П

0,1680,183 0,186

AI

0,169

AC

0,297

П

0,068

A

0,076

AN

0,076

AI

0,074

AC

0,132

0,405 0,414 0,4100,437П A AN

AI

0,445

AC

dN/dS

dS

dN

1,5

0

Mammals: Density of substitutions increases in the N-to-C direction

Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions,

in internal alternative regions

Page 33: Alternative splicing:  A playground of evolution

Many drosophilas:dN in mut. exclusive exons same as in constitutive exonsdS lower in almost all alternatives: regulation?

Page 34: Alternative splicing:  A playground of evolution

Many drosophilas: relaxed (positive?) selection in alternative regions

Page 35: Alternative splicing:  A playground of evolution

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions• Human and chimpanzee genome substitutions vs human SNPs• Exons conserved in mouse and/or dog• Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance

Pn/Ps (SNPs) Kn/Ks (genomes) diff. Signif.

Const. 0.72 0.62 – 0.10 0

Major 0.78 0.65 – 0.13 0.5%

Minor 1.41 1.89 + 0.48 0.1%

Minor isoform alternative regions:• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%• More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37%• Positive selection (as opposed to lower stabilizing selection):

α = 1 – (Pa/Ps) / (Ka/Ks) ~ 25% positions • Similar results for all highly covered genes or all conserved exons

Page 36: Alternative splicing:  A playground of evolution

What does alternative splicingdo to proteins?

• SwissProt proteins• PFAM domains• SwissProt feature tables

Page 37: Alternative splicing:  A playground of evolution

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

Page 38: Alternative splicing:  A playground of evolution

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Ra

tio

(ob

serv

ered

/ex

pec

ted

)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domains

outside domains

Page 39: Alternative splicing:  A playground of evolution

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Page 40: Alternative splicing:  A playground of evolution

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite

patterns

unaffected

Prosite

patterns

affected

FT

positions

unaffected

FT

positions

affected

Expected Observed

Page 41: Alternative splicing:  A playground of evolution

An attempt of integration

• AS is often species-specific

• young AS isoforms are often minor and tissue-specific

• … but still functional– although species-specific isoforms may result from aberrant splicing

• AS regions show evidence for decreased negative selection– excess non-synonymous codon substitutions

• AS regions show evidence for positive selection – excess fixation of non-synonymous substitutions (compared to SNPs)

• AS tends to shuffle domains and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

Page 42: Alternative splicing:  A playground of evolution

What next?

• AS in one species, constitutive splicing, in another (data from microarrays)

• Changes in inclusion rates

• Evolution of regulation of AS

• Control for:– functionality: translated / NMD-inducing (frameshifts, stop codons)– exon inclusion (or site choice) level: major / minor isoform– tissue specificity pattern (?)– type of alternative – 1: N-terminal / internal / C-terminal– type of alternative – 2: cassette and mutually exclusive exon,

alternative site

Page 43: Alternative splicing:  A playground of evolution

Acknowledgements

• Discussions– Eugene Koonin (NCBI)– Igor Rogozin (NCBI) – Vsevolod Makeev (GosNIIGenetika)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)

• Data– King Jordan (NCBI)

• Support– Howard Hughes Medical Institute– INTAS– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Foundation of Basic Research

Page 44: Alternative splicing:  A playground of evolution

Authors• Andrei Mironov (Moscow State University)

• Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog

• Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito

• Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks

• Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test

• Evgenia Kriventseva (now at U. of Geneva) and Shamil Sunyaev (now at Harvard U. Medical School)

– protein structure

• Irena Artamonova (Inst. of General Genetics, Moscow) – human/mouse, plots, MAGE-A

• Alexei Neverov (GosNIIGenetika, Moscow) – functionality of isoforms

Page 45: Alternative splicing:  A playground of evolution

Bonus track: conserved secondary structures regulating (alternative)

splicing in the Drosophila spp.

• ~ 50 000 introns

• 17% alternative, 2% with alt. polyA signals

• >95% of D.melanogaster introns mapped to at least 7 of 12 other Drosophila genomes

• Search for conserved complementary words at intron termini (within 150 nt. of intron boundaries), then align

• Restrictive search => 200 candidates

• 6 tested in experiment (3 const., 3 alt.). All 3 alt. ones confirmed

Page 46: Alternative splicing:  A playground of evolution

CG33298 (phopspholipid translocating ATPase): alternative donor sites

Page 47: Alternative splicing:  A playground of evolution

Atrophin (histone deacetylase): alternative acceptor sites

Page 48: Alternative splicing:  A playground of evolution

Nmnat (nicotinamide mononucleotide

adenylytransferase): alternative splicing and polyadenylation

Page 49: Alternative splicing:  A playground of evolution

Less restrictive search => many more candidates

Page 50: Alternative splicing:  A playground of evolution

Properties of regulated introns

• Often alternative• Longer than usual• Overrepresented in genes linked to

development

Page 51: Alternative splicing:  A playground of evolution

Authors

• Andrei Mironov (idea)• Dmitry Pervouchine (bioinformatics)• Veronica Raker, Center for Genome

Regulation, Barcelona (experiment)• Juan Valcarcel, Center for Genome

Regulation, Barcelona (advice)• Mikhail Gelfand (general pessimism)