alternative splicing: a playground of evolution mikhail gelfand research and training center for...

51
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2008

Upload: morgan-lee

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for BioinformaticsInstitute for Information Transmission Problems RAS,

Moscow, Russia

October 2008

Page 2: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

% of alternatively spliced human and mouse genes by year of publication

Human (genome / random sample)

Human (individual chromosomes)

Mouse (genome / random sample)

All genes

Only multiexon genes

Genes with high EST coverage

Page 3: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Roles of alternative splicing

• Functional:– creating protein diversity

• ~30.000 genes, >100.000 proteins

– maintaining protein identity• e.g. membrane (receptor) and secreted isoforms• dominant negative isoforms• combinatorial (transcription factors, signaling domains)

– regulatory• E.g. via chanelling to NMD

• Evolutionary

Page 4: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

• Evolution of alternative exon-intron structure – mammals:

• human compared to mouse and dog• mouse and rat compared to human and dog• paralogs

– dipteran insects• Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae• many drosophilas

• Evolutionary rates in constitutive and alternative regions– human and mouse– D. melanogaster and D. pseudoobscura– many drosophilas– human-chimpanzee vs. human SNPs

• Alternative splicing and protein domains• Regulation of AS via conserved RNA structures

Plan

Page 5: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Page 6: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

EDAS: a database of alternative splicing• Sources:

– human and mouse genomes– GenBank– RefSeq

• consider cassette exons and alternative splicing sites• functionality:

potentially translated vs. NMD-inducing elementary alternatives (in-frame stops, length non divisible by 3)

human mousegenes 28957 31811mRNA / cDNA 114624 215212proteins 91844 126797ESTs 4294590 3817531all alternatives 51713 44030elementary alternatives 31746 21329

Page 7: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission
Page 8: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Alternative exon-intron structure in the human, mouse and dog genomes

• Human-mouse-dog triples of orthologous genes

• We follow the fate of human alternative sites and exons in the mouse and dog genomes

• Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation:– conservation of the corresponding region

(homologous exon is actually present in the considered genome);

– conservation of splicing sites (GT and AG)

Page 9: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Caveats

• we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes

• we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog

• of course, functionality assignments (translated / NMD-inducing) are not very reliable

Page 10: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Gains/losses: loss in mouse

Commonancestor

Page 11: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Gains/losses: gain in human (or noise)

Commonancestor

Page 12: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Gains/losses: loss in dog (or possible gain in human+mouse)

Commonancestor

Page 13: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Human-specific alternatives: noise?

Conserved alternatives

Triple comparison

Human-specific alternatives: noise?

Conserved alternatives

Lost in dog

Lost in mouse

Page 14: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Translated and NMD-inducing cassette exons

• Mainly included exons are highly conserved irrespective of function• Mainly skipped translated exons are more conserved than NMD-inducing

ones • Numerous lineage-specific losses

– more in mouse than in dog– more of NMD-inducing than of translated exons

• ~40% of almost always skipped (<1% inclusion) exons are conserved in at least one lineage

Page 15: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise

Page 16: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate

• Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time

• Solution: estimate “FDR” by analysis of conservation of pseudoexons

Page 17: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Alternative donor and acceptor sites: same trends

• Higher conservation of ~uniformly used sites• Internal sites are more conserved than external ones (as expected)

Page 18: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Source of innovation: Model of random site fixation

• Plots: Fraction of exon-extending alternative sites as dependent on exon length– Main site defined as the one in

protein or in more ESTs– Same trends for the acceptor

(top) and donor (bottom) sites

• The distribution of alt. region lengths is consistent with fixation of random sites– Extend short exons– Shorten long exons

Page 19: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Genetic diseases• Mutations in splice sites yield exon skips or activation of

cryptic sites• Exon skip or activation of a cryptic site depends on:

– Density of exonic splicing enhancers (lower in skipped exons)– Presence of a strong cryptic nearby

Av. dist. to a stronger site

Skipped exons

Cryptic site exons

Non-mutated exons

Donor sites 220 75 289

Acceptor sites

185 66 81

Page 20: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

One more source of innovation: site creation

• MAGE-A family of human CT-antigens– Retroposition of a spliced mRNA, then duplication

– Numerous new (alternative) exons in individual copies arising from point mutations

Creation of donor sites

Page 21: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Improvement of an acceptor site

Page 22: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Alternative exon-intron structure in fruit flies and the malarial mosquito

• Same procedure (AS data from FlyBase)

– cassette exons, splicing sites

– also mutually exclusive exons, retained introns

• Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes

• Technically more difficult:

– incomplete genomes

– the quality of alignment with the Anopheles genome is lower

– frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Page 23: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura

97% 75-80%

D. melanogaster – Anopheles gambiae

77% ~45%

Page 24: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved

• retained introns are the least conserved (are all of them really functional?)

• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 25: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved

• ~30% joined, ~10% divided exons (less introns in Aga)

• mutually exclusive exons are conserved exactly

• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 26: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Evolution of (alternative) exon-intron structure in nine Drosophila spp.

Dana

Dmel

Dsec

Dyak

Dere

Dpse

Dmoj

DvirDgri

D. melanogasterD. secheliaD. yakubaD. erectaD. ananassaeD. pseudoobscuraD. mojavensisD. virilisD. grimshawi

D. Pollard, http://rana.lbl.gov/~dan/trees.html

Page 27: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Gain and loss of alternative segments and constitutive exons

Dmel

Dsec

Dyak

DereDana

Dpse

Dmoj

DvirDgri

Caveat:We cannot observe exon gain outside and exon loss within the D.mel. lineage

1 / 719 / 23

20 / 322 / 4

2 / 165 / 13

1 / 167 / 8

Notation:Patterns with single events /Patterns with multiple events

(Dollo parsimony)9 / 217 / 12

Sample size397 / 452

18596 / 18874

5 / 81 / 2

3 / 58 / 21

1 / 59 / 12

6 / 158 / 33

5 / 72 / 3

3 / 1010 / 12

7 / 71 / 1

0 / 20 / 2

2 / 120 / 1

8 / 103 / 5

Page 28: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Evolutionary rate in constitutive and alternative regions

• Human and mouse orthologous genes• D. melanogaster and D. pseudoobscura

• Estimation of the dn/ds ratio: higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection

Page 29: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Human/mouse genes: non-symmetrical histogram of

dn/ds(const)–dn/ds(alt)

1 5

3

5

9 1 0

1 8

4 0

6 7

1 3 6

3 2 9

7 5 2 6 4 2

1 9 9

7 3

2 71 8

7 7

01 0 01

1 0

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 – 0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

G en es

C– A

Black: shadow of the left half.In a larger fraction of genes dn/ds(alt) > dn/ds(const), especially for larger values

Page 30: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Concatenated regions:Alternative regions evolve faster than constitutive ones

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN

dN/dS

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Page 31: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Weaker stabilizing selection (or positive selection) in alternative regions

(insignificant in Drosophila)

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN/dS

dN

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Page 32: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Different behavior of terminal alternatives

П A

AN

AI

AC

1,43

0,790,80

0,90

0,62

A

AN

AI

AC

0,22

0,250,23

0,33

0,25

0,28

0,31

0,37

0,23

0,28

П

AN

AI

A AN

П

0,1680,183 0,186

AI

0,169

AC

0,297

П

0,068

A

0,076

AN

0,076

AI

0,074

AC

0,132

0,405 0,414 0,4100,437П A AN

AI

0,445

AC

dN/dS

dS

dN

1,5

0

Mammals: Density of substitutions increases in the N-to-C direction

Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions,

in internal alternative regions

Page 33: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Many drosophilas:dN in mut. exclusive exons same as in constitutive exonsdS lower in almost all alternatives: regulation?

Page 34: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Many drosophilas: relaxed (positive?) selection in alternative regions

Page 35: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions• Human and chimpanzee genome substitutions vs human SNPs• Exons conserved in mouse and/or dog• Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance

Pn/Ps (SNPs) Kn/Ks (genomes) diff. Signif.

Const. 0.72 0.62 – 0.10 0

Major 0.78 0.65 – 0.13 0.5%

Minor 1.41 1.89 + 0.48 0.1%

Minor isoform alternative regions:• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%• More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37%• Positive selection (as opposed to lower stabilizing selection):

α = 1 – (Pa/Ps) / (Ka/Ks) ~ 25% positions • Similar results for all highly covered genes or all conserved exons

Page 36: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

What does alternative splicingdo to proteins?

• SwissProt proteins• PFAM domains• SwissProt feature tables

Page 37: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

Page 38: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Ra

tio

(ob

serv

ered

/ex

pec

ted

)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domains

outside domains

Page 39: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Page 40: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite

patterns

unaffected

Prosite

patterns

affected

FT

positions

unaffected

FT

positions

affected

Expected Observed

Page 41: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

An attempt of integration

• AS is often species-specific

• young AS isoforms are often minor and tissue-specific

• … but still functional– although species-specific isoforms may result from aberrant splicing

• AS regions show evidence for decreased negative selection– excess non-synonymous codon substitutions

• AS regions show evidence for positive selection – excess fixation of non-synonymous substitutions (compared to SNPs)

• AS tends to shuffle domains and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

Page 42: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

What next?

• AS in one species, constitutive splicing, in another (data from microarrays)

• Changes in inclusion rates

• Evolution of regulation of AS

• Control for:– functionality: translated / NMD-inducing (frameshifts, stop codons)– exon inclusion (or site choice) level: major / minor isoform– tissue specificity pattern (?)– type of alternative – 1: N-terminal / internal / C-terminal– type of alternative – 2: cassette and mutually exclusive exon,

alternative site

Page 43: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Acknowledgements

• Discussions– Eugene Koonin (NCBI)– Igor Rogozin (NCBI) – Vsevolod Makeev (GosNIIGenetika)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)

• Data– King Jordan (NCBI)

• Support– Howard Hughes Medical Institute– INTAS– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Foundation of Basic Research

Page 44: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Authors• Andrei Mironov (Moscow State University)

• Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog

• Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito

• Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks

• Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test

• Evgenia Kriventseva (now at U. of Geneva) and Shamil Sunyaev (now at Harvard U. Medical School)

– protein structure

• Irena Artamonova (Inst. of General Genetics, Moscow) – human/mouse, plots, MAGE-A

• Alexei Neverov (GosNIIGenetika, Moscow) – functionality of isoforms

Page 45: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Bonus track: conserved secondary structures regulating (alternative)

splicing in the Drosophila spp.

• ~ 50 000 introns

• 17% alternative, 2% with alt. polyA signals

• >95% of D.melanogaster introns mapped to at least 7 of 12 other Drosophila genomes

• Search for conserved complementary words at intron termini (within 150 nt. of intron boundaries), then align

• Restrictive search => 200 candidates

• 6 tested in experiment (3 const., 3 alt.). All 3 alt. ones confirmed

Page 46: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

CG33298 (phopspholipid translocating ATPase): alternative donor sites

Page 47: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Atrophin (histone deacetylase): alternative acceptor sites

Page 48: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Nmnat (nicotinamide mononucleotide

adenylytransferase): alternative splicing and polyadenylation

Page 49: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Less restrictive search => many more candidates

Page 50: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Properties of regulated introns

• Often alternative• Longer than usual• Overrepresented in genes linked to

development

Page 51: Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission

Authors

• Andrei Mironov (idea)• Dmitry Pervouchine (bioinformatics)• Veronica Raker, Center for Genome

Regulation, Barcelona (experiment)• Juan Valcarcel, Center for Genome

Regulation, Barcelona (advice)• Mikhail Gelfand (general pessimism)