alternative splicing: a playground of evolution mikhail gelfand institute for information...

Post on 05-Jan-2016

225 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Alternative splicing: A playground of evolution

Mikhail Gelfand

Institute for Information Transmission Problems, RAS

May 2004

Alternative splicing of human(and mouse) genes

5% Sharp, 1994 (Nobel lecture)

35% Mironov-Fickett-Gelfand, 1999

38% Brett-…-Bork, 2000 (ESTs/mRNA)

22% Croft et al., 2000 (ISIS database)

55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)

42% Modrek et al., 2001 (HASDB)

~33% CELERA, 2001

59% Human Genome Consortium, 2001

28% Clark and Thanaraj, 2002

all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)

41% (mouse) FANTOM & RIKEN, 2002

60% (mouse) Zavolan et al., 2003

• Alternative splicing of orthologous human and mouse genes

• Sequence divergence in alternative and constitutive regions

• Evolution of splicing sites • Alternative splicing and protein structure

Data

• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)

• additional variants– UniGene (human and mouse EST clusters)

• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome

Methods

• Direct comparison of EST-derived alternatives difficult because of uneven coverage.

• Instead, align alternative isoforms from one species to the genomic DNA of other species.

• If alignable (complete exon or part of exon, no significant loss of similarity, no in-frame stops, conserve splicing sites), then conserved.

• This is an upper estimate on conservation: an isoform may be non-functional for other reasons (e.g. disruption of regulatory sites).

• Cannot analyze skipped exons.

Tools

• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)

• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA

against genomic DNA)• Pro-Frame (spliced alignment, proteins against

genomic DNA)– confirmation of orthology

• same exon-intron structure• >70% identity over the entire protein length

– analysis of conservation of alternative splicing• conservation of exons or parts of exons• conservation of sites

166 gene pairs

42 84 40

human mouse

Known alternative splicing:

126 124

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Human genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 56 25 74 26

Alt. donors 18 7 16 10

Alt. acceptors 13 5 19 15

Retained introns 4 3 5 0

Total 96 30 114 51

Total genes 45 28 41 44

Conserved elementary alternatives: 69% (EST) - 76% (mRNA)

Genes with all isoforms conserved: 57 (45%)

Mouse genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 70 5 39 9

Alt. donors 24 6 17 6

Alt. acceptors 15 6 16 9

Retained introns 8 7 10 4

Total 117 24 82 28

Total genes 68 22 30 26

Conserved elementary alternatives: 75% (EST) - 83% (mRNA)

Genes with all isoforms conserved: 79 (64%)

Real or aberrant non-conserved AS?

• 24-31% human vs. 17-25% mouse elementary alternatives are not conserved

• 55% human vs 36% mouse genes have at least one non-conserved variant

• denser coverage of human genes by ESTs: – pick up rare (tissue- and stage-specific) => younger

variants– pick up aberrant (non-functional) variants

• 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST-derived ones)

smoothelin

human

common

mouse

human-specific donor-site

mouse-specific cassette exon

autoimmune regulator

human

common

mouse

retained intron; downstream exons read in two frames

Na/K-ATPase gamma subunit (Fxyd2)

human

mouse

(deleted) intron

com

mon

alternative acceptor site within (inserted) intron

MutS homolog (DNA mismatch repair)

human

common

dual donor/acceptor site

Modrek and Lee, 2003:• conserved skipped exons:

– 98% constitutive– 98% major form– 28% minor form

• inclusion level:– highly correlated – good predictor of conservation

• Minor non-conserved form exons are not aberrant:– minor form exons are supported by multiple ESTs– 28% of minor form exons are upregulated in one specific

tissue– 70% of tissue-specific exons are not conserved

Thanaraj et al., 2003:

61% (47-86%) alternative splice junctions are conserved

• Alternative splicing of orthologous human and mouse genes

• Sequence divergence in alternative and constitutive regions

• Evolution of splicing sites • Alternative splicing and protein structure

Our preliminary observations: less synonymous, more non-synonymous divergence in alternative exons (human/mouse) => positive selection towards variability

“Contrary to our prediction, synonymous divergence between humans and non-human mammals was significantly higher in constitutive exons … Intriguingly, non-synonymous divergence was marginally significantly higher in alternative exons”

Iida and Akashi, 2000

279 proteins from SwissProt+TREMBL with “varsplic” features

constitutive alternative % alt. to all

length 199270 66054 25%

all SNPs 1126 368 25%

synonymous 576 (51%) 167 (45%) 22%

benign 401 (36%) 141 (38%) 26%

damaging 149 (13%) 60 (16%) 29%

again, there is some evidence of positive selection towards diversity. This is not due to aberrant ESTs

(only protein data are considered).

• Alternative splicing of orthologous human and mouse genes

• Sequence divergence in alternative and constitutive regions

• Evolution of splicing sites • Alternative splicing and protein structure

Alternative splicing in a multigene family: the MAGEA family of

cancer/testis specific antigens

• A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes

• One protein-coding exon, multiple different 5’-UTR exons

• Originates from retroposed spliced mRNA• Mutations create new splicing sites or disrupt

existing sites

Phylogenetic trees (protein-coding and upstream regions)

Expression data

• pooled by organ/tissue; maximum recorded expression level retained• no data for MAGEA10; MAGEA3 and MAGEA6 likely non-distinguishable• green: normal; brown: cancer

TISSUE(ORGAN) \ MAGEA 8 9 10 11 4 1 5 3 6 2 12testis 0 1,5 3 3 2,5 3 3 3 3 22chronic myelogenous leukemia (K562) 2 3 3 3 3 3 3 20thymus (THY) 1,5 1,5 1,5 1,5 1,5 2,5 2,5 1,5 2,5 16,5placenta 2 1,5 2 2 0 2,5 2,5 1,5 14ovary 2 1,5 1,5 1,5 2 2 1,5 12pancreas 1,5 1,5 1,5 1,5 1,5 2 9,5brain (fetal, cortex, amygdala, etc.) 1,5 1,5 0 2 2 1,5 8,5umbilical vein endothelium (HUVEC) 1,5 1,5 2,5 5,510N 2,5 2,5 5uterus 1,5 1,5 1,5 4,5Burkitt's lymphoma (DAUDI) 1,5 2,5 4prostate cancer? (PC4, PC6, PC8) 2 2 4acute lymphoblastic leukemia (MOLT4) 2 2 4salivary gland 1,5 1,5 3lung 1,5 1,5 3heart 2 2spleen 2 2

Simple genes with alternatives in exon 1 (MAGEA1, MAGEA5, MAGEA3/6)

1

1b

MAGEA1

1

MAGEA5 (normal placenta)

1

MAGEA3

1a

1

1

MAGEA6 (testis, brain/medulla, cancer)

1a

Two more genes of subfamily B: multiple isoforms of MAGEA2 and a deletion in MAGEA12

MAGEA2

1

1

1

1

1

1

1

2a

4d

4d

4d

4d

4d

5

5

56

1-0

MAGEA12

1-046

6-5

Isoforms of subfamily A

1

2-1

1

1

1

1

1

1

1

1

3

2

2

2d

2

4a

4a

4c

4b

MAGEA8

MAGEA9 (testis, no cancers)

MAGEA10

MAGEA11

Multiple duplications of the initial exon in MAGEA4

1

1

1

1

1

1

1

1

1

MAGEA4 (testis and cancers; brain/medulla; also common 3’ ESTs in placenta)

Chimaeric mRNAs (splicing of readthrough transcripts)

1

initial exon of MAGEA10 exons of MAGEA5exon in intergenic space

initial exon of MAGEA12 exons of BC013171exon in intergenic space

Other examples:• galactose-1-phosphate uridylyltransferase + interleukin-11

receptor alpha chain (Magrangeas et al., 1998)• P2Y11 [receptor] + SSF1 [nuclear protein] (Communi et al.,

2001)• PrP [Prion protein] + Dpl [prion-like protein Doppel] (Moore et

al., 1999)• cytochrome P450 3A: CYP3A7 + two exons of a downstream

pseudogene read in a different frame (Finta & Zaphiropoulos, 2000)

• HHLA1 + OC90 [otoconin-90] (Kowalski et al., 1999)• TRAX [translin-associated factor X] + DISC1 [candidate

schizophirenia gene] (Millar et al., 2000)• Kua + UEV1 [polyubiquination coeffector] (Thomson et al.,

2000)• FR + GAP [Rho GTPase activating protein] (Romani et al.,

2003) - ?• methyonyl tRNA synthetase + advillin (Romani et al., 2003) - ?

Birth of donor sites (new GT in alternative intial exon 5)

Ancestral gene: GCCAGGCACGCGGATCCTGACGTTCACATCTAGGGCTMAGEA3 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA6 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA2 GCCAAGCACGCGGATCCTGACGTTCACATGTACGGCTMAGEA12 GCCAAGCACGCGGATCCTGACGTTCACATCTGTGGCTMAGEA1 GCCAGGCACTCGGATCTTGACGTCCCCATCCAGGGCTMAGEA4 --CAGGCACTCGGATCTTGACATCCACATCGAGGGCTMAGEA5 GACAGGCACACCCATTCTGACGTCCACATCCAGGGCT

Birth of an acceptor site (new AG and polyY tract in

MAGEA8-specific cassette exon 3)

MAGEA3 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA6 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA2 TTGAGGGTACT-----------CCTGGGC---CAGAATGCAGAMAGEA12 TTGAGGGTACC-----------CCTGGGC---CAGAACGCTGAMAGEA1 CTGAGGGTACC-----------CCAGGAC---CAGAACACTGAMAGEA4 TTGAGGGTACC-----------ACAGGGC---CAGAACGCAGAMAGEA5 TTGAGGGCACC-----------CTTGGGC---CAGAACACAGAMAGEA8 TTGAGGGTACCCTCGATGGTTCTCCTAGCAGGCAAAAAACAGAMAGEA9 TCGAGGGTACC-----------TCCAGGC---CAGAGAAACTCMAGEA10 CTGAGGGTACC-----------CCCAGCC---CATAACACAGAMAGEA11 TTGAGGGTTCC-----------TCCTGGC---CAGAACACAGA

Birth of an alternative donor site (enhanced match to the consensus (AG)

in cassette exon 2)

Ancestral gene: GAGCTCCAGGAACmAGGCAGTGAGGCCTTGGTCTGMAGEA3 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA6 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA2 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA12 GAGTTCCAAGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA1 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA4 GAGCTCCAGGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA5 GAGCTCCAGGAAACAGACACTGAGGCCTTGGTCTGMAGEA8 GAGCTCCAGGAACCAGGCTGTGAGGTCTTGGTCTGMAGEA9 GAGCTCCAGGAA----GCAGGCAGGCCTTGGTCTGMAGEA10 GAGCTCCAGGGACTGTGAGGTGAGGCCTTGGTCTAMAGEA11 AAGCTCCAAAAACTGAGCAGTGAGGCCTTGGTCTC

Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)

Ancestral gene: AGGGGCCCCCATGTGGTCGACAGACACAGTGGMAGEA3 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA6 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA2 AGGGGCCCCCATCTGGTCGACAGATGCAGTGGMAGEA12 AGGGGCCCCCATGTAGTCGACAGACACAGTGGMAGEA1 AGGGACCCCCATCTGGTCTAAAGACAGAGCGGMAGEA4 AGGGACCCCCATCTGGTCTACAGACACAGTGGMAGEA5 AGGGGCCCCCATCTGGTGGATAGACAGAGTGGMAGEA8 AGGGACCCCCATGTGGGCAACAGACTCAGTGGMAGEA9 AGGGAGGCCC-TGTGTTCGACAGACACAGTGGMAGEA10 AGGGAACCCC-TCTTTTCTACAGACACAGTGGMAGEA11 AAAGAGCCCCATATGGTCCACAACTACAGTGG

Disactivation of a donor site and birth of a new site

(non-consensus G and new GT in major-isoform cassette exon 4)

Ancestral gene: GCCAAGmGTCCAGGTGAGGAACCGGAGGGAGGATTGAGGGTACCMAGEA3 GCCAAGCATCCAGGTGAAGAGACTGAGGGAGGATTGAGGGTACCMAGEA6 GCCAAGCATCCAGGTGAAGAGACTGAGGGAGGATTGAGGGTACCMAGEA2 GCCAAGCATCCAGGTGGAGAGCCTGAGGTAGGATTGAGGGTACTMAGEA12 ACCAAGCATCCAGGTGAGAAGCCTGAGGTAGGATTGAGGGTACCMAGEA1 GCCATGCGTTCGGGTGAGGAACATGAGGGAGGACTGAGGGTACCMAGEA4 GCCAAGAGTCCTGGTGAGGAATGTGAGGGAGGATTGAGGGTACCMAGEA5 GTCAGTAGTTCCGGTGAGGAACATGAGGGACGATTGAGGGCACCMAGEA8 ACCAAGAGTCTAGGTGACAACACTGAGGGAAGATTGAGGGTACCMAGEA9 GAGAGCAGTCCAGGTGAGGAACCTAAGGGAGGATCGAGGGTACCMAGEA10 GACAAGAGTCCAGGTAAGGAACCTGAGGGAAATCTGAGGGTACCMAGEA11 GCCAAGAGTCCAGGTGAGAAACCTGAGGGAGGATTGAGGGTTCC

Series of mutations sequentially activating downstream acceptor sites

(mutated AG in exon 4)

Ancestral gene: CCTCCTCACTTCTGTTTCCAGATCTCAGGGAGGTGAGGMAGEA2 CCTCCTCACTTCTGTTTCCAGATCTCAGGGAGTTGATGMAGEA12 CCTCCTCACTTCTGTTTCCAGATCTCAGGGAGTTGAGGMAGEA1 TCTTTTCACTCCTGTTTCCAGATCTGGGGCAGGTGAGGMAGEA4 CCTTCTCATTTCTGATTCCAGATCTCAGTGAGGTGAGGMAGEA5 CCATCTCATTCCTGTTTTCAGATCTCGGGGAGGTGAGGMAGEA8 GCTCCTCATTTCTCTCTTGAGATCTCAGGGAAGTGAGGMAGEA9 CCTCCTCACCTCTGTTTCTGGATCTCAGGGAGGTGAGGMAGEA10 CCTTCTTACTTTTGTTTTGGAATCTCAGGGAGGTGAGAMAGEA11 CCC-CTTACTTCTGTTTTGGAATCTTGGGCAGGTGAGC

• Alternative splicing of orthologous human and mouse genes

• Sequence divergence in alternative and constitutive regions

• Evolution of splicing sites

• Alternative splicing and protein structure

Data

• Alternatively spliced genes (proteins) from SwissProt– human– mouse

• Protein structures from PDB• Domains from InterPro

– SMART– Pfam– Prosite– etc.

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Ra

tio

(ob

serv

ered

/ex

pec

ted

)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domains

outside domains

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite

patterns

unaffected

Prosite

patterns

affected

FT

positions

unaffected

FT

positions

affected

Expected Observed

An attempt of integration

• AS is often young (as opposed to degenerating)

• young AS isoforms are often minor and tissue-specific

• … but still functional– although unique isoforms may be result of aberrant

splicing

• AS regions show evidence for positive selection – excess damaging SNPs– excess non-synonymous codon substitutions

• MAGEA - not aberrant, because explainable by effects of mutations

What to do

• Each isoform (alternative region) can be characterized:– by conservation (between genomes)– if conserved, by selection (positive vs negative)

• human-mouse, also add rat

– pattern of SNPs (synonymous, benign, damaging)– tissue-specificity

• in particular, whether it is cancer-specific

– degree of inclusion (major/minor)– functionality (for isoforms)

• whether it generates a frameshift• how bad it is (the distance between the stop-codon and

the last exon-exon junction)

What to expect (hypotheses)

• Cancer-specific isoforms will be less functional and more often non-conserved

• Non-conserved isoforms will contain a larger fraction of non-functional isoforms; and this may influence evolutionary conclusions

• Still, after removal of non-functional isoforms, one should see positive selection in alternative regions (more non-synonymous substitutions compared to constant regions etc.); especially in tissue-specific ones.

Plans

• careful and detailed analysis of human-mouse-(rat)-((dog)) AS isoforms (human and mouse ESTs)

• conservation of AS regulatory sites• mosquito-drosophila• more families of paralogs; add mouse data• AS of transcription factors and receptors

Acknowledgements

• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)

• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute

Authors

• Andrei Mironov (GosNIIGenetika) – spliced alignment• Shamil Sunyaev (EMBL, now Harvard University Medical

School) – protein structure • Vasily Ramensky (Institute of Molecular Biology) – SNPs• Irena Artamonova (Institute of Bioorganic Chemistry) –

human/mouse comparison, MAGEA family• Dmitry Malko (GosNIIGenetika) – mosquito/drosophila

comparison • Eugenia Kriventseva (EBI, now BASF) – protein structure• Ramil Nurtdinov (Moscow State University) – human/mouse

comparison• Ekaterina Ermakova (Moscow State University) – evolution of

alternative/constitutive regions

ReferencesNurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003)

Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics 12: 1313-1320.

Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S. (2003) Increase of functional diversity by alternative splicing. Trends in Genetics 19: 124-128.

Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG (2001) Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Research 29: 2338-2348.

Dralyuk I, Brudno M, Gelfand MS, Zorn M, Dubchak I (2000) ASDB: database of alternatively spliced genes. Nucleic Acids Research 28: 296-297.

Mironov AA, Fickett JW, Gelfand MS (1999). Frequent alternative splicing of human genes. Genome Research 9: 1288-1293.

top related