on the biological significance of alternative splicing: a bioinformatics approach sandro j. de souza...

43
significance of significance of alternative alternative splicing: a splicing: a bioinformatics bioinformatics approach approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Upload: lawrence-norton

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

On the biological On the biological significance of significance of

alternative splicing: a alternative splicing: a bioinformatics approachbioinformatics approach

Sandro J. de Souza

TDR, 07/05/2004

RNA 10:757-765, 2004

Page 2: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Genomics

Bioinformatics

Large-scale Biology

Page 3: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

The Real Revolution

Early 20th century: Mendel and the inheritance laws

Mid 20th century: DNA as the genetic element (Avery)

Mid 20th century: Watson and Crick and the structure of DNA.

70’s and 80’s: Molecular biology/biotechnology

90’s and 21th century: Genomics and Bioinformatics

Paradigm in Biology: Evolution by means of natural selection(Darwin and Wallace, mid 19th century)

Page 4: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

BioinformaticsBioinformatics

Development of toolsDevelopment of tools Gateway to explore new datasetsGateway to explore new datasets Processing of data derived from Processing of data derived from

large-scale projectslarge-scale projects A new way to do hypothesis-driven A new way to do hypothesis-driven

sciencescience

Page 5: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004
Page 6: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Splicing (1977)Splicing (1977)Roberts and Sharp (Nobel 1993)Roberts and Sharp (Nobel 1993)

Page 7: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Exons Introns

mRNA

Coding Non-coding

Page 8: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004
Page 9: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004
Page 10: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Exon Intron Exon A G G U A A G U … Py12 N C A G N 64 73 100 100 62 68 84 63 65 100 100 5’ site 3’ site

SplicingSplicing

Splicing depends on recognition of exon-intron boundaries

Splice sites are generic and consist solely of:

5’ boundary3’ boundaryAcceptor sitePolypyrimidine tract

Page 11: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

.....if they occur at the boundaries of the regions to be spliced

out, can change the splicing pattern, resulting in the deletion

or addition of whole sequences of amino acids.

Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.

Page 12: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

At least half of all human genes undergo alternative

splicing

Biological significance or spurious events?

Page 13: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Alternative splicing

1. Chromosomal ratio activates txn of Sxl in females only

2. SXL controls splicing of tra-2 mRNA

3. Females: exon 2 (which has a stop codon) is removed via SXLMales: exon 2 is not removed.

4. Males: no active TRAFemales: TRA is made.

5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.

Page 14: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Alternative Splicing – Auditory Hair CellsAlternative Splicing – Auditory Hair Cells

Cytosol

PM

AVSGRKAVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK

Ca2+ concentration at which K+ channel opens depends on alternative splicing of K+ channel – 576 possible alternative splicing combinations

K+ channel

Dotted lines show regions of the protein dependent on splicing

Picture of human cochleal hair cells from http://www.sickkids.on.ca/otolaryngology/Hearloss.asp

Sound frequency

Cytosolic Ca2+ concentration

K+ channel opens

Therefore Ca2+ concentration ‘decodes’ frequency

Page 15: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Types of alternative splicing:

Exon skipping

Intron Retention

5´ 3´

Alternative 5’ splic. site

Alternative 3’ splic. site

mRNA

Page 16: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Large-scale analysis Large-scale analysis of intron retention in of intron retention in

the human the human transcriptometranscriptome

Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager,Sandro J. de Souza

Page 17: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Examples of intron retention Examples of intron retention events with biological events with biological

significancesignificance

Msl2 in DrosophilaMsl2 in Drosophila P element in DrosophilaP element in Drosophila retrovirusesretroviruses

Page 18: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Transmembrane domain

In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region.

Hydrophilic stretchThis intron is not removed in activated B cells, giving rise to a truncated (secreted) product

Ig gene Immature B Cell

Stop codonsStop codonsHydrophilic tailTransmembrane domain

Activation

Immature B cells express membrane-bound Ig. Activation leads to production of secreted form

Page 19: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Intron retention and cancerIntron retention and cancer

CD44 several tumorsGastrin receptor pancreasRet tyrosine kinase pheochromocytomasFas receptor T-cell lymphoma

Page 20: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Transcriptome Database

EST data

Known mRNAs

SAGE data

Genome Data

Page 21: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Genome-based cDNA clusteringGenome-based cDNA clustering

Exon 1

DNA

RNAm

cluster

Exon 2 Exon 3

Page 22: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Transcript Mapping

P53

Page 23: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Types of Data

Page 24: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

RetentionRetentionPrototypePrototype

Full length Full length ESTEST TotalTotal

Full length Full length 640640 691691 11201120

ESTEST 25942594 n.dn.d 25942594

TotalTotal 27932793 691691 31273127

Dataset

Page 25: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Experimental validationExperimental validation

Page 26: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

14% of all human genes show evidence of intron retention

Kan, States & Gish (2002)36% of RefSeq database!

After sample statistics: 5%

Page 27: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Distribution of events along transcripts.

elite groupelite group

events inevents in observeobservedd

expectedexpected

CDSCDS 287 287 (53%)(53%)

502 502 (93%)(93%)

5’ UTR5’ UTR 84 (15%)84 (15%) 27 (5%)27 (5%)

3’ UTR3’ UTR 170 170 (32%)(32%)

12 (2%)12 (2%)

MGCMGC

ObservedObserved expectedexpected

87 (52%)87 (52%) 155 (93%)155 (93%)

15 (9%)15 (9%) 8 (5%)8 (5%)

65 (39%)65 (39%) 4 (2%)4 (2%)

This bias can be a product of:

Underreporting of sequences

Nonsense-mediated decay (NMD)

p << 0.005

p << 0.005

Page 28: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

2563 out of 3195 (80%) 2563 out of 3195 (80%) sequences with a retained sequences with a retained intron had an exon/exon intron had an exon/exon boundary downstream of the boundary downstream of the retention event.retention event.

Page 29: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Retained introns are shorter Retained introns are shorter

P<<<<0.001

Page 30: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Domains encoded by retained Domains encoded by retained intronsintrons

Page 31: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Number of domains entirely encoded by:Retained introns only: 02Exon-intron-exon: 31

Number of domains partially encoded by:Retained introns only: 25Exon-intron-exon: 10

Page 32: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Retained introns have a higher GC content

P<<<<0.001

Page 33: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Did retained introns encode Did retained introns encode protein domains?protein domains?

Only retained introns in the CDS Only retained introns in the CDS were used.were used.

Only retained introns defined by full-Only retained introns defined by full-length mRNAs were used.length mRNAs were used.

Protein sequences were searched Protein sequences were searched against PFAM database.against PFAM database.

Page 34: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Codon UsageCodon Usage

Page 35: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Conservation of intron retention in mouse cDNA sequences

40%-57% of all retained introns present a mouse hit

Identity of orthologous retained introns is 84%

Non-retained introns is 60%; Exons 87%

Mouse cDNA also corresponds to an retention variant

26% - 10 out of 46

Page 36: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Frequency of stop codon

Expected: 1064

88 cases where the retention generates a putative truncated protein

TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA

exon exonretained intron

Stop codons – TAG, TGA, TAA

Found 651 stop codons

mRNA

mRNAcds

stopcds

p-value << 0.005

TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC

Page 37: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

GC content for sequences upstream and downstream the premature stop codon – 88 cases

GC 58%stop

exon exonretained intron

GC 49%

Are under selective pressure for coding potential

5’ 3’

Page 38: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Why the argument of ‘selection’ is important?

•As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one

•If, however, the new variant has some biological significance, selection will act to maintain the function of this variant.

•Therefore, there should not be any “negative selection” on this variant.

Page 39: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

TissueTissue T/NT/N IRIR

BreastBreast TT 1.521.52**

NN 0.620.62

ProstatProstatee

TT 1.451.45**

NN 0.440.44

BrainBrain TT 2.522.52**

NN 3.163.16

ColonColon TT 0.850.85

NN 0.600.60

Intron Retention in Tumors

Page 40: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

w/ w/ downstream downstream spliced intronspliced intron

w/ hit w/ w/ hit w/ mouse mouse cDNAs*cDNAs*

encoding encoding protein protein domains*domains*

experimentallexperimentally validated y validated (both forms)(both forms)

2563/31952563/319580 %80 %

74/15274/15249 %49 %

47/15147/15131 %31 %

2/22/2

* full-length vs full-length set andretained intron entirely in the CDS

Towards a reliable set of intron retention events

Page 41: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Second International Conference on Bioinformatics and

Computational Biologywww.icobicobi.com.br

25-28/10/2004Angra dos Reis

Page 42: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

Group of Group of Computational Biology Computational Biology

Sandro J. de Souza tennis playerHelena Samaia Research AssistantAna C. Pereira Admin. AssistantMaarten Leerkes Ph.D studentNoboru Sakabe Ph.D studentMaria Vibranovski Ph.D studentElza Helena Ph.D studentNatanja Slater Ph.D studentPedro Galante Ph.D studentElisson C. Osorio programmerJorge E. de Souza Ph.D studentRodrigo Soares programmerAndre Zaiats system admin.

Page 43: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004