on the biological significance of alternative splicing: a bioinformatics approach sandro j. de souza...
TRANSCRIPT
On the biological On the biological significance of significance of
alternative splicing: a alternative splicing: a bioinformatics approachbioinformatics approach
Sandro J. de Souza
TDR, 07/05/2004
RNA 10:757-765, 2004
Genomics
Bioinformatics
Large-scale Biology
The Real Revolution
Early 20th century: Mendel and the inheritance laws
Mid 20th century: DNA as the genetic element (Avery)
Mid 20th century: Watson and Crick and the structure of DNA.
70’s and 80’s: Molecular biology/biotechnology
90’s and 21th century: Genomics and Bioinformatics
Paradigm in Biology: Evolution by means of natural selection(Darwin and Wallace, mid 19th century)
BioinformaticsBioinformatics
Development of toolsDevelopment of tools Gateway to explore new datasetsGateway to explore new datasets Processing of data derived from Processing of data derived from
large-scale projectslarge-scale projects A new way to do hypothesis-driven A new way to do hypothesis-driven
sciencescience
Splicing (1977)Splicing (1977)Roberts and Sharp (Nobel 1993)Roberts and Sharp (Nobel 1993)
Exons Introns
mRNA
Coding Non-coding
Exon Intron Exon A G G U A A G U … Py12 N C A G N 64 73 100 100 62 68 84 63 65 100 100 5’ site 3’ site
SplicingSplicing
Splicing depends on recognition of exon-intron boundaries
Splice sites are generic and consist solely of:
5’ boundary3’ boundaryAcceptor sitePolypyrimidine tract
.....if they occur at the boundaries of the regions to be spliced
out, can change the splicing pattern, resulting in the deletion
or addition of whole sequences of amino acids.
Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.
At least half of all human genes undergo alternative
splicing
Biological significance or spurious events?
Alternative splicing
1. Chromosomal ratio activates txn of Sxl in females only
2. SXL controls splicing of tra-2 mRNA
3. Females: exon 2 (which has a stop codon) is removed via SXLMales: exon 2 is not removed.
4. Males: no active TRAFemales: TRA is made.
5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.
Alternative Splicing – Auditory Hair CellsAlternative Splicing – Auditory Hair Cells
Cytosol
PM
AVSGRKAVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK
Ca2+ concentration at which K+ channel opens depends on alternative splicing of K+ channel – 576 possible alternative splicing combinations
K+ channel
Dotted lines show regions of the protein dependent on splicing
Picture of human cochleal hair cells from http://www.sickkids.on.ca/otolaryngology/Hearloss.asp
Sound frequency
Cytosolic Ca2+ concentration
K+ channel opens
Therefore Ca2+ concentration ‘decodes’ frequency
Types of alternative splicing:
Exon skipping
Intron Retention
5´ 3´
Alternative 5’ splic. site
Alternative 3’ splic. site
mRNA
Large-scale analysis Large-scale analysis of intron retention in of intron retention in
the human the human transcriptometranscriptome
Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager,Sandro J. de Souza
Examples of intron retention Examples of intron retention events with biological events with biological
significancesignificance
Msl2 in DrosophilaMsl2 in Drosophila P element in DrosophilaP element in Drosophila retrovirusesretroviruses
Transmembrane domain
In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region.
Hydrophilic stretchThis intron is not removed in activated B cells, giving rise to a truncated (secreted) product
Ig gene Immature B Cell
Stop codonsStop codonsHydrophilic tailTransmembrane domain
Activation
Immature B cells express membrane-bound Ig. Activation leads to production of secreted form
Intron retention and cancerIntron retention and cancer
CD44 several tumorsGastrin receptor pancreasRet tyrosine kinase pheochromocytomasFas receptor T-cell lymphoma
Transcriptome Database
EST data
Known mRNAs
SAGE data
Genome Data
Genome-based cDNA clusteringGenome-based cDNA clustering
Exon 1
DNA
RNAm
cluster
Exon 2 Exon 3
Transcript Mapping
P53
Types of Data
RetentionRetentionPrototypePrototype
Full length Full length ESTEST TotalTotal
Full length Full length 640640 691691 11201120
ESTEST 25942594 n.dn.d 25942594
TotalTotal 27932793 691691 31273127
Dataset
Experimental validationExperimental validation
14% of all human genes show evidence of intron retention
Kan, States & Gish (2002)36% of RefSeq database!
After sample statistics: 5%
Distribution of events along transcripts.
elite groupelite group
events inevents in observeobservedd
expectedexpected
CDSCDS 287 287 (53%)(53%)
502 502 (93%)(93%)
5’ UTR5’ UTR 84 (15%)84 (15%) 27 (5%)27 (5%)
3’ UTR3’ UTR 170 170 (32%)(32%)
12 (2%)12 (2%)
MGCMGC
ObservedObserved expectedexpected
87 (52%)87 (52%) 155 (93%)155 (93%)
15 (9%)15 (9%) 8 (5%)8 (5%)
65 (39%)65 (39%) 4 (2%)4 (2%)
This bias can be a product of:
Underreporting of sequences
Nonsense-mediated decay (NMD)
p << 0.005
p << 0.005
2563 out of 3195 (80%) 2563 out of 3195 (80%) sequences with a retained sequences with a retained intron had an exon/exon intron had an exon/exon boundary downstream of the boundary downstream of the retention event.retention event.
Retained introns are shorter Retained introns are shorter
P<<<<0.001
Domains encoded by retained Domains encoded by retained intronsintrons
Number of domains entirely encoded by:Retained introns only: 02Exon-intron-exon: 31
Number of domains partially encoded by:Retained introns only: 25Exon-intron-exon: 10
Retained introns have a higher GC content
P<<<<0.001
Did retained introns encode Did retained introns encode protein domains?protein domains?
Only retained introns in the CDS Only retained introns in the CDS were used.were used.
Only retained introns defined by full-Only retained introns defined by full-length mRNAs were used.length mRNAs were used.
Protein sequences were searched Protein sequences were searched against PFAM database.against PFAM database.
Codon UsageCodon Usage
Conservation of intron retention in mouse cDNA sequences
40%-57% of all retained introns present a mouse hit
Identity of orthologous retained introns is 84%
Non-retained introns is 60%; Exons 87%
Mouse cDNA also corresponds to an retention variant
26% - 10 out of 46
Frequency of stop codon
Expected: 1064
88 cases where the retention generates a putative truncated protein
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA
exon exonretained intron
Stop codons – TAG, TGA, TAA
Found 651 stop codons
mRNA
mRNAcds
stopcds
p-value << 0.005
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC
GC content for sequences upstream and downstream the premature stop codon – 88 cases
GC 58%stop
exon exonretained intron
GC 49%
Are under selective pressure for coding potential
5’ 3’
Why the argument of ‘selection’ is important?
•As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one
•If, however, the new variant has some biological significance, selection will act to maintain the function of this variant.
•Therefore, there should not be any “negative selection” on this variant.
TissueTissue T/NT/N IRIR
BreastBreast TT 1.521.52**
NN 0.620.62
ProstatProstatee
TT 1.451.45**
NN 0.440.44
BrainBrain TT 2.522.52**
NN 3.163.16
ColonColon TT 0.850.85
NN 0.600.60
Intron Retention in Tumors
w/ w/ downstream downstream spliced intronspliced intron
w/ hit w/ w/ hit w/ mouse mouse cDNAs*cDNAs*
encoding encoding protein protein domains*domains*
experimentallexperimentally validated y validated (both forms)(both forms)
2563/31952563/319580 %80 %
74/15274/15249 %49 %
47/15147/15131 %31 %
2/22/2
* full-length vs full-length set andretained intron entirely in the CDS
Towards a reliable set of intron retention events
Second International Conference on Bioinformatics and
Computational Biologywww.icobicobi.com.br
25-28/10/2004Angra dos Reis
Group of Group of Computational Biology Computational Biology
Sandro J. de Souza tennis playerHelena Samaia Research AssistantAna C. Pereira Admin. AssistantMaarten Leerkes Ph.D studentNoboru Sakabe Ph.D studentMaria Vibranovski Ph.D studentElza Helena Ph.D studentNatanja Slater Ph.D studentPedro Galante Ph.D studentElisson C. Osorio programmerJorge E. de Souza Ph.D studentRodrigo Soares programmerAndre Zaiats system admin.