alternative splicing prediction -...

Post on 19-Aug-2019

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Alternative SplicingAlternative SplicingPredictionPrediction

Fundamentals of Genetics

DNA

"Central Dogma of Molecular Biology"

Fundamentals of Genetics

mRNA

DNA

Transcription

"Central Dogma of Molecular Biology"

Fundamentals of Genetics

mRNA

protein

DNA

TranslationTranscription

"Central Dogma of Molecular Biology"

Fundamentals of Genetics

pre-mRNA

Fundamentals of Genetics

pre-mRNA

Splicing

mRNA

protein

Translation

mRNA SplicingDNA

mRNA SplicingDNA

5' 3'

pre-mRNA

mRNA SplicingDNA

intronsexons

Fundamentals of Genetics

5' 3'

pre-mRNA

mRNA Splicing

introns

exons

DNA

pre-mRNA

mRNA Splicing

mRNA

introns

exons

DNA

mRNA Splicing

mRNA Splicing

pre-mRNApre-mRNA

mRNA mRNA

Alternative Splicing

pre-mRNApre-mRNA

mRNA variantsmRNA variants

Alternative Splicing

Exon Skipping Intron Retention

Alt. 3'Alt. 5' Alt. Both

pre-mRNApre-mRNA

mRNA variantsmRNA variants

Alternative Splicing

Why Study Alternative Splicing?

Multiple transcripts

One gene

Why Study Alternative Splicing?

Regulation

Multiple transcripts

One gene

Why Study Alternative Splicing?

Regulation

Multiple transcripts

One gene

Protein Diversity

Why Study Alternative Splicing?

Regulation

Disease

Multiple transcripts

One gene

Protein Diversity

Why Study Alternative Splicing?

Regulation

Disease~23,000 genes ~20,000 genes

Multiple transcripts

One gene

Protein Diversity

mRNA Sequences

Conventional sequencing● up to full-length mRNA transcripts● costly

mRNA sequence

Genome

mRNA Sequences

fl-cDNA

ESTs

mRNA Sequences

fl-cDNA

ESTs

Gene Models

● Start, end, introns, exons, etc.● “wet-lab” results● computational results (ESTs)

Description of known features

RNA-Seq

fl-cDNA

EST

RNA-Seq

"Next-Generation" sequencing● short "reads"● cheap, plentiful

Ungapped Alignment

mRNA sequence

DNA sequence

TGTTTTTTACCAGGAGTTGCCAAGAATTGGCCAATGCCTTCTTACGACC

GAATTGGCCAATGCCTTCTTAC

GAATTGGCCAATGCCTTCTTAC

Spliced Alignment

TGATTCAGTCATCACTTTAAGAGCCATGGAGT

short readshort read

Spliced Alignment

TGATTCAGTCATCA .........

TGATTCAGTCATCA

TGATTCAGTCATCACTTTAAGAGCCATGGAGT

short readshort read

Genomic referenceGenomic reference

Spliced Alignment

TGATTCAGTCATCA ......... CTTTAAGAGCCATGGAGT

TGATTCAGTCATCA CTTTAAGAGCCATGGAGT

TGATTCAGTCATCACTTTAAGAGCCATGGAGT

short readshort read

Genomic referenceGenomic reference

Spliced Alignment

TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT

TGATTCAGTCATCA CTTTAAGAGCCATGGAGT

TGATTCAGTCATCACTTTAAGAGCCATGGAGT

short readshort read

Genomic referenceGenomic reference

Anchor Regions

TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT

TGATTCAGTCATCA CT

Genomic referenceGenomic reference

2nt anchor: P(match by chance) = 1/16

Anchor region: minimum length by which a readoverlaps a junction on either side

Anchor Regions

TGATTCAGTCATCA GT.....AG CTTTAAGAGCCATGGAGT

TGATTCAGTCATCA CTTTAAGA

Genomic referenceGenomic reference

8nt anchor: P(match by chance) = 1/48

= 1/65,536

Anchor region: minimum length by which a readoverlaps a junction on either side

File Formats

Sequences: FASTQ

File Formats

Gene Models: GFF3

File Formats

Gene Models: GTF

File Formats

Alignments: SAM

Splice Graphs vs. Transcripts

Transcripts

Splice Graphs vs. Transcripts

Transcripts

fl-cDNA

Splice Graphs vs. Transcripts

Splice Graph

Transcripts

fl-cDNA

Splice Graphs vs. Transcripts

Splice Graph

Transcripts

fl-cDNA, ESTsRNA-Seq

fl-cDNA

SpliceGrapherGene ModelGene Model

SpliceGrapher

ESTsESTs

Gene ModelGene Model

SpliceGrapher

RNA-SeqRNA-Seq ESTsESTs

Gene ModelGene Model

RNA-Seq DataUngapped alignmentsUngapped alignments

Predicted Splice GraphPredicted Splice Graph

RNA-Seq DataUngapped alignmentsUngapped alignments

Predicted Splice GraphPredicted Splice Graph

RNA-Seq DataUngapped alignmentsUngapped alignments

Spliced AlignmentsSpliced Alignments

Predicted Splice GraphPredicted Splice Graph

Spliced AlignmentsSpliced Alignments

RNA-Seq DataUngapped alignmentsUngapped alignments

Predicted Splice GraphPredicted Splice Graph

offset = 14nt

Challenges with RNA-Seq

Short Reads

Challenges with RNA-Seq

Short Reads

Ambiguous OriginsAmbiguous Origins

Challenges with RNA-Seq

Short Reads

Ambiguous OriginsAmbiguous Origins

Variable CoverageVariable Coverage

Challenges with RNA-Seq

Short Reads

Ambiguous OriginsAmbiguous Origins

Variable CoverageVariable Coverage

Challenges with RNA-Seq

Short Reads

Ambiguous OriginsAmbiguous Origins

Variable CoverageVariable Coverage

Challenges with RNA-Seq

Short Reads

Ambiguous OriginsAmbiguous Origins

Variable CoverageVariable Coverage

Highly Localized EvidenceHighly Localized Evidence

Validating Splice Sites

Splice SiteSVM

GeneModels

ESTAlignments

Validating Splice Sites

Splice SiteSVM

GeneModels

ESTAlignments

Accuracy ~87-97%

Validating Splice Sites

Genomic referenceGenomic reference

Splice SiteSVM

TCATGTCTTCATGTTTGCGGTAAGAGGTAGTCATCACTTTAAGAG

GeneModels

ESTAlignments

Accuracy ~87-97%

Validating Splice Sites

Genomic referenceGenomic reference

Splice SiteSVM

TCATGTCTTCATGTTTGCGGTAAGAGGTAGTCATCACTTTAAGAG

GeneModels

ESTAlignments

Accuracy ~87-97%

Other Approaches

● Splice graph prediction Sircah (EST only)

Other Approaches

● Splice graph prediction Sircah (EST only)

● Transcript prediction BowTie/TopHat/Cufflinks HashMatch/Supersplat/TAU Scripture

Other Approaches

● Splice graph prediction Sircah (EST only)

● Transcript prediction BowTie/TopHat/Cufflinks HashMatch/Supersplat/TAU Scripture

Results

AS Predictions for AS Predictions for A. thalianaA. thaliana

Results

AS Predictions for AS Predictions for A. thalianaA. thaliana

Results

AS Predictions for AS Predictions for A. thalianaA. thaliana

Results

Results

Results

Results

Results

Example 1 - Cufflinks

Example 1 - TAU

Example 1 - SpliceGrapher

Example 2 - Cufflinks

Example 2 - TAU

Example 2 - SpliceGrapher

Example 3 - Cufflinks

Example 3 - TAU

Example 3 - SpliceGrapher

Conclusions

● Uses gene models, ESTs, and RNA-seq

Conclusions

● Uses gene models, ESTs, and RNA-seq● Conservative splice graph predictions

Curated gene models establish context Accurate splice site models

Conclusions

● Uses gene models, ESTs, and RNA-seq● Conservative splice graph predictions

Curated gene models establish context Accurate splice site models

● Visualization aids

Ongoing Analyses

PlantsA.thaliana V.vinifera B.distachyon G.max O.sativa

MammalsB.taurusH.sapiens

More Information

Funding from NSF award 0743097

Sofware: splicegrapher.sourceforge.netResults: http://combi.cs.colostate.edu/SpliceGrapher

top related