hjay human (research) junction array
Post on 11-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
HJAYHuman (Research) Junction Array
EURASNET, CambridgeSept. 14th, 2007
Tyson A. Clark
Typical Junction Array Design
Junction Arrays vs. Exon Arrays
Junction Exon
Pros
Cons
• Direct Measure of skipping events
• Reciprocal Analysis
• Multiple independent measurements of a single event
• Information on how exons are joined
• Ability to monitor small exons
• Increased genome coverage
• “Discovery” of Alt. Splicing
• Probe selection flexibility
• Comprehensive / Unbiased Design
• Observed events only (no discovery)
• Requires lots of probes
• Limited flexibility
• Half-Hyb
• Difficult to predict joining events without empirical evidence
• Cannot distinguish some isoforms
• Fewer probes per splicing event
• No joining information
Why use Exon-Exon Junction probes?
• Alt. spliced Exons (A & B) Alt. spliced Exons (A & B) present in 50% of transcriptspresent in 50% of transcripts
Why use Exon-Exon Junction probes?
• Alt. spliced Exons (A & B) Alt. spliced Exons (A & B) present in 50% of transcriptspresent in 50% of transcripts
11
Why use Exon-Exon Junction probes?
• Alt. spliced Exons (A & B) Alt. spliced Exons (A & B) present in 50% of transcriptspresent in 50% of transcripts
- or - - or -
11
22
Why use Exon-Exon Junction probes?
• Alt. spliced Exons (A & B) Alt. spliced Exons (A & B) present in 50% of transcriptspresent in 50% of transcripts
- or - - or -
11
22
Cannot distinguishCannot distinguish
between situationbetween situation
1 & 21 & 2(using only exon representation)(using only exon representation)
?? ??
Advantages of Junction Probes
• Information on how exons are joined together• Exon skipping events are measured directly
– rather than just a decrease in exon signal– allows for reciprocal change analyses
• Exon-Exon junctions are non-genomic sequence– not present on a genome tiling array
• Ability to monitor small exons and distinguish alternative splice sites that are very close
HJAYDesign Information
New Research Junction Array DesignGenome-wide “Observed” Junctions
• Using content from ExonWalk (C. Sugnet), Ensembl, and RefSeq, we have designed an array that will include:– Exon Probes 8 – 12 PM probes per exon
– Exon – Exon Junction Probes 8 probes per junction (-4 to +4)
– >30,000 Human & Mouse Genes
– Human and Mouse designs will be manufactured onto separate chips
Design Input
• Human Input files (10,063,211 input exons)– (NCBI 36, March 2006 Genome Assembly)
• RefSeq (hNCBI36)• Ensembl (38)• ExonWalk (hNCBI36_exonwalkall)
• Mouse Input files (3,963,343 input exons)– (MM7, August 2005 Genome Assembly)
• RefSeq (mm7)• Ensembl (38)• ExonWalk (mm7_exonwalkall)
Exon Walk (developed by Chuck Sugnet)
• The ExonWalk program merges cDNA evidence together to predict full length isoforms, including alternative transcripts.
• ESTs Filtered– Present in cDNA libraries of another organism (i.e. also present in
mouse)– Or have three separate cDNA GenBank entries supporting it.
ExonWalk Transcripts – TCF7L2 as an example
Junction Design Strategy(Note reverse strand and non-overlapping transcripts are
separated into unique Transcript Clusters)
Genome Assembly
Genome Assembly
5' edge
3' edge
Observed Junctions
Input Transcripts (redundant)
Transcript Cluster (Non-redundant transcripts)
Exons and Exon Clusters (Non-redundant)
PSRs Non-redundant)
EX_C
EX_A
EX_B
541 2 3 6
New Junction Array – Design Example
Human Design Run #2 (1 week 1 day 41 minutes and 55 seconds elapsed)
• Transcript Clusters 35,123• with junctions 24,753
• Transcripts 335,663
• Junctions (Obs) 260,488• Exons 360,569• Exon Clusters 249,240• PSRs 315,137
Human DesignTranscript Cluster Content
35,123 "Genes"
Ensembl and Refseq and ExonWalk
1667348%Ensembl only
1106932%
Refseq only990%
ExonWalk only442813%
Ensembl and ExonWalk
14994%
Refseq and ExonWalk
1340%
Ensembl and Refseq12213%
Mouse Design (3 days 17 hrs 34 minutes and 1 seconds elapsed)
• Transcript Clusters 30,833• with junctions 25,431
• Transcripts 145,993
• Junctions (Obs) 237,871• Exons 319,769• Exon Clusters 239,114• PSRs 282,186
Mouse DesignTranscript Cluster Content
30,883 "Genes"
Ensembl and Refseq and ExonWalk
15,37650%
Refseq only5182%
Ensembl only680222%
ExonWalk only442813%
Ensembl and ExonWalk
22577%
Refseq and ExonWalk
5302% Ensembl and
Refseq16705%
One 49 Format 5 Micron Mask SetSplit Between Designs(6,553,600 features)
Note these are estimates pre-probe selection.
HJAY (Human Junction ArraY)Probes Per Oligos
Observed Junctions 260,488 8 2,083,904PSRs (Exon Probes) 315,137 8 2,521,096Tiling Targets 33,086 60 1,985,160Controls 75,000
Total 6,665,160
MJAY (Mouse Junction ArraY)Probes Per Oligos
Observed Junctions 237,871 8 1,902,968PSRs (Exon Probes) 282,186 10 2,821,860Tiling Targets 26,949 60 1,616,940Controls 75,000
Total 6,416,768
Tiling of adjacent intronic sequence
Tiling Targets1 50
100
150
200
230
G C T G A C T G C C A T A C C C C A C A C C T C C C T T C C C T A T A T G T G C C C A C A G C T T T T T A C C T C A C A G C T G T G C C C C T G T G G G G T G G C T G G A T T G C T A G G G C C A T T T C T C C C C A G G A G C C C C T G T G C T G G A T A A G T A G G G T C T T C C C T C A A G C T A A T A A A T A C T G C C C C C C A C C C C G G C C C T A C C A C A A T T T T A C C T A A A C C A C A C A T A A G T G C C T G C A A A G G G T T A A C C C C A C T C A ->
T A C C C C A C A C C T C C C T T C C C T A T A T T T T T A C C T C A C A G C T G T G C C C C T G T G G T G G C T G G A T T G C T A G G G C C A T T T C C T G T G C T G G A T A A G T A G G G T C T T C C C G G C C C T A C C A C A A T T T T A C C T A A A A G T G C C T G C A A A G G G T T A A C C C C A
C T T C C C T A T A T G T G C C C A C A G C T T T T G G C T G G A T T G C T A G G G C C A T T T C T G A T A A G T A G G G T C T T C C C T C A A G C T T A C C T A A A C C A C A C A T A A G T G C C T G T A A C C C C A C T C A ->C C C T C A A G C T A A T A A A T A C T G C C C C
231
250
300
301
G G G A T G G C A T A T G G T A G A A A A C A G G G T C T G A G G C T G G G C C C T C T C C C T T C T T G C C T T T C C T C A T G C A T A G
G G G A T G G C A T A T G T G G T A G A A A A C A G G G T C T G A G G C T G
T A T G G T A G A A A A C A G G G T C T G A G G C C C T T C T T G C C T T T C C T C A T G C A T A G
Exon
Tiling Probes Flanking the Exon (Only showing upstream and same strand) (ACE01009 Adam15)
Human Mousealt 3' splice sites 1,555 1,408alt 5' splice sites 904 807Cassette Exons 15,367 9,603Retained Intron 1,260 1,131
Conserved Exons 7,000 7,000Constitutive Exons 7,000 7,000
Total 33,086 26,949
Flanking Tiling Probes (15 per side, both strands)
Controls
Bg GC (Research) 20,595AFFX controls. 4,084
GC Bins (Antigenomic) 16,94320000_st 4,398cross_at 9
default_at 2,008default_st 9,098
norm_exon_pm_st 4,518norm_intron_pm_st 10,990
text_at 242text_st 493
Total Controls 73,378
Constitutive PSRs (Human, 116,284 constitutive, 36.9%)
Genome Assembly
Constitutive PSRs
EX_C
EX_A
EX_B
54
PSRs (Probe Selection Regions)
Transcript Cluster
Exons / Exon Clusters1 2 3
Data Analysis
Analysis Approach
• Used the Splicing Index Algorithm treating each probeset (Exon or Junction) as independent– P-Value cutoff <0.001
– Magnitude of Change > |0.5| (log2 ratio ~1.4)
– 1570 Probesets in total passed those cutoffs
• Looked for Splicing Events that had more than 1 significant probeset– 252 genes with multiple probesets on the list– ~75% looked like real AS event
4 Probesets Monitor a Simple Cassette Exon
Some Examples
Highly Conserved 12bp Exon
CLASP1Event #1
CLASP1Event #2
Excel Sheet with Combined Results
221 Exons Total MED or higher confidence
• By Confidence Level– 26 HIGHEST
• All probesets from that splicing event made the list
– 89 HIGH• Reciprocal Junctions from the splicing event made the list
– 53 MED HIGH• Multiple Junctions from the splicing event made the list
– 53 MEDIUM
• By Splicing Event– 179 Cassette Exons
• 19 Mutually Exclusive Cassettes• 27 Multiple (consecutive) Cassettes
– 32 Alternative 3’ Terminal Exons– 5 Alt. 5’ss– 4 Alt. 3’ss
Histogram of Cassette Exon Size
16 Exons < 25 bp
top related