proteomic characterization of alternative splicing and coding polymorphism
DESCRIPTION
Proteomic Characterization of Alternative Splicing and Coding Polymorphism. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Why don’t we see more novel peptides?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/1.jpg)
Proteomic Characterization of Alternative Splicing and Coding
Polymorphism
Nathan EdwardsCenter for Bioinformatics and Computational
Biology
University of Maryland, College Park
![Page 2: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/2.jpg)
Why don’t we see more novel peptides?
Tandem mass spectrometry doesn’t discriminate against novel peptides...
...but protein sequence databases do!
Searching traditional protein sequence databases biases the results towards well-understood protein isoforms!
![Page 3: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/3.jpg)
What goes missing?
Known coding SNPs
Novel coding mutations
Alternative splicing isoforms
Alternative translation start-sites
Microexons
Alternative translation frames
![Page 4: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/4.jpg)
Why should we care?
Alternative splicing is the norm!• Only 20-25K human genes• Each gene makes many proteins
Proteins have clinical implications• Biomarker discovery
Evidence for SNPs and alternative splicing stops with transcription• Genomic assays, ESTs, mRNA sequence.• Little hard evidence for translation start site
![Page 5: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/5.jpg)
Novel Splice Isoform
Human Jurkat leukemia cell-line• Lipid-raft extraction protocol, targeting T cells• von Haller, et al. MCP 2003.
LIME1 gene:• LCK interacting transmembrane adaptor 1
LCK gene:• Leukocyte-specific protein tyrosine kinase• Proto-oncogene• Chromosomal aberration involving LCK in leukemias.
Multiple significant peptide identifications
![Page 6: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/6.jpg)
Novel Splice Isoform
![Page 7: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/7.jpg)
Novel Splice Isoform
![Page 8: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/8.jpg)
Novel Mutation
HUPO Plasma Proteome Project• Pooled samples from 10 male & 10 female
healthy Chinese subjects• Plasma/EDTA sample protocol• Li, et al. Proteomics 2005. (Lab 29)
TTR gene• Transthyretin (pre-albumin) • Defects in TTR are a cause of amyloidosis.• Familial amyloidotic polyneuropathy
• late-onset, dominant inheritance
![Page 9: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/9.jpg)
Novel Mutation
Ala2→Pro associated with familial amyloid polyneuropathy
![Page 10: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/10.jpg)
Novel Mutation
![Page 11: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/11.jpg)
Searching Expressed Sequence Tags (ESTs)
Pros
No introns!
Primary splicing evidence for annotation pipelines
Evidence for dbSNP
Often derived from clinical cancer samples
Cons
No frame
Large (8Gb)
“Untrusted” by annotation pipelines
Highly redundant
Nucleotide error rate ~ 1%
![Page 12: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/12.jpg)
Compressed EST Peptide Sequence Database
For all ESTs mapped to a UniGene gene:• Six-frame translation• Eliminate ORFs < 30 amino-acids• Eliminate amino-acid 30-mers observed once• Compress to C2 FASTA database
• Complete, Correct for amino-acid 30-mersGene-centric peptide sequence database:
• Size: 223 Mb vs 8 Gb, 20774 FASTA entries• Running time: 15 mins vs 22 hours• E-values: 50-fold reduction
Download:• http://www.umiacs.umd.edu/~nedwards
![Page 13: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/13.jpg)
Back to the lab...
Current LC/MS/MS workflows identify a few peptides per protein• ...not sufficient for protein isoforms
Need to raise the sequence coverage to (say) 80%• ...protein separation prior to LC/MS/MS
analysis
![Page 14: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/14.jpg)
Future informatics directions...
Combine results from multiple searches from multiple engines
Fast, automated triage of “significant false-positive” peptide identifications
Compressed EST peptide sequence database for other species• Mouse, Rat, Zebrafish, Chicken, Cow, A. thaliana, ??
Relational database and web-application infrastructure• Interactive browser data-grid, flexible web-services export• Java Applet MS/MS viewers, GFF for Genome Browser
![Page 15: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/15.jpg)
Conclusions
Peptides identify more than just proteins• Untapped source of disease biomarkers• Functional vs silencing variants
Compressed peptide sequence databases make routine EST searching feasible
Statistically significant peptide identification is only the first step
![Page 16: Proteomic Characterization of Alternative Splicing and Coding Polymorphism](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b55550346895da44625/html5/thumbnails/16.jpg)
Acknowledgements
Catherine Fenselau, Steve Swatkoski• UMCP Biochemistry
Chau-Wen Tseng, Xue Wu• UMCP Computer Science
Cheng Lee• Calibrant Biosystems
PeptideAtlas, HUPO PPP, X!Tandem
Funding: NCI