new tools samifier: a tool which converts results from protein tandem mass spectrometry into sam...

1
New Tools Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co- visualization of genomics, transcriptomics, and proteomics data using the Integrative Genomics Viewer (IGV), which displays SAM files. Results analyzer: This tool reports the number and types of peptides and proteins, and their corresponding Mascot scores based on customizable filters. Peptides that span across exon- exon junctions are also highlighted, which can be used to validate alternatively spliced isoforms of proteins. Tools for the Validation of Genomes and Transcriptomes with Proteomics data Aims With the large amount of genomics and proteomics data currently available, there remains a lack of tools to integrate data from these two fields. This project aims to provide a ‘nexus’ for integrating genomics and transcriptomics data generated from next-generation sequencing with proteomics data generated from protein mass spectrometry. We are developing a set of tools which allow users to: • Co-visualise genomics, transcriptomics, and proteomics data using the Integrated Genomics Viewer (IGV). 1 • Validate the existence of genes and mRNAs using peptides identified from mass spectrometry experiments. • Validate alternatively spliced mRNA isoforms by searching for peptides that span across exon-exon junctions. Chi Nam Ignatius Pang, 1 Carlos Aya, 2 Aidan Tay, 1 Nandan P. Deshpande, 1 Nadeem O. Kaakoush, 1 Hazel Mitchell, 1 Natalie A. Twine, 1 Moustapha Kassem, 3 Marc R. Wilkins 1 1. Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia 2. Intersect Australia Limited, Sydney, Australia 3. Center for Experimental Bioinformatics, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark Downloads The software is available via the GitHub code repository: https://github.com/IntersectAustralia/ap11_samifier Project Blog http://intersectaustralia.github.com/ap11/ Contact Prof. Marc Wilkins - [email protected] Acknowledgements This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) Program and the Education Investment Fund (EIF) Super Science Initiative. The software is developed in conjunction with Intersect Australia Limited, a not-for- profit eResearch company. We thank the Australian Proteomics Computational Facility (APCF) for providing access to their Mascot server. We also thank Dr. Gene Hart-Smith for access to the Wilkins Lab yeast proteomics data. Analysis of Novel Bacterial Proteomes Virtual protein generator: A tool which generates Mascot sequence databases based on genes predicted by tools such as Glimmer. 3 Novel open reading frames are accounted for by creating a database of ‘virtual proteins’, in which the genome is sliced into overlapping, fixed sized regions and translated in all six frames. 4 Virtual protein merger: This tool takes a list of peptides that matches to ‘virtual proteins’ and recalculates the position of the open reading frames by searching for flanking start and end codons. References 1. Robinson, J. T.; Thorvaldsdottir, H.; Winckler, W.; Guttman, M.; Lander, E. S.; Getz, G.; Mesirov, J. P., Integrative genomics viewer. Nat Biotechnol 2011, 29, (1), 24-6. 2. Deshpande, N. P.; Kaakoush, N. O.; Mitchell, H.; Janitz, K.; Raftery, M. J.; Li, S. S.; Wilkins, M. R., Sequencing and validation of the genome of a Campylobacter concisus reveals intra-species diversity. PLoS One 2011, 6, (7), e22170. 3. Delcher, A. L.; Bratke, K. A.; Powers, E. C.; Salzberg, S. L., Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 2007, 23, (6), 673-9. 4. Arthur, J. W.; Wilkins, M. R., Using proteomics to mine genome sequences. Journal of Proteome Research 2004, 3, (3), 393-402. Figure 2. The Integrative Genomics Viewer was used to visualize the peptides of the yeast 40S ribosomal protein S7-B (YNL096C). A peptide which spans exon-exon junction is highlighted in the red box. Analytical Pipeline The pipeline consists of a number of tools and requires a number of input files. It is represented as a diagram below: Summary Statistics • The Results Analyzer were used to calculate the summary statistics of the Campylobacter concisus and Saccharomyces cerevisiae proteome. Proteins which have been verified by two or more peptides, with Mascot score exceeding identity threshold, are included in the statistics. Campylobacter concisus (emergent gut pathogen) - Peptides evidence for 66% (1320/2002) of proteins in Uniprot. 2 Saccharomyces cerevisiae (Baker’s yeast) - Peptides evidence for 14% (895/6621) of proteins in Uniprot. Peptides evidence for 29 exon-exon junctions, 9% of all splice junctions in the yeast proteome. Figure 3. The Virtual protein generator and virtual protein merger. The bacterial genome is sliced into overlapping, fixed sized regions and translated in all six frames to great a database of ‘virtual proteins’. Peptides that match to ‘virtual protein’ are merged together into putative open reading frames based on flanking start and end codons. Genomic location Peptides matches from Mascot Gene architecture (exons and introns) Peptide at exon-exon junction Figure 1. The analytical pipeline allows genomics and transcriptomics data generated from next-generation sequencing platforms to be used in custom sequence databases for Mascot searches. This allows the verification of novel genes or novel alternatively spliced mRNA isoforms using proteomics data. Scan here to download the program.

Upload: monica-carr

Post on 16-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: New Tools Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics,

New Tools

• Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics, and proteomics data using the Integrative Genomics Viewer (IGV), which displays SAM files.

• Results analyzer: This tool reports the number and types of peptides and proteins, and their corresponding Mascot scores based on customizable filters. Peptides that span across exon-exon junctions are also highlighted, which can be used to validate alternatively spliced isoforms of proteins.

Tools for the Validation of Genomes andTranscriptomes with Proteomics data

Aims

With the large amount of genomics and proteomics data currently available, there remains a lack of tools to integrate data from these two fields. This project aims to provide a ‘nexus’ for integrating genomics and transcriptomics data generated from next-generation sequencing with proteomics data generated from protein mass spectrometry. We are developing a set of tools which allow users to:

• Co-visualise genomics, transcriptomics, and proteomics data using the Integrated Genomics Viewer (IGV).1

• Validate the existence of genes and mRNAs using peptides identified from mass spectrometry experiments.

• Validate alternatively spliced mRNA isoforms by searching for peptides that span across exon-exon junctions.

Chi Nam Ignatius Pang,1 Carlos Aya, 2 Aidan Tay, 1 Nandan P. Deshpande,1 Nadeem O. Kaakoush,1 Hazel Mitchell,1 Natalie A. Twine,1 Moustapha Kassem,3 Marc R. Wilkins1

1. Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia2. Intersect Australia Limited, Sydney, Australia3. Center for Experimental Bioinformatics, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark

DownloadsThe software is available via the GitHub code repository:

https://github.com/IntersectAustralia/ap11_samifier

Project Blog

http://intersectaustralia.github.com/ap11/

ContactProf. Marc Wilkins - [email protected]

AcknowledgementsThis project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) Program and the Education Investment Fund (EIF) Super Science Initiative. The software is developed in conjunction with Intersect Australia Limited, a not-for-profit eResearch company. We thank the Australian Proteomics Computational Facility (APCF) for providing access to their Mascot server. We also thank Dr. Gene Hart-Smith for access to the Wilkins Lab yeast proteomics data.

Analysis of Novel Bacterial Proteomes

• Virtual protein generator: A tool which generates Mascot sequence databases based on genes predicted by tools such as Glimmer.3 Novel open reading frames are accounted for by creating a database of ‘virtual proteins’, in which the genome is sliced into overlapping, fixed sized regions and translated in all six frames.4

• Virtual protein merger: This tool takes a list of peptides that matches to ‘virtual proteins’ and recalculates the position of the open reading frames by searching for flanking start and end codons.

References 1. Robinson, J. T.; Thorvaldsdottir, H.; Winckler, W.; Guttman, M.; Lander, E. S.; Getz, G.; Mesirov, J. P., Integrative genomics viewer. Nat

Biotechnol 2011, 29, (1), 24-6.2. Deshpande, N. P.; Kaakoush, N. O.; Mitchell, H.; Janitz, K.; Raftery, M. J.; Li, S. S.; Wilkins, M. R., Sequencing and validation of the genome of a

Campylobacter concisus reveals intra-species diversity. PLoS One 2011, 6, (7), e22170.3. Delcher, A. L.; Bratke, K. A.; Powers, E. C.; Salzberg, S. L., Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics

2007, 23, (6), 673-9.4. Arthur, J. W.; Wilkins, M. R., Using proteomics to mine genome sequences. Journal of Proteome Research 2004, 3, (3), 393-402.

 

Figure 2. The Integrative Genomics Viewer was used to visualize the peptides of the yeast 40S ribosomal protein S7-B (YNL096C). A peptide which spans exon-exon junction is highlighted in the red box.

Analytical Pipeline

The pipeline consists of a number of tools and requires a number of input files. It is represented as a diagram below: Summary Statistics

• The Results Analyzer were used to calculate the summary statistics of the Campylobacter concisus and Saccharomyces cerevisiae proteome. Proteins which have been verified by two or more peptides, with Mascot score exceeding identity threshold, are included in the statistics.

• Campylobacter concisus (emergent gut pathogen) - Peptides evidence for 66% (1320/2002) of proteins in Uniprot.2

• Saccharomyces cerevisiae (Baker’s yeast)- Peptides evidence for 14% (895/6621) of proteins in Uniprot. Peptides evidence for 29 exon-exon junctions, 9% of all splice junctions in the yeast proteome.

Figure 3. The Virtual protein generator and virtual protein merger. The bacterial genome is sliced into overlapping, fixed sized regions and translated in all six frames to great a database of ‘virtual proteins’. Peptides that match to ‘virtual protein’ are merged together into putative open reading frames based on flanking start and end codons.

Genomic location

Peptides matches from Mascot

Gene architecture (exons and introns)

Peptide at exon-exon junction

Figure 1. The analytical pipeline allows genomics and transcriptomics data generated from next-generation sequencing platforms to be used in custom sequence databases for Mascot searches. This allows the verification of novel genes or novel alternatively spliced mRNA isoforms using proteomics data.

Scan here to download the program.

Marc Wilkins
this needs to be better described - not really clear what the experiment is ......!!