andrey bondarenko, oleskii tymchenko, nick shulman ...provides new and exciting avenues for...

1
CHORUS: A Community Based Solution for the Storage, Analysis, and Exchange of Mass Spectrometry Data and Information Andrey Bondarenko, Oleskii Tymchenko, Nick Shulman, Brendan MacLean, Christine Wu, Michael MacCoss, and Nathan Yates Stratus Biosciences, University of Washington, and University of Pittsburgh Introduction: User Interface: Advances in computer technology and the widespread adoption of social media tools have revolutionized the way people create, share, and exchange information and ideas. The development of community based software also provides new and exciting avenues for scientific exploration in the field of mass spectrometry and its impact on the biological sciences and the practice of medicine. Here we describe CHORUS, a highly integrated platform for the storage, visualization, analysis, and exchange of mass spectrometry data. CHORUS is a non-profit, community driven project that provides a platform for the mining of public and private datasets. Built on a modern analysis framework, CHORUS provides a platform for tool integration and parallel data processing. The CHORUS project is a not-for-profit public/private partnership to create a sustainable cloud based platform for the storage, analysis, and sharing of mass spectrometry data. Access to CHORUS is free and more than 550 user accounts from over 150 laboratories have been created. CHORUS makes use of the Amazon Web Services (AWS) cloud computing environment to 1) store and archive raw instrument data, 2) translate different vendor file into a unique distributed file format for random file access and parallelization, and 3) scalable data access and processing. CHORUS has been designed to simplify common data management tasks and new capabilities have been integrated into the software, including protein database search tools and support for quantitative proteomics experiments. CHORUS was designed to have a user interface that resembled Google Drive. Google Drive The CHORUS Project Data Structure: Traditional Data file storage Fast to get a spectrum Slow to get a chromatogram Chorus Data Storage Random access to the file Many processes can be used to extract many chromatograms/spectra using MapReduce CHORUS Growth: New users have grown linearly since ASMS 2013 with ~40 new users added per month toThe rate of new data (in GB) added CHORUS has increased substantially since Jan 2014. http://chorusproject.org Spectrum and Chromatogram Viewer: State of the art, vendor neutral viewer for data stored within CHORUS Project Blogs: Announce public datasets, communicate with collaborators on shared private datasets Integration with the Skyline Client: Protein Database Search: Currently integrated with Comet and Percolator New pipelines coming: Byonic Proteinmetrics Mascot Matrix Science Protein Result View Peptide Result View The import data dialog in Skyline supports browsing Chorus data Performance Tests: It is faster to access DIA data remotely from CHORUS than from the local hard drive

Upload: others

Post on 23-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andrey Bondarenko, Oleskii Tymchenko, Nick Shulman ...provides new and exciting avenues for scientific exploration in the field of mass spectrometry and its impact on the biological

CHORUS: A Community Based Solution for the Storage, Analysis, and Exchange of Mass Spectrometry Data and Information

Andrey Bondarenko, Oleskii Tymchenko, Nick Shulman, Brendan MacLean, Christine Wu, Michael MacCoss, and Nathan Yates

Stratus Biosciences, University of Washington, and University of Pittsburgh

Introduction:User Interface:

Advances in computer technology and the

widespread adoption of social media tools have

revolutionized the way people create, share, and

exchange information and ideas. The

development of community based software also

provides new and exciting avenues for scientific

exploration in the field of mass spectrometry and

its impact on the biological sciences and the

practice of medicine. Here we describe CHORUS,

a highly integrated platform for the storage,

visualization, analysis, and exchange of mass

spectrometry data. CHORUS is a non-profit,

community driven project that provides a platform

for the mining of public and private datasets. Built

on a modern analysis framework, CHORUS

provides a platform for tool integration and parallel

data processing.

The CHORUS project is a not-for-profit

public/private partnership to create a sustainable

cloud based platform for the storage, analysis, and

sharing of mass spectrometry data. Access to

CHORUS is free and more than 550 user accounts

from over 150 laboratories have been created.

CHORUS makes use of the Amazon Web Services

(AWS) cloud computing environment to 1) store

and archive raw instrument data, 2) translate

different vendor file into a unique distributed file

format for random file access and parallelization,

and 3) scalable data access and processing.

CHORUS has been designed to simplify common

data management tasks and new capabilities have

been integrated into the software, including protein

database search tools and support for quantitative

proteomics experiments.

CHORUS was designed to have a user interface that resembled Google Drive.

Google Drive The CHORUS Project

Data Structure:Traditional Data file storage

• Fast to get a spectrum

• Slow to get a chromatogram

Chorus Data Storage

• Random access to the file

• Many processes can be used

to extract many

chromatograms/spectra using

MapReduce

CHORUS Growth:

New users have grown

linearly since ASMS 2013

with ~40 new users added

per month

toThe rate of new data (in

GB) added CHORUS has

increased substantially

since Jan 2014.

http://chorusproject.org

Spectrum and Chromatogram Viewer:

State of the art, vendor neutral viewer for data stored within CHORUS

Project Blogs:

Announce public datasets,

communicate with collaborators on

shared private datasets

Integration with the Skyline Client:

Protein Database Search:

• Currently integrated with Comet

and Percolator

• New pipelines coming:

• Byonic – Proteinmetrics

• Mascot – Matrix Science

Protein Result View Peptide Result View

The import data dialog in Skyline

supports browsing Chorus data

Performance Tests: It is faster to access DIA data

remotely from CHORUS than from the local hard drive