andrey bondarenko, oleskii tymchenko, nick shulman ...provides new and exciting avenues for...
TRANSCRIPT
CHORUS: A Community Based Solution for the Storage, Analysis, and Exchange of Mass Spectrometry Data and Information
Andrey Bondarenko, Oleskii Tymchenko, Nick Shulman, Brendan MacLean, Christine Wu, Michael MacCoss, and Nathan Yates
Stratus Biosciences, University of Washington, and University of Pittsburgh
Introduction:User Interface:
Advances in computer technology and the
widespread adoption of social media tools have
revolutionized the way people create, share, and
exchange information and ideas. The
development of community based software also
provides new and exciting avenues for scientific
exploration in the field of mass spectrometry and
its impact on the biological sciences and the
practice of medicine. Here we describe CHORUS,
a highly integrated platform for the storage,
visualization, analysis, and exchange of mass
spectrometry data. CHORUS is a non-profit,
community driven project that provides a platform
for the mining of public and private datasets. Built
on a modern analysis framework, CHORUS
provides a platform for tool integration and parallel
data processing.
The CHORUS project is a not-for-profit
public/private partnership to create a sustainable
cloud based platform for the storage, analysis, and
sharing of mass spectrometry data. Access to
CHORUS is free and more than 550 user accounts
from over 150 laboratories have been created.
CHORUS makes use of the Amazon Web Services
(AWS) cloud computing environment to 1) store
and archive raw instrument data, 2) translate
different vendor file into a unique distributed file
format for random file access and parallelization,
and 3) scalable data access and processing.
CHORUS has been designed to simplify common
data management tasks and new capabilities have
been integrated into the software, including protein
database search tools and support for quantitative
proteomics experiments.
CHORUS was designed to have a user interface that resembled Google Drive.
Google Drive The CHORUS Project
Data Structure:Traditional Data file storage
• Fast to get a spectrum
• Slow to get a chromatogram
Chorus Data Storage
• Random access to the file
• Many processes can be used
to extract many
chromatograms/spectra using
MapReduce
CHORUS Growth:
New users have grown
linearly since ASMS 2013
with ~40 new users added
per month
toThe rate of new data (in
GB) added CHORUS has
increased substantially
since Jan 2014.
http://chorusproject.org
Spectrum and Chromatogram Viewer:
State of the art, vendor neutral viewer for data stored within CHORUS
Project Blogs:
Announce public datasets,
communicate with collaborators on
shared private datasets
Integration with the Skyline Client:
Protein Database Search:
• Currently integrated with Comet
and Percolator
• New pipelines coming:
• Byonic – Proteinmetrics
• Mascot – Matrix Science
Protein Result View Peptide Result View
The import data dialog in Skyline
supports browsing Chorus data
Performance Tests: It is faster to access DIA data
remotely from CHORUS than from the local hard drive