andrey bondarenko, oleskii tymchenko, nick shulman ...provides new and exciting avenues for...

Post on 23-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CHORUS: A Community Based Solution for the Storage, Analysis, and Exchange of Mass Spectrometry Data and Information

Andrey Bondarenko, Oleskii Tymchenko, Nick Shulman, Brendan MacLean, Christine Wu, Michael MacCoss, and Nathan Yates

Stratus Biosciences, University of Washington, and University of Pittsburgh

Introduction:User Interface:

Advances in computer technology and the

widespread adoption of social media tools have

revolutionized the way people create, share, and

exchange information and ideas. The

development of community based software also

provides new and exciting avenues for scientific

exploration in the field of mass spectrometry and

its impact on the biological sciences and the

practice of medicine. Here we describe CHORUS,

a highly integrated platform for the storage,

visualization, analysis, and exchange of mass

spectrometry data. CHORUS is a non-profit,

community driven project that provides a platform

for the mining of public and private datasets. Built

on a modern analysis framework, CHORUS

provides a platform for tool integration and parallel

data processing.

The CHORUS project is a not-for-profit

public/private partnership to create a sustainable

cloud based platform for the storage, analysis, and

sharing of mass spectrometry data. Access to

CHORUS is free and more than 550 user accounts

from over 150 laboratories have been created.

CHORUS makes use of the Amazon Web Services

(AWS) cloud computing environment to 1) store

and archive raw instrument data, 2) translate

different vendor file into a unique distributed file

format for random file access and parallelization,

and 3) scalable data access and processing.

CHORUS has been designed to simplify common

data management tasks and new capabilities have

been integrated into the software, including protein

database search tools and support for quantitative

proteomics experiments.

CHORUS was designed to have a user interface that resembled Google Drive.

Google Drive The CHORUS Project

Data Structure:Traditional Data file storage

• Fast to get a spectrum

• Slow to get a chromatogram

Chorus Data Storage

• Random access to the file

• Many processes can be used

to extract many

chromatograms/spectra using

MapReduce

CHORUS Growth:

New users have grown

linearly since ASMS 2013

with ~40 new users added

per month

toThe rate of new data (in

GB) added CHORUS has

increased substantially

since Jan 2014.

http://chorusproject.org

Spectrum and Chromatogram Viewer:

State of the art, vendor neutral viewer for data stored within CHORUS

Project Blogs:

Announce public datasets,

communicate with collaborators on

shared private datasets

Integration with the Skyline Client:

Protein Database Search:

• Currently integrated with Comet

and Percolator

• New pipelines coming:

• Byonic – Proteinmetrics

• Mascot – Matrix Science

Protein Result View Peptide Result View

The import data dialog in Skyline

supports browsing Chorus data

Performance Tests: It is faster to access DIA data

remotely from CHORUS than from the local hard drive

top related