the elastic analysis with galaxy on the cloud

1
POSTER PRESENTATION Open Access The elastic analysis with galaxy on the cloud Enis Afgan 1* , Dannon Baker 1 , The Galaxy Team 2 , Anton Nekrutenko 3 , James Taylor 1 From Beyond the Genome: The true gene count, human evolution and disease genomics Boston, MA, USA. 11-13 October 2010 As experimental biologists become increasingly reliant on high-throughput data production, the scale and sophistication of computational infrastructure needed to support data storage and analysis has grown dramati- cally. In addition, the computational infrastructure needs to be coupled with the appropriate data analysis tools. Such an environment requires informatics support to setup, configure and maintain the infrastructure. Moreover, once setup, the complete environment needs to be maintained during the periods of inactivity or low usage. For the experimentalists, such requirements represent a barrier to realizing the next step in science. Cloud computing has recently emerged as a model that is well suited for the periodic computational requirements convenient to experimental biologists. However, cloud computing resources are not yet suita- ble for immediate use by the experimentalists because they still need to be configured and managed. To help in enabling seamless next-generation sequencing (NGS) analyses on the cloud, we have developed Galaxy Cloud- Man. Galaxy CloudMan is a comprehensive manager for running and managing cloud computing resources. Cloud resources managed by Galaxy CloudMan are pre- configured with tools necessary for the NGS analyses. Access and interaction with the preconfigured NGS tools is handled through Galaxy, an open-source, web based system that provides an integrated analysis envir- onment where domain scientists can, without infor- matics expertise, interactively construct multi-step analyses, with outputs from one step feeding seamlessly to the next. Separate from the Galaxy analysis interface, CloudMan offers a simple web-based interface that allows anyone to acquire a desired number of computa- tional and storage resources on a cloud infrastructure and access the familiar Galaxy interface and associated tools. CloudMan automatically handles all aspects of resource acquisition, configuration, and data persistence, thus entirely insulating a user from the low-level com- putational details. With Galaxy CloudMan, an individual researcher can, without any informatics support, gain access to a complete NGS data analysis solution in a matter of minutes and release it once the analysis has completed, thus eliminating the need for the infrastruc- ture maintenance. Author details 1 Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA. 2 http://galaxyproject.org. 3 Huck Institutes of the Life Sciences and Department of Biochemistry and Molecular Biology, Pennsylvania State University, State College, PA 16801, USA. Published: 11 October 2010 doi:10.1186/1465-6906-11-S1-P2 Cite this article as: Afgan et al.: The elastic analysis with galaxy on the cloud. Genome Biology 2010 11(Suppl 1):P2. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color figure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit 1 Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA Full list of author information is available at the end of the article Afgan et al. Genome Biology 2010, 11(Suppl 1):P2 http://genomebiology.com/2010/11/S1/P2 © 2010 Afgan et al; licensee BioMed Central Ltd.

Upload: james-taylor

Post on 12-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

POSTER PRESENTATION Open Access

The elastic analysis with galaxy on the cloudEnis Afgan1*, Dannon Baker1, The Galaxy Team2, Anton Nekrutenko3, James Taylor1

From Beyond the Genome: The true gene count, human evolution and disease genomicsBoston, MA, USA. 11-13 October 2010

As experimental biologists become increasingly relianton high-throughput data production, the scale andsophistication of computational infrastructure needed tosupport data storage and analysis has grown dramati-cally. In addition, the computational infrastructureneeds to be coupled with the appropriate data analysistools. Such an environment requires informatics supportto setup, configure and maintain the infrastructure.Moreover, once setup, the complete environment needsto be maintained during the periods of inactivity or lowusage. For the experimentalists, such requirementsrepresent a barrier to realizing the next step in science.Cloud computing has recently emerged as a model

that is well suited for the periodic computationalrequirements convenient to experimental biologists.However, cloud computing resources are not yet suita-ble for immediate use by the experimentalists becausethey still need to be configured and managed. To helpin enabling seamless next-generation sequencing (NGS)analyses on the cloud, we have developed Galaxy Cloud-Man. Galaxy CloudMan is a comprehensive manager forrunning and managing cloud computing resources.Cloud resources managed by Galaxy CloudMan are pre-configured with tools necessary for the NGS analyses.Access and interaction with the preconfigured NGStools is handled through Galaxy, an open-source, webbased system that provides an integrated analysis envir-onment where domain scientists can, without infor-matics expertise, interactively construct multi-stepanalyses, with outputs from one step feeding seamlesslyto the next. Separate from the Galaxy analysis interface,CloudMan offers a simple web-based interface thatallows anyone to acquire a desired number of computa-tional and storage resources on a cloud infrastructureand access the familiar Galaxy interface and associatedtools. CloudMan automatically handles all aspects of

resource acquisition, configuration, and data persistence,thus entirely insulating a user from the low-level com-putational details. With Galaxy CloudMan, an individualresearcher can, without any informatics support, gainaccess to a complete NGS data analysis solution in amatter of minutes and release it once the analysis hascompleted, thus eliminating the need for the infrastruc-ture maintenance.

Author details1Department of Biology and Department of Mathematics and ComputerScience, Emory University, Atlanta, GA 30322, USA. 2http://galaxyproject.org.3Huck Institutes of the Life Sciences and Department of Biochemistry andMolecular Biology, Pennsylvania State University, State College, PA 16801,USA.

Published: 11 October 2010

doi:10.1186/1465-6906-11-S1-P2Cite this article as: Afgan et al.: The elastic analysis with galaxy on thecloud. Genome Biology 2010 11(Suppl 1):P2.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

1Department of Biology and Department of Mathematics and ComputerScience, Emory University, Atlanta, GA 30322, USAFull list of author information is available at the end of the article

Afgan et al. Genome Biology 2010, 11(Suppl 1):P2http://genomebiology.com/2010/11/S1/P2

© 2010 Afgan et al; licensee BioMed Central Ltd.

As experimental biologists become increasingly reliant on high-throughput data production, the scale and sophistication of computational infrastructure needed to support data storage and analysis has grown dramatically. In addition, the computational infrastructure needs to be coupled with the appropriate data analysis tools. Such an environment requires informatics support to setup, configure and maintain the infrastructure. Moreover, once setup, the complete environment needs to be maintained during the periods of inactivity or low usage. For the experimentalists, such requirements represent a barrier to realizing the next step in science.Cloud computing has recently emerged as a model that is well suited for the periodic computational requirements convenient to experimental biologists. However, cloud computing resources are not yet suitable for immediate use by the experimentalists because they still need to be configured and managed. To help in enabling seamless next-generation sequencing (NGS) analyses on the cloud, we have developed Galaxy CloudMan. Galaxy CloudMan is a comprehensive manager for running and managing cloud computing resources. Cloud resources managed by Galaxy CloudMan are preconfigured with tools necessary for the NGS analyses. Access and interaction with the preconfigured NGS tools is handled through Galaxy, an open-source, web based system that provides an integrated analysis environment where domain scientists can, without informatics expertise, interactively construct multi-step analyses, with outputs from one step feeding seamlessly to the next. Separate from the Galaxy analysis interface, CloudMan offers a simple web-based interface that allows anyone to acquire a desired number of computational and storage resources on a cloud infrastructure and access the familiar Galaxy interface and associated tools. CloudMan automatically handles all aspects of resource acquisition, configuration, and data persistence, thus entirely insulating a user from the low-level computational details. With Galaxy CloudMan, an individual researcher can, without any informatics support, gain access to a complete NGS data analysis solution in a matter of minutes and release it once the analysis has completed, thus eliminating the need for the infrastructure maintenance.