sizhe xiao gigascience 2013 poster open access gigadb – revolutionizing data dissemination,...

1
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman. Abstract GigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website (http://gigadb.org/). To date, GigaDB comprises over 56 datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies. Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISA-Tab and other scientific standards groups to support and extend the usability and interoperability model. Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience doi:10.6084/m9.figshare.786486 Cite this poster as: GigaDB – revolutionizing data dissemination, organization and use. Xiao Si Zhe , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman. http://dx.doi.org/10.6084/m9.figshare786486 © 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Correspondence: [email protected] 1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China . 2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China. 3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong 6. Oxford e-Research Centre, University of Oxford, Oxford, UK. Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi) Acknowledgement s Thanks to: Financial support from: Data sets Analyses Linked to Linked to DOI DOI Open-Paper DOI:10.5524 / 100044 Open- Pipelines Open- Workflows DOI:10.5524 / 100038 Open-Data 78GB CC0 data Linking papers to data and analyses 10/18 microarray papers cannot be reproduced Ioannidis: “Most Published Research Findings Are False>15X increase in retracted papers in last decade Lack of incentives to make data/methods available Poor metadata quality and lack of interoperability Growing replication gap: Background Combine and integrate (via citable DOIs): Open-access journal www.gigasciencejournal.com Data Publishing Platform gigadb.org Data Analysis Platform galaxy.cbiit.cuhk.edu.hk GigaSolution: deconstructing the paper Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy Article processing charges for all submissions in 2013 covered by BGI Open access, open data and highly visible work freely available for distribution Inclusion in PubMed and Google Scholar GigaDB Home page: www.gigadb.org Aspera data transfer Faster download speeds Valida t i o n check s Fail – submitter is provided error report Pass – dataset is uploaded to GigaDB. GigaDB Submission Workflow Curator makes dataset public (can be set as future date if required) DataCite XML file Excel submission file Submitter logs in to GigaDB website and uploads Excel submission GigaDB DOI assign ed Files Submitter provides files by ftp or Aspera XML is generated and registered with DataCite Curator Review Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). DOI 10.5524/100003 Genomic data from the crab- eating macaque/cynomo lgus monkey (Macaca fascicularis) (2011) Public GigaDB dataset Datasets public in GigaDB

Upload: colin-welch

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,

SiZhe Xiao GigaScience 2013

POSTER Open Access

GigaDB – revolutionizing data dissemination, organization and useXiao Si Zhe1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman.

AbstractGigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community.GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website (http://gigadb.org/). To date, GigaDB comprises over 56 datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies. Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISA-Tab and other scientific standards groups to support and extend the usability and interoperability model.

Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience

doi:10.6084/m9.figshare.786486Cite this poster as: GigaDB – revolutionizing data dissemination, organization and use. Xiao Si Zhe , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman.http://dx.doi.org/10.6084/m9.figshare786486

© 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Correspondence: [email protected]. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China.2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China.3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong 6. Oxford e-Research Centre, University of Oxford, Oxford, UK.

Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi)

AcknowledgementsThanks to:

Financial support from:

Data sets

Analyses

Linked to

Linked to

DOI

DOI

Open-Paper

DOI:10.5524/100044

Open-PipelinesOpen-Workflows

DOI:10.5524/100038Open-Data

78GB CC0 data

Linking papers to data and analyses

• 10/18 microarray papers cannot be reproduced

• Ioannidis: “Most Published Research Findings Are False”

• >15X increase in retracted papers in last decade

• Lack of incentives to make data/methods available

• Poor metadata quality and lack of interoperability

Growing replication gap:Background

Combine and integrate (via citable DOIs):

Open-access journalwww.gigasciencejournal.com

Data Publishing Platformgigadb.org

Data Analysis Platformgalaxy.cbiit.cuhk.edu.hk

GigaSolution: deconstructing the paper

Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: • No space constraints, and unlimited data and workflow hosting in GigaDB and

GigaGalaxy• Article processing charges for all submissions in 2013 covered by BGI• Open access, open data and highly visible work freely available for distribution• Inclusion in PubMed and Google Scholar

GigaDBHome page: www.gigadb.org

Aspera data transfer Faster download speeds

Valid

ation

chec

ks

Fail – submitter is provided error report

Pass – dataset is uploaded to GigaDB.

GigaDB Submission Workflow

Curator makes dataset public (can be set as future date if required)

DataCite XML file

Excel submission file

Submitter logs in to GigaDB website and uploads Excel submission

GigaDB

DOI assigned

Files

Submitter provides files by ftp or Aspera

XML is generated and registered with DataCite

Curator Review

Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues).

DOI 10.5524/100003Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011)

Public GigaDB dataset

Datasets public in GigaDB