reproducible and reusable research: are journal data sharing … · 2017-04-26 · reproducible and...

18
Submitted 10 November 2016 Accepted 20 March 2017 Published 25 April 2017 Corresponding author Robin E. Champieux, [email protected] Academic editor Ana Marusic Additional Information and Declarations can be found on page 15 DOI 10.7717/peerj.3208 Copyright 2017 Vasilevsky et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A. Vasilevsky 1 ,2 , Jessica Minnier 3 , Melissa A. Haendel 1 ,2 and Robin E. Champieux 1 1 OHSU Library, Oregon Health & Science University, Portland, OR, United States 2 Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States 3 OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, OR, United States ABSTRACT Background. There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible. Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature. Methods. The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics. The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts. The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal. Results. A total of 11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. A total of 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it. A total of 9.1% of journals mentioned data sharing indirectly, and only 14.8% addressed protein, proteomic, and/or genomic data sharing. There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing criteria. Open access journals were not more likely to require data sharing than subscription journals. Discussion. Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement. Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable. How to cite this article Vasilevsky et al. (2017), Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ 5:e3208; DOI 10.7717/peerj.3208

Upload: others

Post on 08-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Submitted 10 November 2016Accepted 20 March 2017Published 25 April 2017

Corresponding authorRobin E. Champieux,[email protected]

Academic editorAna Marusic

Additional Information andDeclarations can be found onpage 15

DOI 10.7717/peerj.3208

Copyright2017 Vasilevsky et al.

Distributed underCreative Commons CC-BY 4.0

OPEN ACCESS

Reproducible and reusable research: arejournal data sharing policies meeting themark?Nicole A. Vasilevsky1,2, Jessica Minnier3, Melissa A. Haendel1,2 andRobin E. Champieux1

1OHSU Library, Oregon Health & Science University, Portland, OR, United States2Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University,Portland, OR, United States

3OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, OR, United States

ABSTRACTBackground. There is wide agreement in the biomedical research community thatresearch data sharing is a primary ingredient for ensuring that science is moretransparent and reproducible. Publishers could play an important role in facilitating andenforcing data sharing; however, many journals have not yet implemented data sharingpolicies and the requirements vary widely across journals. This study set out to analyzethe pervasiveness and quality of data sharing policies in the biomedical literature.Methods. The online author’s instructions and editorial policies for 318 biomedicaljournals weremanually reviewed to analyze the journal’s data sharing requirements andcharacteristics. The data sharing policies were ranked using a rubric to determine if datasharing was required, recommended, required only for omics data, or not addressed atall. The data sharing method and licensing recommendations were examined, as wellany mention of reproducibility or similar concepts. The data was analyzed for patternsrelating to publishing volume, Journal Impact Factor, and the publishing model (openaccess or subscription) of each journal.Results. A total of 11.9% of journals analyzed explicitly stated that data sharing wasrequired as a condition of publication. A total of 9.1% of journals required data sharing,but did not state that it would affect publication decisions. 23.3% of journals hada statement encouraging authors to share their data but did not require it. A totalof 9.1% of journals mentioned data sharing indirectly, and only 14.8% addressedprotein, proteomic, and/or genomic data sharing. There was nomention of data sharingin 31.8% of journals. Impact factors were significantly higher for journals with thestrongest data sharing policies compared to all other data sharing criteria. Open accessjournals were not more likely to require data sharing than subscription journals.Discussion. Our study confirmed earlier investigations which observed that only aminority of biomedical journals require data sharing, and a significant associationbetween higher Impact Factors and journals with a data sharing requirement.Moreover,while 65.7% of the journals in our study that required data sharing addressed theconcept of reproducibility, as with earlier investigations, we found that most datasharing policies did not provide specific guidance on the practices that ensure datais maximally available and reusable.

How to cite this article Vasilevsky et al. (2017), Reproducible and reusable research: are journal data sharing policies meeting the mark?PeerJ 5:e3208; DOI 10.7717/peerj.3208

Page 2: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Subjects Ethical Issues, Science and Medical Education, Science Policy, StatisticsKeywords Scholarly communication, Scientific communication, Reproducibility, Open data,Data sharing, Data management

INTRODUCTIONOver the last several years, the importance and benefits of research data sharing havebeen emphasized by many communities, including professional societies, funders, policymakers, and publishers (NIH, 2016b; Holdren, 2012; Research Councils UK, 2011; Drazen etal., 2016; Medium.com, 2016). Several rationales underpin the arguments for better accessto and the curation of research data (Borgman, 2012). While the factors contributing to thepoor reproducibility of biomedical research are varied and complex, and even the meaningof reproducible research is fraught, data availability is regarded as one necessary componentfor the assessment of replication and validation studies (Collins & Tabak, 2014). If raw dataare made available, others have the opportunity to replicate or correct earlier findings and,ostensibly, influence the pace and efficiency of future research endeavors. Researchers canask new questions of existing data, and data can be combined and curated in ways thatfurther its value and scholarship (Borgman, 2012). As Fischer and Zigmond argue, the greatadvances in science depend not only on the contributions of many individual researchers,but also their willingness to share the products of their work (Fischer & Zigmond, 2010).

The benefits described above have motivated many of the organizations that supportresearch to require that data be made publicly available. Since 2011, the National ScienceFoundation (NSF) has required applicants to submit a datamanagement plan documentinghow investigators will conform to the NSF’s expectation that primary data and researchresources will be shared with other researchers (NSF, 2016). The White House Officeof Science and Technology Policy issued a memorandum in 2013 directing agencies tomake plans for ensuring public access to federally funded research results, includingdata (Holdren, 2012). In 2014, the National Institutes of Health (NIH) implemented astrong data sharing policy for large-scale human and non-human genomic data (NIH,2016a). Additionally, the European Research Council’s Open Access Guidelines includeand support public access to research data, and open is the default for all data generatedvia its Horizon 2020 program (European Commission, 2016).

However, data sharing and its long-term stewardship involve an array of activities,participants, and technologies, especially if discovery, reuse, and preservation are to beensured (Dallmeier-Tiessen et al., 2014). Moreover, despite a belief in the importance ofaccess to other’s data for their ownwork,many scientists do not consistently share their data,reporting a variety of barriers and disincentives (Tenopir et al., 2011). Roadblocks to sharinginclude insufficient time, a lack of funding, fear of scrutiny or misinterpretation, a deficitof requirements, attribution concerns, competition, difficulty navigating infrastructureoptions, and a paucity of data sharing related rewards (Longo & Drazen, 2016; LeClere,2010; Savage & Vickers, 2009). For quality data sharing to become the norm, broad systemicchange and solutions are needed.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 2/18

Page 3: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Journal publication is the current and primarymode of sharing scientific research.Whilearguably problematic, it has the most influence on an individual’s credibility and success(Fischer & Zigmond, 2010). As Lin and Strasser write, journals and publishers occupy animportant ‘‘leverage point in the research process,’’ and are key to affecting the changesneeded to realize data sharing as a ‘‘fundamental practice’’ of scholarly communication(Lin & Strasser, 2014). There has been significant support for and progress toward this end.At a joint workshop held at the NIH in June 2014, editors from 30 basic and preclinicalscience journals met to discuss how to enhance reproducible, robust, and transparentscience. As an outcome, they produced the ‘‘Principles and Guidelines for ReportingPreclinical Research,’’ which included the recommendation that journals require that allof the data supporting a paper’s conclusion be made available as part of the review processand upon publication, that datasets be deposited to public repositories, and that datasetsbe bi-directionally linked to published articles in a way that ensures attribution (NIH,2016b). In 2013, Nature journals implemented an 18-point reporting checklist for lifescience articles. It included required data and code availability statements, and a strongrecommendation for data sharing via public repositories (Nature Publishing Group, 2013).Additionally, many large and influential journals and publishers have implemented datasharing requirements, including Science, Nature, the Public Library of Science (PLOS),and the Royal Society (AAAS S, 2016; Nature, 2016; PLOS, 2016; The Royal Society, 2016).

Given these developments, and the influence of journal publishing on scientificcommunication and researcher success, we sought to investigate the prevalence andcharacteristics of journal data sharing policies within the biomedical research literature.The study was designed to determine the pervasiveness and quality of data sharing policiesas reflected in editorial policies and the instructions to authors. In other words, we aimedto assess the specific characteristics of the policies as they relate to data sharing in practice.We focused our analysis on the biomedical literature because of the intense attention dataavailability and its relationship to issues of reproducibility and discovery have received,and on account of our own roles as and work with biomedical researchers.

MATERIALS & METHODSWe evaluated the data sharing policies of journals that were included in Thomson Reuter’sInCites 2013 Journal Citations Reports (JCR) (Clarivate Analytics, 2016) classified withinthe following Web of Science schema categories: Biochemistry and Molecular Biology,Biology, Cell Biology, Crystallography, Developmental Biology, Biomedical Engineering,Immunology, Medical Informatics, Microbiology, Microscopy, Multidisciplinary Sciences,and Neurosciences. These categories were selected to capture the journals publishing themajority of peer-reviewed biomedical research. The original data pull included 1,166journals, collectively publishing 213,449 articles. We filtered this list to the journals in thetop quartiles by Impact Factor (IF) or number of articles published in 2013. Additionally,the list was manually reviewed to exclude short report and review journals, and titlesdetermined to be outside the fields of basic medical science or clinical research. The finalstudy sample included 318 journals, which published 130,330 articles in 2013. The study

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 3/18

Page 4: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

sample represented 27% of the original Journal Citation Report list and 61% of the originalcitable articles. After the sample was identified, the 2014 Journal Citations Reports wasreleased. While we did not use this data to change the journals in the study sample, wedid employ data from both reports in the analyses presented here. After our analysis, the2015 Journal Citation Reports was released. While we found no significant differencesin the distribution of impact factor nor citable items by year, nor in subsets definedby data sharing marks, results for the 2015 data are available in our Github repository(https://github.com/OHSU-Library/Biomedical_Journal_Data_Sharing_Policies). In ourdata pulls from JCR, we included the journal title, International Standard Serial Number(ISSN), the total citable items, the total citations to the journal, Impact Factor, and thepublisher. Table 1 reports the number (and percentage) of journals across 2013 ImpactFactors, and Table 2 reports the number of 2013 citable items per journal.

Two independent curators divided the dataset and manually reviewed each journal’sonline author instructions and editorial policies between February 2016 and June 2016,and later spot checked each other’s work. Because we were specifically interested inthe information being communicated to manuscript submitting authors about datasharing requirements, we did not consider more peripheral sources of information, such asfootnoted links to additional web pages, unless authors were specifically instructed to reviewthis information in order to understand or comply with a journal’s data sharing policy. Weranked the journals’ data sharing policies using a rubric adapted from Stodden, Guo & Ma(2013) (Table 3). The StoddenGuo, andMa rubricwas utilized because it provided a startingscale for evaluating the current state of journal data sharing policies, as it was originallyconstructed to assess to what degree journals had implemented the National Academy ofSciences guidelines for data and code sharing. Moreover, the rubric was relatively easy toadapt and expand to assess policy characteristics specific to biomedical research, such asdifferentiating those policies that only addressed omics and structural data. Additionally,we examined the policies to determine the recommended data sharing method (e.g., apublic repository or journal hosted), if data copyright or licensing recommendations werementioned, the inclusion of instructions on how long the data should be made available,and if the policy noted reproducibility or analogous concepts. We chose to examinethese characteristics based on their relationship to a growing body of best practices andrecommendations associated with data sharing, such as those addressed in the TOPGuidelines (https://cos.io/our-services/top-guidelines/) that provision policy templates forjournals. Finally, each journal was classified as either open access or subscription-based onits inclusion in the Directory of Open Access Journals database and this was confirmed oneach journal’s website (Table 4). If a discrepancy was found between these sources, we usedthe evaluation we gleaned from the journal’s website.While we did not track the occurrenceof the discrepancies, they were rare. Four independent curators were randomly assigned toevaluate the journal data sharing policies for 10 journals in our sample, 40 journals (12.5%of our sample) in total. The percentage of agreement and Cohen’s kappa was calculated foreach dimension of the rubric. Agreement between the original and independent curatorsranged from 92.308% to 100%. The Cohen’s kappa coefficient, which takes into account

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 4/18

Page 5: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Table 1 Journal impact factor category.

Journal impact factor category N (%)

<2 19 (6%)2–3.99 125 (39.3%)4–5.99 102 (32.1%)6–7.99 25 (7.9%)8–9.99 15 (4.7%)10–29.99 29 (9.1%)30–43 3 (0.9%)

Table 2 Number of citable items per journal.

Number of citable items per journal N (%)

<100 42 (13.2%)100–500 239 (75.2%)500–1,000 28 (8.8%)1,000–32,000 9 (2.8%)

the possibility of agreement occurring by chance and is generally considered a more robustmeasure, ranged from .629 to 1.0 (Table 5).

Statistical methodsContinuous variables are summarized with medians and interquartile ranges (IQRs)denoting the 25th and 75th percentiles. Categorical variables are summarized with countsand percentages. The variables IF and total citable items are not normally distributed(Shapiro Wilk’s Test p-values < 0.001), so medians are presented instead of means, andnonparametric methods are used for statistical tests.

The association of IF with 6-level data sharing mark (DSM) was tested with anonparametric Kruskal-Wallis one-way analysis of variance (ANOVA) of IF in 2013and 2014 with DSM as a grouping factor. Post-hoc pairwise two-sample Wilcoxon testswere used to determine whether the median IF for journals differ between the two-leveldata sharing policy (required vs. not required) categories. P-values from the Wilcoxontests were adjusted for multiple comparisons with the Holm procedure.

Pearson’s chi-square test was used to test the association of data sharing policy (twolevels: required vs not required) and open access status. Fisher’s Exact Test was used totest the association of the 6-level DSM with open access status. Fisher’s Test was used asopposed to Chi-square test due to the low number of open access journals within someDSM categories. To examine the association of open access status and data sharing weightedby publishing volume we examined the number of citable items in each category and testedfor the association of open access and data sharing with Pearson’s chi-square test.

All statistical analyses were performed with R version 3.2.1 (R Foundation for StatisticalComputing RCT, 2016). All code and data to reproduce these results can be foundon GitHub (https://github.com/OHSU-Library/Biomedical_Journal_Data_Sharing_Policies).

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 5/18

Page 6: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Table 3 Journal scoring rubric used in this study, adapted from Stodden, Guo &Ma (2013).

Data sharing mark1 Required as condition of publication, barring exceptions2 Required but, no explicit statement regarding effect on

publication/editorial decisions3 Explicitly encouraged/addressed, but not required.4 Mentioned indirectly5 Only protein, proteomic, and/or genomic data sharing are

addressed.6 No mention

Journal access mark (whole journal model, does not consider hybrid publishing)1 Open access0 Subscription

Protein, proteomic, genomic data sharing required with deposit to specific data banksa Yesb No

Recommended sharing methodA Public online repositoryB Journal hostedC By reader request to authorsD Multiple methods equally recommendedE Unspecified

If journal hosteda Journal will host regardless of sizeb Journal has data hosting file/s size limitc Unspecified

Copyright/licensing of dataa Explicitly stated or mentionedb No mention

Archival/retention policy (statement about how long the data should be retained)a Explicitly statedb No mention

Reproducibility or analogous concepts noted as purpose of data policya Explicitly statedb No mention

RESULTSOf the 318 journals examined, 38 (11.9%) required data sharing as a condition ofpublication and 29 (9.1%) required data sharing, but made no explicit statement regardingthe effect on publication and editorial decisions. A total of 74 (23.3%) journals explicitlyencouraged or addressed data sharing, but did not require it. 29 (9.1%) of journalsmentioned data sharing indirectly. A total of 47 (14.8%) journals only addressed datasharing for proteomic, genomic data, or other specific omics data. A total of 101 (31.8% ofjournals did not mention anything about data sharing (Fig. 1 and Table 6).

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 6/18

Page 7: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Figure 1 Percentage of journals per each data sharing mark (DSM). (A) shows the percentage of alljournals for each data sharing mark. (B) shows the percentage of citable items from each journal (includ-ing PLOS ONE) for each data sharing mark. (C) shows the percentage of citable items for each journal(excluding PLOS ONE) for each data sharing mark. Because of the journal PLOS ONE’s high publishingactivity, we analyzed the percentage of citable items for each data sharing mark including and excludingPLOS ONE.

Table 4 Number of journals per open access.

Open access # Journals (%) Median # citableitems perjournal 2013

# Citable items2013 (%)

# Citable items2013, removePLOS ONE (%)

Median # citableitems perjournal 2014

# Citable items2014 (%)

# Citable items2014, removePLOS ONE (%)

Open access 44 (13.8%) 199.5 43,789 (33.6%) 12,293 (12.4%) 207 45,831 (35.0%) 15,791 (15.6%)Subscription 274 (86.2%) 246.5 86,541 (66.4%) 86,541 (87.6%) 240 85,276 (65.0%) 85,276 (84.4%)

In order to understand the potential influence of the policies on the published literature,we also evaluated the distribution of publication volume by each data sharing mark. In2013, the total number of citable items (papers) in the studied journals was 130,330. In2014, the total number of citable items was 131,107. The median number of citable itemsper journal was 243.0 and 237.5, respectively (Table 5).

Table 5 shows the 2013 and 2014 publishing volume in citable items for each datasharing mark. While it is likely that some of the journals in the study implemented orrevised their data sharing policies after 2014, the publishing volume data is current enoughto provide an insight into the potential influence of existing journal data sharing policieson the published literature.

While only 21% of the journals in the study required data sharing (DSM 1 and 2), thesejournals published 42.1% of the citable items in 2013 and 2014 (23.6% and 24.9% of thecitable items in 2013, 2014 after removing PLOS ONE) (Table 5).

The median 2013 journal IF for journals with the strongest data sharing policies (DSM1) was 8.2; whereas, the median 2013 IF for journals with no mention of data sharingwas 3.5. Figure 2 shows the median IF for each DSM category by report year. The IFwasalso analyzed by collapsing the DSM into two categories: Required (DSM 1, 2) and Not

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 7/18

Page 8: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Table 5 Curator reliability.

Score # Journals % Agreement Cohen’s Kappa

Open access mark 40 100.000 1.000Data sharing mark 40 92.500 0.905Protein proteomic genomic or microaray sequence orstructural data sharing addressed required with deposit tospecific data banks

40 100.000 1.000

Recommended preferred sharing mark 40 97.500 0.959Size guidelines if journal hosted provided 13 92.308 0.629Copyright licensing mark 40 100.000 1.000Archival retention mark 40 97.500 0.655Reproducibility noted mark 40 92.500 0.754

Required (DSM 3, 4, 5, 6). The median 2013 IF for the journals that required data sharingwas 6.8, and the median 2013 IF for the journals that did not require data sharing was 4.0.

Impact Factor is significantly associated with the six-category data sharing marks(Kruskal-Wallis rank sum test, 5 df , p < 0.001, 2013 and 2014). Examining pairwisedifferences between DSM categories, we see that journals with DSM 1 have significantlyhigher IF than journals with DSM 3, 4, 5, or 6 (Wilcoxon test, p< 0.001, <0.001, 0.04,<0.001; 2013 data, 2014 similar). Journals with DSM 2 have significantly higher IF thanjournals with DSM 3, 4, or 6 (Wilcoxon test, p= 0.034, 0.0072, 0.0033; 2013 data, 2014similar). Journals with DSM 5 have significantly higher IF than journals with DSM 3, 4,and 6 (Wilcoxon test, p 0.0022, <0.001, <0.001; 2013 data, 2014 similar). In general, IF isnot significantly different between DSM 1 and 2 and between DSM 2 and 5, reflecting thesimilar IF for journals with explicit data sharing requirements, either full or partial sharing.After collapsing DSM into two categories, required (DSM 1, 2) and not required (DSM 3,4, 5, 6), we still see a highly significant increase in IF for journals with required data sharing(Wilcoxon Rank Sum Test, p< 0.001, 2013 and 2014 data) (Fig. 2).

Table 7 shows the count of subscription and open access journals for each DSMcategory, and the count and percentage of subscription and open access journals for eachDSM category. The Fisher’s Exact Test result, which yielded a p-value of 0.07, showedno significant association between the DSM and a journal’s access model. We also testedthis association by collapsing the DSM into two categories, required (DSM 1, 2) and notrequired (DSM 3, 4, 5, 6), and using a Chi-square test. Again, no significant associationwas found (Chi-square Test, df = 1, p= 0.62). Both results suggest that journals with adata sharing requirement are not more likely to be open access than journals without adata sharing requirement.

Although there was no significant association between open access and DSM at thejournal level, we observed a highly significant association at the citable item level (Chi-square Test, df = 1, p< 2e−16). That is, a citable item that is open access is much morelikely to be published in a journal with a data sharing requirement (DSM 1 or 2). Theproportion of open access journals that require data sharing is much larger than theproportion of subscription journals (64.3% vs 11.3%). The very small p-value is partially

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 8/18

Page 9: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Table 6 Publishing volume by data sharing mark.

DSM DSM description # Journals (%) Median # citableitems perjournal 2013

# Citable items2013 (%)

# Citable items2013, removePLOS ONE (%)

Median # citableitems perjournal 2014

# Citable items2014 (%)

# Citable items2014, removePLOS ONE (%)

1 Required as conditionof publication,barring exceptions

38 (11.9%) 230.5 42,669 (32.7%) 11,173 (11.3%) 220 42,794 (32.6%) 12,754 (12.6%)

2 Required but noexplicit statementregarding effect onpublication/editorialdecisions

29 (9.1%) 209 12,138 (9.3%) 12,138 (12.3%) 227 12,436 (9.5%) 12,436 (12.3%)

3 Explicitlyencouraged/addressed,but not required.

74 (23.3%) 259.5 25,519 (19.6%) 25,519 (25.8%) 282.5 26,026 (19.9%) 26,026 (25.8%)

4 Mentioned indirectly 29 (9.1%) 256 8,062 (6.2%) 8,062 (8.2%) 225 7,894 (6%) 7,894 (7.8%)5 Only protein,

proteomic, and/orgenomic data sharingare addressed.

47 (14.8%) 277 19,339 (14.8%) 19,339 (19.6%) 316 19,080 (14.6%) 19,080 (18.9%)

6 No mention 101 (31.8%) 211 22,603 (17.3%) 22,603 (22.9%) 213 22,877 (17.4%) 22,877 (22.6%)

Publishing volume by data sharing requirementDSM 1&2 Required 67 (21.1%) 226 54,807 (42.1%) 23,311 (23.6%) 221 55,230 (42.1%) 25,190 (24.9%)DSM 3–6 Not Required 251 (78.9%) 248 75,523 (57.9%) 75,523 (76.4%) 244 75,877 (57.9%) 75,877 (75.1%)

Publishing volume in all journalsTotal All Journals 318 (100%) 243 130,330 (100%) 98,834 (100%) 237.5 131,107 (100%) 101,067 (100%)

Vasilevskyetal.(2017),PeerJ,D

OI10.7717/peerj.3208

9/18

Page 10: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Figure 2 Impact factors were higher for journals with the strongest data sharing policies (DSM 1)compared to journals with nomention of data sharing (DSM 6). The median Impact Factor was calcu-lated for the journals with each data sharing mark (1–6) for each report year (left= 2013, right= 2014).The lower and upper hinges of the boxplots represent the first and third quartiles of journal Impact Fac-tor, the horizontal line represents the median, the triangle represents the mean, and the upper and lowerwhiskers extend from the hinge to the highest (lowest) value that is within 1.5 times the interquartile rangeof the hinge, with journals outside this range represented as points.

due to the large number of total citable articles studied and also due to the large proportionof open access citable items in PLOS ONE. However, even with PLOS ONE removedfrom the analysis, an open access article is still more likely to have been published in ajournal with a data sharing requirement and the proportion of open access journals versussubscription journals that require data sharing is 16.0% vs 11.3% (Chi-square Test, df = 1,p< 2e−16).

As illustrated in Fig. 3, excluding those journals with no mention of data sharing(DSM 6), 57.6% (125) of the journals in the data set recommended data sharing via apublic repository, 20.7% (45) recommended sharing via a journal hosted method, 1.8%(4) recommend sharing by reader request to authors, 5.1% (11) state multiple equallyrecommended methods and 14.8% (32) do not specify.

Of the journals requiring data sharing (DSM 1 or 2), 85% (57) recommend data sharingvia a public repository. Of the journals that recommended data sharing via a journal hostedmethod, the majority, 88.8% (40), did not specify any size limitations.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 10/18

Page 11: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Table 7 Open access journals and citable items by data sharing mark.

Open access journals and citable items by data sharingmark

Subscription Open access %Open access

Data sharing mark # Journals(# Citable items)

# Journals(# Citable items)

% Journals(% Citable items)

1- Required as condition of publication, barring exceptions 29 (7,709) 9 (34,960; 3464a) 23.7% (81.9%; 31%a)2- Required but no explicit statement regarding effect onpublication/editorial decisions

27 (11,864) 2 (274a) 6.9% (2.3%)

3- Explicitly encouraged/addressed, but not required. 63 (22,884) 11 (2,635) 14.9% (10.3%)4- Mentioned indirectly 29 (8,062) 0 (0) 0% (0%)5- Only protein, proteomic, and/or genomic data sharingare addressed.

40 (17,401) 7 (1,938) 14.9% (10.0%)

6- No mention 86 (18,621) 15 (3,982) 14.9% (17.6%)

Data sharing requirementDSM 1&2 - Required 56 (19,573) 11 (35,234; 3,738a) 16.42% (64.29%; 16.04%a)DSM 3–6 - Not required 218 (66,968) 33 (8,555) 13.15% (11.33%)

Notes.aAfter removing PLOS ONE.

Only 7.3% (16) journals that addressed data sharing (DSM 1,2,3, 4, and 5) explicitlymentioned copyright or licensing considerations. Even for those journals that requireddata sharing (DSM 1 or 2), only 16.4% (11) mentioned copyright or licensing; however,these journals published 31.9% of the citable items in 2013 of the journals that addresseddata sharing. Only 2 journals in the entire data set addressed how long the data should beretained.

In light of its frequently used justification, we also coded the data sharing policiesfor a mention of scientific reproducibility or analogous concepts. Reproducibility orsimilar language was mentioned by 16.9% (54) of the total studied journals. Of the journalsrequiring data sharing (DSM1 or 2), 65.5% (44)mentioned the concept of reproducibility.

DISCUSSIONPublishers have an influential role to play in promoting, facilitating, and enforcing datasharing (Dallmeier-Tiessen et al., 2014; Lin & Strasser, 2014). However, only a minorityof the journals analyzed for this this study required data sharing. While the capacity ofthe existing policies is more promising if considered from the perspective of publishingvolume, our results were consistent with other examinations of data sharing policies(Piwowar, Chapman &Wendy, 2008; Barbui, 2016; McCain, 1995). Like Piwowar andChapman (Piwowar, Chapman &Wendy, 2008), we found that a large proportion of thejournals we examined (40%) required the deposition of omics data to specific repositories.Less frequent and more varied, however, were requirements that addressed data in general.The higher prevalence of omics data sharing requirements we observed may be due tothe more mature guidelines, reporting standards, and centralized repositories for omicsdata types (Piwowar, Chapman &Wendy, 2008; Brazma et al., 2001; Hrynaszkiewicz et al.,2016; Piwowar & Chapman, 2010). The further development and implementation of wellcommunicated best practices and resources for general data types, could be a means for

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 11/18

Page 12: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Figure 3 Recommended data sharing method by data sharing mark (DSM) 1–5. The number (percent)of journals with each recommended data sharing method is represented by each tile, with brighter blueshades denoting higher percentages of journals with the given data sharing method.

increasing the prevalence and strength of journal data sharing requirements and ensuringcompliance (Lin & Strasser, 2014).

While a problematic and often abused proxy for quality, the IF is closely associatedwith a journal’s prestige (Lariviere et al., 2016). It influences publication decisions and theperceived significance of individual papers (Lariviere et al., 2016). Because of its impacton scholarly communication, it is noteworthy that there was a significantly higher IFassociated with the journals with a data sharing requirement. This result was similar toother studies (Piwowar, Chapman &Wendy, 2008; Stodden, Guo & Ma, 2012; Sturges et al.,2015; Magee, May & Moore, 2014). As has been noted, prestigious journals may be betterpositioned and be more willing to impose new requirements and practices on authors (Lin& Strasser, 2014; Piwowar, Chapman &Wendy, 2008).

The importance and benefits of data sharing are often linked to and discussed withinthe larger context of open access to research results, specifically the published literature.Public access to both peer reviewed articles and data are regarded as necessary elementsfor addressing problems within the scientific enterprise and realizing the full value ofresearch investments, as exemplified in recent policies aimed at increasing access to and thequality of biomedical research in which both open access and data sharing are addressed

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 12/18

Page 13: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

(Holdren, 2012). As such, we wanted to investigate if there was a positive relationshipbetween open access journals and data sharing requirements. While we found that an openaccess citable item is much more likely to be published in a journal with a data sharingrequirement, we did not find that open access journals are any more likely to require datasharing than subscription journals. This result is in contrast with a previous finding fromPiwowar and Chapman (Piwowar, Chapman &Wendy, 2008). However, we analyzed agreater number of journals and a greater number of open access journals. We hypothesizesome open access journals may be less willing to impose additional requirements, becausethey lack the prestige or prominence of more established journals and publishers. Smallerand independent open access journals may also lack the resources to facilitate and enforcedata sharing

How data is managed and shared affects its value. If a data set is difficult to retrieveor understand, for example, replication studies can’t be performed and researchers can’tuse the data to investigate new questions. While 65.7% of the journals in our study thatrequired data sharing addressed the concept of reproducibility, as with earlier investigations(Piwowar, Chapman &Wendy, 2008; Sturges et al., 2015) we found that most data sharingpolicies did not provide specific guidance on the practices that ensure data is maximallyavailable and reusable (DataONE, 2016; Starr et al., 2015). For example, the majority ofjournals that addressed data sharing (DSM 1–5) recommended depositing data in a publicrepository; however, only a handful of journals provided guidelines or requirements relatedto licensing considerations or retention timeframes. While a higher IF was associated withthe presence of a data sharing requirement, overall the policies did not provide guidelinesor specificity to facilitate reproducible and reusable research. This result is similar to aprevious study in which we showed that the majority of biomedical research resources arenot uniquely identifiable in the biomedical literature, regardless of journal Impact Factor(Vasilevsky et al., 2013).

Our study confirms earlier investigations which observed that only a minority ofbiomedical journals require data sharing, and a significant association between higherImpact Factors and journals with a data sharing requirement. Our approach, however,included several limitations. Only journals in the top quartiles by volume or Impact Factorfor the Web of Science categories we identified as belonging to the biomedical corpus wereanalyzed, which introduced some inherent biases. While the 2014 median Impact Factorfor the journals in our sample was 4.16, the median Impact Factor for all journals in theWeb of Science categories we utilized was 2.50. Similarly, the median number of 2014citable items was 237.5 for our sample, and 94.0 for all journals. In hindsight, it wouldhave been valuable to have systematically analyzed more nuanced aspects of the policies’quality characteristics, such as whether minimal information or metadata standards wereaddressed and if the shared data was reviewed in the peer review process. Finally, it shouldbe noted that many of the policies we reviewed were difficult to interpret. While the study’sauthors are confident that the data sharing scores we assigned reflect the most accurateinterpretation of each journal’s policy at the time of our data collection, the policies ingeneral included ambiguous and fragmented information. It is possible, therefore, thatthere are gaps between the scores we assigned to some policies and their editorial intent.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 13/18

Page 14: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

There have also been changes to some policies since our last data collection. Springer,for example, implemented a new policy in July of 2016 as described by Freedman, 2016(http://www.springersource.com/simplifying-research-data-policy-across-journals/).

As a continuation of this work, we plan to maintain a community curated and regularlyupdated public repository of journal data sharing policies, which will include journalsbeyond the sample represented in this study. The repository will utilize a schema thatbuilds upon the rubric used for this study, but also addresses additional and more nuancedaspects of data sharing, such as those described above and addressed in the TOP Guidelines(The TOP Guidelines Committe, 2016), the FAIR Data Principles (Wilkinson et al., 2016), aswell as the FAIR-TLCmetrics, which build upon the FAIRData Principles to alsomake dataTraceable, Licensed, and Connected (Haendel et al., 2016). We will also collect additionalinformation about the journals, such as age and frequency of publication, as outlined byStodden, Guo & Ma (2012). In addition to providing a queryable resource of journal datasharing policies, the database’s curation schedule will facilitate an understanding of policychanges over time and inform future evolution. It is our hope that data from the repositorywill be ingested and used by other resources, such as BioSharing (McQuilton et al., 2016), tofurther the discovery and analysis of data sharing policies. A follow-up study will look at thedata availability for articles associated with the journals in this study. Finally, building uponrecommendations outlined by the Journal Research Data (JoRD) Project (Sturges et al.,2015), Lin and Strasser (Lin & Strasser, 2014), the Center for Open Science (https://cos.io/),and the FAIR-TLC metrics, we intend to convene a community of stakeholders to furtherwork on recommendations and template language for strengthening and communicatingjournal data sharing policies. Maximally available and reusable data will not be achieved viathe implementation of vague data sharing policies that lack specific direction on where datashould be shared, how it should be licensed, or the ways in which it should be described.On the contrary, such specificity is essential.

CONCLUSIONSWe observed a two-pronged problem with journal data sharing policies. First, given theattention the benefits of data sharing have received from the biomedical community, itis problematic that only a minority of journals have implemented a strong data sharingrequirement. Second, among the policies that do exist, guidelines vary and are relativelyambiguous. Overall, the biomedical literature is lacking policies that would ensure thatunderlying data is maximally available and reusable.

This is problematic if we are to realize the outcomes and improvements that open datais supposed to facilitate.

ACKNOWLEDGEMENTSThanks to the following colleagues for their curation assistance or visualization advice:Steven Bedrick, PhD; Heather Coates, MLIS; Jill Emery, MLIS; Erin Foster, MLIS; DanielleRobinson; Chris Shaffer, MLIS; Kate Thornhill, MLIS; Jackie Wirz, PhD.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 14/18

Page 15: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was supported by National Institutes of Health grant numbers NIH BD2K SCCHHSN316201200001W, HHSN27200001. The funders had no role in study design, datacollection and analysis, decision to publish, or preparation of the manuscript.

Grant DisclosuresThe following grant information was disclosed by the authors:National Institutes of Health: NIH BD2K SCC HHSN316201200001W, HHSN27200001.

Competing InterestsThe authors declare there are no competing interests.

Author Contributions• Nicole A. Vasilevsky and Robin E. Champieux conceived and designed the experiments,performed the experiments, wrote the paper, reviewed drafts of the paper.• JessicaMinnier analyzed the data, contributed reagents/materials/analysis tools, preparedfigures and/or tables, reviewed drafts of the paper.• Melissa A. Haendel reviewed drafts of the paper.

Data AvailabilityThe following information was supplied regarding data availability:

Github: https://github.com/OHSU-Ontology-Development-Group/Biomedical_Journal_Data_Sharing_Policies.

REFERENCESAAAS S. 2016. Science: editorial policies | Science | AAAS. Available at http://www.

sciencemag.org/authors/ science-editorial-policies (accessed on 21 October 2016).Barbui C. 2016. Sharing all types of clinical data and harmonizing journal standards.

BMCMedicine 14:63 DOI 10.1186/s12916-016-0612-8.Borgman CL. 2012. The conundrum of sharing research data. Acta Anaesthesiologica

Scandinavica 63:1059–1078 DOI 10.1002/asi.22634.Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J,

AnsorgeW, Ball C, Causton H, Gaasterland T, Glenisson P, Holstege F, Kim I,Markowitz V, Matese J, Parkinson H, Robinson A, Sarkans U, Schulze-KremerS, Stewart J, Taylor R, Vilo J, VingronM. 2001.Minimum information about amicroarray experiment (MIAME)-toward standards for microarray data. NatureGenetics 29:365–371 DOI 10.1038/ng1201-365.

Clarivate Analytics. 2016. Journal citation reports. Available at http:// clarivate.com/scientific-and-academic-research/ research-evalution/ journal-citation-reports/(accessed on 3 November 2016).

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 15/18

Page 16: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Collins FS, Tabak LA. 2014. Policy: NIH plans to enhance reproducibility. Nature505:612–613 DOI 10.1038/505612a.

Dallmeier-Tiessen S, Darby R, Gitmans K, Lambert S, Matthews B, Mele S, SuhonenJ, WilsonM. 2014. Enabling sharing and reuse of scientific data. New Review ofInformation Networking 19:16–43 DOI 10.1080/13614576.2014.883936.

DataONE. 2016. All best practices | DataONE. Available at https://www.dataone.org/all-best-practices (accessed on 25 October 2016).

Drazen JM, Morrissey S, Malina D, Hamel M. 2016. The importance - and thecomplexities - of data sharing. New England Journal of Medicine 375:1182–1183DOI 10.1056/NEJMe1611027.

European Commission. (2016). FAIR data management in horizon 2020. Available athttp:// aims.fao.org/activity/blog/ guidelines-fair-data-management-horizon-2020(accessed on 20 October 2016).

Fischer BA, ZigmondMJ. 2010. The essential nature of sharing in science. Science andEngineering Ethics 16:783–799 DOI 10.1007/s11948-010-9239-x.

Freedman P. 2016. Simplifying research data policy across journals. [Web Blog Post].Haendel M, Su A, McMurry J, Chute CG, Mungall C, Good B,Wu C, McWeeney S,

Hochheiser H, Robinson P, Hoatlin M, BrushM, Smedley D, Munoz-TorresM, Jiang G, Liu H, McQuilton P. 2016.Metrics to assess value of biomedical digitalrepositories: response to RFI NOT-OD-16-133. Geneva: Zenodo.

Holdren JP. 2012. Increasing access to the results of federally funded scientific research.Available at https:// obamawhitehouse.archives.gov/ sites/default/ files/microsites/ ostp/ostp_public_access_memo_2013.pdf .

Hrynaszkiewicz I, Khodiyar V, Hufton A, Sansone S-A. 2016. Publishing descriptions ofnon-public clinical datasets: proposed guidance for researchers, repositories, editorsand funding organisations. Research Integrity and Peer Review 1:6DOI 10.1186/s41073-016-0015-6.

Lariviere V, Kiermer V, MacCallum C, McNutt M, PattersonM, Pulverer B, Swami-nathan S, Taylor S, Curry S. 2016. A simple proposal for the publication of journalcitation distributions. BioRxiv DOI 10.1101/062109.

LeClere F. 2010. Too many researchers are reluctant to share their data. The Chronicleof Higher Education. Available at http://www.scienceofsciencepolicy.net/ sites/default/files/ attachments/Too%20Many%20Researchers%20Are%20Re...pdf .

Lin J, Strasser C. 2014. Recommendations for the role of publishers in access to data.PLOS Biology 12:e1001975 DOI 10.1371/journal.pbio.1001975.

Longo DL, Drazen JM. 2016. Data sharing. New England Journal of Medicine374:276–277 DOI 10.1056/NEJMe1516564.

Magee AF, MayMR, Moore BR. 2014. The dawn of open access to phylogenetic data.PLOS ONE 9(10):e110268 DOI 10.1371/journal.pone.0110268.

McCain KW. 1995.Mandating sharing: journal policies in the natural sciences. ScienceCommunication 16:403–431 DOI 10.1177/1075547095016004003.

McQuilton P, Gonzalez-Beltran A, Rocca-Serra P, ThurstonM, Lister A, Maguire E,Sansone S-A. 2016. BioSharing: curated and crowd-sourced metadata standards,

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 16/18

Page 17: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

databases and data policies in the life sciences. Database: The Journal of BiologicalDatabases and Curation 2016:baw075 DOI 10.1093/database/baw075.

Medium.com. 2016. Inspiring a new generation to defy the bounds of innovation: amoonshot to cure cancer—cancer moonshot. Available at https://medium.com/cancer-moonshot/ inspiring-a-new-generation-to-defy-the-bounds-of-innovation-a-moonshot-to-cure-cancer-fbdf71d01c2e#.gx72sbluo (accessed on 5 November 2016).

Nature. 2016. Availability of data, and material, and methods. Available at http://www.nature.com/authors/policies/ availability.html (accessed on 21 October 2016).

Nature Publishing Group. 2013. Reporting checklist for life sciences articles. Available athttp://www.nature.com/authors/policies/ checklist.pdf .

NIH. 2016a. National institutes of health genomic data sharing policy. Available athttps:// gds.nih.gov/03policy2.html (accessed on 20 October 2016).

NIH. 2016b. Principles and Guidelines for Reporting Preclinical Research | NationalInstitutes of Health (NIH). Available at https://www.nih.gov/ research-training/ rigor-reproducibility/principles-guidelines-reporting-preclinical-research (accessed on 20October 2016).

NSF. 2016. Dissemination and sharing of research results | NSF—National ScienceFoundation. Available at http://www.nsf.gov/bfa/dias/policy/dmp.jsp (accessed on20 October 2016).

Piwowar HA, ChapmanW. 2010. Public sharing of research datasets: a pilot study ofassociations. Journal of Informetrics 4:148–156 DOI 10.1016/j.joi.2009.11.010.

Piwowar, ChapmanHA,WendyW. 2008. A review of journal policies for sharingresearch data. Available at http:// ocs.library.utoronto.ca/ index.php/Elpub/2008/paper/ view/684/0.

PLOS. 2016. PLOS data availability. Available at http:// journals.plos.org/plosone/ s/data-availability (accessed on 21 October 2016).

R Foundation for Statistical Computing RCT. 2016. R: a language and environmentfor statistical computing. Available at https://www.R-project.org (accessed on 2November 2016).

Research Councils UK. 2011. RCUK common principles on data policy—researchcouncils UK. Available at http://www.rcuk.ac.uk/ research/datapolicy/ (accessed on25 October 2016).

Savage CJ, Vickers AJ. 2009. Empirical study of data sharing by authors publishing inPLoS journals. PLOS ONE 4(9):e7078 DOI 10.1371/journal.pone.0007078.

Starr J, Castro E, Crosas M, Dumontier M, Downs R, Duerr R, Haak LL, Haendel M,Herman I, Hodson S, Hourclé J, Kratz JE, Lin J, Nielsen LH, Nurnberger A, ProellS, Rauber A, Sacchi S, Smith A, Taylor M, Clark T. 2015. Achieving human andmachine accessibility of cited data in scholarly publications. PeerJ Computer Science1 DOI 10.7717/peerj-cs.1.

Stodden V, Guo P, Ma Z. 2012.How journals are adopting open data and code policies.In: Proceedings for the governing pooled knowledge resources. Digital Library of theCommons: University of Indiana. Available at http://hdl.handle.net/ 10535/9584.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 17/18

Page 18: Reproducible and reusable research: are journal data sharing … · 2017-04-26 · Reproducible and reusable research: are journal data sharing policies meeting the mark? Nicole A

Stodden V, Guo P, Ma Z. 2013. Toward reproducible computational research: an empir-ical analysis of data and code policy adoption by journals. PLOS ONE 8(6):e67111DOI 10.1371/journal.pone.0067111.

Sturges P, BamkinM, Anders JHS, Hubbard B, Hussain A, Heeley M. 2015. Re-search data sharing: developing a stakeholder-driven model for journal policies.Journal of the Association for Information Science and Technology 66:2445–2455DOI 10.1002/asi.23336.

Tenopir C, Allard S, Douglass K, Aydinoglu A,Wu L, Read E, Manoff M, FrameM.2011. Data sharing by scientists: practices and perceptions. PLOS ONE 6:e21101DOI 10.1371/journal.pone.0021101.

The Royal Society. 2016. Royal Society data sharing and mining. Available at https:// royalsociety.org/ journals/ ethics-policies/data-sharing-mining/ (accessed on 21October 2016).

The TOP Guidelines Committe. 2016. Promoting an open research culture: the TOPguidelines for journals. OSF Preprints. Available at https:// osf.io/ xd6gr/ ?action=download (accessed on 05 October 2016).

Vasilevsky N, BrushMH, Paddock H, Ponting L, Tripathy SJ, Larocca GM, HaendelMA. 2013. On the reproducibility of science: unique identification of researchresources in the biomedical literature. PeerJ 1:e148 DOI 10.7717/peerj.148.

WilkinsonMD, Dumontier M, Jan Aalbersberg I, Appleton G, AxtonM, Baak A,Blomberg N, Boiten J-W, Da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ,Clark T, Crosas M, Dillo I, DumonO, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, HooftR, Kuhn T, Kok R, Kok J, Lusher SJ, MartoneME, Mons A, Packer AL, Persson B,Rocca-Serra P, Roos M, Van Schaik R, Sansone S-A, Schultes E, Sengstag T, SlaterT, Strawn G, Swertz MA, ThompsonM, Van der Lei J, VanMulligen E, Velterop J,Waagmeester A,Wittenburg P,Wolstencroft K, Zhao J, Mons B. 2016. The FAIRguiding principles for scientific data management and stewardship. Scientific Data3:160018 DOI 10.1038/sdata.2016.18.

Vasilevsky et al. (2017), PeerJ, DOI 10.7717/peerj.3208 18/18