ischools future of data managemente dec2017

33
Natasha Simons What’s coming next? The future of research data management Australian National Data Service iSchools Data Science Winter Institute Hong Kong, 7-8 December 2017

Upload: ands-nectar-rds

Post on 21-Jan-2018

37 views

Category:

Education


4 download

TRANSCRIPT

Natasha Simons

What’s coming next? The future of research data management

Australian National Data ServiceiSchools Data Science Winter Institute

Hong Kong, 7-8 December 2017

Brisbane, Australia

University of Queensland

What is ANDS?

NCRIS• National Collaborative Research Infrastructure

Strategy (NCRIS)

• Australian government program

• Drives research excellence and collaboration between researchers, government and industry to deliver practical outcomes

• Funds research infrastructure projectsincluding ANDS, Nectar and RDS

• 2016 National Research Infrastructure Roadmap outlines Australian research infrastructure required over next decade

ANDS/Nectar/RDSAligned set of joint investments to deliver four key transformations in the research sector:

1. A world leading data advantage

2. Accelerated innovation

3. Collaboration for borderless research

4. Enhanced translation of research

Our approachBuilding on and leveraging previous investments and relationships:

1. Research domain program

2. Research data platforms

3. Sector-wide support and engagement

Trend #1 Data policiesFunder data sharing policies are on the rise.

Examples:

Data sharing is essential for expedited translation

of research results into knowledge, products and

procedures to improve human health….[and it]

should be made as widely and freely available

as possible... - National Institutes of Health USA

(1)

Publicly funded research data are a public

good...which should be made openly available

with as few restrictions as possible in a timely

and responsible manner - Research Councils UK

(2)

(1) National Institutes of Health. 2003. Data Sharing

Policy and Implementation Guidance.

(2) Research Councils UK. 2011 (revised 2015). RCUK

Common Principles on Data Policy.

Photo by Christine Roy on Unsplash

Trend #1 Data policiesResearch data principle: as open as possible, as closed as necessary - European Commission

Horizon 2020 Guidelines (1)

We expect our researchers to maximise the availability of research data, software and materials

with as few restrictions as possible - Wellcome Trust (2)

The ARC is committed to maximising the benefits from ARC-funded research, including by

ensuring greater access to research data. Since 2007, the ARC has encouraged researchers to

deposit data arising from research projects in publicly accessible repositories. The ARC’s

position reflects an increased focus in Australian and international research policy and

practice on open access to data generated through publicly funded research. - Australian

Research Council (3)

(1) National Institutes of Health. 2003. Data Sharing Policy and Implementation Guidance.

(2) Research Councils UK. 2011 (revised 2015). RCUK Common Principles on Data Policy.

(3) Australian Research Council. http://www.arc.gov.au/research-data-management.

Trend #1 Data policiesGovernment open data policies are on the rise.

Examples:

Newly-generated [USA] government data is required to

be made available in open, machine-readable formats,

while continuing to ensure privacy and security (1)

All EU institutions are invited to make their data publicly

available whenever possible (2)

The Japanese government is promoting the Open Data

initiative, in which the government widely discloses public

data (3)

(1) USA Federal Government. 2013. Memorandum - Open Data Policy - Managing Information as an Asset.

(2) European Union. Open Data Portal - About.(3) Japan Open Data Initiative

Trend #1 Data policiesThe Australian Government Open Data Declaration is

about making more government information available to

the public online (4)

The public sector information portal of the Government

of the Hong Kong Special Administrative Region with

datasets from different government departments and

public/private organisations (5)

Keywords: transparency, openness, return on

investment, economy, industry/government/research

collaborations, innovation

(1) Australian Government. 2010. Declaration of Open Government.

(2) Hong Kong Open Data Portal. data.gov.hk.

Trend #1 Data policiesPublisher/Journal data policies and initiatives are on the

rise. Examples:

PLOS journals require authors to make all data underlying

the findings described in their manuscript fully available

without restriction, with rare exception - PLOS (1)

A condition of publication in a Nature Research journal

is that authors are required to make materials, data, code,

and associated protocols promptly available to readers

without undue qualifications. - Nature (2)

(1) PLOS. Data availability. (2) Nature. Availability of data, materials and methods.

Trend #1 Data policiesPublisher signed statement examples:

The ultimate measure of success is in the replicability of

science, generation of new discoveries, and in progress

on the grand challenges facing society that depend on

the integration of open data, tools, and models from

multiple sources. This statement of commitment signals

important progress and a continuing commitment by

publishers and data facilities to enable open data in

the Earth and space sciences - COPDESS (1)

Transparency, open sharing, and reproducibility are

core values of science. Over 5,000 journals and

organizations have already become signatories of the

TOP Guidelines. - TOP Guidelines (2)

(1) COPDESS. Statement of Commitment. (2) Centre for Open Science. Transparency and Openness

Guidelines.

Photo by Drew Hays on Unsplash

Policy challenges● Existence of data policy e.g. the higher the Impact Factor of the journal the

more likely they are to have a data availability policy and to enforce it (1)

● Data policies vary widely: content; discoverability; ease of interpretation;

infrastructure providers; support for compliance (2)

● Most journal data sharing policies do not provide specific guidance on the

practices that ensure data is maximally available and reusable (3)

(1) Piwowar, HA and Chapman, WW (2010) Public sharing of research datasets: A pilot study of associations. Journal of Informetrics, 4 (2).

148 - 156. ISSN 1751-1577

(2) Naughton, L. & Kernohan, D., (2016). Making sense of journal research data policies. Insights. 29(1), pp.84–89. DOI:

http://doi.org/10.1629/uksg.284

(3) Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2017) Reproducible and reusable research: are journal data sharing policies

meeting the mark? PeerJ 5:e3208https://doi.org/10.7717/peerj.3208

Policy challenges● Data availability declines over time (1)

● The most effective journal data policies mandate data sharing in a repository

and a data availability statement with a link to the data (2)

● Data availability from authors on request has been found wanting in several

studies/case studies (3-5)

● The introduction of a data availability policy can polarize the research

community e.g. PLOS, ICMJE

(1) Vines et al. (2013) Current Biology. DOI: http://dx.doi.org/10.1016/j.cub.2013.11.014

(2) Vines, et al. (2013) FASEB J doi: 10.1096/fj.12-218164

(3) Systematic Reviews 2014, 3:97 doi:10.1186/2046-4053-3-97

(4) American Psychologist, Vol 61(7), Oct 2006, 726-728. doi:10.1037/0003-066X.61.7.726

(5) 1.PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078

Thanks to Iain Hyrnaszkiewicz, Springer Nature, for dot points 1-3 above.

Trend #2 Data sharingFigshare open data survey 2017:

● 82% aware of open data sets● 80% willing to reuse open data sets in own research● 60% routinely share their data (frequently or

sometimes)● 21% have never made a data set openly available● 74% are now curating their data for sharing● 77% value a data citation the same as an article

Science, Digital (2017): The State of Open Data 2017 Report -Infographic. figshare.https://doi.org/10.6084/m9.figshare.5519155.v1 pp. 7-11

Trend #2 Data sharingWe can see strong signals that open data is becoming more embedded [but] there is still a lack of confidence around open data.

Figshare open data survey 2017

Trend #2 Data sharingA 2011 study of 500 papers that were published in 2009 from 50 top-ranked

research journals showed that only 47 papers (9%) of those reviewed had deposited full primary raw data online.

As another study notes, the number of datasets being shared annually has increased by more than 400% from 2011 to 2015, and this pace will likely continue.

What Constitutes Peer Review of Data? A Survey of Peer Review Guidelines by Todd A. Carpenter. Scholarly Kitchen blog post 11 April 2017. https://scholarlykitchen.sspnet.org/2017/04/11/what-constitutes-peer-review-research-data/

Trend #2 Data sharingMore than two thirds of Wiley researchers reported they are now sharing their data. Though this varies geographically and across research disciplines we are seeing that more researchers are sharing their data and taking efforts to make it reproducible.

Wiley Global Data Sharing Infographic June 2017.https://authorservices.wiley.com/author-resources/Journal-Authors/licensing-open-access/open-access/data-sharing.html

Data sharing challenges

Lack of understanding of the open/shared/closed model.

Lack of skills/understanding about how to share sensitive data.

Still too few “rewards” for data sharing.

Researchers may lack skills needed to manage and share data.

Wiley survey - Top 4 reasons why researchers

are hesitant to share their data:

● 50% Intellectual Property or confidentiality

issues

● 31% Ethics concerns

● 23% Concerns about misinterpretation or

misuse of my research

● 22% Concerns that my research will be

scooped

Photo by rawpixel.com on Unsplash

Trend #3 Connected research/dataConnected research (researchers, research organisations, publications, data, grants,

software, methods and more) is important:

● for better discovery of research (data)

● to assist the ability to reproduce research

● to research transparency

● to aid attribution and credit

● to track use and impact

Persistent Identifiers (PIDs) and global standards play a key role in connecting research.

Looks something like this..Research Graph is an open collaborative project that builds the capability for connecting researchers, publications,

research grants and research datasets (data in research).

http://researchgraph.org/

Trend #3 Connected research/dataExamples of progress:

The ability to access and review the data behind research is a well sought after, but

often elusive, resource. In recognition of this, Scopus has been working to incorporate

new tools that can make it easier to search and share data - Scopus makes strides

in data linking

Major publishers have committed to requiring ORCID iDs in the publishing

process for their journals and invite other publishers to do the same - Requiring

ORCID in Publication Workflows: Open Letter

Approximately 148 million DOIs have been assigned [to publications, data,

software and more] through a federation of Registration Agencies world-wide -

Frequently asked questions about the DOI system

Connected data challenges

Photo by William Bout on Unsplash

● Raise PiD adoption levels

e.g. THOR Project

● ORCIDs - need to be

populated and used

● Increasing PiDs in research

workflows

● Need standard ways to

exchange information e.g.

Scholix initiative to link data

and publications

● Data Citation practice

challenges

Trend #4 Data reuseThere is a push for reusable research data. Examples:

Why enable reuse? The UK Data Archive provides many reasons, including: encouraging

scientific enquiry and debate; promoting innovation and potential new data uses.

2013 study: “We further conclude that, at least for gene expression microarray data, a substantial

fraction of archived datasets are reused, and that the intensity of dataset reuse has been

steadily increasing since 2003” - Piwowar HA, Vision TJ. (2013) Data reuse and the open data

citation advantage. PeerJ1:e175 https://doi.org/10.7717/peerj.175.

Early 2017: Springer Nature responded to the US National Institutes of Health’s request for

information on Strategies for NIH Data Management, Sharing, and Citation. They made a number

of recommendations to the NIH, and funding organisations, including: Encouraging researchers

to share and describe datasets in a way that facilitates reuse and reproducibility.

Data reuse challenges“87% of researchers don’t know what licence to apply to their data” - Daniel

Hook, CEO Digital Science, 3/1//17

There is a quality issue: (a) sharing data is necessary but not sufficient for future

reuse, (b) ensuring that data is “independently understandable” is crucial, and (c)

incorporating a data review process is feasible - Peer et al. Committing to a Data

Quality review. IDCC14 Practice Paper.

Other issues: Geographic differences and differences across age groups: younger

respondents feel more favorably toward data sharing and reuse, yet make less of

their data available than older respondents - Tenopir et al. Changes in Data Sharing

and Data Reuse Practices and Perceptions among Scientists Worldwide

So the future of RDM is...FAIR Data #1 Data policies help support Findable data#2 Data sharing helps create Accessible data#3 Connected research/data is part of Interoperable data#4 Data reuse is enabled by Reusable dataFORCE11 Fair Data Principles

By SangyaPundir (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

FAIR Data... ● Requires good data

management across the whole

lifecycle.

● Requires many stakeholders to

work together

iProfessionals have:

● a challenge

● an opportunity

● an incredible amount of skills

and knowledge to contribute!

ANDS FAIR Data flyer

Research lifecycle - traditional

University of Bournemouth - http://blogs.bournemouth.ac.uk/research/tag/rkeo/

DATA

Research lifecycle - data infused

Find data

Plan to manage dataPublish data

Collect, store, analyse, visualise data

Cite data

The future is an opportunity

"The challenge of the

unknown future is so

much more exciting

than the stories of the

accomplished past."

- Simon Sinek

Photo by Warren Wong on Unsplash

With the exception of third party images or where otherwise indicated, this work is licensed under the Creative

Commons 4.0 International Attribution Licence.

ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research

Infrastructure Strategy Program (NCRIS).

[email protected]

orcid.org/0000-0003-0635-1998

@n_simons

Natasha Simons