data archiving and networked services dans is an institute of knaw en nwo access to research data in...

24
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Access to Research Data in Trustworthy Digital Archives Peter Doorn, Director DANS TABLE RONDE OPEN ACCESS ET ACCÈS AUX DONNEES RÉUNION CPU BRUXELLES JEUDI 16 OCTOBRE 2014

Upload: ashlynn-wade

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Data Archiving and Networked Services

DANS is an institute of KNAW en NWO

Access to Research Data in Trustworthy Digital Archives

Peter Doorn, Director DANS

TABLE RONDE OPEN ACCESS ET ACCÈS AUX DONNEES

RÉUNION CPU BRUXELLESJEUDI 16 OCTOBRE 2014

Contents

• Data is hot!• To share or not to share…• Data fraud and emerging data policies• Data cultures: variations across disciplines• Archiving research data for permanent access• Trust, data quality, research data management• The role of journals in data access• International organizations and research data• Conclusions

Data is hot!

Neelie Kroes (Vice-President of the European Commission responsible for the Digital Agenda): “Data is the new gold”

Horizon 2020 and data

Máire Geoghegan-Quinn (European Commissioner for Research and Innovation): "We must give taxpayers more bang for their buck. Open access to scientific papers and data will speed up important breakthroughs by our researchers and businesses, boosting knowledge and competitiveness in Europe.”

To share or not to share...

What is in it for the researcher? Benefits:• Visibility• Citation: researchers who share data are cited more often than others

Piwowar, H., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ PrePrints, 1:e1. doi:10.7287/peerj.preprints.1”[…] we find a robust citation benefit from open data […] there is a direct effect of third-party data reuse that persists for years […] a substantial fraction of archived datasets are reused, and the intensity of dataset reuse has been steadily increasing since 2003."

Researchers who share data get more citations!

Seven common objections to data sharing...and how to overcome them (1-4)

Why not share? How to overcome?

1. No one else can understand the complexity of my data

Document the data, describing the conditions of the research

2. If someone else analyzes my data, they may come up with a different answer disproving my perspective

By considering different perspectives on the same data set, we will come closer to the “right” answer

3. Someone else may find something new in my data that I did not see

Finding something new in an existing data set will increase the return on investment in the data collection

4. I have not finished analyzing my data, and I will make it available once my analysis is complete

Research is a never-ending story… A published paper suggests that the data have been substantially analyzed; thus sharing at this point seems appropriate

Adapted from Stephen H. Koslow (2000)

Seven common objections to data sharing...and how to overcome them (5-7)

Why not share? How to overcome?

5. It is my data that I worked very hard to collect, and no one else has the right to it.

Publicly funded data should be publicly available. A publication implies that research results are to be shared. Reviewers and readers should have access to the primary data on which publications are based.

6. I cannot trust or understand the data produced in another laboratory

If this is not possible, who can we trust the the scientific literature? This is the mirror image of 1.

7. Documenting my data for others costs me time for which I get no credit

You will be rewarded by more citations; your work will be more visible and your data will be cited as well.

Adapted from Stephen H. Koslow (2000), ‘Should the neuroscience community make a paradigm shift to sharing primary data?’, Nature America Inc., 3:9 (September), p. 863-865.

Data sharing cultures across scientific domains

Locally: on my own com-puter(s), or on computer(s) of my department or labora-

tory36%

On external hard disks or backup media (CD, DVD, tape,

etc.)23%

On a network disk of my department or institute

31%

On a central storage facility outside my department or insti-

tute9%

Other1%

NiederlandeRenommierter Psychologe gesteht Fälschungen

September 2011: D

iederik Stapel, S

ocial Psychology

November 2011: D

on Poldermans, C

ardiovascular M

edicine

June 2012: Dirk

Smeesters,

Experimental Socia

l Psychology

October 2

012: Mart B

ax, Cultu

ral Anthropology

The KNAW “Schuyt report” on data practices

• A lot of variation across and within disciplines

• Pattern: data management in small-scale research more risky than in big science

• Risk: missing checks and balances, especially in period after granting a research proposal and before publication

• Peer pressure is an important control mechanism

http://www.knaw.nl/Content/Internet_KNAW/publicaties/pdf/20131009.

pdf

Increased awareness of need for Data Policies• Dutch Academy (KNAW) “supports the free

movement of data and results. Taking into account variations across and within scientific disciplines, free availability of data should be the default”.

• Dutch Universities are developing data policies • Dutch research funding organisation NWO: Data

Management Plans (DMP’s) and data sharing are becoming requirements for funding

• Science Europe: Research Data Working Group recommends members to develop open data policies

• EU research funding programme Horizon 2020: DMP and open data access standard, opt-out possible

• EU Vice-President Neelie Kroes: “Data is the oil for science” (Riding the Wave report, 2010)

• RECODE recommendations: develop and implement explicit policies requiring timely open access to research data as the default position

What is

?

Institute of Dutch Academy and

Research Funding

Organisation (KNAW & NWO)

since 2005

First predecessor dates back to

1964 (Steinmetz Foundation),

Historical Data Archive 1989

Mission: promote permanent access to digital research

information

Research Data Netherlands: federated infrastructure• Mission: the promotion of

sustained access and responsible re-use of digital research data

• Cooperation of DANS, 3TU.Datacentre and SURFsara (training, Dutch Data Award)

• Towards a collaborative data infrastructure

www.researchdatanetherlands.nl

EASY: Electronic Archiving System (self-deposit for long-term storage)

Our core services

Hosting Dutch Dataverse Network for Universities (data storage during research projects)

NARCIS: Gateway to scholarly information In the Netherlands

1960-69

1980-891991

19931995

19971999

20012003

20052007

20092011

0

5000

10000

15000

20000

25000

30000

CentERdata3TU.DCDANS

Excluding 1276 datasets in 2012 without year of deposit, among which 99 of Univ. Tilburg

Dat

aset

s (c

umul

ative

)

Datasets archived at DANS and accessible through Narcis.nl

November 2010: 1 million data filesMarch 2013: 2 million data files (25,000 data sets)

Reuse of datasets at DANS 2005-2013

20052006

20072008

20092010

20112012

20130

5000

10000

15000

20000

25000

30000

35000

Open (after login)63%

Other (closed)1%

Restricted (permission re-

quest)11%

Restricted (group access)

25%

March 2014

Open (after login)49%

Other (closed)2%

Restricted (per-mission request)

12%

Restricted (group access)

37%

2012

• Trust is at the very heart of creating, managing, storing and sharing data for all stakeholders:– Data creators– Data users– Data repositories– Funders

• Data quality:– valid– accurate– consistent– integrity– timely– complete

Trust in research data

ESFRI Research Infrastructures and Trust

Requirements for CLARIN Centres“Centres need to have a proper and clearly specified repository system and participate in a quality assessment procedure as proposed by the Data Seal of Approval or MOIMS-RAC approaches”

Building Trust: CESSDA Self-Assessment ProjectParticipants from fifteen CESSDA member organisations discussed the CESSDA-ERIC requirements and agreed upon using the Data Seal of Approval (DSA) guidelines as a tool to gain information on the level of their conformance with the DSA and the CESSDA-ERIC requirements.

Certification of digital repositories

• International framework• 3 standards• 3 levels (basic, extended, formal)

Certification Standards: Data Seal of Approval (DSA)• DANS initiative (2005/6)• International Board• 16 guidelines• Self assessment• Transparency • 34 seals awarded since 2010

The research data:• can be found on the

Internet• are accessible (clear rights

and licenses)• are in a usable format• are reliable• can be referred to

(persistent identifier)

Data producers are responsible for the quality of research data, repositories for storage and long-term access, and users for correct use of data

http://datasealofapproval.org/

Journals and data

"Nature Publishing Group (NPG) has announced the Spring 2014 launch of Scientific Data: http://www.nature.com/scientificdata/

Around 30 Elsevier journals have a Data Availability Policy (DAP)

DANS will soon

launch its own data

journal

International data organisations

• Alliance for Permanent Access (APA): to develop a shared vision and framework for a sustainable organisational infrastructure for permanent access to scientific information. www.alliancepermanentaccess.org

• Research Data Alliance (RDA): to accelerate international data-driven innovation and discovery by facilitating research data sharing and exchange, use and re-use, standards harmonization, and discoverability. www.rd-alliance.org

• International Council for Science / Committee on Data for Science and Technology (ICSU/Codata): to strengthen international science for the benefit of society by promoting improved scientific and technical data management and use. www.codata.org/

Concluding remarks• Archiving and sharing data is good for research!

– it has added value: higher return on investment– it increases visibility and citation of the researcher– it does not prevent data fraud, but it increases the transparency of

research and is a deterrent against “sloppy science”

• Archiving and sharing or publishing data, via trustworthy data repositories, increases the reliance on quality data

• Research funding and performing organizations should develop data policies, requiring that research projects have a data management plan, and that such plans contain a section on accessibility of data after publication of the results in a trusted repository

Data Archiving and Networked Services

DANS is an institute of KNAW en NWO

Thank you for your attention

www.dans.knaw.nlwww.narcis.nl

[email protected]

http://youtu.be/HJbo-OAaJ1I

This presentation summarized in a 4 minute video, introduced by Neelie Kroes!