are the fair data principles fair? · h2020 / eu demands on open data and research data management....

31
Alastair Dunning @alastairdunning Jasmin Böhmer @JasminBoehmer Madeleine de Smaele @MadeleineSmaele Technical University of Delft Hosts of 4TU.Centre for Research Data Are the FAIR Data Principles Fair?

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Alastair Dunning@alastairdunning

Jasmin Böhmer@JasminBoehmer

Madeleine de Smaele@MadeleineSmaele

Technical University of DelftHosts of 4TU.Centre for Research Data

Are the FAIR Data Principles Fair?

Page 2: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Are the FAIR Data Principles Fair?Blog Post with all the information:

http://bit.ly/2lIgc9pFAIR Principles – Connecting the Dots for the IDCC 2017

Page 3: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Motivation for this Project

● H2020 / EU demands on open data and research data management.

● Providing insight and support for repositories to improve their

information architecture and digital infrastructure to comply to H2020

and FAIR demands.

● Own aspiration to offer the best possible service and support for

4TU.Centre for Research Data.

● Working towards practices to improve interoperability and reuse-value

of data-sets in research data repositories.

Page 4: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

● Using the FAIR principles and corresponding facets as scoring

matrix

● Applying a traffic-light rating system:

● Use the information available on the web-interface of the

repository online to evaluate the FAIR Principles

Methodology

Page 5: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture
Page 6: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Our Interpretation of the FAIR Principles

http://bit.ly/2lI2CCJ

Page 7: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

1. Compliance is not high

Page 8: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

https://data.4tu.nl/repository/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f / General Overview Charts

N = 37

Findable Accessible

Interoperable Re-Usable

Page 9: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

F1 (meta)data are assigned a globally unique and eternally persistent identifier.

49% of the repositories do not assign DOI, HANDLE, or URN.

E.g. Subject Based Repositories use project ID’s or subject specific ID-systems. These links do not work in public spheres.

Page 10: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

A4 metadata are accessible, even when the data are no longer available.

97% of the repository do not clearly write about their metadata persistency, if the data is not available (anymore).

The transparency and integrity of the repository is improved by providing metadata-records for closed, restricted, or unavailable data-sets.

Page 11: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

I2 (meta)data use vocabularies that follow FAIR principles.

100% of the repositories do not have visible ontologies or (controlled) vocabulary.

Adding a semantic layer that enables links to unambiguous terms and definitions needs a lot of curation effort.

Is e.g. ORCID (Open Researcher and Contributor ID) a vocabulary?

Page 12: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

R1 meta(data) have a plurality of accurate and relevant attributes.

38% of the repositories do not provide sufficient information that helps to determine the value of reuse for the information seeker.

Specific information are mostly included in the documentation. Displaying those information in appropriate metadata fields would be beneficial.

Page 13: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

2. Some principles are easily measured; some are much more subjective

Page 14: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Pretty Obvious - (meta)data are

assigned a globally unique and

eternally persistent identifier.

Page 15: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Vague - data are described with

rich metadata

What makes metadata rich ?

Page 16: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Subjective - (meta)data meet

domain-relevant community

standards

Page 17: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Philosophically dubious -

(meta)data use vocabularies that

follow FAIR principles

Page 18: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

3. Some principles are narrow; others are broad

Page 19: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Narrow -

(meta)data are retrievable by their identifier using

a standardized communications protocol.

the protocol is open, free, and universally

implementable.

the protocol allows for an authentication and

authorization procedure, where necessary.

Page 20: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Broad -

(meta)data include qualified references to other

(meta)data.

(meta)data meet domain-relevant community

standards (takes a long time to figure out)

Page 21: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Technical vs Policy ● (meta)data are retrievable by their identifier

using a standardized communications protocol.

● the protocol is open, free, and universally

implementable.

● the protocol allows for an authentication and

authorization procedure, where necessary.

● metadata are accessible, even when the data

are no longer available.

Page 22: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

4. Some subject areas fare badly

Page 23: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Compliance of Social Science Data Repositories against FAIR

Findable Principles (F1, F2, F3 and F4)

Page 24: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Practice for Social Science Repositories Analysed

● Data only available on request

● Licence not visible / clear

● Plenty of free text documentation on collection of data exists

● No structured metadata per dataset / no machine readable

metadata

● But still seem to work well within the discipline

Page 25: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

LASA - Longitudinal Aging Study Amsterdam. Aging research and collecting data on aging in the Netherlands

No global identifierNo structured metadataBut plenty of documentation

Page 26: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

● Licence sometimes clear (no data protection issues)

● Some free text documentation on the overall collection of data

exists

● No structured metadata per dataset / sometime the data is

dynamically created following query

● No global identifiers per dataset

● Meeting existing disciplinary norms but not fully embedded as

machine readable data

Practice for Climate Data Repositories Analysed

Page 27: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

SACA - Southeast Asian Climate Assessment

No global identifierNo structured metadataBut plenty of documentation

Page 28: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

5. For repositories, doing some simple(ish) things vastly helps compliance

Page 29: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

● Create a permanent identifier for each dataset

● Always use an open license or clear License● Make sure each dataset has rich metadata

associated with it (Dublin Core good starting place!)

● Make data available via http

Page 30: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Some Final Points (I)

● FAIR principles are deliberately vague - principles to be

interpreted

● Nothing about back-up and preservation. Relationship to Data

Seal of Approval?

● Much more work to be done on relationship between FAIR data

and FAIR repository

Page 31: Are the FAIR Data Principles Fair? · H2020 / EU demands on open data and research data management. Providing insight and support for repositories to improve their information architecture

Some Final Points (II)

● To create FAIR dataset demands alliance between repository

and dataset creator

● Governance? How are principles updated

● FAIR principles derive not from libraries / archives but more

from life sciences; but still require good knowledge of metadata

/ archiving practice