Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policies
Simon Hodson, Executive Director, CODATA
www.codata.org
African Open Science Platform Project and Research Data Alliance Workshop Association of African Universities Conference
Palm Royal Beach Hotel, Accra, Ghana 8 June 2017
Why Open Science, Open Data and FAIR Data?
Why Open Science / FAIR Data?
• Good scientific practice depends on communicating the evidence.
• Open research data are essential for reproducibility, self-correction.
• Academic publishing has not kept up with age of digital data.
• Danger of an replication / evidence / credibility gap.
• Boulton: to fail to communicate the data that supports scientific assertions is malpractice
• Open data practices have transformed certain areas of research.
• Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data…
• FAIR data helps use of data at scale, by machines, harnessing technological potential.
• Research data often have considerable potential for reuse, reinterpretation, use in different studies.
• Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system.
• Research data produced by publicly funded research are a public asset.
Data Revolution: A World that Counts!
Creating a world that counts: Mobilising the Data Revolution for Sustainable Development.
To meet the new sustainablity goals ‘there is an urgent need to mobilise the data revolution for all people and the whole planet in order to monitor progress, hold governments accountable and foster sustainable development.’
Without immediate action, gaps between developed and developing countries, between information-rich and information-poor people, and between the private and public sectors will widen, and risks of harm and abuses of human rights will grow.
Data quality and integrity
Data disaggregation (no-one should be invisible)
Data timeliness
Data transparency and openness
Data usability and curation
Data protection and privacy
Data governance and independence
Data resources and capacity
Data rights
Data Revolution: how can we improve … with open data?
GODAN-ODI Report: improving agriculture, food and nutrition with open data.
‘Although the amount of data openly available is constantly increasing, there are still challenges related to data management, licensing, interoperability and exploitation. There is a need to evolve policies, practices and ethics around closed, shared, and open data.’
Enabling more efficient and effective decision making > lowers cost of accessing information and underpins tools that farmers themselves can use.
Fostering innovation to benefit everyone > an opportunity that must not be missed for creating new businesses and jobs in ‘new data-powered innovation ecosystems’.
Driving organisational and sector change through transparency > open data is essential to understanding complex systems, interventions, targets, change.
Availability is not enough > essential that the data be interoperable and machine-readable.
Problem oriented and solution-based data strategies.
Develop infrastructure and human capacity.
The Value of Open Data Sharing
Report by CODATA for GEO, the Group on Earth Observation.
Provides a concise, accessible, high level synthesis of key arguments and evidence of the benefits and value of open data sharing.
Particular, but not exclusive, reference to Earth Observation data.
Benefits in the areas of:
Economic Benefits
Social Welfare Benefits
Research and Innovation Opportunities
Education
Governance
Available at http://dx.doi.org/10.5281/zenodo.33830
GEO DSWG is building on this work with further examples: would be valuable to work with this community.
Africa, Data and Open Science
21st century is the century of data.
Data skills and infrastructure will be essential for economic advancement and for sustainable development.
We need to create a ‘world that counts’ that gathers data and uses data to understand itself.
Open data is essential to increase impact of research and translation for practitioners.
African governments, research and education systems and universities have an interest in developing data skills and infrastructure.
African universities have an essential role to play as educators of a data savvy generation and as the stewards of the data created by African research.
The data from many research projects conducted in Africa is not looked after in African institutions.
African institutions need to present their research outputs, including data, as a shop window and a record of their activities, achievements, impact.
Open Science
What is Open Science:
Open access to research literature.
Data that is as Open as possible, as closed as necessary.
FAIR Data (Findable, Accessible, Interoperable, Reusable).
A shop window and repository of all research outputs.
A culture and methodology of open discussion and enquiry (including methodology, lab notebooks, pre-prints)
Research data is evidence: it is fundamental to the validity and reproducibility of science.
Those research disciplines that have leapt forward in the past 15-20 years are those that have shared and analysed data at scale: genomics, astronomy, disciplines using remote sensing data etc.
African research institutions have an opportunity to build their reputation around research specialisation: and this requires data specialisation and FAIR data collections.
The Case for Open Data in a Big Data World
• Science International Accord on Open Data in a Big Data World: http://www.science-international.org/
• Supported by four major international science organisations.
• Presents a powerful case that the profound transformations mean that data should be:
• Open by default
• Intelligently open
• Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved.
• Campaign for endorsements: over 150 organisations so far.
• Please consider endorsing the Accord: http://www.science-international.org/#endorse
National and Institutional Data Strategies
Opportunities for Research Institutions and for National Research Base
Open and FAIR Research Data Presents Major Opportunities for Research Systems
Research intensive RPOs will be data intensive RPOs.
Supporting researchers’ use of data is a key strategic mission and enabler: world class research environment includes support for data stewardship.
An RPO’s reputation is increasingly built on all research outputs and wider societal and economic impact: data is core to this.
Development of significant data collections of research intensive universities. Leading departments / research groups will be characterised by excellence in data, by Open FAIR data collections.
The way in which the contribution to research of both the individual researcher and the institution will increasingly be measured on the basis of data outputs as well as research articles.
Policies less and less ambiguous – data stewardship, RDM is necessary for grant funding success.
Avoid reputational damage through data loss.
Challenges for Research Systems
Policy development: unpicking Open and FAIR data
Supporting data through the lifecycle.
Culture and incentives: what’s in it for us?
Skills gaps: training and support.
Technical systems and infrastructure.
Developing culture of conscious data stewardship: what to keep and what to discard.
Supporting the long term stewardship of research data: finding niche in data ecosystem, clarifying division of responsibility between institutional national and international repositories.
Sustainability and finance.
Sustainability and finance.
Sustainability and finance.
…
Framework for National and Institutional Data Strategies
National / Institutional Open and FAIR Data Strategy.
Open data policies and guidance at national and institutional level.
Clarify the boundaries of open (particularly privacy, IPR).
Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.
Data ‘publication’ and citations of data included in assessment of research contribution.
Promotion of data skills (researchers and data stewards).
Development of institutional infrastructure for research collaboration and data stewardship/RDM.
Collaborative infrastructures for certain research disciplines, nationally, regionally to pool expertise and lower costs.
Data Management Planning
Managing Active
Data
Processes for selection and
retention Deposit / Handover
Data Repositories/
Catalogues
Components of RDM support services
RDM Policy and Roadmap Business Plan and
Sustainability
Guidance, Training and Support
Research Data Registry /
Infrastructure
14
Institutional Research Data Management Policies: http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies/uk-institutional-data-policies
Group Work 1: Materials
Consider the principles laid out in the Science International Accord on Open Data in a Big Data World: http://www.science-international.org/#accord (short version) or http://bit.ly/AOSP_Accord_Short
Consider Section D of the Accord on Enabling Practices: http://www.science-international.org/#accord (long version) or http://bit.ly/AOSP_Accord_Long
Group Work 1: Activity
1. Endorse the Accord: Please consider endorsing the Accord and take and action to discuss it in your institution: http://www.science-international.org/#endorse
2. Responsibilities. Do you agree with the description of responsibilities in the Accord? Who are the key national stakeholders? Who are the key stakeholders in the research institution? What are their roles and responsibilities? What would their roles and responsibilities be ideally? What needs to be done to achieve this? (30 minutes)
3. Enabling Practices. What are the most important enabling practices? What things should a national or institution data strategy address? Critique the framework for National and Institutional Data Strategies. (30 minutes)
4. Reporting Back (20 minutes)
Open and FAIR Data Policy at National and Institutional Level
Resources: Current Best Practice for Research Data Management Policies
Expert report commissioned by CODATA member: http://dx.doi.org/10.5281/zenodo.27872
Provides comprehensive summary of best practice in funder data policies.
Identifies key elements to be addressed:
1. Summary of policy drivers
2. Intelligent openness
3. Limits of openness
4. Definition of research data
5. Define data in scope
6. Criteria for selection
7. Summary of responsibilities
8. Infrastructure and costs
9. DMP requirements
10. Enabling discovery and reuse
11. Recognition and reward
12. Reporting requirements, compliance monitoring
Resources: Current Best Practice for Research Data Management Policies
See also RECODE Report, Annex on Policy Development: http://recodeproject.eu/
LEARN Project Toolkit: http://learn-rdm.eu/en/about/
FOSTER Knowledge Base on Open Science: https://www.fosteropenscience.eu/
Group Work 2: Materials
Consider the framework for data policies in ‘Current Best Practice for Research Data Management Policies’ http://dx.doi.org/10.5281/zenodo.27872 or http://bit.ly/AOSP_CODATA_Policy
Consider the elements of data policies in the LEARN project ‘Developing a Research Data Policy: Core Elements of the Content of a Research Data Management Policy’: http://bit.ly/AOSP_LEARN_Policy
Group Work 2: Activity
1. Develop the Outline of an Institutional (University or RPO) Research Data Management Policy (60 minutes)
• What elements need to be included?
• What could the institution say about these issues?
• What would the process be for developing and adopting a data policy?
• What are the key dependencies?
• How would you go about it?
2. Reporting back (20 minutes)
Simon Hodson Executive Director CODATA
www.codata.org http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org
Email: [email protected] Twitter: @simonhodson99
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris,
FRANCE
Thank you for your attention!
The Open Data Iceberg
The Technical Challenge
The Ecosystem Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
Processes &
Organisation
People
Geoffrey Boulton (CODATA) - developed from an idea by Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.
A National Infrastructure
Technology
Where should research data go?
• Earth observation data;
• Genetic data;
• Social science survey data…
Homogenous data collections
essential for research
• Significant data outputs from funded projects;
• Raw and analysed experimental data…
Significant data outputs of
publicly funded research
• Raw and analysed data for reproducibility (evidence);
• Data behind the graph…
Data underpinning
research publications
National and international data
archives
National or institutional data
archives; data papers
Dedicated data archives (e.g.
Dryad)
Boundaries of Open
For data created with public funds or where there is a strong demonstrable public interest, Open should be the default.
As Open as Possible as Closed as Necessary.
Proportionate exceptions for:
Legitimate commercial interests (sectoral variation)
Privacy (‘safe data’ vs Open data – the anonymisation problem)
Public interest (e.g. endangered species, archaeological sites)
Safety, security and dual use (impacts contentious)
All these boundaries are fuzzy and need to be understood better!
There is a need to evolve policies, practices and ethics around closed, shared, and open data.
Emerging Policy Consensus? FAIR Data
• FAIR Data (see original guiding principles at https://www.force11.org/node/6062
• Findable: have sufficiently rich metadata and a unique and persistent identifier.
• Accessible: retrievable by humans and machines through a standard protocol; open and free; authentication and authorization where necessary.
• Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’.
• Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance.
• FAIR Data now at the heart of H2020 policy, European Open Science Cloud etc.
• Under the revised version of the 2017 work programme, the Open Research Data pilot has been extended to cover all the thematic areas of Horizon 2020.
• Current EC Guidance at http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf and http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
• European Commission Expert Group (chaired by Simon Hodson, CODATA; Sarah Jones, DCC, Rapporteur) producing implementation guidelines for FAIR Data for EC Funded Programmes: draft report end 2017, final report March 2018: http://bit.ly/FAIRdata-EG
FAIR Guiding Principles (1)
• To be Findable:
• F1. (meta)data are assigned a globally unique and persistent identifier
• F2. data are described with rich metadata (defined by R1 below)
• F3. metadata clearly and explicitly include the identifier of the data it describes
• F4. (meta)data are registered or indexed in a searchable resource
• To be Accessible:
• A1. (meta)data are retrievable by their identifier using a standardized communications protocol
• A1.1 the protocol is open, free, and universally implementable
• A1.2 the protocol allows for an authentication and authorization procedure, where necessary
• A2. metadata are accessible, even when the data are no longer available
(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
FAIR Guiding Principles (2)
• To be Interoperable:
• I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
• I2. (meta)data use vocabularies that follow FAIR principles
• I3. (meta)data include qualified references to other (meta)data
• To be Reusable:
• R1. meta(data) are richly described with a plurality of accurate and relevant attributes
• R1.1. (meta)data are released with a clear and accessible data usage license
• R1.2. (meta)data are associated with detailed provenance
• R1.3. (meta)data meet domain-relevant community standards
(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)