open science - global perspectives/simon hodson
TRANSCRIPT
Open Science: Global Perspectives
Simon Hodson, Executive Director, CODATA
www.codata.org
SA-EU Open Science Dialogue WorkshopBirchwood Hotel & OR Tambo Conference Centre
Johannesburg, South Africa30 November 2017
Why Open Science / FAIR Data?
• Good scientific practice depends on communicating the evidence.
• Open research data are essential for reproducibility, self-correction.
• Academic publishing has not kept up with age of digital data.
• Danger of an replication / evidence / credibility gap.
• Boulton: to fail to communicate the data that supports scientific assertions is malpractice
• Open data practices have transformed certain areas of research.
• Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data…
• FAIR data helps use of data at scale, by machines, harnessing technological potential.
• Research data often have considerable potential for reuse, reinterpretation, use in different studies.
• Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system.
• Research data produced by publicly funded research are a public asset.
Policy Push for Open Research Data
• The three Bs (Budapest, Berlin and Bethesda) and Open Access, 2002-3
• OECD Principles and Guidelines on Access to Research Data, 2004, 2007
• UK Funder Data Policies, from 2001, but accelerates from 2009
• NSF Data Management Plan Requirements, 2010
• Royal Society Report ‘Science as an Open Enterprise’, 2012
• OSTP Memo ‘Increasing Access to the Results of Federally Funded Scientific Research’, Feb 2013
• G8 Science Ministers Statement, June 2013
• G8 Open Data Charter and Technical Appendix, June 2013
• EC H2020 Open Data Policy Pilot, 2014; Adoption of FAIR Data Principles, 2017.
• Science International Accord on Open Data in a Big Data World, Dec 2015: http://bit.ly/opendata-bigdata
Developments: Journal Data Policies
Dryad Joint Data Archiving Policy, Feb 2010: http://datadryad.org/jdap
This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity.
PLOS Data Availability Policy, revised Feb 2014: http://www.plosone.org/static/policies.action#sharing
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exceptions.
Springer Nature initiative to standardise policies: http://www.springernature.com/gp/group/data-policy/policy-types
FAIRsharing https://fairsharing.org ;
RDA Interest Group to encourage development and adoption of journal data policies
AGU COPDESS activity to promote greater data availability in earth system sciences.
Resources: Current Best Practice for Research Data Management Policies
Expert report commissioned by CODATA member.
Provides comprehensive summary of best practice in funder data policies.
Identifies key elements to be addressed:
1. Summary of policy drivers
2. Intelligent openness
3. Limits of openness
4. Definition of research data
5. Define data in scope
6. Criteria for selection
7. Summary of responsibilities
8. Infrastructure and costs
9. DMP requirements
10. Enabling discovery and reuse
11. Recognition and reward
12. Reporting requirements, compliance monitoring
Zenodo: http://dx.doi.org/10.5281/zenodo.27872
CODATA Data Policy Activities
New Data Policy Committee, chaired by Paul Uhlir, international expert in Data Policies and member of CODATA Executive Committee.
Current Best Practice for Research Data Management Policies http://dx.doi.org/10.5281/zenodo.27872
The Value of Open Data Sharing, report for GEO http://dx.doi.org/10.5281/zenodo.33830
Legal Interoperability, Principles and Implementation Guidelines https://doi.org/10.5281/zenodo.162241
FAIR Data
Simon Hodson is chairing the European Commission’s Expert Group on FAIR Data: http://bit.ly/FAIR_Data_Expert_Group
OECD Global Science Forum and CODATA Project on Business Models for Sustainable Data Repositories: http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models
The Case for Open Datain a Big Data World
• Science International Accord on Open Data in a Big Data World: http://www.science-international.org/
• Supported by four major international science organisations.
• Takes a global approach: data revolution and Open Science are phenomena with global ramifications. Repeatedly stresses the opportunities for LMICs and the negative consequences of being left outside a system of data intensive research.
• Presents a powerful case that the profound transformations mean that data should be:
• Open by default
• Intelligently open, FAIR data
• Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved.
• Campaign for endorsements: over 150 organisations so far.
• Please consider endorsing the Accord: http://www.science-international.org/#endorse
The “Science International” Accord:principles of open data
(www.icsu.org/science-international)
Responsibilities1-2. Scientists3. Research institutions & universities4. Publishers5. Funding agencies6. Scholarly societies and academies7. Libraries & repositories
8. Boundaries of openness
Enabling practices9. Citation and provenance10. Interoperability11. Non-restrictive re-use12. Linkability
Framework for Regional, National and Institutional Data Strategies
National / Institutional Open Science and FAIR Data Strategy
Consultative forum, stakeholder engagement.
Open data policies and guidance at national and institutional level.
Clarify the boundaries of open (particularly privacy, IPR).
Clarify the data in scope, guidelines on selection.
Develop incentives and reward systems.
Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.
Data ‘publication’ and citations of data included in assessment of research contribution.
Promotion of data skills:
Essential data skills for researchers.
Develop skills and competencies for data stewards, data scientists.
Framework for Regional, National and Institutional Data Strategies
Scope, roadmap and implement data infrastructure.
Key components of national and regional infrastructure (network / NREN, economies of scale for storage and compute).
Development of regional, national and institutional infrastructure(s) for research collaboration and data stewardship/RDM, generic research platforms/environments, trusted digital repositories.
Collaborative infrastructures for certain research disciplines, nationally, regionally to pool expertise and lower costs.
International infrastructure / data ecosystem components: permanent identifiers, metadata standards.
Establish African Open Data Forum / Platform
Funded Research Data Infrastructure Initiatives
Funded, co-designed transdisciplinary researchprojects
Co-design African Open Data Policies
Develop Incentives Frameworks
Develop Research Data Science Training
African Research Data Infrastructure Roadmap
Activities requirelow funding for coordination, secondment,
contributions in kind and evaluation.
Activities requirehigher investmentfor coordination,
co-designimplemenatationand evaluation.
African Open Science PlatformPilot Project Workpackages
International ‘ecosystem’ of open science components
Open Science infrastructure is not just the network, storage and compute.
Ecosystem of components which are created and governed internationally.
Reporting Research Outputs: information systems for research output reporting (CRIS), metadata standards e.g. CERIF, managed by euroCRIS.
Persistent and Unique Identifiers: DOIs for articles (CrossRef); DOIs for data sets (DataCite); author IDs (ORCID).
Data and Metadata Standards: CIF in crystallography, FITS in astronomy, DDI in social science surveys, Darwin Core in biodiversity, etc, etc.
DCC Registry of Metadata Standards http://www.dcc.ac.uk/resources/metadata-standards ; now maintained by RDA IG http://rd-alliance.github.io/metadata-directory/
Data Repositories: listed in Re3Data, registry of data repositories: https://www.re3data.org/
Trusted Data Repositories: Core Trust Seal https://www.coretrustseal.org/, a merger of Data Seal of Approval and the World Data System criteria.
Criteria for Trustworthy Digital Archives (DIN 31644) http://www.data-archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=3
Audit and certification of trustworthy digital repositories (ISO 16363) http://www.data-archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=2
Global Registry of Data Repositories
Country coverage in Re3Data.org (registry of data repositories.
Data Seal of Approval
Location of repositories having acquired Data Seal of Approval
CODATA 2017 Session: ‘World Tour of Open Data and Open Science’
Presentations from USA, Finland, Japan, China, Australia, Canada, Israel, South Africa, Kenya.
Presentations covered: National Policies, Key Policy Players, Most Significant Projects, Barriers.
Intention is for the session to be written up as a series of surveys.
Australia: Policies
Australia Research Council (ARC)
Encourages researchers to deposit data arising from research in publicly accessible repositories.
Since 2014, requires researchers to provide a DMP as part of the application process
Australian Code for the Responsible Conduct of Research (2007) include the proper management of research data
“Research data should be made available for use by the other researchers unless this is prevented by ethical, privacy or confidentiality matters.”
Reference OECD Principles and Guidelines for Access to Research Data from Public Funding (2007)
National Health and Medical Research Council (NHMRC)
Acknowledges the importance of making data publicly accessible.
Encourages data sharing and providing access to data and other research outputs arising from NHMRC supported research.
2016 National Research Infrastructure Roadmap
Recommends Australian Research Data Cloud
Slide Credit: Jane Hunter
Australia: Key Initiatives
ANDS – Australian National Data Service
Research Data Australia https://researchdata.ands.org.au/ (currently 133539 records)
ANDS has been a major player nationally and internationally (72M AUS$ over 3.5 years from 2009-13)
NeCTAR – National eResearch Collaborative Tools and Resources
RDS – Research Data Services
CSIRO data activities; Data61 (digital and data innovation group).
Various Data Initiatives – particularly strong in ecosystem / biodiversity studies
TERN Terrestrial Ecosystem Research Network http://www.tern.org.au/
Atlas of Living Australia https://www.ala.org.au/
IMOS, Integrated Marine Observing System, a National Collaborative Research Infrastructure
Slide Credit: Jane Hunter
Canada: Policies
Canada does not have an overarching policy per se but a patchwork of policy initiatives by important players is starting to emerge.
“Government and its information must be open by default. Simply put, it is time to shine more light on government to make sure it remains focused on the people it was created to serve –Canadians.“(PM Trudeau)
The 3 federal granting councils (NSERC, SSHRC, CIHR) have drafted a policy embracing the FAIR principles for research data and are entering a consultation process. They embraced open publications in 2015
Nine provinces/territories and 55 cities have embraced open data
Federal publications are available as part of the Depository Services Program.
Some individual universities are developing data repositories and support services to assist researchers in research data management.
Open journal publishing at an embryonic stage but is strongly supported by academic libraries.
Slide Credit: Jon Broome and Ernie Boyko
Canada: Policies
Canada’s Portage Network works with research libraries and other stakeholders to coordinate expertise, services, and technology in research data management: https://portagenetwork.ca/
Federated Research Data Repository A scalable federated platform for digital research data management and the discovery of Canadian research data https://www.frdr.ca/repo/?locale=en
Research Data Canada a stakeholder-driven and supported organization dedicated to improving the management of research data in Canada.
Datacite Canada Canada's data registration service provided by National Research Council.
GENOME Canada
Canada Astronomy Data Centre Thirty years of open data research
GeoGratis Was an early example of open data when Natural Resources Canada made its mapping data open for use by the pubic
FACETS is Canada's first multidisciplinary open access science journal which will have a data section edited by Chuck Humphrey, Canada’s preeminent data scientist
Slide Credit: Jon Broome and Ernie Boyko
Indian National Data Sharing and Accessibility Policy
Global experience has demonstrated convincingly that access to data leads to breakthroughs in scientific understanding as well as to economic and public good, in addition to several benefits to civil society. Given the deployment of substantial level of investment of public funds in collection of data and the untapped potentials of benefits to social society, it has become important to make available non-sensitive data for legitimate and registered use.
National Data Sharing and Accessibility Policy (NDSAP), March 2012: http://www.dst.gov.in/NDSAP.pdf
Places emphasis on a negative list of sensitive data types, rather than a positive list of data to be released: i.e. the default is open, unless the data is on the ‘stop’ list.
Allows for data to be Open, accessible to registered users and under restricted access.
Indian National Data Sharing and Accessibility Policy
Implementation Guidelines, Feb 2014: http://data.gov.in/sites/default/files/NDSAP_Implementation_Guidelines_2.2.pdf
Deal mostly with public data, but research data produced by government institutes of funded by the government is in scope.
For data to be reused, it needs to be adequately described and linked to services that disseminate the data to other researchers and stakeholders. The current methods of storing data are as diverse as the disciplines that generate it. It is necessary to develop institutional repositories, data centers on domain and national levels that all methods of storing and sharing have to exist within the specific infrastructure to enable all users to access and use it.
India: Policies and Initiatives
Policies
Government of India National Data Sharing and Accessibility Policy (NDSAP), March 2012
Followed by Implementation Guidelines, Feb 2014
Very unclear the level of implementation: lack of data infrastructure and cultural barriers.
Key Initiatives
Lots of Open Access initiatives: e.g. in major universities; Librarians’ Digital Library (LDL)(https://drtc.isibang.ac.in); OpenMed@NIC (http://openmed.nic.in/) Open Access self-archive for medical and allied sciences.
Specific initiatives: e.g. National Mission for Manuscripts (http://namami.nic.in/) ; lots of activities around multilingual computing.
Little national infrastructure: activities around specific projects of institutions and portals?
Slide Credit: Devika Madalli
China: Policies
MOST (Ministry of Science and Technology) has taken one year to draft a regulation on open research data, estimated it will be published in 2018
Previously there have been some data sharing polices on program level, for example, NSTIC (National Science and Technology Infrastructure), CAS (Chinese Academy of Science) scientific data program.
NSFC is pushing open research products, mainly focus on OA for research papers.
Slide Credit: Jianhui LI
China: Key Initiatives
National Science and Technology Infrastructure, supported by MOST (the Ministry of Science and Technology)
From an the early stage (from 2001), supported the creation of 13 scientific data centres/sharing platforms covering the fields of agriculture, forestry, seismicity, meteorology, marine science, earth system, population and health, biology, chemistry, materials, energy, etc.
CAS Practice-Scientific Data Programme
Long-term mission started in 1986, funded by CAS
Many institutes involved; long-term, large-scale collaboration; data from research, for research
Collecting multi-discipline research data and promoting data sharing
More than 350 research databases and 1350 datasets by 61 institutes
Over 600TB data available to open access and download
CAS Big Data Earth programme
The strategic Priority Research Program (Category A)
Almost 1.8 billion YMB for 5 years;
Biodiversity, ecology system, environment, natural resources, earth science
Slide Credit: Jianhui LI
RDA and Open Science
RDA is a unique global platform to synchronize between the national, European, disciplinary and sector stakeholders to support the transition towards Open Science, an Open World and Open Innovation. Our open, neutral social platform where international research data experts meet, covering 130 countries and more than 6000 individual members, facilitates the exchange of views and alignment on topics at the heart of Open Science, e.g. social hurdles on data culture, data stewardship and training challenges, data management plans and certification of data repositories to name just a few of the priorities addressed.
RDA is building the social and technical bridges that enable open sharing of data to achieve its vision of researchers and innovators openly sharing data
across technologies, disciplines, and countries to address the grand challenges of society.
rd-alliance.org/about-rdaWWW.RD-ALLIANCE.ORG@RESDATALL
CC BY-SA 4.0
CODATA Prospectus:https://doi.org/10.5281/zenodo.165830
Principles, Policies and Practice
Capacity Building
Frontiers of Data Science
Data Science Journal
CODATA 2017, Saint Petersburg 8-13 Oct
2017
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 22–26 October 2018
Digital Frontiers of Global Science
Frontier issues for research in a global and digital age.
Applications, progress and challenges of data intensive research.
Data infrastructure and enabling practices for international and
collaborative research.
Data, development and innovation: data as an interface between
research, industry, government, society and development.
Botswana, Africa and the World!
Stable, safe, modern and exciting country
Simon HodsonExecutive Director CODATA
www.codata.orghttp://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org
Email: [email protected]: @simonhodson99
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, FRANCE
Thank you for your attention!
Slide Credits: Geoffrey Boulton, Jane Hunter, John Broome, Ernie Boyko, Devika Madalli, LI Jianhui
What Are the Barriers
Reasons why respondents are hesitant to share their data, n=7082
What Are the Barriers in Australia to
Advancing Open Data/Open Science
32
• Future funding situation – long term sustainable business models for “Aust
Research Data Cloud”
• Simple, fast, clear easy-to-use processes & services
• Data curation is expensive & time consuming
– Other priorities, no incentives
• University repositories vs national repositories (RDA) vs discipline
repositories ????
• Data licensing & agreements – govt & agency data
What are the key barriers in Canada to advancing OR/OS?
• The lack of a national data services strategy. There have been previous articulations of a strategy but all that exists is a patchwork.
• The tenure and promotion procedures used by research institutions Do not recognize data as a research product Competitive nature of obtaining grants
• Researcher reluctance is causing Tri-Agencies to move slowly on implementing research data policies linking grants to data deposit (even if data are not shared).
• Incomplete understanding of the issues by senior university administration.
• Insufficient expertise and financial support to prepare data for reuse.
• A shortage of suitable data repositories.
DataTrieste Film on Vimeohttps://vimeo.com/232209813
Göttingen-CODATA Symposium18-20 March 2018
The critical role of university RDM infrastructure in transforming data to knowledge
http://conference.codata.org/2018-Goettingen-RDM/
An opportunity to share experiences, research and insights in the development implementation of RDM services in research institutions.
Special collection of Data Science Journal.
Themes: services and solutions; strategy; measuring success; skills and support; sustainability; shared services and outsourcing / consortiums; service level, trust and FAIR; champions and engaging with researchers.
Announcement of call for papers in the next week or so.
Deadline for abstracts, 15 December.