fairsharing presentation at the japan science and technology agency

38
Interlinking standards, repositories and policies Peter McQuilton, PhD @fairsharing_org NBDC Seminar, Tokyo, Japan 4 th Dec., 2017

Upload: peter-mcquilton

Post on 21-Jan-2018

95 views

Category:

Science


2 download

TRANSCRIPT

Interlinking standards, repositories and policies

Peter McQuilton, PhD

@fairsharing_org

NBDC Seminar, Tokyo, Japan 4th Dec., 2017

Biology is big data!

Credit to: ttps://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ 2014

But we don’t handle data well

A set of principles, for those wishing to enhance

the value of their

data holdings

Designed and endorsed by a diverse set of stakeholders - representing academia, industry, funding

agencies, and scholarly publishers

FAIRFindable

Accessible

Interoperable

Reusable

Visible, citable

Trackable

Community standards

Reproducible

These put emphasis on enhancing the

ability of machines to automatically find

and use the data, in addition to supporting

its reuse by individuals

Most data aren’t FAIR

Most data aren’t FAIR

• Not always well cited, stored

o Software, code, workflows are hard to find/access

• Poorly described for third party reuse

o Different levels of detail and annotation

• Curation activities are perceived as time-consuming

o Collection and harmonization of detailed methods and

experimental steps is rushed at the publication stage

Not FAIR – low findability and badly documented

• Available in a public repository

• Findable through some sort of search facility

• Retrievable in a standard format

• Self-described so that third parties can make sense of it

• Intended to outlive the experiment for which they were collected

To do better science, more efficiently, we need data that are…

My database is going offline, where should I

put the data, and in what format?

Before accepting my paper, this journal

wants my data to be in a public repository, but

which one?

My funder says I should deposit the data in a reputable

repository. But which one?

I’m collecting in-vivo animal

testing data –what metadata should I curate?

I’m about to start a set of experiments. In what

format should I record the data?

A web-based, curated, and searchable portal that monitors the

development and evolution of standards*, across all disciplines,

inter-related to databases/repositories and data policies

* A standard is a formal community specification for reporting, sharing and citing data, metadata and other digital assets

Initial focus on metadata (or content) standards

Content standards

Models/Formats = Conceptual

model, conceptual schema,

exchange formats

Terminologies = Controlled

vocabularies, taxonomies,

thesauri, ontologies etc.

Guidelines = Minimum information

reporting requirements, checklists

Formats Terminologies Guidelines

Formats Terminologies Guidelines

FAIRsharing enhances their findability

240+

119+

709+

Source:

Sources:

MIAME

MIRIAM

MIQASMIX

MIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

SRAxml

SOFT FASTA

DICOM

MzML

SBRML

SEDML…

GELML

ISA

CML

MITAB

AAO

CHEBIOBI

PATO ENVO

MOD

BTO

IDO…

TEDDY

PRO

XAO

DO

VO

~1500

Source:

Content standards

Data policies by funders, journals and other organizations

Databases/Repositories

Formats Terminologies Guidelines

Mapping a complex and evolving landscape

270

4823

2

97

87 4

204

9 6 8

Paper in preparation, preliminary information as of July 2017

Ready for use, implementation, or recommendation

In development

Status uncertain

Deprecated as subsumed or superseded

All records are manually curated

in-house and verified by the

community behind each resource

Community verified status indicators

My funder’s data policy recommends the use of established standards, but which are widely endorsed and applicable to my crop data?

We need a standard for sharing social science data, what’s out there and who should we talk to?

I have some old rice genomic data in format X, which is now deprecated; what format has replaced X?

Which are the mature standards and standards-compliant databases that we should recommend to our authors?

Finding and Accessing the data

Collections group together

one or more types of

resource by domain,

project or organization.

Recommendations are a

core-set of resources that

are selected and

recommended by a funder

or journal data policy.

Grouping the data

Data Policy

Visualizing the relationships between data…

Dr Massimiliano Izzo

How do we make FAIRsharing FAIR?

A pan-European infrastructure for biological information

€19 million 2015 - 2019

Making FAIRsharing FAIR –Findable - ELIXIR

Making FAIRsharing FAIR –Findable - Bioschemas• Web mark-up – Schema.org and Bioschema.org

scscscsc/BioSchemas/specifications/tree/master/DataCatalog

Consortium of 33 pan-European organisations & 15 third parties covering a range of disciplines and organisations working together to develop a European-wide

governance framework for a pan-European “trusted virtual environment with free, open and seamless services for data storage, management, analysis, sharing and re-

use, across disciplines”

European Open Science Cloud (EOSC) Pilot

Wider adoption of FAIRsharing by many biomedical research infrastructure programmes in EU and USA, e.g.

Embeddable Widget

• Recommendation/Collection Widget for embedding in third-party websites• Journal data policies (GigaScience, PLOS, Springer

Nature…)

• Standard Developing Organisations (e.g. TDWG)

• Societies/Organisations (e.g. ELIXIR)

Dr Massimiliano Izzo

FAIR - Interoperability/Accessibility

• Data annotation:• Users/Maintainers – ORCID

• Organisations – FundRef

• Species – NCBI Taxon ontology

• Disciplines and Domains – re3data/EDAM/BRO

• API – swagger (ELIXIR guidelines)

• DOIs for standards (coming soon)

Reaching out to the community - What were the aims of the RDA /Force BioSharing WG?

• To develop guidelines for linking information on databases, content standards and journal and funder data policies in the life sciences

• To develop a curated registry (running since 2011),to access and cross-search this information, suchthat a variety of stakeholders can make decisionson which standards and databases to use orendorse

Standard developing groups, incl:Journal publishers, incl:

Cross-links, data exchange, incl:

Societies and organisations, incl: Institutional RDM services, incl:

Projects, programmes:

Working with and for the community

OBO

The FAIRsharing team Our Advisory Board

Thank-you for listening.Questions?