FAIR digital research assets: beyond the acronym
Susanna-Assunta Sansone, PhD@SusannaASansone
ORCiD 0000-0001-5306-5690
Consultant,Founding Academic Editor
Associate Director,Principal Investigator
Neuroinformatics,KualaLumpur,20-21August,2017
• Available in a public repository
• Findable through some sort of search facility
• Retrievable in a standard format
• Self-described so that third parties can make sense of it
• Intended to outlive the experiment for which they were collected
To do better science, more efficiently we need data that are…
A set of principles, for those
wishing to enhance
the value of their
data holdings
Wider adoption of the FAIR principles, by research infrastructure programmes, e.g.
Defining FAIRness
Defining a framework for evaluating FAIRness
By the
fairmetrics.org
Working Group
NOTE: The Principles are high-level; do not suggest any specific
technology, standard, or implementation-solution
Principles put emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals
Interoperability standards – the pillars of FAIR
The invisible machinery
• Identifiers and metadata to be implemented by technical experts in tools, registries, catalogues, databases, services
• It is essential to make standards ‘invisible’ to lay users, who often have little or no familiarity with them
http://nometadata.org/logo
Metadata standards – fundamentals
• Descriptors for a digital object that help to understand what it is, where to find it, how to access it etc.
• The type of metadata depends also on the type of digital object (e.g. software, dataset)
• The depth and breadth of metadata varies according to their purpose§ e.g. reproducibility requires richer metadata then citation
• Domain-level descriptors that are essential for interpretation, verification and reproducibility of datasets
• The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why
Metadata standards - datasets
• Domain-level descriptors that are essential for interpretation, verification and reproducibility of datasets
• The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why allowing:§ experimental components (e.g., design, conditions, parameters),§ fundamental biological entities (e.g., samples, genes, cells), § complex concepts (such as bioprocesses, tissues and diseases),§ analytical process and the mathematical models, and § their instantiation in computational simulations (from the molecular
level through to whole populations of individuals)
to be harmonized with respect to structure, format and annotation
Metadata standards - datasets
Metadata for discovery
model and related formats
Metadata for discovery, but not only
…..
Domain-specific metadata standards for datasets
MIAMEMIRIAM
MIQASMIXMIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTADICOM
MzMLSBRML
SEDML…
GELML
ISA
CML
MITAB
AAOCHEBIOBI
PATO ENVOMOD
BTOIDO…
TEDDY
PROXAO
DO
VO
de jurestandard
organizations
de facto
grass-rootsgroups
Formats Terminologies Guidelines
220+
115+
548+
~1000
https://doi.org/10.6084/m9.figshare.3795816.v2
https://doi.org/10.6084/m9.figshare.4055496.v1
• Perspective and focus vary, ranging:§ from standards with a specific biological or clinical domain of study
(e.g. neuroscience) or significance (e.g. model processes)§ to the technology used (e.g. imaging modality)
• Motivation is different, spanning:§ creation of new standards (to fill a gap)§ mapping and harmonization of complementary or contrasting efforts§ extensions and repurposing of existing standards
• Stakeholders are diverse, including those:§ involved in managing, serving, curating, preserving, publishing or
regulating data and/or other digital objects § academia, industry, governmental sectors, and funding agencies§ producers but also also consumers of the standards, as domain (and
not just technical) expertise is a must
A complex landscape
Standards’ life cycle
• Formulation§ use cases, scope, prioritization and expertise
• Development§ iterations, tests, feedback and evaluation§ harmonization of different perspectives and available options
• Maintenance§ (exemplar) implementations, technical documentation, education
material, metrics§ sustainability, evolution (versions) and conversion modules
Technologically-delineated views of the world
Biologically-delineated views of the world
Generic features (‘common core’)- description of source biomaterial- experimental design components
Arrays &Scanning
…
Columns
GelsMS MS
FTIR
NMR
Columns…
transcriptomics proteomics metabolomics
plant biologyepidemiology neuroscience
Fragmentation, duplications and gaps
Arrays
Scanning …
Arrays
Scanning … Arrays &
Scanning…
Columns
GelsMS MS
FTIR
NMR
Columns…
transcriptomics proteomics metabolomics
Modularization to combine and validate
plant biologyepidemiology neuroscience
Proteomics-based investigations of
neurodegenerative diseases
Proteomics and metabolomics-based investigations of
neurodegenerative diseases
Working in/across multiple domains is challenging
• Requires§ Mapping between/among heterogeneous representations
§ Conceptual modelling framework to encompass the domain specific metadata standards
§ Tools to handle customizable annotation, multiple conversions and validation
Technical and social engineering required
• Pain points include§ Fragmentation§ Coordination, harmonization, extensions§ Credit, incentives for contributors§ Governance, ownership§ Indicators and evaluation methods§ Outreach and engagement with all stakeholders§ Synergies between basic and clinical/medical areas§ Implementations: infrastructures, tools, services§ Education, documentation and training§ Funding streams§ Business models for sustainability
Too many
cooks in the
standards’
kitchen?
Standards
fusion…anyone?
doi: 10.1126/science.1180598
doi:10.1038/nbt1346doi:10.1038/nbt1346
OBO Portal and Foundry Portal and Foundrydoi: 10.1038/nbt.1411
Doing my fair share
• Consumers:§ How do I find the standards appropriate for my case?
• Producers§ How do I make my standards visible to others?
Improving discoverability of standards
Monitorsthedevelopment andevolution ofstandards,
theiruse indatabases andtheadoptionofbothindatapolicies,
toinform andeducate theusercommunity
Standard developing groups, incl:Journal, publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes:
Working with and for producers and consumers
Databases/data repositories
Metadata standards
Formats Terminologies Guidelines
Interlink standards among themselves and with repositories
Data policies by funders, journals and other organizations
Formats Terminologies Guidelines
…and to indicate ‘adoption’
Databases/data repositories
Data policies by funders, journals and other organizations
Metadata standards
270
48232
97
87 4
204
9 6 8
Assign ‘indicators’ to describe their status…
Paper in preparation, preliminary information as of July 2017
Readyforuse,implementation,orrecommendation
Indevelopment
Statusuncertain
Deprecatedassubsumedorsuperseded
Allrecordsaremanuallycurated
in-houseandverifiedbythe
communitybehindeachresource
Help us map the neuroscience standards landscape
Models/Formats Reporting Guidelines Terminology Artifacts
Database Implementations
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
Paper in preparation, preliminary information as of July 2017
Activating the decision-making chain
Models/Formats Reporting Guidelines Terminology Artifacts
Database Implementations
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Models/Formats Reporting Guidelines Terminology Artifacts
Database Implementations
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
Number of standards implemented by 544 databases/repositories (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
59 out of 116 (MIAME)
146 out of 223 (FASTA)
121 out of 343 (GO)
Paper in preparation, preliminary information as of July 2017
Activating the decision-making chain
Philippe Rocca-Serra, PhDSenior Research Lecturer
AlejandraGonzalez-Beltran, PhDResearch Lecturer
Milo Thurston, DPhDResearch Software Engineer
MassimilianoIzzo, PhDResearch Software Engineer
Peter McQuilton, PhDKnowledge Engineer
Allyson Lister, PhDKnowledge Engineer
EamonnMaguire, DphilContractor
David Johnson, PhDResearch Software Engineer
MelanieAdekale, PhDBiocurator Contractor
DelphineDauga, PhDBiocurator Contractor
Susanna-Assunta Sansone, PhDPrincipal Investigator, Associate Director
The (long) road to FAIR
Interoperability standards
are digital objects in their own right,
with their associated research, development and educational activities