Download - FAIRy Stories
FAIRy stories
for Christmas
Carole GobleThe University of Manchester, [email protected]
ELIXIR-UK, FAIRDOM, ISBE, BioExcel CoE, Software Sustainability InstituteOpen PHACTS
SWAT4HCLS 2017, 5th Dec 2017, Rome
Once upon a time in a land far, far away lived a KinG …
Who wanted all data to be FAIR….
Mark D. Wilkinson, Michel Dumontier,
IJsbrand Jan Aalbersberg, Gabrielle Appleton,
Myles Axton, Arie Baak,
Niklas Blomberg, Jan-Willem Boiten,
Luiz Bonino da Silva Santos, Philip E. Bourne,
Jildau Bouwman, Anthony J. Brookes,
Tim Clark, Mercè Crosas,
Ingrid Dillo, Olivier Dumon, Scott Edmunds,
Chris T. Evelo, Richard Finkers,
Alejandra Gonzalez-Beltran, Alasdair J.G. Gray,
Paul Groth, Carole Goble,
Jeffrey S. Grethe, Jaap Heringa,
Peter A.C ’t Hoen, Rob Hooft,
Tobias Kuhn, Ruben Kok,
Joost Kok, Scott J. Lusher,
Maryann E. Martone, Albert Mons,
Abel L. Packer, Bengt Persson,
Philippe Rocca-Serra, Marco Roos,
Rene van Schaik, Susanna-Assunta Sansone,
Erik Schultes, Thierry Sengstag,
Ted Slater, George Strawn,
Morris A. Swertz, Mark Thompson,
Johan van der Lei, Erik van Mulligen,
Jan Velterop,Andra Waagmeester,
Peter Wittenburg, Katherine Wolstencroft,
Jun Zhao, Barend Mons
Wilkinson Dumontier Schultes
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
Queens…
And FAIRY GODMOTHERS
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
Machine Processable Metadata
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
• Catalogues, Search, Stores• Metadata Standards• Standard Access protocols• Identifiers, Policies• Authorised Access • Licensing
FAIR spread across the lands ……
VIVO/SciTS Conferences 6-8 August 2014, Austin, TX
FAIR spread across the lands ……
Stakeholder FAIR Awareness
UK Institutional Research Data Management guidance*
* Jisc: Final Report FAIR in Practice, Nov 2017
Government, Funder, Publisher,National & International Infrastructures…
Institutional
Researchers
FAIR spread across the lands …… BUT not
necessarily all the peoples
FAIR spread across the lands ……
Moral: Names are important
Spinning (metadata) straw into gold
Be careful what you promise…
Me Too!
staking claims
we { are | will be | always have been } FAIR
a rallying flag
Hype
Curve
http://dx.doi.org/10.1101/225490
http://blog.ukdataservice.ac.uk/fair-data-assessment-tool/
http://fairmetrics.org/
Beware…
beauty is in the
eye of the
beholder
What’s FAIR from a Cataloguer perspective maybe useless from a biologists viewpoint
My Semantic FAIRy Stories
The Scientist and
the FAIR Commons
The MAGIC
Research Object
little semantics and
the big Web
The Scientists and the FAIR Research Commons
Supporting mixed types and many researchers
FAIR
The Scientists and the
FAIR Research
Commons
Find: ID resolutionFaceted NavigationSearch, RDFSPARQL endpoint, APIs
A Commons for Workflows
myexperiment.org
A Commons for Systems Biology Projects
fairdomhub.org
investigation
study
assay/analysis
data
models
SOPs
Community & Project Commons
Structured organisationacross standards and types
Federation over autonomous resources
Laissez-Faire
Independent Users
Ecosystem of types, stores and metadata
Own little houses: from straw to bricks
Permission controlsStaged sharingLicensesNegotiated accessEmbargosOpen
SchemaDublin coreDatacite, DCAT, Bioschemas
Catalogue Level
InvestigationStudiesAssay/Analysis
Contentlevel
Persistent Identifiers
Content levelsubject thematic standards
Contentlevel
StratifiedLinked Data
Getting the best FAIR metadata….FAIR Access
– myExperiment -> open
– FAIRDOM -> friends and family
– Hand over straw houses to FAIRDOMHub
“The Tragedy of the Commons”* – Metadata quality and quantity
– Identifier hygiene
– Curation & contributions
– Public good vs personal burden
– Incorporation into processes
– Community socialisation - obligations mismatches. Credit!
*Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
project PIs, funderstime burden, distrust
project PIs, fundersPALs – juniors, advocates and Cinderellastemplates, toolsbenefit
Moral: Incentives
Bake in
“Semantic Nudging”
Ontologies stealthily embedded in Excel spreadsheet templates
Added value -Model execution
Vanity, guilt, shaming
Automation
rightfield.org.uk
Cinderella?
The Spreadsheet
“The Last Mile”* -> The First Mile
FAIR from bench to cloud
Last mile - Infrastructure view
First mile - researcher / resource view
* Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for European research e-infrastructuresResearch I deas and Outcomes 2: e9933 (20 Jul 2016) https://doi.org/10.3897/rio.2.e9933
the generic vs specific zig zag path
The MAGIC Research OBJECT
GENERIC Framework For exchange, reproducibility,
Preservation, active artefacts
Universal Catering, bottomless content
FAIR
The FAIR Research Object import, exchange, portability, maintenance
ISA-TAB
Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project, BMC Bioinformatics 2014, 15:369
workflow engine
Workflow RunProvenance
Inputs Outputs
IntermediatesParametersConfigs
Narrative
Exchange between people & platformsCommons store, catalogue & archiveReproduce preserve, port, repairActivate re-compute, mix, compare, evolve
The FAIR Workflow Research Object
researchobject.org
Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Standards-based generic
metadata framework for
bundling internal and external
resources with context
citable reproducible packaging
Data used and results produced in studyMethods employed to produce/analyse dataProvenance and settings for the experimentsPeople involved in the investigationAnnotations about these resources:-understanding & interpretation
Linking across ROs and into the Linked Open Data Cloud
• Recording & linking together the components of an experiment
• Linking across experiments.
• Linked ROs
• A Semantic Web of Research Objects
• Resource References – a bottomless pot
Technology Independent.
The least possible.The simplest feasible. Low tech.
Low user overhead and thin client
Graceful degradation.
FAIR ROs Desiderata
Construction Content ProfileTypes
Identificationto locate thingsAggregatesto link things togetherAnnotationsabout things & their
relationships
Type Checklistswhat should be thereProvenancewhere it came fromVersioningits evolutionDependencies what else is needed
Manifest checklistType Checklistsdescribing what should be there
Container
Metadata
Objects
Construction
http://www.researchobject.org/specifications/
RO Model
Identifiers: URI, RRI, DOI, ORCID
W3C Web Annotation Vocabulary
Open Archives InitiativeObject Exchange and Reuse
Aggregation
Annotation
Container
Content
Profiles. Progression Levels
Container
Profile
http://purl.org/minim/description
W3C Shape Specs
*Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489
validators / viewers
Minim model for defining checklists*
multiple profiles for different consumers
Generic
Specifics
RO-SHOW
Container
Linked Data Pharmacological Discovery Platform Data ReleasesDataset “build”
RO LibraryEarth Sciences
Public Health Learning Systems
Asthma Research e-Lab sharing and computing statistical cohort studies
Happy Endings!
ISA based Packaging, Systems Biology commons & publishing
Managing distributed unmovable large datasets for Biomedical HTS analytic pipelines *
* Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618
Happy Ending – Workflows
Biomedical HTS analytic pipelines
Manifest description of CWL workflows + rich context + provenance +other objects + snapshots
Precision medicineNGS pipelines regulation*
*Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
EDAM
Biomolecular modelling
Portable Workflows
BagIT, JSON(-LD), schema.org
https://dokie.li/
https://linkedresearch.org/
Manifest: Schema.org, JSON-LD, RDFArchive: .tar.gz
Reproducible Document Stack project
eLife, Substance and Stencila
BagIT data profile + schema.org JSON-LD annotations
Many Roads
MoralsIncremental, open frameworks hard work
– Extensive reuse of standards is tricky
– Too Generic vs Too Specific
– Multi-element type & nesting challenges
– ROs with a Purpose
– Examples & templates
Representational Beauty vs Tools– Easy to make, hard to consume
– Be specific, be developer friendly
– Profiles & tools critical
Patience is a virtue
Bioschemas:
Little Semantics and
the big web
Being and keeping light,
small and viral
FAIR
Structured data markup for web pagesSchema.org adds simple structured metadata markup to web pages & sitemaps for harvesting, search and summary snippet making.
Search engines often highlight websites containing Schema.org
Widespread commercial and open source infrastructure creates a low barrier to adoption
Goldilocks & the 3 Use Cases
Standardised metadatamark-up
Metadata published & harvested without APIs or special feeds
3 Use Cases
1. Finding/Citing, 2. Summary snippets3. Metadata exchange /
ingest
Goldilocks• Reuse ubiquitous
commercial platform• The least possible change,
the max possible reuse• Minimum properties – 6• Reuse domain ontologies –
we are not reinventing them!
CommodityOff the Shelf toolsApp eco-system
Repository LevelContent type level
Standardised metadatamark-up
Metadata published & harvested without APIs or special feeds
CommodityOff the Shelf toolsApp eco-system
Repository LevelContent type level
Goldilocks & the 3 Use Cases
TrainingmaterialsEvents
Organizations Data
Software Lab Protocols
schema.org tailored to the Biosciences for FAIRsimple structured metadata markup on web pages & sitemaps
bio.tools
schema.org tailored to the Biosciencessimple structured metadata markup on web pages & sitemaps
• Specific for life sciences• Extends existing Schema.org types• Focused on few types and well defined relationships• Minimum properties for finding and accessing data• Best practices for selected properties• Managed by Bioschemas.org
• Generic data model• Generous list of properties to describe data types• Managed by Schema.org
Tailored schema.org to improve Findability and Accessibility in Bioscience
Layer of constraints +
documentation + extensionsLeyla Garcia. Poster & Flashtalk
2-3 Oct 2017, Hinxton, ~50 people
Ideally 6 conceptsReuse ontologies
schema.orgReal mark-upTools
Find, Cite, Snippets, Metadata exchange
Community
http://www.france-bioinformatique.fr/en/training_material
https://search.google.com/structured-data/testing-tool
Applied Drupal 7 schema.org extensionTook about 2 hours
Included in TeSS in an hour[Niall Beard]
MORALs
Community Buy-in Worth it
• First specs & main mechanism for training
• Google / Schema & ELIXIR support
• Research Schemas for European Open Science Cloud pilot
Goldilocks works but is hard work
• Types & Profiles debates
• Elegance vs best for tools
• Reuse domain ontologies
• Validation, mark-up & harvesting tools
Trolls
How are we FAIRing?
Different levels with different emphasisIts an Ecosystem, not a single solution
• Catalogues, Search, Stores• Metadata Standards• Standard Access protocols• Identifiers, Policies• Authorised Access • Licensing
smart rebrand launch
Still hard, same stuff
Rally big communities and grassroots initiatives
Examine our capabilities
There is no magic
FAIRy Land PEST
Political
Economic
Social
Technical
Platform & user buy-in from the get-go
Passionate, dedicated leadership
Seeding critical mass
Community
Tools Driver
Bottom up initiatives fostered by big umbrellas infrastructures
FAIR Semantic Village*
Simple & Lightweight
Ramps not revolutions
FAIR with a PURPOSE & With PEOPLE
FAIR
Support typical developer –Familiarity – JSON, APIs
*Deb McGuinness
Research for FAIRFAIR representation
• The Semantic Web
Automated metadata• Deep learning, machine learning, AI
• Text Mining, Ontology mapping
Social metadata• User Experience, Crowd Sourcing
• Choice architecture
FAIR action• Blockchain
• Virtualised & remote execution
• Image processing
• Preservation & portability
• Provenance tracking, object trajectories
• Engineering & Design, Ethics, Social Sciences
Research +
Developer Practitioner
practices
Mark RobinsonNorman MorrisonPaul GrothTim ClarkAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian GarzaDaniel GarijoCatarina MartinsIain BuchanCaroline JayDavid De RoureOscar CorchoSteve PettiferKhalid BelhajjameJun ZhaoPhil CrouchLilian Gorea, Oluwatomide Fasugba
Stian Soiland-ReyesMichael CrusoeRafael JimenezAlasdair GrayBarend MonsSean Bechhofer
Michel DumontierMark WilkinsonLeyla GarciaStuart OwenKaty WolstencroftFinn BacallAlan WilliamsWolfgang MuellerOlga KrebsJacky SnoepMatthew GambleRaul PalmaMark Musen
http://www.researchobject.org
http://www.myexperiment.org
http://wf4ever.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://rightfield.org.uk
http://www.bioschemas.org
http://www.commonwl.org
http://www.bioexcel.eu
http://www.openphacts.org