![Page 1: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/1.jpg)
+
Common Framework Working Groups
Owen White and many more
![Page 2: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/2.jpg)
+ Why this is confusing
■Several different initiatives■BD2k, Common Fund, Global Alliance, Genome Data Commons
■Several different virtual spaces■GDC, Hutch Data Commonwealth, Cloud pilots
■Co-opting several pre-existing activities■MODs, GA4GH, HMP
![Page 3: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/3.jpg)
+Why this is REALLY confusing
![Page 4: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/4.jpg)
+ Ready or not …we are building an ecosystem
■ Living, thriving, dynamic and very new concept
■ Composed of many incubators
■ Some technologies will prevail, some will not
■ It is not appropriate or possible to:■ do this in isolation■ burn resources while just doing our own research
![Page 5: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/5.jpg)
+Data Commons Components
Reference DataSets
Resource Search & Index
Cloud Credit Model
Commons Framework Pilots
• GDC• Human Microbiome Project• Global Alliance• MODs• RFI – engage community
Winter 2017
• FOAs – Place high impact data sets in the cloud
Spring 2017
![Page 6: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/6.jpg)
+Data Commons Components
Reference Data Sets
Resource Search & Index
Cloud Credit Model
Commons Framework Pilots
• Data Discovery Index (DDI) Consortium (bioCADDIE, dataMed, omicsDI others)
• Aggregation of metadata presented on web
• Driving metadata standards
• Search/query services
![Page 7: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/7.jpg)
+Data Commons Components
Reference Data Sets
Resource Search & Index
Cloud Credit Model
Commons Framework Pilots
• 3 year pilot to test business model • Investigators receive credits for
use with cloud providers• Provider debits against account in
pay-as-you-go model• Amazon Reseller, IBM, Google
Reseller, Broad and NCI Cloud Pilots
![Page 8: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/8.jpg)
+Data Commons Components
Reference Data Sets
Resource Search & Index
Cloud Credit Model
Commons Framework Pilots
Several examples…...
![Page 9: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/9.jpg)
+The CEDAR Approach to Metadata
![Page 10: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/10.jpg)
+AzTec
Building a Technology Platform to Integrate Biomedical Resources
https://aztec.bio
![Page 11: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/11.jpg)
+
Faceted search
Metadata editorAPI testingRepository
API
smartAPI Interoperability PilotDevelopment of a Community-based standard
Intelligent authoring of API Metadata
![Page 12: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/12.jpg)
+
Brian O’Connor - UCSC
![Page 13: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/13.jpg)
+ Motivation – broad goals…and why you should participate
■ Everyone has a lot to share – let’s ensure we socialize our research products
■ Vision for Commons implementation
■ Self-governance
■ Managing standards proliferation
■ We are not in competition with each other
■ Setting guidelines in RFAs
![Page 14: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/14.jpg)
+ Common Framework Working Group
■ Development of FAIR-ness Metrics■ objective measures for the degree of data availability
■ Metadata documentation of APIs■ creating a "minimal list" that describes available APIs
■ Data-object registry / Indexing■ Approaches to make all data findable
■ Workflow sharing and Docker Registry■ we got lots of workflows, how to share them?
■ Commons publication initiative■ a coordinated publication plan
![Page 15: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/15.jpg)
FAIRness
![Page 16: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/16.jpg)
FAIRness MetricsOptimizing the FAIR alignment of
research assets, roles & relationships
Neil McKenna, Ph.DBaylor College of Medicine
Co-Chair, FAIRness Metrics Subgroup
Co-chair: Michel Dumontier, Ph.D
![Page 17: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/17.jpg)
What are FAIRness Metrics?
• The FAIR principles articulate ideals in research• FAIRness Metrics give effect to these principles to
advance FAIRness in research• Commons FAIRness Metrics Subgroup (FMSG) has been
tasked with developing FAIRness Metrics• First (ongoing) step for the FMSG is to comprehensively
define the components of the research ecosystem: – Research assets– Research roles– Research relationships
![Page 18: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/18.jpg)
What are the assets of research?
Datasets
Metadata& Standards
Research Resources
Applications,services & tools
![Page 19: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/19.jpg)
Defining research roles: two examples
Asset Producers
Asset Stewards
Individual benchresearchers
Ontologyorganizations
Tool & appdevelopers
Primary data repos
Softwareregistries Research Resource
Stores
![Page 20: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/20.jpg)
What are the roles in research?
Asset Consumers
Asset Producers
Asset Stewards
PublishersAsset sponsors
Asset indexers& registries
![Page 21: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/21.jpg)
What are the relationships between these roles?
![Page 22: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/22.jpg)
What are the relationships between assets and roles?
+
Web-based analysis widget
![Page 23: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/23.jpg)
Asset Producers
Asset Stewards
PublishersAsset Sponsors
Asset ConsumersAsset Indexers
& Registries
FAIRness Metrics & Indexes
• FAIRness metrics seek to optimize the alignment of research assets, roles and relationships with the FAIR principles
• Unique roles & relationships require unique sets of metrics– FAIRness Index: custom set of metrics tailored to a
specific research role, its assets and its relationships with other roles
![Page 24: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/24.jpg)
FAIRness Indexes: holding a mirror up to research roles
• Asset Producers How well are the products of my research shared with other researchers?
• Asset stewards Are assets optimally exposed to both machines & humans?
• Publishers Is the relationship between research articles and their supporting assets properly recognized?
• Consumers Do I give full attribution when I re-use assets?
![Page 25: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/25.jpg)
Asset Producers
PublishersAsset Sponsors
Asset ConsumersAsset Indexers
& Registries
Get involved – please!• Long term goal is to have FAIRness Indexes adopted by
funding agencies & incorporated into FOAs• We need help from the community:
– Defining & identifying research assets and roles– Developing & refining use cases that define the relationships
between research assets & roles – The more roles that are represented in the FMSG, the better
FAIRness Indexes will reflect the real research world• To get involved in the FMSG, complain, or just find out
more about what we’re doing, contact– Neil McKenna ([email protected]) – Michel Dumontier ([email protected])
• Or stop by Poster 135 later on!
![Page 26: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/26.jpg)
Current FMSG roster: thank youMark Wilkinson (University of Madrid)Alejandra Gonzalez-Beltran, Philippe Rocca-Serra, Susanna Sansone (Oxford University)Allen Dearry, Elaine Collier (NIH)Lucila Ohno-Machado, Jeff Grethe (UCSD)Mark Musen (Stanford University)Tim Clark (Harvard Medical School)Nolan Nichols (SRI/Stanford)Tobias Kuhn (VU University Amsterdam)Carole Goble (The University of Manchester)Jo McEntyre (EBI)Luiz Bonino (DTL/VU)Alasdair Gray (Heriot-Watt University)Marco Roos, Katy Wolstencroft, Mark Thompson (Leiden University Medical Center)Richard Finkers (Wageningen UR)Christina Lohr, Holly Falk-Krzesinski, Anita deWaard, Paul Groth (Elsevier)Ronak Patel (Baylor College of Medicine)Lisa Federer (NIH Library)
![Page 27: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/27.jpg)
![Page 28: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/28.jpg)
CFWG API Interoperability Working GroupImproving the discoverability, accessibility,
interoperability and reuse of web APIs
28
Co-Chairs: Michel Dumontier and Chunlei Wu
@micheldumontier::CFWG:30-11-2016
![Page 29: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/29.jpg)
MotivationBiomedical science is increasingly being done using cloud-based, web-friendly application programming interfaces (APIs).
BUT it’s pretty much impossible to automaticallydiscover which API to use and how to connect these together to create an effective workflow.
-> barrier to discovery.@micheldumontier::CFWG:30-11-2016 29
51 APIs
1,184 APIs
14,952 APIs
![Page 30: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/30.jpg)
Examining the metadata for the myGene.info web API
@micheldumontier::CFWG:30-11-2016 30
GenemyGene.info ?
![Page 31: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/31.jpg)
@micheldumontier::CFWG:30-11-2016 31
GenBank identifier
Affymetrix identifier
Taxonomy identifier
… 1340 lines …
HGNC symbol
?
NCBI Gene Terminology
Profiling the API output
What do these symbols refer to?How do we find out more?
![Page 32: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/32.jpg)
@micheldumontier::CFWG:30-11-2016 32
How does myGene.info connect with myVariant.info?
Gene
myGene.info
?
myVariant.info
![Page 33: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/33.jpg)
Knowing how APIs connect is essential for (automated) workflow composition
@micheldumontier::CFWG:30-11-2016 33
![Page 34: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/34.jpg)
Problem Statement
There is an overwhelming lack of explicitknowledge pertaining to the structure and datatype of web API inputs and outputs
If web APIs were annotated with semantic metadata, they would be easier to discover, connect together, and reuse.
@micheldumontier::CFWG:30-11-2016 34
![Page 35: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/35.jpg)
API Interoperability CFWG
To foster a collaborative environment for the discussion, development and evaluation of infrastructure and guidelines that facilitate the discoverability, implementation, deployment, interoperability and reuse of web APIs
@micheldumontier::CFWG:30-11-2016 35
![Page 36: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/36.jpg)
API Interoperability WG PeopleMichel DumontierAmrapali ZaveriShima Dastgheib
Chunlei Wu Caty ChungRaymond Terryn Paul Avillach
http://mygene.info
http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/
http://dumontierlab.com http://www.lincsproject.org http://bd2k-picsure.hms.harvard.edu
https://spec-ops.io http://nidm.nidash.org/
Kevin OsbornDavid Steinberg
https://cgl.genomics.ucsc.edu/
http://sadiframework.org https://bd2kccc.org/http://rgd.mcw.edu/
Kathleen Jagodnik
36
Gregg Kellogg Nolan Nichols
Mark Wilkinson Ruben Verborgh Mary ShimoyamaJeff De Pons Denise Luna
![Page 37: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/37.jpg)
Metadata Survey
@micheldumontier::CFWG:30-11-201637
We performed a survey of 3 repositories (Biocatalogue, Programmable Web, Elixir Tools & Services Registry) and 4 specifications (MIAS, OPEN API, SADI, schema.org, and a preliminary smartAPI metadata specification).
![Page 38: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/38.jpg)
@micheldumontier::CFWG:30-11-2016 38
Metadata Elements 20 basic, 6 provider, 10 operation, 12 parameters, 6 response
![Page 39: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/39.jpg)
@micheldumontier::CFWG:30-11-2016 39
Metadata authoring made easy. We extended t
smartAPI metadata authoring tool
he Swagger Editor to validateusing the smartAPIspecification and to suggestmetadata elements and values from the smartAPIrepository API.
![Page 40: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/40.jpg)
Unify API data with Linked Open Data
@micheldumontier::CFWG:30-11-2016 40
![Page 41: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/41.jpg)
@micheldumontier::CFWG:30-11-2016 41
WG members are documenting their APIs!
![Page 42: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/42.jpg)
API Interoperability CFWGMission: To foster a collaborative environment for the discussion, development and evaluation of infrastructure and guidelines that facilitate the discoverability, implementation, deployment, interoperability and reuse of web APIs.
Planned Activities– Finalizing vision and API metadata specification– Demonstrations and evaluations of usability and utility of our work– Implement and use of smartAPIs in reproducible discovery science– Coordinate activities with the GA4GH API group– Investigating FAIR metrics for APIs– Your idea here!
Participation– Join mailing list and participate in biweekly teleconference calls– Work with an excellent group of people with broad expertise– Take credit for transforming the API ecosystem in BD2K … and beyond!
@micheldumontier::CFWG:30-11-2016 42
![Page 43: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/43.jpg)
43@micheldumontier::CFWG:30-11-2016
[email protected]: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier
![Page 44: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/44.jpg)
![Page 45: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/45.jpg)
BD2K Indexing Working Group
a consolidated effort of the Commons Framework Pilots WG, the Centers of Excellence Coordination Center,
and the Data Discovery Index Consortium
Current co-chairs
45
Wei Wang, UC Los Angeles
Michel Dumontier, Stanford
Lucila Ohno-Machado, UC San Diego
Founding members (everyone is welcome to join)George Alter, Univ. MichiganElizabeth Bell, UCSDAlejandra Gonzalez-Beltran, Philippe Rocca-Serra, Susanna Sansone, Univ. OxfordJudith Blake, The Jackson LaboratoryBrian Bleakley, BD2K centers Coordinating CenterBenjamin Hitz, StanfordIyad Obeid, TempleJoe Picone, TempleKevin Read, NYU
![Page 46: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/46.jpg)
Operating Principles
• Data integration is key to functional and comparative biomedicine (-omics, clinical medicine, public health, health economics)– Allows data to be evaluated in new contexts
• Standards are key to data integration– Nomenclature
• Standardized nomenclature, keywords, etc.– Knowledge representation
• Gene Ontology (GO)• Mammalian Phenotype Ontology• Others
Adapted from J Blake’s slide, The Jackson Laboratory 46
![Page 47: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/47.jpg)
Gaps in the Metadata Workflow
Most data are “born digital,” but metadata are orphans
47
• Curating data is an expensive manual process• When data are created in silico, why are annotations entered
manually?• There are gaps in the scientific workflow because tools for managing,
transforming, and analyzing data are not metadata-aware• Tools to automate the capture and maintenance of metadata are
needed• Example:
– Many types of data are analyzed in statistical packages (R, SAS, etc.) that do not read or write metadata (data transformations from statistical software must be annotated by hand)
– Other analytical software should also read/write metadata (and be indexed)
Adapted from G Alter’s slide, University of Michigan
![Page 48: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/48.jpg)
Annotating Data Repositories
48
MetadataIngestion
Terminology server• Query
expansion• Result ranking
DataMed User InterfaceSearch Engine
Metadata Management• Mapping• Indexing
Repositories
Data Sets
Funding Agencies
Data Producers
Publishers
Data
sour
ces
![Page 49: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/49.jpg)
Dataset Ingestion Challenges and Costs (1)
Challenges we have encountered Costs
1. Lack of metadata documentation Human labor and time spent on investigating the repository website to understand the data it provides, and to find solutions for obtaining metadata
2. Limited readily accessible metadata Human labor and time spend on design a web crawler to collect available data from the repository website before translating them into the metadata required for indexing
Hardware to meet computational needs for web crawling tasks
3. Lack of domain knowledge (from the indexing team)
Human labor and time spent on understanding the biological and/or technical contents of the data repository
4. Heterogeneity in metadata and data formats Human labor and time spent on iterative refinement of DATS mapping as well as transformation and ingestion scripts (or codes)
Adapted from H Kim’s slide, UC San Diego 49
![Page 50: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/50.jpg)
Dataset Ingestion Challenges and Costs (2)
Challenges• Setting up the ingestion pipeline is
complicated and time-consuming (one-time process)
• Metadata download and ingestion requires domain expertise to verify validity & granularity
• Domain experts required to verify indexing
• Heterogeneity across curators during the mapping process
• Code for harvesting metadata needs to be invariably customized for each repository
• Poor documentation (including lack of APIs, no defined metadata) in a large number of repositories
• Requires interaction and communication with repository personnel (time-consuming) to initiate the ingestion process
50
Costs• Personnel (domain experts &
programmers)• Time consuming process
Adapted from H Xu’s slide, University of Texas Houston
![Page 51: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/51.jpg)
Like JATS (Journal Article Tag Suite) is used by PubMed to index literature,
DATS (DatA Tag Suite) is needed for a scalable way to
index data sources in the DataMed prototype
A community effort
Adapted from a slide by Sansone, Gonzalez-Beltran, and Rocca-Serra, University of Oxford
![Page 52: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/52.jpg)
Example of a model for scalable indexing
Convergence
of elements extracted
from competency
questions
and existing (generic and
biomedical)
data models
(incl. DataCite, DCAT,
schema.org, HCLS
dataset, RIF-CS, ISA-
Tab, SRA-xml etc.)
Adoption from
of elements extracted from
and from
core entities
extended entitiesAdapted from a slide by Sansone, Gonzalez-Beltran, Rocca-Serra, University of Oxford
![Page 53: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/53.jpg)
Interlinking to other indexes
Adapted from a slide by Sansone, Gonzalez-Beltran, and Rocca-Serra, University of Oxford
![Page 54: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/54.jpg)
Two FrontsAnnotating existing data• Continue to work with data
repositories to map into a minimal standard
• Incentive$ for data producers/repositories to facilitate mapping
• Incentive$ for data reuse/citation
Annotating new data• Could be done at the
source, like publishers do for JATS
• Additional re$ource$ need to be provided for data producers/repositories to prepare data for sharing (e.g., after grant funding period ends)
Re$ource$ for data producers and/or repositories to maintain data and their annotations are needed
Leveraging resources from various paid projects, consolidation of efforts, and incentivizing data producers/keepers saves time and money
54
![Page 55: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/55.jpg)
Working Group Charter
Make recommendations to funders to allow increase adoption of standardized metadata by the
biomedical science community
• Establish framework for calculation of costs and sustainability
• Propose mechanisms to enable effective metadata curation– What: Re$ource$
– When: Timelines– How: Minimal metadata– Who: Self- or assisted mapping
55
![Page 56: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/56.jpg)
Workflow Sharing and DockerRegistries Work Group
Umberto RavaioliUniversity of Illinois
Brian O’ConnorUniversity of California Santa Cruz
![Page 57: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/57.jpg)
FAIR-ness
• Adherence to FAIR principles: Registries tomake tools Findable and Accessible and(Docker) container adoption to make toolcomponents Interoperable and Reusable.
• Important mission of the NIH Commonsshould be to develop a culture of open sourcedevelopment, data sharing, and accessibletools for reproducible science.
![Page 58: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/58.jpg)
Overview of Activities - MembershipRavaioli Umberto University of Illinois at Urbana-ChampaignO'Connor Brian University of California, Santa CruzDiekhans Mark University of California, Santa CruzPaten Benedict University of California, Santa CruzBlatti Charles University of Illinois at Urbana-ChampaignEpstein Milt University of Illinois at Urbana-ChampaignArmstrong Don University of Illinois at Urbana-ChampaignMadduri Ravi University of Chicago & Argonne National LabAmaro Rommie University of California, San DiegoRamsey Stephen Oregon State UniversityHitz Benjamin Stanford UniversityCrusoe Michael Common Workflow Language ProjectSofia Heidi National Human Genome Research Institute, NIHTsang Steve Hsinyi NIH/NCI & Attain
![Page 59: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/59.jpg)
Organization of Activities
• Monthly conference calls (3rd Thursday of themonth)
• Use of Google tools and workspaces tocommunicate and share documents
• Administrative assistance received fromCoordinating Center (UCLA – Denise Luna)
![Page 60: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/60.jpg)
Position Paper• Discussion of the State-of-the-Art• Goals of the Work Group• Sharing mechanisms• Docker containers• Workflow Languages• Case Studies/Prototypes/etc• Recommendations based on experience, future
path of technologies, e.g.:• Standards , API’s / External Collaborations• Other considerations (security, legal, etc.)• Adherence to FAIR Commons Concepts
![Page 61: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/61.jpg)
Workflow Languages and Specs
• This area keeps evolving• There are two main languages (CWL and WDL)
used by the genomics community• Workflow Execution Services:
– Seven Bridges– Fire Cloud (Broad Institute, specialized for Google)– Consonance (Java)– TOIL (UCSC – Python. Wide support of computing
systems)
![Page 62: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/62.jpg)
Common Workflow Language (CWL)
• CWL is a way to describe command line tools and connect them to create workflows.
• CWL is a specification and not a piece of software• Tools and workflows described using CWL are
portable across a variety of platforms that support the CWL standard.
• CWL approach emphasizes execution features and machine-readability, and serves a core target audience of software and platform developers.
![Page 63: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/63.jpg)
Workflow Description Language (WDL)
• Developed by the Broad Institute engineering team supporting genome analysis pipelines
• WDL emphasizes scripting and is designed from the ground up as a human-readable and -writable way to express tasks and workflows.
• WDL script provides a complete analysis solution: workflow, task, call, command and output
• Work is underway to ensure interoperability between CWL and WDL, through conversion and related utilities.
![Page 64: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/64.jpg)
Reaching out to GA4GH
• We are in contact with the GA4GH Containersand Workflows Group to coordinate technicaldiscussions and possibly to mergedevelopment of position paper into a jointactivity (Brian taking the lead on this)
![Page 65: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/65.jpg)
GA4GH – API proposal for:
• ability to request a workflow run using CWL orWDL (and maybe future formats)
• ability to parameterize that workflow using aJSON schema that's simple and used incommon between CWL and WDL
• ability to get information about runningworkflows, status, errors, output file locations
• ISSUE: standardization of terms– job, workflow, steps, tools, etc
![Page 66: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/66.jpg)
GA4GH – API (continued)
• Having this standard API supported by multipleexecution engines will give options of processingthe same workflow (e.g., CWL or WDL) acrossdifferent workflow execution platforms runningacross various clouds/environments.
• Example of possible scenario:– Get workflow in CWL on Dockstore.org– Use Dockstore to generate a JSON parameterization
file– Submit to SevenBridges/FireCloud/Consonance or
some other GA4GH-compliant workflow executionservice (if API is supported!)
![Page 67: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/67.jpg)
Containerization
• How do we approach standardization ofDocker containers to promote reusability?
• Computational efficiency goes hand in handwith workflow definition and execution.
• Parallelization: Macrotasking vs Microtasking.• Optimization of numerical procedures is of
paramount importance.• Discoverability: Standardization of terms
mentioned before is very important.
![Page 68: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/68.jpg)
Computing Landscape
• Cloud architectures and providers areproliferating in a climate of competitions
• Will platforms standardize and perhapsconsolidate over time?
• Need to understand trade off between cost,efficiency and adherence to FAIR principles.
![Page 69: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/69.jpg)
Disruptive Technologies on the Horizon
• Amazon “lambda” serverless computingparadigm is intended to maximize utilization ofresources.
• Server Virtual Machine is not “allocatedpermanently” to a given system but computeinstances are fire up only when needed.
• Considerable cost reduction with presentcharging scheme.
• Need to understand how design of wokflows andcontainers may be affected.
![Page 70: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/70.jpg)
15
![Page 71: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/71.jpg)
+
BD2k Collections IssueOwen White
Ian FosterXinzhi Zhang
Susanna-Assunta Sansone
![Page 73: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/73.jpg)
+ Oversight Committee
Develop r ules of engagement regarding consor tium membership, disclosing intended publications to the group, and areas of professional conduct.
Discussion of topical areas.
Search and open call for possible manuscr ipt authors.
Promote publication plan across BD2k network .
Promote coordination with E uropean or other inter national networks.
![Page 74: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/74.jpg)
+ Oversight Committee
G eneration of an over view publication descr ibing the BD2k commons, and general NIH data management ecosystem.
O rganization and general announcements to the larger group.
Hold per iodic meetings to discuss progress.
C oordination and milestone completion.
G eneration of ar twork for special collection.
![Page 75: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/75.jpg)
+ Timeline November 2016 BD2k meeting: broad announcement for special
collections
November: formation of Steer ing C ommittee
November: O versight C ommittee representative contacts potential jour nal editors
Januar y 2017: Finalize target jour nal for special collections
Present to November 2017: Manuscr ipt generation
July - November: Per iodic meetings for exposure of content, discussion of progress
November 2017: Submission deadline
December 2017/Januar y 2018: Review and revision
Febr uar y 2018: publication appears
![Page 76: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/76.jpg)
+ Open Issues
Iterative process / Multiple deadlinesPublish an earlier marker paper or
set of position papers
![Page 77: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/77.jpg)
+ The CFWG: Looming Issues
![Page 78: Common Framework Working Groups - Data Science · Ronak Patel (Baylor College of Medicine) Lisa Federer (NIH Library) CFWG API Interoperability Working Group Improving the discoverability,](https://reader035.vdocuments.net/reader035/viewer/2022070812/5f0b03cb7e708231d42e6e85/html5/thumbnails/78.jpg)
+ Looming Issues■Consortium-wide tools
■ Diversity of datatypes■Genomic / `omic / variants■Phenotypes / patient■Clinical studies
■Overlapping working groups■Funding / identify / mandate■NIH, trans-agency, international, NGOs
■Awareness
■Longevity / sustainability