sustainability plans for the nih/nhgri model organism ......• the plan provides a roadmap for...
TRANSCRIPT
Valentina Di Francesco
PAG XXVII – San Diego
Jan 14, 2019
Sustainability Plans for the NIH/NHGRI Model Organism Databases
Current Status and a Vision for 2019 and Beyond
2
NHGRI MODs and GO Consortium Organizational Model in 2016
SGD
FlyBase
WormBase
MGD
ZFIN
GO Consortium
Different interfaces for common data types• No single access to multiple data resources
and different user interfaces• Different annotation strategies and workflows• Different data representation
Redundancy of operations at 6 resources• Data management systems for related data
structures and types• System administration and IT support• User support• Links to/from the same public resources
3
MODs and GO Data Types
4
NHGRI Goals• Facilitate access and use of these valuable
resources • Transition the resources to a more effective and
sustainable funding model • Repurpose NHGRI funds towards other needed
genomics-based data science research
MODs Reorganization
5
• Founding members of the AGR Consortium:• MGD, FlyBase, SGD, ZFIN, WormBase, GO Consortium, RGD
• 3-year supplement awarded to WormBase in Sept 2016 to
establish an integrated resource
• Gradual decrease of each resource’s budget starting in FY17
Alliance of Genome Resources
6
• Enhance comparative genome biology by providing unified data access mechanisms to the MODs and GO Consortium data
• genes, genomes, gene function and genotype/phenotype associations
• Implement a common, flexible ecosystem of shared modular infrastructure that will support core functionality
• Sustainability goals: Increase efficiency and decrease costs associated with the maintenance of the resources
Scope of the AGR
7
Individual MODs hold most of the data and function. The Alliance currently has a small subset.
Jeff DePons, AGR/RGD
Now…
Over time the software and data residing at individual MODs get smaller and the Alliance gets larger.
Jeff DePons, AGR/RGD
Moving forward
https://www.alliancegenome.org/
• Alliance web site and APIs for data access• gene symbol/name (synonyms)• functional annotation terms (GO)• disease terms (DO)
• Common orthology set• S3 buckets in the cloud for NIH Data Commons• Multi-organism display of curated gene to human disease associations• Common instance of JBrowse• Modular sequence display widget• Common automated concise gene description text• Compact annotation ribbon display for function, phenotype, expression• Common ingest and display of interaction data
Overview of Accomplishments
Paul Sternberg AGR/WormBase
11
12
13
• Modular, shared infrastructure that handles all core MOD software functions
• MODs will be able to move residual functions to Alliance Central Portals
• Flexibility to allow evolution of resources and onboarding of others
• Users (both individual and power) have better access to data and more rapid response to new data and new software technology
• Clinical researchers will have facile access to model organism and some human data
• Coordination of MOD/GO communication and outreach
AGR Five-Year Goals
14
Paul Sternberg AGR/WormBase
Focus on model organism-specific needs:
• Data curation efforts
• Outreach to individual model organism research communities
• Streamlined or eliminated user-interfaces at each individual MOD
• Incorporation of additional data types with co-funding by other ICs
Individual MODs – Next steps
15
• Model organism researchers• Genomicists• NIH - Sustainability
• NHGRI• NIH Scientific Data Council• NIH Data Commons Pilot
AGR/MODs Stakeholders
16
• Requested by Congress
• Developed by the NIH Scientific Data Council
• Released on June 4, 2018
• The plan provides a roadmap for modernizing the NIH funded biomedical data
science eco-system and focuses on:
• Modernizing the data resources ecosystem to increase its utility for
researchers and to optimize its efficiency of operation
• Separate funding support for databases, knowledgebases and tools
NIH Strategic Plan for Data Science
17
NIH Data Commons Pilot Phase (DCPP)• Pilot phase 1 Sept 2017- Nov 2018
• AGR had a key role in phase 1, which will continue in phase 2 • Procurement and deployment of AGR data and services on multiple cloud platforms• Development and implementation of use cases spanning the 3 initial datasets• Development and implementation of Data Commons best practices for FAIR data
Data Commons Pilot Phase OT Awardees Stanley Ahalt, RENCI Isaac Kohane, HarvardTitus Brown, UC Davis Avi Ma’ayan, Mt. SinaiMerce Crosas, Harvard Lucila Ohno-Machado, UCSDBrandi Davis-Dusenbery, SevenBridges
Benedict Paten, UCSC; Robert Grossman, UChicago;Anthony Philippakis, Broad Institute
Ian Foster, UChicago Owen White, UMD
Data Sets
TOPMed
DCPP Consortium
• AGR will enhance comparative genome biology by providing unified data access mechanisms to the MODs and GO Consortium data
Summary
19
Acknowledgements
20
NIH/NHGRI Colleagues: Robert Fullem, Ajay PillaiAGR Consortium