the challenge - data science · – 134, combining protein and genome annotation for interpretation...
TRANSCRIPT
The Challenge
BD2K-SDC Sustainability
Sustainability - Agenda• Introduction: Allen Dearry• Results From Metrics RFI: Izumi Hinkson• Panel Presentations and Discussion
– Susan Gregurick, Moderator– David Giaretta: The Role of Trustworthy Digital Repositories in Sustainability – George Alter: The Sloan Stewardship Gap Project– Melissa Landrum: Archiving Interpretations of Variants in ClinVar– Cathy Wu: Interoperability, Sustainability, and Impact: A UniProt Case Study
• Posters– 134, Combining Protein and Genome Annotation for Interpretation of
Genomic Variants, Peter McGarvey– 135, Interoperability of NURSA, PharmGKB, dkNET, and DataMed, Neil
McKenna • Notes
– https://docs.google.com/document/d/1EaYSuOeR7BJnmjzAAoS8iuS86UuS-3eK-MesX6-9_Eg/edit (hotlink on program book p6)
RFI- Metrics to Assess the Value of Biomedical Digital Repositories
Izumi HinksonAAAS S&T Policy Fellow
NCI CBIIT
BD2K & SDC Sustainability Working Group November 30, 2016
Goals of RFI
Solicit input from stakeholdersFoundation for long-term sustainability
o Enable repository owners to prioritize repository management
o Support decisions made by funding agencies
o Support diverse domains of scienceo Communication between stakeholders
Distribution of Response ThemesUse & Users
Quality & Impact
Quality of Service
Governance
Infrastructure
Surveys & Case Studies
Other considerations
24%n=65
26%n=71
9%n=24
8%n=21
6%n=16
11%n=29
16%n=42
Examples of metrics
24%n=65
Use & Users
• Number of downloads• Size of user community• International reach• Cautions
• User counts vs coverage • Bias of utilization statistics
Examples of metrics
26%n=71
Quality & Impact
• Number of publications,citations, grants, and patents
• Data and metadata standards• Educational tools and protocols• Altmetrics• Cautions
• Data vs repository value• Context for quantitative
metrics
Examples of metrics
16%n=42
Quality of Service
• Expertise of staff• Up time and response time• Regularly scheduled updates and
maintenance • Help desk and FAQs• Tutorials, webinars, and training• Cautions
• Down time context
Examples of metrics
11%n=29
Governance
• Scientific Advisory Board• Legal, regulatory, and
contractual framework• Documentation• Terms of use• Licensing• Encryption and security
• Lifecycle management plan• Funding
Examples of metrics
16%n=42
Infrastructure
• Infrastructure funding• Technology architecture
• Hardware and software• Maintenance• Sustainability
• Office space
Examples of metrics
9%n=24
Surveys & Case Studies
• User experience surveys• Stakeholder interviews• Availability of alternatives• Counterfactuals• External audits• Cautions
• Testimonials may be biased• Validity of counterfactuals
Examples of metrics
. . .
8%n=21
Other Considerations
• Terminology• Indicators vs metrics
• Research cycle specific metrics• Tracking of missing, uncertain,
contradictory, or retracted data• “Diversity in data complexity
calls for different metrics”• Existing metrics and assessment
resources (e.g., ISO 16363, DSA-WDS)
Where to find more information?
• Executive Summary, mid-Decembero datascience.nih.gov/bd2k
• For more information, email:o [email protected]
Data Science at NIH
Data Science at NIH https://datascience.nih.gov/adds [email protected]@NIH_BD2K #BD2K, #BigData
Framing Questions for Sustainability Session at the 2016 BD2K AHM(also on google doc notes page)
1. In thinking about data preservation, scientific quality and impact, what do you find are the most important elements, or indicators, that promote data quality and ensure data impact?
2. As data integration and database cross integration become commonplace, how will this affect attribution and adherence to the FAIR principles*? For example, are there best practices for using format standards or for correctly referencing data identifiers that best support cross-linking data & repositories from one source/provider to another?
3. For data repositories that deal with clinical data and information, are there particular issues or challenges with adhering to the FAIR principles and performance indicators that could impact success?
4. What is the role for certifications of biomedical data repositories and trusted digital repositories? Will this differ between different biomedical and scientific domains?
5. What role does the scientific research community play in planning for and evaluating indicators for data preservation?
*FAIR principles are Findable, Accessible, Interoperable and Reusable.