embl-abr - bd2k and why bioinformatics matters...bd2k and why bioinformatics matters relevance to...
TRANSCRIPT
![Page 1: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/1.jpg)
BD2K and why bioinformatics matters
relevance to Australia
EMBL - Australia AHM 2016
Vivien Bonazzi Senior Advisor for Data Science Technologies ADDs (Assoc. Director for Data Science) Office
Office of the Director (OD) National Institutes of Health (NIH)
![Page 2: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/2.jpg)
The NIH Data Commons Digital Ecosystems for using and sharing FAIR Data
EMBL - Australia AHM 2016
Vivien Bonazzi Senior Advisor for Data Science Technologies ADDs (Assoc. Director for Data Science) Office Office of the Director (OD) National Institutes of Health (NIH)
![Page 3: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/3.jpg)
http://datascience.nih.gov/bd2k
A word about BD2K
![Page 4: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/4.jpg)
What’s driving the need for a
Data Commons?
![Page 5: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/5.jpg)
Convergence of factors
¤ Mountains of Data
¤ Increasing need and support for Data sharing
¤ Availability of digital technologies and infrastructures that support Data at scale
![Page 6: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/6.jpg)
![Page 7: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/7.jpg)
![Page 8: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/8.jpg)
https://gds.nih.gov/ Went into effect January 25, 2015 NCI guidance: http://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data Requires public sharing of genomic data sets
![Page 9: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/9.jpg)
9
Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad array of large datasets so that researchers, clinicians, and patients will be able to both contribute and analyze data, facilitating discovery that will ultimately improve patient care and outcomes.
9
![Page 10: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/10.jpg)
![Page 11: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/11.jpg)
![Page 12: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/12.jpg)
Challenges with Biomedical Data The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
![Page 13: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/13.jpg)
What’s Changing?
Digital ecosystems
![Page 14: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/14.jpg)
Development of the
NIH Data Commons
![Page 15: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/15.jpg)
¤ How do we find data, software, standards?
¤ How can we make (large) data, annotations, software, metadata accessible?
¤ How do we reuse data, tools and standards?
¤ How do we make more data machine readable?
¤ How do we leverage existing digital technologies systems, infrastructures?
¤ How do we collaborate?
¤ How do we enable digital ecosystem?
Changing the conversation around Data sharing and access
NIH Data Commons
![Page 16: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/16.jpg)
Data Commons enabling data driven science
Enable investigators to leverage all possible data and tools in the effort to accelerate biomedical discoveries, therapies and cures
by
driving the development of data infrastructure and data science capabilities through collaborative research and robust engineering
Matthew Trunnel, FHC
![Page 17: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/17.jpg)
Data Commons’s
![Page 18: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/18.jpg)
Developing a Data Commons
¤ Treats products of research – data, methods, papers etc. as digital objects
¤ These digital objects exist in a shared virtual space • Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows
¤ Digital object compliance through FAIR principles: • Findable • Accessible (and usable) • Interoperable • Reusable
![Page 19: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/19.jpg)
The Data Commons
is a framework
that supports
FAIR data access and sharing
and
fosters the development
of a digital ecosystem
https://datascience.nih.gov/commons
![Page 20: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/20.jpg)
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data “Reference” Data Sets
User defined data
Dig
ital O
bje
ct C
om
plia
nc
e
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
![Page 21: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/21.jpg)
Current Data Commons Pilots
![Page 22: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/22.jpg)
Current Data Commons Pilots
Explore feasibility of the Commons Framework
Facilitate collaboration and interoperability
Making large and/or high impact NIH funded data sets
and tools accessible in the cloud
Developing Data and Software indexing methods
Leveraging BD2K Efforts: bioCADDIE and others.
Collaborating with external groups
Provide access to cloud (IaaS) and PaaS/SaaS via credits
Connecting credits to the grants system
![Page 23: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/23.jpg)
Reference Data Sets Pilot
Large, High-Impact Datasets in the Cloud
![Page 24: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/24.jpg)
Commons Framework Pilots Software and Services
![Page 25: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/25.jpg)
Commons Framework
• FAIRness Metrics
• Data-object registry
• Interoperability of APIs
• Workflow sharing and docker registry
• Commons Framework Publications
![Page 26: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/26.jpg)
Resource Search & Indexing
Discoverability of data and software
![Page 27: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/27.jpg)
Cloud Credits Model
$ denominated NIH credits to use cloud resources (IaaS) and services (PaaS/SaaS)
![Page 28: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/28.jpg)
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data “Reference” Data Sets
User defined data
Dig
ital O
bje
ct C
om
plia
nc
e
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
![Page 29: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/29.jpg)
Authorization /authentication layer
Digital Ecosystem
![Page 30: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/30.jpg)
Considerations and
Concluding Thoughts
![Page 31: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/31.jpg)
Considerations
¤ Metrics – Understanding and accounting of data usage patterns
¤ Cost • Cloud Storage
• Pay for use cloud compute (NIH credits pilot)
• Indirect costs for cloud
¤ Hybrid Clouds – Institution (private) and commercial (public) clouds
¤ Managing Open vs Controlled access data • Auth: single sign on - dreams/nightmares?
¤ Archive vs Working and versioning Copies of data
¤ Interoperability with other Commons (clouds)
![Page 32: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/32.jpg)
¤ Standards – Metadata, UIDs, APIs
¤ Discoverability – Finding digital objects across clouds
¤ Interfaces – For users with different needs and capabilities
¤ Consent – Reconsenting data, Dynamic consents?
¤ Policies • Data sharing policies that are useful and effective
• Keep pace with use of technology (e.g. dbGAP data in the Cloud)
¤ Incentives • Access to, and shareability of FAIR Data as part of NIH grant review
criteria
¤ Governance – Community involvement in governance models
¤ Sustainability – Long term support
![Page 33: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/33.jpg)
Relevance to Australia?
![Page 34: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/34.jpg)
Relevance to Australia
¤ The value of Australian Data * ¤ Unique flora and fauna
¤ e.g Marsupials ¤ Indigenous Australians
¤ Understanding of genomic structure – health & disease ¤ Medicinal products
¤ Making this data (securely) available ¤ With high quality annotation and metadata ¤ Attributions to original authors ¤ On the cloud ¤ Via open standard APIs
¤ Aggregation of data via an Australian wide Commons?
![Page 35: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/35.jpg)
Authorization /authentication layer
Oz Digital Ecosystem
![Page 36: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/36.jpg)
Summary
¤ We need an unprecedented level of convergence and
collaboration to drive biomedical science to the next level.
¤ Supporting this model of data-intensive collaborative
science requires a shift in academic research culture and
new investments in data infrastructure and capabilities.
Matthew Trunnel, FHC
![Page 37: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/37.jpg)
Acknowledgments • ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
![Page 38: EMBL-ABR - BD2K and why bioinformatics matters...BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3d6de9e883df4b5b55efcc/html5/thumbnails/38.jpg)
Stay in
Touch
QR Business Card
@Vivien.Bonazzi
Slideshare
Blog (Coming soon!)