towards biomedical research as a digital enterprise
DESCRIPTION
Presented at the American College of Medical Informatics (ACMI) Winter Symposium in Phoenix, Arizona, February 14, 2014TRANSCRIPT
2014 ACMI Winter Symposium 1
Towards Biomedical Research as a Digital Enterprise
Philip E. BourneUniversity of California San Diego
2/14/14
2014 ACMI Winter Symposium 2
My Background/Bias• Limited Biomedical Informatics Experience – IAIMS,
Pharmacy Informatics
• RCSB PDB/IEDB Database Developer – Views on community, quality, sustainability …
• PLOS Journal Co-founder – Open Science Advocate• Associate Vice Chancellor for Innovation – Business
models, interaction with the private sector, sustainability• Professor – Mentoring, reward system, value (or not) of
research
2/14/14
2014 ACMI Winter Symposium 3
Why Am I Here?
• In two weeks I will take on the NIH role of Associate Director for Data Science (ADDS):
NIH Data Science Point Person
Reports to NIH Director Lead the BD2K initiative Trans-NIH responsibilities for data
Eric Green, Acting
2/14/14
[Modified slide from Eric Green]
2014 ACMI Winter Symposium 4
Disclaimer
• These comments are currently being made as an employee of the University of California system and reflect my own opinions.
2/14/14
2014 ACMI Winter Symposium 5
I want to engage with this community to:
• Understand the most pressing problems• Begin a dialog • Inform you of what I am currently thinking• Inform you of NIH initiatives that are
underway or planned• Have you change my thinking appropriately
2/14/14
2014 ACMI Winter Symposium 6
The NIH Process Thus Far
An external advisory group provided a valuable blueprint for what should be done
acd.od.nih.gov/diwg.htm
2/14/14
2014 ACMI Winter Symposium 7
Blueprint Recommendations• Promote central and federated catalogs
– Establish minimal metadata framework– Tools to facilitate data sharing– Elaborate on existing data sharing policies
• Support methods and applications– Fund all phases of software development– Leverage lessons from National Centers
• Training– More funding– Enhance review of training apps– Quantitative component to all awards
• On campus IT strategic plan– Catalog of existing tools– Informatics laboratory– Ditto big data
• Sustainable funding commitment
2/14/14
acd.od.nih.gov/diwg.htm
2014 ACMI Winter Symposium 8
Special Considerations for Phenotypic Data Relevant to ACMI
• Definition: From cellular to human; sensitive or non-sensitive
• Need:– Provide transparency regarding current policies– Develop a common language for appropriate data
access– Establish the appropriate forum to establish
policies
2/14/14
2014 ACMI Winter Symposium 9
Some of the Phenotypic Data Issues
• Data Governance– Needs a balance of technology and policy
solutions• Data Sharing
– Query with or without data release• Data Characterization
– Local vs standard nomenclature and associated mapping
2/14/14
Aligns well with:Hripcsak et al. J. Am. Med. Inform. Assoc. 2014 21:204-211
2014 ACMI Winter Symposium 10
Let Me Outline Then in General Terms Where I See My Effort Being Spent Going Forward
2/14/14
http://pebourne.wordpress.com/2013/12/
2014 ACMI Winter Symposium 11
ADDS Initial Thrusts
• How data are currently being used• Lightweight metadata standards• Data & software registries• Expanded policies on data sharing, open source
software• Training programs & reward systems• Institutional incentives• Private sector incentives• Data centers serving community needs2/14/14
2014 ACMI Winter Symposium 12
ADDS Initial Thrusts
• How data are currently being used• Lightweight metadata standards• Data & software registries• Expanded policies on data sharing, open source
software• Training programs & reward systems• Institutional incentives• Private sector incentives• Data centers serving community needs2/14/14
2014 ACMI Winter Symposium 13
We Need to Start By Asking How Are We Using the Data Now!
Only Then Can We Make Rational Decisions About Data – Large or Small
2/14/14
2014 ACMI Winter Symposium
How Data Are Used
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
14[Andreas Prlic]2/14/14
2014 ACMI Winter Symposium 15
We Need to Learn from Industries Whose Livelihood Addresses the Question of Use
2/14/14
2014 ACMI Winter Symposium 16
ADDS Initial Thrusts – More Detail• Now:
– Data centers (under review)– Data science training grants (call out)– Pilot data catalog consortium (call out)– Genomic Data Sharing Policy (being finalized)– Piloting “NIH-drive”
• What Is Planned:– Extended public-private programs specifically for data science
activities– Interagency activities– International exchange programs– Cold Spring Harbor-like training facilities – by-coastal?– Programs for better data descriptions– Reward institutions/communities– Policies to get clinical trial data into the public domain
2/14/14
2014 ACMI Winter Symposium 17
ADDS Initial Thrusts – More Detail• Now:
– Data centers (under review)– Data science training grants (call out)– Pilot data catalog consortium (call out)– Genomic Data Sharing Policy (being finalized)– Piloting “NIH-drive”
• What Is Planned:– Extended public-private programs specifically for data science
activities– Interagency activities– International exchange programs– Cold Spring Harbor-like training facilities – by-coastal?– Programs for better data descriptions– Reward institutions/communities– Policies to get clinical trial data into the public domain
2/14/14
2014 ACMI Winter Symposium 18
Pilot NIH-Drive
• Investigator A from the NCI makes frequent reference to the over expression of genes x and y.
• Investigator B from the NHLBI makes frequent reference to the under expression of genes x and y
• Automatic notification of a potential common interest before publication or database deposition
2/14/14
2014 ACMI Winter Symposium 19
Let Me Bring Us Back to a More Far Reaching View Embodied in the Title of This Talk:
Towards Biomedical Research as a Digital Enterprise
2/14/14
2014 ACMI Winter Symposium 20
First Consider What We Do (or Wish We Could Do) Every Day:
We take actions on digital data increasingly across boundaries
2/14/14
2014 ACMI Winter Symposium 21
Actions on Biomedical Data Implies:
• Insuring data quality and hence trust• Making data sustainable• Making data open and accessible• Making data findable• Providing suitable metadata and annotation• Making data queryable• Making data analyzable• Presenting data as to maximize its value• Rewarding good data practices2/14/14
2014 ACMI Winter Symposium 22
Actions on Biomedical Data Implies:
• Insuring data quality and hence trust • Making data sustainable • Making data open and accessible • Making data findable • Providing suitable metadata and annotation• Making data queryable• Making data analyzable • Presenting data as to maximize its value• Rewarding good data practices2/14/14
2014 ACMI Winter Symposium 23
Boundaries on Biomedical Data Implies:
• Working across biological scales• Working across biomedical disciplines• Working across basic and clinical research and
practice• Working across institutional boundaries• Working across public and private sectors• Working across national and international
borders• Working across funding agencies2/14/14
2014 ACMI Winter Symposium 24
Boundaries on Biomedical Data Implies:
• Working across biological scales • Working across biomedical disciplines• Working across basic and clinical research and
practice• Working across institutional boundaries• Working across public and private sectors • Working across national and international
borders• Working across funding agencies2/14/14
2014 ACMI Winter Symposium 25
These Issues Have Been Around Almost As Long As Biomedical informatics
The Good News is That “Big Data” Has Bought More Attention to the Problem
2/14/14
2014 ACMI Winter Symposium 26
What Are Big Data?
• Large datasets from high throughput experiments
• Large numbers of small datasets• Data which are “ill-formed”• The why (causality) is replaced by the what• A signal that a fundamental change is taking
place – a tipping point?
2/14/14
2014 ACMI Winter Symposium 27
That Change is Embodied inThe Digital Enterprise
• Consists of digital assets• E.g. datasets, papers, software, lab notes• Each asset is uniquely identified and has
provenance, including access control• E.g. publishing simply involves changing the
access control• Digital assets are interoperable across the
enterprise
2/14/14
2014 ACMI Winter Symposium 28
The Enterprise Is Almost Anything..Your Lab, your Institution, the
NIH….
2/14/14
2014 ACMI Winter Symposium 29
Consider an Academic Institution As A Digital Enterprise
• Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors, whose research profiles are on-line and well described, are automatically notified of Jane’s potential based on a computer analysis of her scores against the background interests of the neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research rotation. During the rotation she enters details of her experiments related to understanding a widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line research space – an institutional resource where stakeholders provide metadata, including access rights and provenance beyond that available in a commercial offering. According to Jane’s preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a graduate student in the chemistry department whose notebook reveals he is working on using bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a number of times in their notes, which is of interest to two very different disciplines – neurology and environmental sciences. In the analog academic health center they would never have discovered each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage. The collaboration results in the discovery of a homologous human gene product as a putative target in treating the neurodegenerative disorder. A new chemical entity is developed and patented. Accordingly, by automatically matching details of the innovation with biotech companies worldwide that might have potential interest, a licensee is found. The licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the license. The research continues and leads to a federal grant award. The students are employed, further research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014 21:194
2/14/14
2014 ACMI Winter Symposium 30
The NIH is Starting to Think About the Digital Enterprise, Witness…
2/14/14
bd2k.nih.gov
2014 ACMI Winter Symposium 31
What Will Define the NIH Digital Enterprise?
• NCBI/NLM• Trans-NIH collaboration – a culture change• Long-term NIH strategic planning • The BD2K Initiative• A “hub” of data science activities • International cooperation• Interagency cooperation• Data sharing policies• External forces….2/14/14
2014 ACMI Winter Symposium 32
External Forces: Science Will Continue to Become More Open
• The public (and hence the politicians demand it)
• Its the right thing to do• Its part of the modern psyche• The scholarly enterprise is broken and more
stakeholders are acknowledging it
2/14/14
2014 ACMI Winter Symposium
http://sagecongress.org/Presentations/Sommer.pdf
Result: Discovery is Too Slow
2/14/14 33
[Josh Sommer]
2014 ACMI Winter Symposium
http://sagecongress.org/Presentations/Sommer.pdf
Result: Discovery is Too Slow
2/14/14 34
[Josh Sommer]
2014 ACMI Winter Symposium 35
Personal Evidence for a Broken System
• I have a paper with 16,000 citations that no one has ever read
• I have papers in PLOS ONE that have more citations than ones in PNAS
• I have data sets I am proud of but no place to put them
• I “cant” reproduce work from my own lab….
2/14/14
Personal Evidence for a Broken System
• I cant immediately reproduce the research in my own laboratory:
• It took an estimated 280 hours for an average user to approximately reproduce the paper
• Workflows are maturing and becoming helpful• Data and software versions and accessibility
prevent exact reproducibility
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
2014 ACMI Winter Symposium 362/14/14
2014 ACMI Winter Symposium 37
Politicians Demand It:G8 open data charter
http://opensource.com/government/13/7/open-data-charter-g82/14/14
2014 ACMI Winter Symposium
External Forces: The Deinstitutionalization of Science
38
Daniel Hulshizer/Associated Press
2/14/14
2014 ACMI Winter Symposium
External Forces: The Deinstitutionalization of Science
39
Daniel Hulshizer/Associated Press
2/14/14
2014 ACMI Winter Symposium
An Example of That External Force:The Story of Meredith
40
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
2/14/14
2014 ACMI Winter Symposium
External Forces: The Deinstitutionalization of Science
41
Daniel Hulshizer/Associated Press
2/14/14
External Forces: The Deinstitutionalization of Science
2014 ACMI Winter Symposium 42
Daniel Hulshizer/Associated Press
2/14/14
2014 ACMI Winter Symposium
There Still Needs to be a Reward SystemThe Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that relate to the journal that are missing of stubs
Develop a Wikipedia page in the sandbox
Have a Topic Page Editor Review the page
Publish the copy of record with associated rewards
Release the living version into Wikipedia
432/14/14
44
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
One Possible End Product of Open Science
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e342/14/14
2014 ACMI Winter Symposium 45
If This Vision of a Digital Enterprise Comes to Pass Based Upon:
• More open science• Deinstitutionalization• New modes of scholarly communication• Changing rewards for scholarship
2/14/14
What Will Biomedical Research Look Like?
46
The Research Life Cycle will Persist
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
2/14/14 2014 ACMI Winter Symposium
Tools and Resources Will Continue To Be Developed
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Those Elements of the Research Life Cycle will Become More Interconnected
Around a Common Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
New/Extended Support Structures Will Emerge
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
2/14/14 2014 ACMI Winter Symposium 49
2014 ACMI Winter Symposium 50
Change in the Way we Support the Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
2/14/14
2014 ACMI Winter Symposium 51
Conclusion:Biomedical Research Will Increasingly
Become a Digital Enterprise in the Way I Have Described
2/14/14
Agree/Disagree?If Agree Where Should Resources be Put?
If Disagree What is Your Vision?
2014 ACMI Winter Symposium 52
Provocative Questions Perhaps?
• Do BMI’s see openness in the same way as computational biologists; if not why not?
• Is there indeed perturbation in what it means to be a research scholar and if so is that disruption as prevalent in clinical research as basic research?
• What would you do in my shoes?
2/14/14