infrastructure for communicating data-intensive science
TRANSCRIPT
infrastructure for communicating data-intensive science
brian m. bot | senior scientist | community manager | sage bionetworks
clearScience
a non-profit organization which pilots a variety of components that are necessary to build a scientific research “commons”
why?
Sage Bionetworks
“We Must Guard Against the acquisition of unwarranted influence,
whether sought or unsought, by the Military Industrial Complex”
- Dwight D. Eisenhower 1961 Medical
institutional incrementalism
individual tenure
proprietary shortsighted solutions
not conducive for a ‘commons’
“The problem is that right now, it’s not easy to donate your data to health research.”
“The goal of Consent to Research is to play a part in the transformation of health from
something we experience passively to something we
experience actively.”
http://weconsent.usJohn Wilbanks, Chief Commons Officer
open data
accessible platform
a collaborative compute space that allows scientists to share and analyze
data together
the status quo tolerates poor communication of findings
6%
21%
8%
11%
54%cannot reproduce
can reproduce in principle
can reproduce w/discrepancies
can reproduce from processed data w/discrepancies
can reproduce partially
Ioannidis A. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149-155 (2009) | doi:10.1038/ng.295
208,294,724 datapoints
124 pages supplemental material
?? lines unobtainable source code
?? version or architecture of statistical analysis program (R)
enumerable R packages and package dependencies
key R package “ClaNC” no longer available
442 citations
often what is in principle reproducible, is not practically reproducible
unidentified publication‣ from journal with 5 year impact factor of 28‣ article freely available for download‣ data freely available for download
4. test hypothesis experimentally
5. analyze experimental data
7. publish results
6. draw conclusions based on data
scientific method1. define a question
2. gather information and resources (background research)
3. form a hypothesis
8. retest (frequently done by other scientists)
4. test hypothesis experimentally
5. analyze experimental data
7. publish results
6. draw conclusions based on data
submit to journal
analyze on local machine
write a documentsent to reviewers as pdf
printed on paper
static html representation
experimentally generate data
accepted & digitally typeset
static pdf representation
store on local server
clearSciencere-imagining scientific communication
allow consumption of content at a variety of levels of complexity
and abstraction
leverage Synapse RESTful APIs
clearScienceallow consumption of content at a
variety of levels of complexity and abstraction
“hand the keys over” to the reviewers
“Scientists often study the past as obsessively as historians because few
other professions depend so acutely on it. Every experiment is a conversation with
a prior experiment, every new theory a refutation of the old”
-Siddhartha Mukherjee, The Emperor of All Maladies
AcknowledgementsSage Bionetworks
David Burdick - Senior Software Engineer
Stephen Friend - President and CEO
Erich S. Huang - Director of Cancer Research
Michael Kellen - Director of Technology
External Partners
Myles Axton - Nature Genetics
Phil Bourne - PLoS Computational Biology
Josh Greenberg - Alfred P. Sloan Foundation
Kelly LaMarco - Science Translational Medicine
Eric Schadt - Mount Sinai School of Medicine