tools for communicating in the computational sciences
TRANSCRIPT
tools for communicating in the computational sciences
Brian M. Bot | Senior Scientist | Sage Bionetworks
clearScience
14 December 2012
Sage Bionetworks
Sage Bionetworks
I love Sage notebooks!
a non-profit organization which pilots a variety of components that are necessary to build a scientific research “commons”
why?
Sage Bionetworks
“We Must Guard Against the acquisition of unwarranted influence,
whether sought or unsought, by the Military Industrial Complex”
- Dwight D. Eisenhower 1961 Medical
not conducive for a ‘commons’
institutional incrementalism
individual tenure
proprietary short term solutions
not conducive for a ‘commons’
commonsenabling a
open data
accessible platform
clear communication
“The problem is that right now, it’s not easy to donate your data to health research.”
“The goal of Consent to Research is to play a part in the transformation of health from
something we experience passively to something we
experience actively.”
http://weconsent.usJohn Wilbanks, Chief Commons Officer
open data
open data
accessible platform
clear communication
commonsenabling a
‣ compute ‣ hardware ‣ software
‣ data ‣ code
analysis environment
RESTful APIs
accessible platform
open data
accessible platform
clear communication
commonsenabling a
clear communication
Deception at Duke
research scandals represent merely the extreme of a continuum in the culture of academic research
the status quo tolerates poor communication of findings
6%
21%
8%
11%
54%cannot reproduce
can reproduce in principle
can reproduce w/discrepancies
can reproduce from processed data w/discrepancies
can reproduce partially
Ioannidis A. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149-155 (2009) | doi:10.1038/ng.295
208,294,724 datapoints
124 pages supplemental material
?? lines unobtainable source code
?? version or architecture of statistical analysis program (R)
enumerable R packages and package dependencies
key R package “ClaNC” no longer available
442 citations
often what is in principle reproducible, is not practically reproducible
unidentified publication‣ from journal with 5 year impact factor of 28‣ article freely available for download‣ data freely available for download
how are we to move science forward
if we cannot understand what was done previously?
let’s go back to basics
4. test hypothesis experimentally
5. analyze experimental data
7. publish results
6. draw conclusions based on data
scientific method1. define a question
2. gather information and resources (background research)
3. form a hypothesis
8. retest (frequently done by other scientists)
4. test hypothesis experimentally
5. analyze experimental data
7. publish results
6. draw conclusions based on data
7. publish results
finitein
∞...
submit to journal
analyze on local machine
write a documentsent to reviewers as pdf
printed on paper
static html representation
experimentally generate data
accepted & digitally typeset
static pdf representation
store on local server
are being artificially uncoupled from
scientific claims
science itself
is hardscience
is hardcommunication
(especially for scientists)
clearSciencere-imagining scientific communication
allow consumption of content at a variety of levels of complexity
and abstraction
leverage Synapse RESTful APIs
clearScienceallow consumption of content at a
variety of levels of complexity and abstraction
“hand the keys over” to the reviewers
scientific communicationneeds to evolve
along with scienceneeds to evolve
make it easy to do
good science
clearScience
make it easy to do
AcknowledgementsSage Bionetworks
David Burdick - Rockstar Engineer
Stephen Friend - President and CEO
Erich S. Huang - Director of Cancer Research
Mike Kellen - Director of Technology
External Partners
Myles Axton - Nature Genetics
Phil Bourne - PLoS Computational Biology
Josh Greenberg - Alfred P. Sloan Foundation
Kelly LaMarco - Science Translational Medicine
Eric Schadt - Mount Sinai School of Medicine