open source scientific software
DESCRIPTION
Brief talk at BrainHack 2013 on developing open-source scientific software: strategical issues on project positioning,TRANSCRIPT
Open source scientific softwareWhat, why, & how
Gael Varoquaux —
Slides on slideshare
Please allow me to introduce myselfI’m a man of wealth and tasteI’ve been around for a long, long year
2005..2007: Experimental-control softwareQuantum physics, free-fall airplanes
2006... Open source scientific PythonMayavi, scikit-learn, joblib, nipy, nilearn...
2008 Consultant, scientific PythonStartup: Enthought, Texas
Scipy/Euroscipy conference chair
G Varoquaux 2
Open source scientific software
1 What
sourceaccessscience
data
G Varoquaux 3
1 Open Source: definitions
OSI: Open Source Initiative http://opensource.org
Free redistribution
Access to source code
Allow derived work
No discrimination against persons or groups /against fields of endeavor
FSL, I am looking at youUniversities are commercial entities
(Madey vs Duke)
Open CommunityAccess to a code repository: read & writeSPM, FreeSurfer... I am looking at you
G Varoquaux 4
1 Open Source: definitions
OSI: Open Source Initiative http://opensource.org
Free redistribution
Access to source code
Allow derived work
No discrimination against persons or groups /against fields of endeavor
FSL, I am looking at youUniversities are commercial entities
(Madey vs Duke)
Open CommunityAccess to a code repository: read & writeSPM, FreeSurfer... I am looking at you
G Varoquaux 4
1 Choice of license
http://opensource.org/licenses
Use it, don’t screw my usersBSD, MIT
Viral by code inclusionLGPL
CopyLeftGPL
Do you understand the consequences?- GPL code cannot be linked to MKL- LGPL code can only be reused in GPL/LGPL code- Code with no licenses cannot be used
G Varoquaux 5
1 Choice of license
http://opensource.org/licenses
Use it, don’t screw my usersBSD, MIT
Viral by code inclusionLGPL
CopyLeftGPL
Do you understand the consequences?Don’t invent licensesLegalese should be left to lawyers
G Varoquaux 5
1 Choice of license
http://opensource.org/licenses
Use it, don’t screw my usersBSD, MIT
Viral by code inclusionLGPL
CopyLeftGPL
Do you understand the consequences?Don’t invent licensesLegalese should be left to lawyers
Use BSDfoster private sectoravoid legal difficultieswe need a much reuse as possiblescience should not have strings attached
G Varoquaux 5
Open source scientific software
2 Why
www.phdcomics.com
How do we justify the investmentto our bossesto the funding agencies
G Varoquaux 6
2 For the Good of Science“if it’s not open andverifiable by others, it’snot science, or engineering,or whatever it is you callwhat we do” Stodden, 2010
“An article about computational science in a scientificpublication is not the scholarship itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment.”
Buckheit & Donoho, 1995Reproducible science
These are high-levelconclusions
Need more ground-to-earth arguments
G Varoquaux 7
2 For the Good of Science“if it’s not open andverifiable by others, it’snot science, or engineering,or whatever it is you callwhat we do” Stodden, 2010
“An article about computational science in a scientificpublication is not the scholarship itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment.”
Buckheit & Donoho, 1995Reproducible science
These are high-levelconclusions
Need more ground-to-earth arguments
G Varoquaux 7
2 Lab survival: beyond the oral tradition
Can you run the analysisof the lab’s former students?
We need basic building blocks
More eyes make bugs shallow
G Varoquaux 8
2 The economicsCode maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/monthjoblib ∼ 45 email/month mayavi ∼ 30 email/month
“Hey Gael, I take it you’re toobusy. That’s okay, I spent a daytrying to install XXX and I thinkI’ll succeed myself. Next timethough please don’t ignore myemails, I really don’t like it. Youcan say, ‘sorry, I have no time tohelp you.’ Just don’t ignore.”
Your “benefits” come from a fraction of the codeData loading?Standard algorithms?
Share the common code......to avoid dying under code
Code becomes less precious with timeAnd somebody might contribute features
G Varoquaux 9
2 The economicsCode maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/monthjoblib ∼ 45 email/month mayavi ∼ 30 email/month
Your “benefits” come from a fraction of the codeData loading?Standard algorithms?
Share the common code......to avoid dying under code
Code becomes less precious with timeAnd somebody might contribute features
G Varoquaux 9
2 Having an impact
To reach our target audience(neuroscientists, MD)
To disseminate our ideas
To facilitate new ideas
Can bring citations
G Varoquaux 10
Open source scientific software
3 How
G Varoquaux 11
3 Choice of environment
Python, what else?
High-level language- interactive ipython- easy to debug- general purpose
Scientific computing environment- array-computing numpy- rich ecosystem
scipy, scikit-learn,scikit-image...
G Varoquaux 12
3 6 steps to a successfull project
1 Focus on quality
2 Build great docs and examples
3 Use github
4 Limit the technicality of your codebase
5 Releasing and packaging matter
6 Focus on your contributors,give them credit
http://www.slideshare.net/GaelVaroquaux/scikit-learn-dveloppement-communautaire
G Varoquaux 13
3 Scikit-learn: a very successful projectGeneral-purpose machine learning in Python
Over 200 contributors∼ 12 core devs
Huge feature list: benefits of wide teamSuccess recipe: product vision, great docs, high-level
Documentation: all figures are generatedCrafting simple didactic examples has taught us a lot
⇒ Executable docs= textbooks of the future
G Varoquaux 14
3 Nilearn: making multivariate analysis routine
ni
Project scope Very preliminarMachine learning for neuroimaging:make using scikit-learn on neuroimaging easy
The target user base is small
Examples in the docs
Run out of the box,downloading open dataProduce a clear figure
Data from Miyawaki 2008
Routine, simple, reproduction of papers
G Varoquaux 15
Open source scientific software
It’s worth itDo it right:- Liberal licensing (BSD)- Realistic engineer compromises- Quality and ease of use (the apple strategy)
Work with us on nilearn niExamples = open science
@GaelVaroquaux
Open source a tragedie 1/f distribution
Source: Fernando Perez