open source scientific software

22
Open source scientific software What, why, & how Ga¨ el Varoquaux Slides on slideshare

Upload: gaelvaroquaux

Post on 27-Jan-2015

125 views

Category:

Technology


4 download

DESCRIPTION

Brief talk at BrainHack 2013 on developing open-source scientific software: strategical issues on project positioning,

TRANSCRIPT

Page 1: Open Source Scientific Software

Open source scientific softwareWhat, why, & how

Gael Varoquaux —

Slides on slideshare

Page 2: Open Source Scientific Software

Please allow me to introduce myselfI’m a man of wealth and tasteI’ve been around for a long, long year

2005..2007: Experimental-control softwareQuantum physics, free-fall airplanes

2006... Open source scientific PythonMayavi, scikit-learn, joblib, nipy, nilearn...

2008 Consultant, scientific PythonStartup: Enthought, Texas

Scipy/Euroscipy conference chair

G Varoquaux 2

Page 3: Open Source Scientific Software

Open source scientific software

1 What

sourceaccessscience

data

G Varoquaux 3

Page 4: Open Source Scientific Software

1 Open Source: definitions

OSI: Open Source Initiative http://opensource.org

Free redistribution

Access to source code

Allow derived work

No discrimination against persons or groups /against fields of endeavor

FSL, I am looking at youUniversities are commercial entities

(Madey vs Duke)

Open CommunityAccess to a code repository: read & writeSPM, FreeSurfer... I am looking at you

G Varoquaux 4

Page 5: Open Source Scientific Software

1 Open Source: definitions

OSI: Open Source Initiative http://opensource.org

Free redistribution

Access to source code

Allow derived work

No discrimination against persons or groups /against fields of endeavor

FSL, I am looking at youUniversities are commercial entities

(Madey vs Duke)

Open CommunityAccess to a code repository: read & writeSPM, FreeSurfer... I am looking at you

G Varoquaux 4

Page 6: Open Source Scientific Software

1 Choice of license

http://opensource.org/licenses

Use it, don’t screw my usersBSD, MIT

Viral by code inclusionLGPL

CopyLeftGPL

Do you understand the consequences?- GPL code cannot be linked to MKL- LGPL code can only be reused in GPL/LGPL code- Code with no licenses cannot be used

G Varoquaux 5

Page 7: Open Source Scientific Software

1 Choice of license

http://opensource.org/licenses

Use it, don’t screw my usersBSD, MIT

Viral by code inclusionLGPL

CopyLeftGPL

Do you understand the consequences?Don’t invent licensesLegalese should be left to lawyers

G Varoquaux 5

Page 8: Open Source Scientific Software

1 Choice of license

http://opensource.org/licenses

Use it, don’t screw my usersBSD, MIT

Viral by code inclusionLGPL

CopyLeftGPL

Do you understand the consequences?Don’t invent licensesLegalese should be left to lawyers

Use BSDfoster private sectoravoid legal difficultieswe need a much reuse as possiblescience should not have strings attached

G Varoquaux 5

Page 9: Open Source Scientific Software

Open source scientific software

2 Why

www.phdcomics.com

How do we justify the investmentto our bossesto the funding agencies

G Varoquaux 6

Page 10: Open Source Scientific Software

2 For the Good of Science“if it’s not open andverifiable by others, it’snot science, or engineering,or whatever it is you callwhat we do” Stodden, 2010

“An article about computational science in a scientificpublication is not the scholarship itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment.”

Buckheit & Donoho, 1995Reproducible science

These are high-levelconclusions

Need more ground-to-earth arguments

G Varoquaux 7

Page 11: Open Source Scientific Software

2 For the Good of Science“if it’s not open andverifiable by others, it’snot science, or engineering,or whatever it is you callwhat we do” Stodden, 2010

“An article about computational science in a scientificpublication is not the scholarship itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment.”

Buckheit & Donoho, 1995Reproducible science

These are high-levelconclusions

Need more ground-to-earth arguments

G Varoquaux 7

Page 12: Open Source Scientific Software

2 Lab survival: beyond the oral tradition

Can you run the analysisof the lab’s former students?

We need basic building blocks

More eyes make bugs shallow

G Varoquaux 8

Page 13: Open Source Scientific Software

2 The economicsCode maintenance is expensive

scikit-learn ∼ 300 email/month nipy ∼ 45 email/monthjoblib ∼ 45 email/month mayavi ∼ 30 email/month

“Hey Gael, I take it you’re toobusy. That’s okay, I spent a daytrying to install XXX and I thinkI’ll succeed myself. Next timethough please don’t ignore myemails, I really don’t like it. Youcan say, ‘sorry, I have no time tohelp you.’ Just don’t ignore.”

Your “benefits” come from a fraction of the codeData loading?Standard algorithms?

Share the common code......to avoid dying under code

Code becomes less precious with timeAnd somebody might contribute features

G Varoquaux 9

Page 14: Open Source Scientific Software

2 The economicsCode maintenance is expensive

scikit-learn ∼ 300 email/month nipy ∼ 45 email/monthjoblib ∼ 45 email/month mayavi ∼ 30 email/month

Your “benefits” come from a fraction of the codeData loading?Standard algorithms?

Share the common code......to avoid dying under code

Code becomes less precious with timeAnd somebody might contribute features

G Varoquaux 9

Page 15: Open Source Scientific Software

2 Having an impact

To reach our target audience(neuroscientists, MD)

To disseminate our ideas

To facilitate new ideas

Can bring citations

G Varoquaux 10

Page 16: Open Source Scientific Software

Open source scientific software

3 How

G Varoquaux 11

Page 17: Open Source Scientific Software

3 Choice of environment

Python, what else?

High-level language- interactive ipython- easy to debug- general purpose

Scientific computing environment- array-computing numpy- rich ecosystem

scipy, scikit-learn,scikit-image...

G Varoquaux 12

Page 18: Open Source Scientific Software

3 6 steps to a successfull project

1 Focus on quality

2 Build great docs and examples

3 Use github

4 Limit the technicality of your codebase

5 Releasing and packaging matter

6 Focus on your contributors,give them credit

http://www.slideshare.net/GaelVaroquaux/scikit-learn-dveloppement-communautaire

G Varoquaux 13

Page 19: Open Source Scientific Software

3 Scikit-learn: a very successful projectGeneral-purpose machine learning in Python

Over 200 contributors∼ 12 core devs

Huge feature list: benefits of wide teamSuccess recipe: product vision, great docs, high-level

Documentation: all figures are generatedCrafting simple didactic examples has taught us a lot

⇒ Executable docs= textbooks of the future

G Varoquaux 14

Page 20: Open Source Scientific Software

3 Nilearn: making multivariate analysis routine

ni

Project scope Very preliminarMachine learning for neuroimaging:make using scikit-learn on neuroimaging easy

The target user base is small

Examples in the docs

Run out of the box,downloading open dataProduce a clear figure

Data from Miyawaki 2008

Routine, simple, reproduction of papers

G Varoquaux 15

Page 21: Open Source Scientific Software

Open source scientific software

It’s worth itDo it right:- Liberal licensing (BSD)- Realistic engineer compromises- Quality and ease of use (the apple strategy)

Work with us on nilearn niExamples = open science

@GaelVaroquaux

Page 22: Open Source Scientific Software

Open source a tragedie 1/f distribution

Source: Fernando Perez