practice what you preach - cbiovikings.org

31
Practice What You Preach Reproducible Research at the Front Lines of Science Kai Blin Novo Nordisk Foundation Center for Biosustainability Tools for Reproducability in Bioinformatics 2016-04-21

Upload: truonghanh

Post on 31-Dec-2016

239 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Practice What You Preach - cbiovikings.org

Practice What You PreachReproducible Research at the Front Lines of Science

Kai BlinNovo Nordisk Foundation Center for Biosustainability

Tools for Reproducability in Bioinformatics 2016-04-21

Page 2: Practice What You Preach - cbiovikings.org

HejWho is that guy?

More info:

Scientific Software Engineering

Microbiology

antiSMASH

CRISPy

·

·

·

·

phdops.kblin.org

github.com/kblin

@kaiblin

orcid.org/0000-0003-3764-6051

·

·

·

·

orcid.org/0000-0003-3764-6051

2/31

Page 3: Practice What You Preach - cbiovikings.org

Overview

Reproducible Research

Reproducible Research in Practice

·

·

Problems-

3/31

Page 4: Practice What You Preach - cbiovikings.org

Overview

Reproducible Research

Reproducible Research in Practice

·

·

Challenges-

4/31

Page 5: Practice What You Preach - cbiovikings.org

Overview

Reproducible Research

Reproducible Research in Practice

What does actually work (for me)

·

·

Challenges-

·

5/31

Page 6: Practice What You Preach - cbiovikings.org

There is no such thing as "reproducible science", there is just"science" and "not science". – Someone on Twitter

Reproducible Research

Page 7: Practice What You Preach - cbiovikings.org

Reproducibility

Ideally:

Lab notebook available

Data on Figshare

Code on GitHub/Bitbucket (and Figshare)

Preprint on a preprint server

·

·

·

·

7/31

Page 8: Practice What You Preach - cbiovikings.org

Reproducibility

Reviewers check and reproduce results

Fame and glory (and grants)

·

·

8/31

Page 9: Practice What You Preach - cbiovikings.org

In theory, there is no difference between theory and practice.But, in practice, there is. – Jan L. A. van de Snepscheut

Reproducible Research inPractice

Page 10: Practice What You Preach - cbiovikings.org

ChallengesReproducibility isn't free

Making sure your research is 100% reproducible is a lot of work.

This takes time and effort. (see Reproducibility isn't free by FitzJohnet al.)

Even if you are convinced, is your PI / supervisor? Their boss?

·

·

·

10/31

Page 11: Practice What You Preach - cbiovikings.org

ChallengesReproducibility isn't compelling

Nice post by Greg Wilson in the context of Software Carpentry.

~ 5 mio articles published between 1990-2000

Of these ~ 100 retracted for "computational irreproducibility"

Chances that your paper is retracted: 1 : 5 000 000

Assuming ~ 8 months to write paper and 48 hour work week, canspend 115 seconds on reproducibility

·

·

·

·

11/31

Page 12: Practice What You Preach - cbiovikings.org

ChallengesChicken and egg

Reviewers don't ask for it

Researchers don't provide it

Catching this at publication stage is too late

·

·

·

12/31

Page 13: Practice What You Preach - cbiovikings.org

ChallengesIt's not a reflex yet

13/31

Page 14: Practice What You Preach - cbiovikings.org

But…

Some points raised by C. Titus Brown

start small

any reproducibility is better than no reproducibility

·

provide raw data

provide code

provide what version of what tool you called with whatparameters

-

-

-

·

14/31

Page 15: Practice What You Preach - cbiovikings.org

Reproducible Research != Open Science

Disclaimer: I like Open Science

Can work reproducibly even in closed science

Maybe easier to get buy-in from senior scientists for RR

Many selfish reasons to do RR

·

·

·

15/31

Page 16: Practice What You Preach - cbiovikings.org

It works for me. – Christopher Walken

What works (for me)

Page 17: Practice What You Preach - cbiovikings.org

What Works (for me)

Reduce special cases

Document remaining special cases

less surprises == better work

learn from past mistakes

·

·

·

·

17/31

Page 18: Practice What You Preach - cbiovikings.org

Directory Layout

As regular as possible, at least per project type.

2016-04-21_e.xamplis_de_novo/ |-- fastqc/ |-- post/ \-- pre/ |-- output/ |-- reads/ |-- scripts/ |-- trimmed/ |-- Makefile \-- README.md

18/31

Page 19: Practice What You Preach - cbiovikings.org

Directory Layout

Use a script to create project layout

#!/bin/bash mkdir -p reads fastqc/{pre,post} output scripts trimmed touch README.md Makefile git init cat >> .gitignore <<EOF reads/*gz fastqc/* *.swp EOF git add .gitignore git commit -m "Initial commit"

19/31

Page 20: Practice What You Preach - cbiovikings.org

Lab Notebook

README.md file per project

copy & paste README.md into ELN

Keep in git

·

Have an explanation what the project is about for future self

Then add whatever you're doing to it as it happens

-

-

·

·

20/31

Page 21: Practice What You Preach - cbiovikings.org

gitall the things! (almost)

Whenever possible, keep stuff in git

README.md, Makefile, scripts, etc.

Commit when something is changed

Don't commit reads / generated data

·

·

·

·

(Maybe use git-annex for large files)·

21/31

Page 22: Practice What You Preach - cbiovikings.org

gitpart deux

Have some git repo for random stuff unrelated to projects

Make small commits of logical units of change

Write good commit messages

·

Avoid "where did I put this script?"-

·

·

22/31

Page 23: Practice What You Preach - cbiovikings.org

gitcommit messages

I suggest the following format:

Short summary line < 60 characters A longer explanation of what the change is about. This is what your'll read to figure out what this change is about. Make this count, your future self will thank you.

23/31

Page 24: Practice What You Preach - cbiovikings.org

Don't Run Commands Manually

Every project has a scritps directory or Makefile.

Data manipulation is driven from there

Also applies to (most) other things I do

·

·

·

If more elaborated than ls, put it in a script

Not perfect yet, but getting there

-

-

24/31

Page 25: Practice What You Preach - cbiovikings.org

Software Management

Package software

Use fpm & aptly / createrepo

Install locally or put into Docker containers

·

·

·

25/31

Page 26: Practice What You Preach - cbiovikings.org

Docker

BUT

Great to deal with software with many dependencies·

Clumsy for CLI tools·

26/31

Page 27: Practice What You Preach - cbiovikings.org

Docker CLI example

#!/bin/bash readonly INPUT_FILE=$(basename $1) readonly INPUT_DIR=$(dirname $(readlink -f $1)); shift readonly OUTPUT_DIR=$(readlink -f $1); shift readonly CONTAINER_SRC_DIR="/input" readonly CONTAINER_DST_DIR="/output" if [ ! -d ${OUTPUT_DIR} ]; then mkdir ${OUTPUT_DIR} fi docker run \ --volume ${INPUT_DIR}:${CONTAINER_SRC_DIR}:ro \ --volume ${OUTPUT_DIR}:${CONTAINER_DST_DIR}:rw \ --detach=false --rm --user=$(id -u):$(id -g) \ antismash/standalone ${INPUT_FILE} $@

27/31

Page 28: Practice What You Preach - cbiovikings.org

Workflow Management Systems

Good idea for repetitive workflows

Check what your colleagues are using

Maybe use Common Workflow Language (CWL)

·

·

·

28/31

Page 29: Practice What You Preach - cbiovikings.org

Workflow Management Systems

Keep track of inputs and parameters

Easily run a workflow 5, 10, 100 times

Lots of overhead for one-off analyses

·

·

·

29/31

Page 30: Practice What You Preach - cbiovikings.org

Further reading

Git can facilitate greater reproducibility and increased transparencyin science

Reproducible research is still a challenge

Some myths of reproducible computational research

·

·

·

30/31

Page 31: Practice What You Preach - cbiovikings.org

Thanks for your attention.

Questions?