five selfish reasons to work reproducibly

51
Florian Markowetz CRUK Cambridge Institute www.markowetzlab.org 5 selsh reasons to work reproducibly More publications, more grants, more awesome!

Upload: florian-markowetz

Post on 07-Aug-2015

536 views

Category:

Science


1 download

TRANSCRIPT

Florian Markowetz CRUK Cambridge Institute

www.markowetzlab.org

5 sel!sh reasons to work reproducibly

More publications, more grants, more awesome!

Systems Genetics of Cancer

Genetic variation •  In people •  In tumours •  In clones

Phenotypic variation •  Tumour subtypes •  Aggressiveness •  Survival

Cancer genome Evolution

Cancer tissue Context

Cancer genome Function

Ines

Wei Edith

Geoff

Ke Anne Joe

Leon

Andy

Amanda

Science ≠ miracles

“How Bright Promise in Cancer Testing Fell Apart”

New York Times, July 7, 2011

Slid

es b

y Ke

ith B

agge

rly

Slid

es b

y Ke

ith B

agge

rly

Slid

es b

y Ke

ith B

agge

rly

Baggerly & Coombes, AOAS 2009

Slid

es b

y Ke

ith B

agge

rly

http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/

Reproducible Research

•  It’s the right thing to do! •  The world would be a better place if everyone did it!

•  It’s the foundation of Science!

•  It’s the honourable thing to do!

Reproducibility helps to

avoid disaster

Weak Strong Phenotype

Step 1

Step 2

Hits

Knock-down Known pathway members New RNAi Hits

Compare expression phenotypes by NEMs

NFκB

?

Anatomy of the NFκB pathway

What a nice result!

What a nice result!

A project is more than a beautiful result!

Starting with reproducibility early

helps saving time later

Reproducibility helps

writing papers

Why is well-documented and easily accessible code+data useful?

•  Easy to look up numbers and put them in manuscript

•  Be con!dent your !gures and tables are up-to-date •  Numbers and result automatically update when

data change. •  It is engaging and more eyes can look over the

analysis. •  Easier to spot mistakes.

Why is well-documented and easily accessible code+data useful?

•  Easy to look up numbers and put them in manuscript

•  Be con!dent your !gures and tables are up-to-date •  Numbers and result automatically update when

data change. •  It is engaging and more eyes can look over the

analysis. •  Easier to spot mistakes.

Reproducibility helps arguing with reviewers

A very engaged reviewer

•  Reviewer: “I downloaded the authors’ data and tried out a variation of their analysis which gave an insigni!cant result”

•  We: “Thank you, the reason is XXX and

if you do YYY everything is !ne.”

Reproducibility enables

continuity

“I am so busy, I can’t remember all the

details of all my projects”

“I did this analysis 6 months ago.

Of course I can’t remember all the

details any more …”

“My PI said I should continue the project of a previous

postdoc.

But that postdoc is long gone and hasn’t saved any scripts

or data.”

Reproducibility helps to build your

reputation

http

://w

ww

.scie

ncem

ag.o

rg/c

onte

nt/3

48/6

242/

1422

/F1.

larg

e.jp

g

“Mind your own business!

I document my data the way I want!”

“Excel works just !ne.

I don’t need any fancy R or Python or whatever.”

“Sounds alright, but my code and data are spread over so

many hard drives and directories that it would just be too much work to collect

them all in one place”

“My !eld is very competitive

and I can’t risk wasting time”

“We can always sort out the code and data after submission”

“It’s only the result that

matters!”

“I’d rather do real science than tidy up

my data”

5 sel!sh reasons to work reproducibly

1.  Avoid disaster

2.  Easier to write papers

3.  Easier to talk to reviewers

4.  Continuity of your work/in the lab

5.  Reputation

When do you need to worry about reproducibility?

•  Before you start the project • While you do the analysis • When you write the paper • When you co-author a paper • When you review a paper

When do you need to worry about reproducibility?

•  Before you start the project • While you do the analysis • When you write the paper • When you co-author a paper • When you review a paper

Scienti!c SOFT SKILLS

•  Organization of project

•  Tidy data

•  Tidy code

•  Control over tools

•  Documentation

•  Reproducibility

\project            \data            \code            \analysis            \paper  

Less  clicking  and  pasting,  more  scripting  and  coding  

Reproducibility is important for

• Phd students

• Postdocs

• PIs

Learn tools and apply in daily work!

Create a ‘culture of reproducibility’ in your lab!

5 sel!sh reasons to work reproducibly

1.  Avoid disaster

2.  Easier to write papers

3.  Easier to talk to reviewers

4.  Continuity of your work/in the lab

5.  Reputation