csiro digital productivity and services...

30
Statistical experimental design principles for biological studies Alexander (Alec) Zwart 10 July 2014 CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP

Upload: others

Post on 25-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Statistical experimental design principles for biological studiesAlexander (Alec) Zwart10 July 2014

CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP

Page 2: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

An apology, and a caveat

•Apology: I am not a genetics/genomics expert!•But I DO know a bit about design…

• Caveat: Experimental design is a science! (as is statistical analysis…)•Every experiment is different•This talk is necessarily simplistic, and veryincomplete…

• Intended as a reminder of some concepts, some bits of advice, and a plea…

Presentation title  |  Presenter name2 |

Page 3: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

An experiment – recap:

• Apply treatments (experimental conditions) or treatment combinations to subjects (or experimental units).

• Carefully control/remove other influences as much as possible.

• Observe one or more responses (observed or measured results)

• A way to infer cause => effect.

• Genotype/Species/Variety is a treatment?  Yes...

Presentation title  |  Presenter name3 |

Page 4: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

What is statistical experimental design concerned with?• Everything!  The entire process leading to the final dataset…

• Recognising sources of variation (systematic, random) that influence your results and hence controlling for them

•How to ensure that we obtain data that provides ‘honest’ and accurate answers to our questions

Presentation title  |  Presenter name4 |

Page 5: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Key point:

•Good statistics cannot rescue a bad dataset!

•And a bad dataset may not be able to answer your questions (honestly, that is)

•Do the experiment right => get a good dataset

• Consider design from the very beginning of planning the experiment

Presentation title  |  Presenter name5 |

Page 6: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Fundamentals

Randomisation

Blocking

Replication

Presentation title  |  Presenter name6 |

Page 7: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Randomisation

•Allocate treatments to subjects randomly(ideally using a proper random number generator, not your own personal judgement)

•Also randomise:•Spatial layout•Order of processing

•Avoid systematic patterns (But: see Blocking)

•Protects against unforeseen biases

ED for Biology |  Alexander Zwart7 |

Page 8: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Blocking

• Identify groupings that may effect your results•Choice of operator/technician/machine•Cage/Table/shelf/assay plate/batch

•Try to assign (ideally) one of each treatment to each group! (Randomised Block Design or RBD)•(Example coming later…)

ED for Biology |  Alexander Zwart8 |

Page 9: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Replication

• Sample size•G*Power (http://www.gpower.hhu.de/en.html)•Russ Lenth’s applets (http://homepage.stat.uiowa.edu/~rlenth/Power/)

•Simulation studies in the literature? (Google scholar)

• Biological/internal/technical replicates•Don’t confuse these – Internal/technical replicates are not a substitute for biological replication.

• (more later…)

ED for Biology |  Alexander Zwart9 |

Page 10: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

12 mice, three treatments (Colour)

Presentation title  |  Presenter name10 |

Completely randomised designBalanced (equal info on each treatment,

treatments compared with equal accuracy)

Page 11: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

But, the mice must be housed…

Presentation title  |  Presenter name11 |

Here, treatments are confounded with cages

Cannot separate Cage effects from Trt effects!

Page 12: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Cages as Blocks

Presentation title  |  Presenter name12 |

Page 13: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Grr

Presentation title  |  Presenter name13 |

Animals (especially) can pose a problem…

Grrr!

Grrr!GrrEeeek!

Page 14: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Cages as Experimental Units / Subjects…

Presentation title  |  Presenter name14 |

…& Cage = Mouse

Page 15: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

(Aside‐ env. trend across columns of cages?

Presentation title  |  Presenter name15 |

=> Block by Columns!

Page 16: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Not enough cages? A compromise:

But, Cages are the Experimental unitsOnly two replicates of each Trt!

Page 17: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

More on replication:

• The replication used determines the population(s) being sampled...

...hence…

• The replication used determines the conclusions that may be validly drawn from a statistical test

ED for Biology |  Alexander Zwart17 |

Page 18: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Comparison between populations of plants

ED for Biology |  Alexander Zwart18 |

Mutant

2.1 1.4 1.9 1.6 4.3 4.44.13.9

Wild Type

Page 19: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Comparisons between two plants

ED for Biology |  Alexander Zwart19 |

Mutant

4.3 4.44.13.9

Wild Type

2.1 1.4 1.9 1.6

Page 20: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Recording

•Details of treatments and associated randomisation, blocking, types of replication should always be recorded

• Check what information is required for the analysis, or talk to the statistician about what is required.

• This is important meta data when distributing your data…

ED for Biology |  Alexander Zwart20 |

Page 21: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

More on Blocking…

• ‘One of each treatment per block’ is not always feasible – blocks may be too small• ‘Incomplete blocks’

• Think about:•Balance – treatments occur together in the same blocks the same number of times.

•Linkage – common treatments in different blocks provide links between blocks – ensure you have plenty of linkage.

ED for Biology |  Alexander Zwart21 |

Page 22: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Balanced Incomplete Block Design (BIBD)

- Trts occur together the same number of times- Plenty of treatment-linkages between blocks

Page 23: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Notes on incomplete blocks

• BIBDs are only availble for certain combinations of parameters.

•However, the general principles are still relevant• ‘Unbalanced incomplete blocks’

•Note that teasing apart the treatment effects from block effects in unbalanced situations is dependent on the capabilities of the analysis method•Check on said capabilities, consider simpler experiment if necessary…

ED for Biology |  Alexander Zwart23 |

Page 24: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Treatment structure

• Sir Ronald Fisher => factorial treatment structure

(3x4 factorial)

• 5 = Number of replicates of each treatment

ED for Biology |  Alexander Zwart24 |

Page 25: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Factorial treatment structure

• Separate main effects and interactions• Can include more factors (in theory)•Unbalanced numbers of treatments usually OK:

ED for Biology |  Alexander Zwart25 |

Page 26: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

But beware missing treatment combinations

• The more that entire treatments are missing from the factorial, the harder it is to tease apart main effects and interactions and the more confounded different treatment effects become.

ED for Biology |  Alexander Zwart26 |

Page 27: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Don’t get too ambitious!

• Two‐way factorials are much easier to interpret than three‐way, four way… etc.

• The higher the factorial structure, the more likely the structure will be incomplete • (missing values, treatment combos which don’t make sense, treatment combos you aren’t interested in and don’t include).

• Statisticians and their tools can’t work miracles!• Simpler is better…• Talk to a statistician!

ED for Biology |  Alexander Zwart27 |

Page 28: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Plea:

• If you don’t already know how to design and conduct good experiments…

…then…

•…involve a statistician in the early stages of planning your experiment!

•And note – the statistician may need time…

Presentation title  |  Presenter name28 |

Page 29: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

Sir Ronald Fisher, ‘the father of modern statistics’…

“To consult the statistician after an experiment is finished is often merely to ask them to conduct a post mortem examination. 

They can perhaps say what the experiment died of.”

(R. Fisher)

Presentation title  |  Presenter name29 |

Page 30: CSIRO DIGITAL PRODUCTIVITY AND SERVICES …bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/...CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP Thank you Title Microsoft PowerPoint

CSIRO DIGITAL PRODUCTIVITY AND SERVICES FLAGSHIP

Thank you