a web-based personal assistant for designing statistically...

30
Department of Statistics A web-based personal assistant for designing statistically sound experiments A web-based personal assistant for designing statistically sound experiments 9 July 2015 Kathy Ruggiero and David Banks

Upload: others

Post on 10-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Department of Statistics

A web-based personal assistant for designing statistically sound

experiments

A web-based personal assistant for designing statistically sound

experiments

9 July 2015

Kathy Ruggiero and David Banks

Page 2: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

All too often…

Page 3: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Inevitably…

• Wrong data is collected– Cannot answer the questions for which they

were intended

OR

• Data is inefficiently collected– Comparisons between treatment means made

with lower precision than may otherwise have been possible

Page 4: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The solution“Use statistically designed experiments to ensure that the “right answers” are found with minimum

effort, subjects and other resources.”

• Three fundamental principles (Fisher, 1926):– Replication, to enable separation of signal and noise– Randomisation, to eliminate bias and induce

independence– Blocking, to control for systematic nuisance sources of

variation

Page 5: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The problem• Many software packages with experimental

design capabilities but…– User must know a priori the “layout” of the design, e.g. Completely randomised design Randomized complete block design Balanced incomplete block design Row-column design Split-plot design Etc., etc., etc.!

• Large body of literature, largely inaccessible to novices

Page 6: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Our idea“An expert system with the embedded requisite

knowledge which interactively guides usersthrough the thinking needed to develop an efficient

statistically designed experiment.”

Page 7: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

ConceptionC2D: Concept 2 Design

Objectives

Measureable responses

Experimental factors

Experimental material

Nuisance variables

Generalisability

Finish

PDF Summary

Design of Experiment

Start

Research questions

Page 8: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Example: Simulating a high CO2 world

Experiment: Simulate a high CO2 world

Identify/quantify metabolite, protein and gene transcript abundances

Goal: Elucidate molecular mechanisms involved in calcification

Page 9: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

How were samples obtained?

http://uncw.edu/aquaculture/images/Larval-rearing-tanks.jpg

Divide into 3 cultures

Mid = 540 ppmHigh = 1000 ppm

Control = 380ppm (current level) Until 4-arm stage

http://www.ceoe.udel.edu/Antarctica/pluticon.jpg

Male:Female pair

Page 10: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

How were samples obtained?

8 Male:Female pairs

We will focus on proteomics and assume one sample is taken from each tank.

Page 11: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Front matter

Page 12: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The most important question?

Why am I collecting this data?

How will each variable you measure enable you to answer your research question(s)?

Page 13: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Why this experiment?Why is this information being collected?

To direct you towards writing focused statements about the investigative questions you want your experiment to answer.

Page 14: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Why is this information being collected?To direct you towards thinking about the variable(s) you will measure

on the subjects of your experiment.

What will you measure?

Page 15: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Factors you will deliberately vary?Why is this information being collected?

To direct you towards thinking about the experimental factors you will study in your experiment for the purpose of learning how they may help to explain observed changes in your measureable responses.

Page 16: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Factors you will deliberately vary?

Page 17: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Factors you will deliberately vary?

Page 18: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The subjects of your experiment?Why is this information being collected?

To direct you towards thinking about:1. Who or what are the subjects of your experiment?2. How will you apply the experimental treatments to them?3. How will they be managed for making measurements on them?

Page 19: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The subjects of your experiment?

Page 20: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?Why is this information being collected?

To direct you towards thinking about:1. The source of each portion of experimental material. I.e. is each

derived from:a. an independent source?b. a common source?c. a combination of both independent and common sources?

2. How the experimental material will be:a. processed and/orb. arranged in time and/or spacein order to make measurements on them.

3. How 1 and 2 may contribute to systematic differences in the observed values on your measureable variables.

Page 21: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?

Group of subjects = culture or tank3 per parent-pair

Male:Female parent-pair

Common source:A male:female pair

Page 22: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?How will the experimental material will be:

a. processed and/orb. arranged in time and/or space

in order to make measurements on them?

8 samples can be simultaneously analysed in a single mass spec run (ignoring isobaric tags used to label samples)

Yes

Can each run accommodate a sample from each male:female pair?

Page 23: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?How will the experimental material will be:

a. processed and/orb. arranged in time and/or space

in order to make measurements on them?

Male:Female Pair1 2 3 4 5 6 7 8

Run

1

2

3

Page 24: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?Cells with same fill pattern = Samples analysed in a runCells with same fill colour = Samples from common source

Male:Female Pair1 2 3 4 5 6 7 8

Run

1

2

3

Every cell colour (M:F pair) occurs with every pattern (run)

Row-Column design

Page 25: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Other experimental variables?

Male:Female Pair1 2 3 4 5 6 7 8

Run

1 380 540 380 1000 540 380 540

2 540 380 1000 540 380 1000 540 1000

3 1000 540 380 1000 540 380 1000 380

Allocation of CO2 levels to row-column arrangementFull complement of CO2 levels in columns

CO2 levels balanced within runsBiological variation orthogonal to run-to-run variation

Page 26: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Our design versus theirs?

• 3 runs• 12 residual d.f.• 98.4% efficiency• CO2 levels estimated

independently of tags

• 4 runs (extra $4500 )• 14 residual d.f.• 100% efficiency• CO2 levels partially

confounded with tags

Design from ConceptionC2D

Their design

Page 27: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Output

• Summary of input (PDF)

Page 28: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

Output

• Statistical design (CSV)

Run Pair CO21 1 3801 2 5401 3 10001 4 3801 5 540

Page 29: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

What’s next

• Beta test it– Is it accessible and easy-to-use?Want to try it out? Email me: [email protected]

• Make it “smart”– An adaptive expert system which learns from

its interactions with end-users• Make it available on smart devices

Page 30: A web-based personal assistant for designing statistically ...bioinformatics.org.au/ws/wp-content/uploads/sites/... · Kathy Ruggiero k.ruggiero@auckland.ac.nz The problem • Many

Kathy Ruggiero [email protected]

The End

Thank you!