graphical display of data and statistics for basic (2)people.musc.edu › ~elg26 › talks ›...

39
GRAPHICAL DISPLAY OF DATA AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer, PhD Professor of Biostatistics, Hollings Cancer Center and Dept. of Public Health Sciences

Upload: others

Post on 07-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

GRAPHICAL DISPLAY OF DATA AND STATISTICS FOR BASIC SCIENCE RESEARCHERSElizabeth Garrett-Mayer, PhDProfessor of Biostatistics, Hollings Cancer Center and Dept. of Public Health Sciences

Page 2: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 3: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 4: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

1. Look at the raw data

Tendency to ‘adjust’ or ‘normalize’ prior to seeing what the data say on their own.

Why is this important? Detection of outliers Helps to determine patterns in the data You need to know your data before you analyze it.

Do not get ahead of yourself: the analytic tools you use depend on what the data ‘look like.’

Page 5: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Example:

-5000.00

0.00

5000.00

10000.00

15000.00

20000.00

25000.00

Inte

rfer

on-g

amm

a re

leas

e (p

g/m

l)

no restimulation

restimulation

Page 6: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Example

Page 7: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 8: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Graphical displays lead us to the correct analysis approach

0 20 40 60 80

Time (days)

Vol

ume

030

100

300

600

1000

2000

3000

4000 High Cis+GRHigh CisLow Cis+GRLow CisGRControl

GR is a thromboxane receptor antagonist

Page 9: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

0 20 40 60 80

Time from Injection (days)

Tum

or V

olum

e

High Cis+GRHigh CisLow Cis+GRLow CisGRControl

030

100

300

600

1000

2000

3000

4000

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Pro

porti

on T

umor

-Fre

e

Time to Tumor (Days)

High Cis+GRHigh CisLow Cis+GRLow CisGRnoGR

0 10 20 30 40

Days from Tumor Onset

Vol

ume

0

30

100

300

600

1000

2000

High Cis+GRHigh CisLow Cis+GRLow CisGRnoGR

Page 10: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 11: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Natural History of PSA: N=39

0 50 100 150 200

Time On Study (months)

PS

A

0.01

0.1

0.5

25

1050

200

Page 12: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Three treatment clinical trial. N=1000.

Treatment Group

PSA

Doce, Sched 1 Doce, Sched 2 Mitox

0.1

1

10

100

1000

10000

Treatment Group

PSA

Doce, Sched 1 Doce, Sched 2 Mitox

0.1

1

10

100

1000

10000

Treatment Group

PSA

Doce, Sched 1 Doce, Sched 2 Mitox

0.1

1

10

100

1000

10000

Page 13: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

But, N=10? Or N=5? Show the data!

Page 14: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 15: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Fold-Changes

Fold-change should usually be displayed on the log-scale

Fold-change should be ANALYZED on the log scale

Tumor size, PSA, etc.

Page 16: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 17: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Make the inferences simpler by being consistent Example: Why?

Data are means ± SD (n=8) Data are means ± SE (n=4)

Figures: Same groups or conditions should be the same across figures Symbols Colors Lines

Analysis: t-test in some; rank sum test in others? Picking the best p-value? If there is a reason for INconsistency, you should explain it.

Page 18: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 19: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Paired data

-5000.00

0.00

5000.00

10000.00

15000.00

20000.00

25000.00

Inte

rfer

on-g

amm

a re

leas

e (p

g/m

l)

no restimulation

restimulation

Page 20: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Longitudinal Data

0 20 40 60 80

Time (days)

Vol

ume

030

100

300

600

1000

2000

3000

4000 High Cis+GRHigh CisLow Cis+GRLow CisGRControl

Page 21: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Longitudinal Experiment

Page 22: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 23: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

They are usually real

Check the data entry But, just because a mouse

or experiment did not show consistent results, you cannot and should not remove it.

“Results shown are based on 6 representative mice”…RED FLAG.

Page 24: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Statistical assumptions?

Sometimes outliers create skewness or cause other problems for statistical analysis

Making the data fit the analytic approach is not correct.

Find an analytic approach that is valid for your data

Examples: Non-parametric tests transformations

Page 25: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 26: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Don’t overinterpret

The population? One donor….representative of the population? Same with cell lines Who can you generalize to, and therefore, what does

the p-value or confidence interval mean? Sometimes the p-value really ends up just testing the

precision of your assay!

Page 27: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

P-values

The role of p-values P-values were never intended to be the ‘last’ line of

defense (Regina Nuzzo. Scientific method: statistical errors. Nature, 12 February 2014; 506: 150-52)

“P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.”

P-values are based on statistical tests The tests have assumptions and assume a specific

experimental design E.g. paired vs. two-sample t-tests

Page 28: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

P-values

Never interpret a p-value on its own The p-value from a two-sample t-test depends on

The variance in the groups being compared The sample size in each group The difference in the means in each group

Many insignificant p-values accompany highly meaningful effect sizes. Why?

Page 29: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Incorrect implementation of t-testCD62L Low

CD62L High

CD62L Low

CD62L High

CD62L Low CD62L High

Donor 1 1 1.53 1.00 2.49 1.00 2.06Donor 2 1 1.87 1.00 2.52 1.00 1.87

CD62L Low

CD62L High

CD62L Low

CD62L High

CD62L Low CD62L High

Average 1 1.69966 12.504847 11.9637952

6

SD 0 0.24416 00.024433 00.1306806

9

p value 0.055839 0.0001320.0090674

4

0

0.5

1

1.5

2

2.5

CD62LLow

CD62LHigh

0

0.5

1

1.5

2

2.5

3

CD62LLow

CD62LHigh

0

0.5

1

1.5

2

2.5

CD62LLow

CD62LHigh

Page 30: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Multiplicity

When you use α= 0.05 (i.e., p<0.05) as a threshold, you have a 5% chance of making an error.

Multiple experiments: In a paper with 20 figures (including sub-figures) and 5 groups,

that means 200 p-values for all comparisons. By chance alone (meaning, if there are NO associations at all),

you would expect 10 significant p-values Multiple markers:

If you have a panel of 200 markers, you expect at least 10 to be significant by chance alone.

High-throughput setting: If you have 60,000 genes, you would expect 3000 false

positives.

Page 31: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 32: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Captions

Unlike other areas of medical research, in basic science a lot goes into the captions Statistical methods, p-values, experimental design Figures are multi-paneled (sometimes more than 8

displays in one figure). If you haven’t been clear in the (statistical) methods

section about your analysis approach, you need to in the caption

Figures should speak for themselves (i.e., reader should not have to reference the text).

Page 33: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Legends

When possible, put clarifying information in legends within the figure.

Makes interpretation simpler than sifting through all of the information in the caption.

The "Acid Test" for Tables and Figures: Any Table or Figure you present must be sufficiently clear, well-labeled, and described by its legend to be understood by your intended audience without reading the results section, i.e., it must be able to stand alone and be interpretable. Overly complicated Figures or Tables may be difficult to understand in or out of context, so strive for simplicity whenever possible. If you are unsure whether your tables or figures meet these criteria, give them to a fellow [scientist] and ask them to interpret your results.*

*http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html

Page 34: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff

Page 35: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Avoid 3-D unless you have ‘truly’ 3-D results

Page 36: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

10 Principles of Display of Data

1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to

the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuffBonus Principle: Contact your statistician early and often!

Page 37: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Get a statistical colleague onboard

Engage a statistician. But, “there is no such thing as a free lunch” For support with analyses for grants, they should be

included as collaborator/consultant/co-investigator Resources:

Hollings Cancer Center: Biostatistics Shared Resource CTSA: Biostatistics, Epidemiology & Research Design

Services.

Page 38: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

References/Resources

Karl Broman’s top ten worst graphs: http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

Reporting statistical results in your paper and figures:http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWstats.html

Open courseware: JHU statistics for laboratory scientists coursehttp://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticsLaboratoryScientistsI/coursePage/index/

ARRIVE guidelines: http://www.nc3rs.org.uk/page.asp?id=1357http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000412http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001756

Page 39: Graphical Display of Data and Statistics for Basic (2)people.musc.edu › ~elg26 › talks › Graphical Display of...AND STATISTICS FOR BASIC SCIENCE RESEARCHERS Elizabeth Garrett-Mayer,

Acknowledgements: Thanks to my colleagues

Examples are drawn from work from the labs of the following MUSC investigators Chris Voelkel-Johnson Shikhar Mehrotra Mark Rubinstein Omar Moussa/Dennis Watson

Clinical data is based on work done in collaboration with Mario Eisenberger (Johns Hopkins)