copenhagen 2008

How to improve the chance of getting your manuscript

accepted for publication

Jonas Ranstam PhD

Anecdotal evidence

(Case reports)

Evidence basedmedicine

(The Cochrane collaboration 1993)

Cohort study of smoking and lung cancer (1954)(Bradford Hill)

Case-control study of smoking and lung cancer (1950)(Bradford Hill)

Randomised clinical trial of streptomycin and tubercolosis (1948)(Bradford Hill)

EU directive (2001)

ICH GCP (1996)

CONSORT (1996)

WHO CIOMS (1993)

ICMJE Uniform Requirements (1978)

Helsinki declaration (1964)

Nürnberg convention (1949)

Trial registration (2005)

Mandatory disclosure of trial results (2008)

Plan

1. Methodological background

2. General guidelines

3. Special recommendationsa) case reportsb) mechanical experimentsc) in vitro/cadaver experimentsd) cross-sectional studiese) epidemiological studiesf) randomized trials

4. Summary

1. Methodological background

What is statistics used for?

1. Describing data (statistics in the plural)

2. Interpreting uncertain data (statistics in the singular)

Two kinds of uncertainty

1. Uncertainty of measurement

2. Uncertainty of sampling

1. Uncertainty of measurement

The precision of the used measurement instrument.

The precision of the Finapres non-invasive blood pressure monitor is on the average 12.1 mm Hg.

2. Uncertainty of sampling

Individual effects vary between subjects. Different samples of subjects yield different observed mean effects.

Example

Assume that the cumulative 10-year revision rate of the Oxford knee prosthesis is 8% and that two groups of 100 patients receiving the prosthesis are randomly selected and followed over time.

The two groups are likely to get different numbers of patients revised during follow up.

375 randomly ordered patients of which 30 (8%) will be revised within 10 years

6% revised

12% revised

Sampling uncertainty

6% revised

12% revised

H0: The two samples represent the same population

H1: The two samples represent different populations

P-value

The probability that an observed effect only reflects sampling uncertainty.

12/100 vs. 6/100, Fisher's exact test p = 0.22

P-values are often misunderstood

They cannot

- describe clinical relevance (they depend on sample size)

- show that a difference “does not exist”, because n.s. is absence of evidence, not evidence of absence

Confidence interval

A range of values, which with the specified confidence level describes how likely it is that the estimated population parameter is included.

12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6)

1 Relative Risk2 1/2

Confidence interval

A range of values, which with the specified confidence level describes how likely it is that the estimated population parameter is included.

12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6)

1 Relative Risk2 1/2

p < 0.05

n.s.

Important assumptions

Many statistical methods like the Student's t-test and ANOVA are based on the assumption of Gaussian distribution and homogeneous variance.


Many statistical methods like the Student's t-test and ANOVA are based on the assumption of Gaussian distribution and homogeneous variance.

If the assumptions are not met, use alternative (non-parametric) methods, like the Mann-Whitney U-test or Kruskal-Wallis non-parametric anova).


Most conventional methods (both parametric and non-parametric) require independent observations.



- Patients are independent

- Patients' knees, hips, shoulders, feet, etc. are not

Copyright ©1995 BMJ Publishing Group Ltd.

Bland, J M. et al. BMJ 1995;310:446

pH against PaCO2 for eight subjects, with parallel lines fitted for each subject

Incorrect analysis: r = -0.51, p < 0.001Correct analysis: r = -0.07, p = 0.7

How Many Patients? How Many Limbs? Analysis of Patients or Limbs in the Orthopaedic Literature: A Systematic Review

Bryant et al. JBJS Am. 2006;88:41-45.

Our findings suggest that a high proportion (42%) of clinical studies in high-impact-factor orthopaedic journals involve the inappropriate use of multiple observations from single individuals, potentially biasing results. Orthopaedic researchers should attend to this issue when reporting results.



Include only one observation per patient, or use a statistical method that can handle dependant data, e.g. multilevel or mixed effects models.

Always present both number of observations andpatients.

Multiplicity

In contrast to many other forms of precision, statistical precision depends on the number of performed measurements (significance tests).

Multiplicity

Each significance test at a 5% significance level has 5% risk of a false positive test.

Repeated testing increases the risk of at least one false positive test.

Number of tests Risk of at least one false positive

1 0.05 2 0.10

5 0.23 10 0.40

Example 1 (Subgroups, two tests)

Example 2 (Repeated testing,five tests)

Example 3 (Liver function, 10 tests)

Example 4 (Scores, 135 tests)

Multiplicity

Common in exploratory analyses

Unacceptable in confirmatory analyses

2. General guidelines

Statistical Methods

“Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.”

Statistical Methods

“Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.”

Required for analytical methods (statistical models, hypothesis tests, confidence intervals).

Descriptions are often unclear, vague or ambiguous. They need to be clear and detailed.

Results

“When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).”

Results

“When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).”

Statistical precision (p-values and confidence inter-vals) are necessary for generalization of results beyond examined patients.

Results

“Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.”

Results

“Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.”

Describe both your observations and how you interpret them (use confidence intervals or p-values).

Clinically Statistically significant significant yes no

yes a b no c d

There was, or was no, (statistically significant) difference is too simplistic

Example

Two side effects with a new osteoporosis treatment:

- A statistically significant reduction in body hairgrowth rate by 5% (p = 0.04)

- A statistically insignificant increase in systolic blood pressure by 25 mmHg (p = 0.06)

Confidence intervals are better than p-values

In contrast to p-values they do

- relate to clinical significance

- show when a difference “does not exist”

because they present lower and upper limits ofpotential clinical effects/differences

0Effect

Clinically significant effects

Statistically and clinically significant effect

Statistically, but not necessarily clinically, significant effect

Inconclusive

Neither statistically nor clinically significant effect

Statistically significant reversed effect

p < 0.05

p < 0.05

n.s.

n.s.

p < 0.05

P-values Conclusion from confidence intervals

[2 alternatives] [6 alternatives]

P-value and confidence interval

Statistically but not clinically significant effectp < 0.05

When there is a difference in data

Do not write that there is not a difference!

There were indeed differences, they are 0.45 and 0.57

There were indeed differences, they are 0.45 and 0.57

Better alternative:

“The observed differences in extraction torques between the two types of uncoated distal pins can be explained by chance.”

Avoid non-technical use of technical terms and use clear expressions

- significant clinically or statistically?

- no difference statistically insignificant?

- statistical difference statistically significant?

- matched selected or just comparable?

- correlation relation, regression?

- normal Gaussian distribution?

- random mathematical algorithm?

- etc.

3. Special recommendations

a) case reports

Case reports can be used for

- Generation of new hypotheses

- Showing inconsistencies in established “facts”

Case reports may need statistics (in the plural sense)

- Summary description of characteristics

- Description of change or variation over time

Case reports cannot be used for

- Generalizing findings like risk or treatment effect

(This requires statistics in the singular sense)

b) mechanical experiments

Mechanical experiments

What do p-values and confidence intervals relate to?

- Measurement uncertainty (Perhaps)

- Sampling uncertainty (No, there is no information on subject variation. The findings cannot be generalized beyond the device).

c) in vitro/cadaver experiments

In vitro/cadaver experiments

What do p-values and confidence intervals relate to?

- Measurement uncertainty (Perhaps)

- Sampling uncertainty (Perhaps, if the observations provide information on variation between subjects)

Example

In a study with 60 observations 20 specimens had been taken from each of 3 subjects.

The specimens were distributed randomly between one control group and one experimental group.

What do significance tests of these two groups tell us?

d) cross-sectional studies

Remember

- Sampling frame

- Target population

Super (for scientific questions)

Finite (requires corrections)

- Non-responders

e) epidemiological studies

Epidemiological studies

- Exploratory, hypothesis generating, multiplicity issues considered less important than validity issues

- External validity (source of subjects)

- Internal validity (confounding)

Results

Uniform Requirements: “Where scientifically appropriate, analyses of the data by variables such as age and sex should be included.”

Results

Uniform Requirements: “Where scientifically appropriate, analyses of the data by variables such as age and sex should be included.”

Observational studies require adjustment for known and suspected confounding factors to produce valid effect estimates.

This adjustment is usually performed using statistical modelling (e.g. ANCOVA or regression analysis). The purpose is to increase validity.

Results

Automatic stepwise regression (forward or backward) is not an adequate method for confounding adjustment.

f) randomized trials

Clinical trials

“The ICMJE member journals will require, as a condition of consideration for publication in their journals, registration in a public trials registry.”

“The ICMJE recommends that journals publish the trial registration number at the end of the Abstract.”

Clinical trials

“When reporting experiments on human subjects, authors should indicate whether the procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2000 (5).”

WORLD MEDICAL ASSOCIATION DECLARATION OF HELSINKI

Ethical Principles for Medical Research Involving Human Subjects

27. ...Reports of experimentation not in accordance with the principles laid down in this Declaration

should not be accepted for publication.

Purpose of a randomized trial

To test a hypothesis with control of random and systematic errors.

- No bias (randomization & blinding)

- No multiplicity problems

Randomization

Mathematical algorithm

Stratified

Concealment of outcome

Reproducible

Study populations

Intention-to-treat Analyze all randomized subjects(ITT) principle according to planned treatment

regimen.

Full analysis set The set of subjects that is as close(FAS) as possible to the ideal implied by

the ITT-principle.

Per protocol The set of subjects who complied(PP) set with the protocol sufficiently to ensure

that they are likely to exhibit the effects of treatment according to the

underlying scientific model.

FAS vs. PP-set

FAS + no selection bias- misclassification problem (effect dilution)

PP-set + no contamination problem- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the sameconclusions, confidence in the trial is supported.

Endpoints

Primary The variable capable of providing themost clinically relevant evidencedirectly related to the primary objectiveof the trial

Secondary Either measurements supporting theprimary endpoint or effects related to

secondary objectives

Statistical analyses

Confirmatory The result concerns a primary endpoint and the p-value or confidence interval

accounts for potential multiplicity.

The result can support a claim of superiority, equivalence or non-

inferiority.

Exploratory All other analyses.

The result is either supporting or explanatory, or simply just a new hypothesis.

Reporting

“For reports of randomized controlled trials authors should refer to the CONSORT statement.”

Include with the manuscript

Study Protocol

Statistical Analysis Plan

Clinical trialsInternational regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates- missing data- multiplicity issues- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

4. Summary

The responsibilities of a statistical reviewer

“To make sure that the authors spell out for the reader the limitations imposed upon the conclusions by the design of the study, the collection of data, and the analyses performed.”

Shor S. The responsibilities of a statistical reviewer. Chest 1972;61:486-487.

Read the manuscript from end to beginning, and look for weaknesses in the links between:

1. Conclusion

2. Discussion (Discussion section)

3. Results (Results section)

4. Methods (Material & methods section)

5. Data (Material & methods section)

5. Hypothesis (Introduction)

Make sure the chain holds all the way!

Summary

1. Present statistical methods in detail, and the numberof observations included in each analysis.

2. Present data, statistical results and your conclusions- data description vs. results interpretation- clinical vs. statistical significance- absence of evidence is not evidence of

absence

3. Adjust for confounding factors in observationalstudies (but do not use stepwise regression)

4. Comply with the CONSORT checklist in randomizedstudies

Thank you for your attention!

copenhagen 2008

Documents

parametric andnonparametric

number of tests risk

independent observations

independent patients

analysisof patients

fishers exact test p

different numbersof

statistical precision