research by magic

MagnitudeArticulation

GeneralityInterest

Credibility

*Abelson, Statistics as Principled Argument

Research by MAGIC*

Magnitude

What’s the smallest result anyone will

care about?

Reduce the length of stay by one day?

Decrease mortality from 1% to 0.9%?

Are we trying to prove that there is

a meaningful difference, or that any

difference is too small to care

about?

Articulation – What’s the Story?

Variable(s) of

Primary interest

Outcomes

Continous:

Length of Stay

Pain scores

Events:

Infection

DVTs

Confounding variables

May be demographics

or comorbidities

Known or reasonably

expected to affect

outcome

Not all outcomes can be neatly

measured as discrete events or

physical units (pain, disability…)

Not all measurable variables may

be confounders. Only control or

match for ones you are sure of.

Articulation – tell the complete

story (using all relevant variables)

Articulation – a clear story

Tell as much of your story as you can using graphs and tables. Clinicians are a visual audience

Can you explain how variables may interact to produce the observed results?

Can you explain to a clinician (insurer, administrator, patient…) what the result means?

Articulation – telling the right story

Straight

line with

error

Nonlinear,

no error

No error, but

outlier

No result except

for outlier?

All of these have same regression line and R2

Generalizable

Who will be able to benefit from the

results of your study?

All surgeons and patients?

A subset such as:

Urban or rural locations?

Older or younger patients?

An infrequent result (5-10% of cases?)

Something so rare a surgeon may

never see it?

Generality

ALL RETROSPECTIVE STUDIES

ARE EXPLORATORY!

Without comparing to another data set, you can’t

confirm

GROUPS DEFINED BY THE

OUTCOMES SHOULD BE

SUSPECT!

Your data set should not drive the analysis

Interesting

"Not everything that counts can be

counted, and not everything that can

be counted, counts."Einstein on

endpoints.

Is this new information?

Is this useful?(see also:

Generalizable)

Is this something you yourself

would want to read about on

your own?

Credibility - Data ain’t fish! You can make tasty

imitation crabmeat,

shrimp, etc. by

mixing together

cheaper fish and

seasoning.

You can NOT pull

the same trick with

data.

Collect it right the

first time!

https://en.wikipedia.org/wiki/Crab_stick

Rosenwasser’s Special Case

“Meta-Analysis is to Analysis what Metaphysics is to Physics.”

Robert H. Rosenwasser, MD, FACS, FAHA

A special case of “data ain’t fish”Good studies + bad studies do not equal good on average

Many bad studies do not equal one good study

Credibility – Prospective StudiesA 22-item

checklist for good

reporting of a

randomized

controlled trial is

available at www.consort-statement.org

Why Randomize?If you don’t know what other

factors affect the result, you can

at least be confident they’re the

same in all groups.

http://www.consort-statement.org/

Credibility – Retrospective

StudiesBradford Hill’s nine criteria for causality

Strength of Association

Consistency with Prior Knowledge

Specificity (more causes, less specific)

Temporal relationship – cause before effect

Dose response – more exposure, greater odds

Plausibility – existing theory linking cause + effect

Coherence – does not contradict existing knowledge

Experimental evidence (such as animal studies)

Analogy – parallels other known cause-effect association

Presence doesn’t prove, absence doesn’t disprove, but each one helps.

Credibility: Math problemIf the Type I error is limited to 5% then we expect one false positive out of 20 different tests where the null hypothesis is true.

These could be:

20 different studies from the same person

20 different sites attempting the same study

One study containing 20 different tests

This last case is the only one under our control

Correcting for multiple testsIn both one-tailed and two-tailed

tests, the total Type I error

probability (area in red) sums up

to a.

In two-tailed tests, the error is

divided between a /2 for two

possibilities.

Bonferroni and other corrections

for multiple tests also divide up the

Type I error between tests.

Bonferroni divides up a among N

tests as a /N.

This correction protects against inflated type I error

Intention to TreatIn randomized studies, analysis must always be based on the group patients were assigned to, even if they cross over.

This prevents bias. For example, patients assigned to a non-operative group may still be given surgery, but operative patients can’t cross over to non-operative.

Patients having more trouble with one treatment may be more likely to cross over or drop out

The intention to treat analysis doesn’t ask whether the treatment is effective; it asks whether the policy of assigning a patient to the treatment is effective.

Six Ways to p-Hack

(list from Leif D. Nelson, Berkeley Initiative for Transparency in the Social Sciences)

Stop collecting data once p<.05

Analyze many measures, but report only those with

p<.05.

Collect and analyze many conditions, but only report

those with p<.05.

Use covariates to get p<.05.

Exclude participants to get p<.05.

Transform the data to get p<.05.

Goodhart’s Law: When a

measure becomes a

target, it ceases to be a

good measure

Male Age

(years)

Implant Ever

Smoked?

Disability

(%)

1 45 Brass 1 75

0 30 Ceramic 1 45

0 . Ceramic 0 30

1 56 Brass 0 50

0 . Brass 1 50

Sex Age

(years)

Implant Smoker Disability

(%)

M 45 Acme

Brass

Y 75

f 30 Presto

Ceramic

2

packs/day

45%

Y N/A Zenith

Ceramic

No 0.3

male 56 Delta

Brass

NO 50

F ? Metal Sometime

s

half

COLLECT DATA CONSISTENTLY

Revision required

before analysis is

practical.

The same data, clearly

coded with minimal

chance of error.

Useful Cynicism from

Statisticians All models are wrong, but some are useful. (George E. P.

Box)

An approximate answer to the right problem is worth a good

deal more than an exact answer to an approximate

problem. (John Tukey)

The combination of some data and an aching desire for an

answer does not ensure that a reasonable answer can be

extracted from a given body of data. (also John Tukey)

To call in the statistician after the experiment is done may

be no more than asking him to perform a post-mortem

examination: he may be able to say what the experiment

Also remember:

People who interview you – whether

hiring committees or patients – are

going to remember whether you spoke

with depth, insight and enthusiasm.

The difference between good medicine

and no medicine is generally smaller

than the difference between good

medicine and bad medicine. Caution

and skepticism help prevent getting bad

medicine out there.

research by magic

Documents