comparative research on training simulators in emergency medicine: a methodological review

33
Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1 Comparative Research on Training Simulators in Emergency Medicine: A Methodological Review Matt Lineberry, Ph.D. Research Psychologist, NAWCTSD [email protected] Medical Technology, Training, & Treatment

Upload: vina

Post on 24-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Comparative Research on Training Simulators in Emergency Medicine: A Methodological Review. Matt Lineberry, Ph.D. Research Psychologist, NAWCTSD [email protected] Medical Technology, Training, & Treatment (MT3) May 2012. Credits and Disclaimers. Co-authors - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

1

Comparative Research on

Training Simulators in Emergency Medicine:

A Methodological Review

Matt Lineberry, Ph.D.Research Psychologist, NAWCTSD

[email protected]

Medical Technology, Training, & Treatment (MT3) May 2012

Page 2: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

2

Credits and Disclaimers• Co-authors

– Melissa Walwanis, Senior Research Psychologist, NAWCTSD

– Joseph Reni, Research Psychologist, NAWCTSD

• These are my professional views, not necessarily those of NAWCTSD, NAVMED, etc.

Page 3: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

3

Objectives• Motivate conduct of comparative research in

simulation-based training (SBT) for healthcare

• Identify challenges evident from past comparative research

• Promote more optimal research methodologies in future research

Page 4: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Cook et al. (2011) meta-analysis in JAMA:

“…we question the need for further studies comparing simulation with no intervention (ie, single-group pretest-posttest studies and comparisons with no-intervention controls).

…theory-based comparisons between different technology-enhanced simulation designs (simulation vs. simulation studies) that minimize bias, achieve appropriate power, and avoid confounding… are necessary”

Page 5: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Issenberg et al. (2011) research agenda in SIH:

“…studies that compare simulation training to traditional training or no training (as is often the case in control groups), in which the goal is to justify its use or prove it can work, do little to advance the field of human learning and training.”

Page 6: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

6

Moving forward:comparative research

• How do varying degrees and types of fidelity affect learning?

• Are some simulation approaches or modalities superior to others?For what learning objectives?Which learners? Tasks? Etc.

• How do cost and throughput considerations affect the utility of different approaches?

Page 7: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

7

Where are we now?Searched for peer-reviewed studies comparing training

effectiveness of simulation approaches and/or measured practice on human patients for emergency medical skills

• Searched PubMed and CINAHL– mannequin, manikin, animal, cadaver, simulat*, virtual reality,

VR, compar*, versus, and VS• Exhaustively searched Simulation in Healthcare• Among identified studies, searched references forward and

backward

Page 8: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

8

Reviewed studies17 studies met criteria

• Procedure trained:– Predominantly needle access (7 studies).

4 airway adjunct, 3 TEAM, 2 FAST, etc.

• Simulators compared:– Predominantly manikins, VR systems, and part-task trainers

Page 9: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

9

Reviewed studies• Design:

Almost entirely between-subjects (16 of 17)

• Trainee performance measurement:– 7 were post-test only; all others included pre-tests– Most (9 studies) use expert ratings;

also: knowledge tests (7), success/failure (6), and objective criteria (5)

– 6 studies tested trainees on actual patients– 6 tested trainees on one of the simulators used in training

Page 10: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

10

Apparent methodological challenges1. Inherently smaller differences between conditions –

and consequently, underpowered designs

2. An understandable desire to “prove the null” – but inappropriate approaches to testing equivalence

3. Difficulty measuring or approximating the ultimate criterion: performance on the job

Page 11: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

11

Challenge #1: Detecting “small” differences• Cook et al. (2011) meta:

Differences in outcomes of roughly 0.5-1.2 standard deviations, favoring simulation-based training over no simulation.

Comparative research should expect smaller differences than these.

• HOWEVER, small differences can have great practical significance if they…– correspond to important outcomes

(e.g., morbidity or mortality),– can be exploited widely, and/or– can be exploited inexpensively.

Page 12: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

12

The power of small differences…• Physicians Health Study:

Aspirin trial halted prematurely due to obvious benefit for heart attack reduction–Effect size: r = .034–Of 22k participants,

85 fewer heart attacks in the aspirin group

Page 13: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

13

…and the tyranny of small differences• Probability to detect differences (power)

decreases exponentially as effect size decreases

• We generally can’t control effect sizes.Among other things, we can control:– Sample size– Reliability of measurement– Chosen error rates

Page 14: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

14

Sample size• Among reviewed studies, n ranges from 8 to 62;

median n = 15.

• If n = 15, α = .05, true difference = 0.2 SDs, and measurement is perfectly reliable,probability of detecting the difference is only 13%

RECOMMENDATION:Pool resources in multi-site collaborations to achieve needed power to detect effects(and estimate power requirements a priori)

Page 15: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

15

Reliability of measurement

• Potential rater errors are numerous

• Typical statistical estimates can be uninformative (i.e. coefficient alpha, inter-rater correlations)

• If measures are unreliable – and especially if samples are also small – you’ll almost always fail to find differences,whether they exist or not

Page 16: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

16

Reliability of measurementAmong nine studies using expert ratings:

• Only two used multiple raters for all participants

• Six studies did not estimate reliability at all– One study reported an inter-rater reliability coefficient– Two studies reported correlations between raters’ scores

Both approaches make unfounded assumptions

• Ratings were never collected on multiple occasions

Page 17: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

17

Reliability of measurementRECOMMENDATIONS:1. Use robust measurement protocols –

e.g., frame-of-reference rater training, multiple raters

2. For expert ratings, use generalizability theory to estimate and improve reliability

G-theory respects a basic truth:“Reliability” is not a single value associated with a measurement tool

Rather, it depends on how you conduct measurement, who is being measured, the type of comparison for which you use the scores, etc.

Page 18: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

18

G-theory process, in a nutshell1. Collect ratings, using an experimental design to expose

sources of error(e.g., have multiple raters give ratings, on multiple occasions)

2. Use ANOVA to estimate magnitude of errors3. Given results from step 2, forecast what reliability will

result from different combinations of raters, occasions, etc.

18

Page 19: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

19

Weighted scoring• Two studies used weighting schemes –

more points associated with more critical procedural steps– Can improve both reliability and validity

• RECOMMENDATION:Use task analytic procedures to identify criticality of subtasks;weight scores accordingly

Page 20: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

20

Selecting error ratesWhy do we choose p = .05 as the threshold for statistical significance?

Page 21: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

21

Relative severity of errorsType I error: “Simulator x is more effective than Simulator y”(but really, they’re equally effective)

Potential outcome: Largely trivial; both are equally effective, so erroneously favoring one does not affect learning or patient outcomes

Type II error: “Simulators x and y are equally effective”(but really, Simulator X is superior)

Potential outcome: Adverse effects on learning and patient outcomes if Simulator X is consequently underutilized

Page 22: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

22

Relative severity of errorsType I error: “Simulator x is more effective than Simulator y”(but really, they’re equally effective)

Potential outcome: Largely trivial; both are equally effective, so erroneously favoring one does not affect learning or patient outcomes

Type II error: “Simulators x and y are equally effective”(but really, Simulator X is superior)

Potential outcome: Adverse effects on learning and patient outcomes if Simulator X is consequently underutilized

α=.05

β=1-power(e.g., 1-.80 = .20)

Page 23: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

23

Relative severity of errors• RECOMMENDATION:

Particularly in a new line of research, adopt an alpha level that rationally balances inferential errors according to their severity

Cascio, W. F., & Zedeck, S. (1983). Open a new window in rational research planning: Adjust alpha to maximize statistical power. Personnel Psychology, 36, 517-526.

Murphy, K. (2004). Using power analysis to evaluate and improve research. In S.G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (Chapter 6, pp. 119-137). Malden, MA: Blackwell.

Page 24: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

24

Challenge #2: Proving the null• Language in studies often reflects desire to

assert equivalence– e.g., different simulators are “reaching parity”

• Standard null hypothesis statistical testing (NHST) does not support this assertion– Failure to detect effects should prompt

reservation of judgment, not acceptance of the null hypothesis

Page 25: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

25

Which assertion is more bold?

“Sim X is more effective than Sim Y”

“Sims X and Y are equally effective”0Y favored X favored

0Y favored X favored

Page 26: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

26

Proving the null• Possible to prove the null:

– Set a region of practical equivalence around zero– Evaluate whether all plausible differences (e.g., 95% confidence

interval) fall within the region

• RECOMMENDATION:– Avoid unjustified acceptance of the null– Use strong tests of equivalence when hoping to assert

equivalence– Be explicit about what effect size you would consider practically

significant, and why

Page 27: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

27

Challenge #3: Getting to the ultimate criterion• The goal is not test performance but job

performance;“the map is not the terrain”

• Typical to test demonstration of procedures, often on a simulator– Will trainees perform similarly on actual patients,

under authentic work conditions?– Do trainees know when to execute the procedure?– Are trainees willing to act promptly?

Page 28: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

28

e.g.: Roberts et al. (1997)• No differences detected in rate of successful laryngeal mask

airway placement for manikin vs. manikin-plus-live-patient training– However: Confidence very low, and only increased with live-patient

practice

• “…if a nurse does not feel confident enough… the patient will initially receive pocket-mask or bag-mask ventilation, and this is clearly less desirable”

Issue of willingness to act decisively

Page 29: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

29

Criterion relevance• RECOMMENDATION:

Where possible, use criterion testbeds that correspond highly to actual job performance– Assess performance on human patients/volunteers– Replicate performance-shaping factors (not just

environment)– Test knowledge of indications and willingness to act

Page 30: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

30

What if patients can’t be used?• Using simulators as the criterion

testbed introduces potential biases–e.g., train on cadaver or manikin;

test on a different manikin

Page 31: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

31

A partial solution:Crossed-criterion design

Page 32: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

32

A partial solution:Crossed-criterion design• Advantages

–Mitigates bias–Allows comparison of generalization of learning

from each training condition• Disadvantages

–Precludes pre-testing, if pre-test exposure to each simulator is sufficiently lengthy to derive learning benefits

Page 33: Comparative Research on  Training Simulators in Emergency Medicine: A Methodological Review

Click to edit Master title style

• Click to edit Master text styles• Second level• Third level• Fourth level• Fifth level

33

Conclusions• “The greatest enemy of a good plan is the

dream of a perfect plan”• All previous comparative research is to be

lauded for pushing the field forward• Concrete steps can be taken to maximize

the theoretical and practical value of future comparative research