class 11 -12 chapters 5 & elkins (1989). threats to statistical conclusion validity are the...

Class 11 -12

Chapters 5 & Elkins (1989)

Threats to Statistical Conclusion Validity Are the observed relations among variables accurate?

Power

Unreliability of Measures Introduces error varianceAttenuates Correlations

Unreliability of Treatment Implementation

Specificity- Active ingredientsFidelity of deliveryCompetency

Extraneous Variance in the Experimental Setting

Heterogeneity of Participants

2

Threats to Internal ValidityCan we conclude that there is a causal relation between the IV and the DV? Did treatment cause differences in DV across groups? Selection Inclusion –Exclusion criteria &

Who gets assigned to which group?

History

Attrition What do we know about drop-outs?

Repeated Testing Effects

Reaction to Control Group Assignment

Double- blind designs pharmaceutical studiesPlacebo effects – non-specific-factors vs active ingredient are responsible for observationsHouston study -----------

3

Department of Veterans Affairs (VA) and Baylor College of Medicine- Houston

180 osteoarthritis and knee pain patients randomly assigned to (New England J of Medicine, 2002):

Debridement worn, torn,cartilage is cut and removed with viewing tube called an arthroscope

Arthroscopic lavage

bad cartilage is flushed out

Simulated arthroscopicSurgery

small incisions were made, but no instruments were inserted and no cartilage removed

4

Findings During two years of follow-up:,

patients in all three groups reported moderate improvements in pain and ability to function.

intervention groupsdid not report not less pain or better function than the placebo group.

Placebo patients reported better outcomes than the debridement patients at certain points during follow-up.

Patients were blind to type of surgery 5

Threats to Construct Validity

To what extent variables capture desired constructs

Mono-Operation Bias(Instruments)

Mono-Method BiasSelf-ReportClinician ragted

Experimenter Expectancies

Allegiance Effect

6

Threats to External ValidityCan we generalize observed relations across persons, settings and times

Person-Units

Outcome Measures

Settings

7

Elkin et al: Purpose

Test feasibility of the collaborative clinical trial model

Examine relative efficacy of CBT, IPT, and Medication for Depression

8

NIMH Treatment of Depression Collaborative Research Program

U. of Pittsburg George Washington U. U. of Oklahoma 250 Patients: Major depressive disorder 28 therapists: years experience 2 -27; 71% male

10 psychologists 18 psychiatrists

9

10

Experimental Between-Group Designs

1. Post-Test Only Control2. Pre-Test -- Post-Test Control3. Solomon Four Group (combination of 1 and 2 above)

Factorial Design more than one independent variable; interactions treatment X therapist

or patient characteristic

Dependent Sample Design (Matching)

11

Experimental Between-Group Designs

1. Post-Test Only Control2. Pre-Test -- Post-Test Control3. Solomon Four Group (combination of 1 and 2 above)

Factorial Design - Post Hoc more than one independent variable; interactions

treatment X patient characteristic (depression level at intake)

Dependent Sample Design (Matching)

12

IVs: Experimental Groups:

Cognitive Behavioral Therapy Interpersonal Therapy

16 individual sessions/ 50 min.Medication + Clinical Management* Pill-Placebo + Clinical Management*

1st session 55 min.; then 20 to 25 min.

* Minimal supportive therapy condition

Dependent Variables

Clinical Evaluator

Self Report

13

Dependent Variables

Clinical Evaluator

• Hamilton Rating Scale Depression (HRSD)

• Global Assessment Scale (GAS)

Self Report

• Beck Depression Inventory (BDI)

• Hopkins Symptom Checklist (HSCL-90)

14

Outcome Research Strategies

Primary Analyses

Secondary Analyses (Post-Hoc)

15

Types Outcome Studies Kazdin (chap 18)1. Treatment Package Strategy2. Dismantling Strategy3. Constructive Strategy 4. Parametric Strategy (structural

components)5. Comparative Outcome Strategy6. Client and Therapist Variation

Strategy Moderation Designs


Primary AnalysesTreatment packageComparative

Secondary AnalysesClient Variation -moderation effect?

17


Secondary AnalysesClient Variation -moderation effect

depression level at intake as moderator differences between in outcomes treatment groups

Were outcomes across treatment groups different for patients with higher versus lower levels of depression at pre-test?

18

Control Groups

CBT IPTMedication + Clinical Management* Pill- Placebo + Clinical Management*


19

Treatments & TherapistsCognitive Behavioral Therapy

Interpersonal Therapy

Different group of experienced therapists

Medication + Clinical Mngmnt

Pill-Placebo + Clinical Mngmnt

Same therapists - psychiatrists

20

Treatments & Therapists

Cognitive Behavioral Therapy

Interpersonal Therapy

Different group of experienced therapists (potential confound)

Medication + Clinical Mngmnt

Pill-Placebo + Clinical Mngmnt

Same therapists: psychiatrists

(safeguards internal validity- undermines generalizability)

21

Ensure Valid Treatments

Specify the treatment(s)

Therapist training/monitoring

Fidelity Checks

22

Ensure Valid Treatments

Specify the treatment(s)Manuals

Therapist training/monitoringFidelity Checks- therapy tapes

Collaborative Study Psychotherapy Rating Scale (CSPRS):

Taped treatments could be discriminated 95% of the time

23

Attrition (>15 sessions or 12 weeks)

Total: 77/239 32%CBT 32%IPT 23%Meds/CM 33%Placebo/CM 40%Early terminators more depressed at pre-test than completers.

24

Which group to use in outcome analysis??

Total N = 239 CompletersN = 155

15 weeks or12 sessions

End-Point N = 204

At least 3.5 weeks or4 sessions

End PointN = 239

Intent to Treat Group(last assessment or pre-test)

25

Assessment Times

Pre treatment

Post Treatment4, 8, 12 weeksTermination – 15 weeksFollow up: 6, 12, 18 months

26

27

Analyses of Pre-test/Post-test (1)

Paired T-Test to examine differences between pre-test and post-test scores (p. 974)

How Many ??

Table 1 Completer Group: At least 12 sessions; n=155 (page 975)

28

29

Analyses of Pre-test/Post-test (1)

Paired T-Test to examine differences between pre-test and post-test scores (p. 974)

How Many ??

4 Treatment groups X 4 Outcome measures CBT HRSD IPT GAS IMI-CM BDI Pla-CM HSCL-90 X 3 Samples – Completers; End Point 204; 239

Findings – T-Tests

30

P.974 right

31

IVs: Experimental Groups:

Cognitive Behavioral Therapy Interpersonal Therapy

16 individual sessions/ 50 min.Medication + Clinical Management* Pill-Placebo + Clinical Management*

1st session 55 min.; then 20 to 25 min.


Analyses of Post-test scores Use pre-test as a covariate in analyses of co-

variance to compare mean post-test scores across the 4 treatment groups

Calculate a residualized change score – amount of variability in the post-test that is not associated with the pre-test score

Used a p<.10 in ANCOVAS and p =.10/6 =.01666=.017 pair-wise comparisons(6) Bonferroni correction (p.974)

32

Table 1 Completer Group: At least 12 sessions; n=155 (page 975)

33

34

ANCOVAS: Post test scores

Statistically significant differences between groups in scales at post-test Four 3 X 4 ANCOVAS: differences across

treatments in Post-treatment scores in: HRSD, GAS --- BDI, HSCL90

3 (sites) X 4 (treatment groups) Analyses reported only for treatment groups

combining them across sites

Co-VariatesPre-test scores Marriage Status (1,2)

Why not MANCOVAS? P.973

35

Table 1 Completer Group: At least 12 sessions; n=155 (page 975) p<.10

36

BDI -No significance differences in pair-wise comparisons

Table 1 End Point 239 Group CBT IPT IMI-CM PLA-CM p<.10

37

Findings Pair-wise ComparisonsSample Clinical Evaluator Self-Report

Completer N = 155

BDI Pairwise NSHSCL-90-T p=.006

IMI-CM<PLA-CM

EP-204GASIMI-CM<PLA-CM (trend p=.020--- .017)

EP-239HRSDep IPT, IMI-CM<PLA-CMGAS p =.010 IMI-CM<PLA-CM

(trend p=.017,.018)

38

39

Measuring Change Elkin et al. 1989

Statistical significance

Clinical significance Recovery Analysis

40

Measuring Change Elkin et al. 1989

Statistical significance Differences between groups in scales at

post-test controlling for pre-test scores

Clinical significance Percentage of participants that changed

from dysfunctional to functional level (using cut-off scores)

Clinical Significance Recovery Analysis

Proportion of patients who improved vs. not improved

Cut Off Scores Not Depressed HRSD < 6 and BDI < 9 Depressed HRSD > 6 or BDI > 9

Statistical Analyses Chi square: Proportion of depressed and non-

depressed patients across treatment groups at termination.

42

44

End Point 239 HRSD p = .04 CBT IPT IMI-CM P-CM

Chi Square (Χ2) tests to what extent the proportion in each group is what may be expected by chance or if it is larger or smaller than expected…….

IPT = IMI-CM>Placebo-CM CBT - % comparison was not sig. for any group

Proportion of cases that met recovery criteria

36%(ns)


43%


42%


21%

45

Completer Group on HRSD CBT IPT IMI-CM P-CM

Chi Square (Χ2) tests to what extent the proportion in each group is what may be expected by chance or if it is larger or smaller than expected…….

IPT, IMI-CM>Placebo-CM


51%


55%


57%


29%

Secondary Analyses

To examine effect of pre-treatment severity (HRSD/GAS) on outcome by treatment groupDVs: Post-treatment scores Severity Criteria

HRSD>20 44% of sample GAS<50 41%

Covariate Marital Status

46

2X4 ANCOVA (severity x treatment) DVs- Post Test HRSD, GAS, BDI, HSCL-90

Main Effect for

Main Effect for

(Interaction term)***

47

2X4 ANCOVA (severity x treatment) DVs- Post Test HRSD, GAS, BDI, HSCL-90

Main Effect for Severity More Severe Pre-Test HRSD>20; GAS<50 Less Severe Pre-Test

Main Effect for Treatment CBT IPT IMI-CM P-CM

Severity X Treatment (interaction term)*******48

49

Interaction Effect HRSD Severity x TG Dependent Variables: HRSD* GAS, BDI, HSCL-90 (p.976)

Completer S BDI IPT IMI-CM P-CMHigh Depression

Low Depression


Low Depression

Completer* CBT IPT IMI-CM P-CMHigh HRSD Low HRSD

End Point 239^ CBT IPT IMI-CM P-CMHigh HRSDLow HRSD

4 sets of 3 2X4 Ancovas: 4DVs, 3 sample subgroups *p<.10; ^p<.11

End Point 204* CBT IPT IMI-CM P-CMHigh HRSD

Low HRSD

50

Interaction Effect GAS Severity x TG: Dependent Variables: HRSD GAS, BDI, HSCL-90


Low Depression


Low Depression

Completer** CBT IPT IMI-CM P-CMHigh GASLow GAS

End Point 239* CBT IPT IMI-CM P-CMHigh GASLow GAS

End Point 204**** CBT IPT IMI-CM P-CMHigh GAS

Low GAS

51

Treatment by Severity Interaction/end-point 204 sampleHigher score Negative Outcome Higher Score Positive Outcome

Summary All Pairwise analyses following interaction effects p.976

Less severe groups: no differences across treatment groups

More severe groups IPT more effective than PLA-CM in 3

instances all in the HRSD measure in the END Point Sample 204 (3 out of 4 comparisons)

IMI-CM more effective than PLA-CM across a number of measures (8 out 10 comparisons)

52

53

Figure 2Recovery Rates (%) endpoint /204 sample

Figure 2 Recovery Rates (%) endpoint /204 sample for severity groups (p.977)

Less severe subgroups: NS differences among treatments for all samples with HRSD or GAS.

More severe subgroups for HRSD and GAS: Consistent findings across the three samples IPT>PLA-CM 5/6 and IMI-CM>PLA-CM 6/6

54

Threats to Statistical Conclusion ValidityAre the observed relations among variables accurate?

Power

Unreliability of Measures




55

Threats to Statistical Conclusion ValidityAre the observed relations among variables accurate?

Power • Large N by group range 34-62 +• Outcome measures are well-known +• Power analyses 81-95% for medium effects +• p<.10 for Mancovas and .10/6 for pairwise comp

Unreliability of Measures


• Experienced Therapists – 2-27yrs Mean = 11 +• Manuals, training per treatment group +• Closely monitored +• Taped sessions – 95% correctly classified +


• Not known for the most part -• 28 therapists from 3 – 11 patients each -• no way to control for therapist effects P. 980 -• one site CBT another site IPT similar to

Meds/CM


• Random assignment to groups +• Only included 45% of those screened. + • Mostly women 70% female +• 89% white participants + 56

Threats to Internal ValidityCan we conclude that there is a causal relation between the IV and the DV? Did treatment cause differences in DV across groups? Selection Who gets assigned to which group?

History

Attrition What do we know about drop-outs?



57

Threats to Internal ValidityCan we conclude that there is a causal relation between the IV and the DV?Selection Used Randomization-

See factors under Heterogeneity of Participants

History Time frame of study not reportedDid therapy happen at about the same time for everyone?

Attrition Relatively high attrition rates - 32% -- about 25% was for negative reasons related to treatment- (-)Early terminators were more depressed at intake (-)


Tested at frequent intervals –’ pre-test, 4, 8, 12, weeks, termination 6 12 and 18 months follow-up


Not known – but could be the case. Placebo/CM experienced the highest attrition – 32% CBT—23% IPT – 33% Meds/CM -- 40% Placebo/CM

58




Mono-Method BiasExperimenter Expectancies

59




• Used 4 different outcome measures HIRSD, BDI, GAS, HCSL-90 +

• Measures of well-known psychometric properties +

Mono-Method Bias• Used both patient self report and clinician

completed measures +• Measures of well-known psychometric

properties +

Experimenter Expectancies

• Clinicians not blind to therapy modality-• Psychiatrist blind to Med condition +

60


Person-Units

Outcome Measures

Settings

61


Person-Units• Highly selected sample (-)• Only 45% screened were selected (-)• Generalizable to white (89%) women (70%)

highly educated (75% coll degree or some coll) who were less severely depressed (p.974)

Outcome Measures• Interview and self –report measures +• Clinical significance recovery rates +• Statistically significant findings were not

consistent across measures – HRSD detected more differences in depression that BDI -

Settings Empirical Question ????

62

63

Results: Summary 1/3

Paired T test showed stat. sig. differences (p<.001) in Pre- Post scores in all measures for all three groups of participants (even placebo pill/CM) Intent-to treat Completers Minimum

3.5<Sessions Completers of all or most sessions

At least 12<sessions > 15 (n=155)


ANCOVAS showed no stat sig differences in pre-test scores in

any measure for any treatment group

Stat sig differences in post-test BDI/HSCL90 Completers HSRD/GAS Total Group (239)

65


Pairwise Follow-up ANCOVA HSCL-90 IMI-CM> PLA-CM (Completer) GAS -- IMI-CM>PLA-CM (Total 239 group) HRSD IPT, IMI-CM>trend PLA-CM (Total 239)

Recovery Findings (Clinical Significance) IPT, IMI-CM > PLA-CM ( End-Point 239) 43% 42% 21% Post-test HRSD<6 CBT = 36% NS

class 11 -12 chapters 5 & elkins (1989). threats to statistical conclusion validity are the...

Documents