considerations in the use of propensity scores in observational studies · 2016. 11. 1. ·...

Considerations in the Use of Propensity

Scores in Observational Studies

Lawrence Rasouliyan, Estel Plana, Jaume Aguado

October 11, 2016

PhUSE 2016 / Real World Evidence Stream

RTI Health Solutions, Barcelona, Spain

2

• Randomized Controlled Trials = Gold standard for treatment

comparison

• Observational studies are increasingly being used to

estimate the effects of treatments, exposures, and

interventions on outcomes.

• Treated and untreated patients often have systematic

differences in their distributions of underlying characteristics.

• Traditionally, multivariable models have been used.

• Propensity scores can be used to minimize the effects of

bias in the estimate of the treatment effect.

– Serves as a “balancing” score for measured confounders

– Outcomes are compared taking into account this balancing to reduce

potential bias of confounders

Background

3

• Probability of receiving Treatment A (versus Treatment B)

given the patient’s underlying characteristics.

• Usually determined by logistic regression:

Propensity Score: Definition

4




𝑙𝑛𝑃𝐴

1 − 𝑃𝐴= 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯


5




𝑙𝑛𝑃𝐴

1 − 𝑃𝐴= 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯

• Solving for PA, we get the propensity score:

𝑃𝐴 =𝑒𝑥𝑝 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯

1 + 𝑒𝑥𝑝 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯


6




𝑙𝑛𝑃𝐴

1 − 𝑃𝐴= 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯

• Solving for PA, we get the propensity score:

𝑃𝐴 =𝑒𝑥𝑝 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯

1 + 𝑒𝑥𝑝 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 +⋯

• Then, we account for this propensity score to make

treatment comparison.


7

• Prospective 8-year observational study in oncology

• Outcomes of interest:

– Overall survival (OS) = Time to death

– Progression-free survival (PFS) = Time to progression or death

– Overall response rate (ORR) = Response to treatment

• Data simulated to mimic general attributes of actual results

• Disease indication: “Cancer”

• Two possible therapies…

Illustrative Example: Data Source

8

Two Possible Therapies

Blue Pill

N = 1,871

Red Pill

N = 1,384

Which pill is “better” for outcomes of interest?

9

Characteristic Blue Pill Red Pill P value

N = 1871 N = 1384

Age group, n (%)

≤ 50 59 (3.2%) 234 (16.9%) <0.001

51 to 60 576 (30.8%) 684 (49.4%)

61 to 70 925 (49.4%) 412 (29.8%)

> 70 311 (16.6%) 54 (3.9%)

Sex, n (%)

Male 834 (44.6%) 593 (42.8%) 0.326

Female 1037 (55.4%) 791 (57.2%)

Geographic region, n (%)

North America 846 (45.2%) 553 (40.0%) <0.001

Europe 705 (37.7%) 702 (50.7%)

Asia 320 (17.1%) 129 (9.3%)

Performance status, n (%)

0 1060 (56.7%) 1001 (72.3%) <0.001

1 688 (36.8%) 352 (25.4%)

≥ 2 123 (6.6%) 31 (2.2%)

Patient Characteristics

10


N = 1871 N = 1384

Age group, n (%)

≤ 50 59 (3.2%) 234 (16.9%) <0.001

51 to 60 576 (30.8%) 684 (49.4%)

61 to 70 925 (49.4%) 412 (29.8%)

> 70 311 (16.6%) 54 (3.9%)

Sex, n (%)

Male 834 (44.6%) 593 (42.8%) 0.326

Female 1037 (55.4%) 791 (57.2%)

Geographic region, n (%)

North America 846 (45.2%) 553 (40.0%) <0.001

Europe 705 (37.7%) 702 (50.7%)

Asia 320 (17.1%) 129 (9.3%)

Performance status, n (%)

0 1060 (56.7%) 1001 (72.3%) <0.001

1 688 (36.8%) 352 (25.4%)

≥ 2 123 (6.6%) 31 (2.2%)

Patient Characteristics

Red Pill Patients

Younger

From Europe

Better

Performance

11

Patient Characteristics (Continued)


N = 1871 N = 1384

Prognosis risk category, n (%)

Low risk 297 (15.9%) 228 (16.5%) <0.001

Medium risk 684 (36.6%) 656 (47.4%)

High risk 890 (47.6%) 500 (36.1%)

Disease stage, n (%)

III 1210 (64.7%) 851 (61.5%) 0.063

IV 661 (35.3%) 533 (38.5%)

Extranodal sites, n (%)

≥ 2 591 (31.6%) 417 (30.1%) 0.374

< 2 1280 (68.4%) 967 (69.9%)

LDH level, n (%)

Elevated 498 (26.6%) 290 (21.0%) <0.001

Normal 1373 (73.4%) 1094 (79.0%)

Bone marrow involvement, n (%)

Yes 1165 (62.3%) 900 (65.0%) 0.106

No 706 (37.7%) 484 (35.0%)

Center type, n (%)

Academic 400 (21.4%) 197 (14.2%) <0.001

Community 1471 (78.6%) 1187 (85.8%)

Red Pill Patients

Better Prognosis

Risk

Normal LDH Level

From Community

Center

12

• Choosing main effects is an “art” and science

• Several factors to consider (e.g., strength of association,

clinical plausibility, multicollinearity, effect modification)

• Remember the purpose: Control for confounding variables

• Simulation studies have shown:

– Confounding variables should always be included

Specifying the Propensity Score Model

Treatment Outcome

Variable

13







– Variables associated with only the treatment decrease precision


Treatment Outcome

Variable

14








– Variables associated with only the outcome increase precision


Treatment Outcome

Variable

15








– Variables associated with only the outcome increase precision

• Our strategy: Include all variables associated with outcome


Treatment Outcome

Variable

16

• Look at variables one at a time (univariable models)

• Determine which variables are associated with any outcome

Selection of Patient Characteristics for Model

* Univariable Cox Regression Model Predicting OS;

proc phreg data=ad01;

class region (ref="North America");

model osmo*osevt(0) = region;

run;

* Univariable Cox Regression Model Predicting PFS;



model pfsmo*pfsevt(0) = region;

run;

* Univariable Logistic Regression Model Predicting ORR;

proc logistic data=ad01;


model orr (event = "1") = region;

run;

17

Variable OS PFS ORR

β P Value β P Value β P Value Age group

≤ 50 Ref Ref Ref Ref Ref Ref

51 to 60 0.291 0.141 0.038 0.728 0.411 0.001

61 to 70 0.922 <0.001 0.511 <0.001 -0.189 0.070

> 70 1.057 <0.001 0.393 0.001 -0.806 <0.001

Sex

Male 0.049 0.536 0.002 0.970 0.021 0.734

Female Ref Ref Ref Ref Ref Ref

Geographic region

North America Ref Ref Ref Ref Ref Ref

Europe -0.041 0.628 -0.058 0.307 0.093 0.274

Asia 0.167 0.144 0.138 0.076 -0.298 0.004

Performance status

0 Ref Ref Ref Ref Ref Ref

1 0.059 0.491 0.092 0.104 0.019 0.859

≥ 2 0.679 <0.001 0.423 <0.001 -0.334 0.036

Disease stage

III Ref Ref Ref Ref Ref Ref

IV 0.349 <0.001 0.335 <0.001 -0.217 0.001

Extranodal sites

≥ 2 0.209 0.011 0.152 0.006 -0.201 0.001

< 2 Ref Ref Ref Ref Ref Ref

LDH level

Elevated 0.673 <0.001 0.335 <0.001 -0.417 <0.001

Normal Ref Ref Ref Ref Ref Ref

Bone marrow involvement

Yes -0.019 0.813 0.001 0.986 -0.018 0.780

No Ref Ref Ref Ref Ref Ref

Center type

Academic 0.187 0.051 0.129 0.047 0.068 0.405

Community Ref Ref Ref Ref Ref Ref


18

Variable OS PFS ORR

β P Value β P Value β P Value Age group

≤ 50 Ref Ref Ref Ref Ref Ref

51 to 60 0.291 0.141 0.038 0.728 0.411 0.001

61 to 70 0.922 <0.001 0.511 <0.001 -0.189 0.070

> 70 1.057 <0.001 0.393 0.001 -0.806 <0.001

Sex

Male 0.049 0.536 0.002 0.970 0.021 0.734

Female Ref Ref Ref Ref Ref Ref

Geographic region

North America Ref Ref Ref Ref Ref Ref

Europe -0.041 0.628 -0.058 0.307 0.093 0.274

Asia 0.167 0.144 0.138 0.076 -0.298 0.004

Performance status

0 Ref Ref Ref Ref Ref Ref

1 0.059 0.491 0.092 0.104 0.019 0.859

≥ 2 0.679 <0.001 0.423 <0.001 -0.334 0.036

Disease stage

III Ref Ref Ref Ref Ref Ref

IV 0.349 <0.001 0.335 <0.001 -0.217 0.001

Extranodal sites

≥ 2 0.209 0.011 0.152 0.006 -0.201 0.001

< 2 Ref Ref Ref Ref Ref Ref

LDH level

Elevated 0.673 <0.001 0.335 <0.001 -0.417 <0.001

Normal Ref Ref Ref Ref Ref Ref

Bone marrow involvement

Yes -0.019 0.813 0.001 0.986 -0.018 0.780

No Ref Ref Ref Ref Ref Ref

Center type

Academic 0.187 0.051 0.129 0.047 0.068 0.405

Community Ref Ref Ref Ref Ref Ref


Selected Variables

• Age group

• Geographic

region

• Performance

status

• Disease stage

• Extranodal sites

• LDH level

19

Propensity Score Model

* PS Model: Multivariable Logistic Regression Predicting Receipt of BP;


class agegrp (ref="<=50") region (ref="North America")

perfstat (ref="0") stage (ref="III")

extranod (ref=">=2") ldhlevel (ref="Elevated") / param=ref;

model tx (event="BP") = agegrp region perfstat stage extranod ldhlevel;

output out=ad02 pred=propensity;

run;

20









run;

Model receipt

of the Blue Pill

21









run;

Model receipt

of the Blue Pill

As a function of the

6 selected

variables

22









run;

Model receipt

of the Blue Pill


6 selected

variables Output the

results into a

data set named

AD02

23









run;

Model receipt

of the Blue Pill


6 selected

variables

With the variable

PROPENSITY for

each patient Output the

results into a

data set named

AD02

24









run;

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.1882 0.1860 40.7886 <.0001

agegrp 51 to 60 1 1.1653 0.1600 53.0770 <.0001

agegrp 61 to 70 1 2.1720 0.1610 181.9532 <.0001

agegrp >=70 1 3.1191 0.2114 217.6932 <.0001

region Asia 1 0.5041 0.1274 15.6471 <.0001

region Europe 1 -0.4297 0.0835 26.4601 <.0001

perfstat 1 1 0.6027 0.0856 49.5625 <.0001

perfstat >=2 1 1.2876 0.2182 34.8199 <.0001

stage IV 1 0.1179 0.0811 2.1121 0.1461

extranod <2 1 -0.0882 0.0849 1.0814 0.2984

ldhlevel Normal 1 -0.3877 0.0925 17.5754 <.0001

25

Propensity Score Trimming

• Propensity score distribution through stacked histograms

26

Propensity Score Trimming

• Propensity score distribution through stacked histograms

1st Percentile among BP

99th Percentile among RP

• 317 patients

excluded from

further analysis

• PS-trimmed

cohort = 2,945

patients

27

Application of the Propensity Score

• We will cover 4 methods of applying the propensity score to

estimate the adjusted treatment effect:

– Stratification

– Matching

– Inverse Probability Treatment Weighting (IPTW)

– Covariate adjustment

• Note about covariate balance

28

Application of the Propensity Score

• We will cover 4 methods of applying the propensity score to

estimate the adjusted treatment effect:

– Stratification

– Matching

– Inverse Probability Treatment Weighting (IPTW)

– Covariate adjustment

• Note about covariate balance

• Continuous variables

𝑑 =𝑋𝐵𝑃 − 𝑋𝑅𝑃

𝑠𝐵𝑃2 − 𝑠𝑅𝑃

2

2

• Categorical variables

𝑑 =𝑝𝐵𝑃 − 𝑝𝑅𝑃

𝑝𝐵𝑃 1 − 𝑝𝐵𝑃 + 𝑝𝑅𝑃 1 − 𝑝𝑅𝑃2

29

Stratification

• Rank propensity scores into quintiles.

• Patients within any particular quintile are similar in characteristics.

• Treatment effect within a particular quintile should not be

influenced by measured differences between BP and RP patients.

* Categorize PS-Trimmed Cohort into PS Quintiles;

proc rank data=ad02 out=strat01 groups=5;

where (pstrimcohort eq 1);

var propensity;

ranks psrank;

run;

data strat02;

set strat01;

psquin = (psrank + 1);

label psquin = "Propensity Score Quintile (1 to 5)";

run;

30

Stratification: Relative Treatment Effect

• Model each outcome as a function of treatment and propensity

score quintile:

* PS Stratification;

* Relative Treatment Effect (Hazard Ratio): OS;

proc phreg data=strat02;

class tx (ref="BP") psquin;

model osmo*osevt(0) = tx psquin / rl;

run;

* Relative Treatment Effect (Hazard Ratio): PFS;



model pfsmo*pfsevt(0) = tx psquin / rl;

run;

* Relative Treatment Effect (Odds Ratio): ORR;

proc logistic data=strat02;

class tx (ref="BP") psquin / param=ref;

model orr (event = "1") = tx psquin / rl;

run;

31

Stratification: Relative Treatment Effect

• Model each outcome as a function of treatment and propensity

score quintile:

* PS Stratification;




model osmo*osevt(0) = tx psquin / rl;

run;




model pfsmo*pfsevt(0) = tx psquin / rl;

run;


proc logistic data=strat02;

class tx (ref="BP") psquin / param=ref;

model orr (event = "1") = tx psquin / rl;

run;

32

Matching

• Each RP patient is matched to a BP patient with similar propensity

score

• Various matching algorithms (we used 1:1 “greedy” matching)

• In our example:

– 960 RP patients matched to 960 BP patients = 1,920 patients total

– Additional 1,025 patients excluded

• Account for matched pairs in treatment effect.

33

Matching: Relative Treatment Effect

• Model each outcome adjusting for matched pairs:

* PS Matching Accounting for Clustering Within Matched Pairs;


proc phreg data=ad_matching covs(aggregate);

id matchid;

class tx (ref="BP");

model osmo*osevt(0) = tx / rl;

run;



id matchid;


model pfsmo*pfsevt(0) = tx / rl;

run;



proc genmod data=ad_matching desc;

class matchid;

model orr = tx / dist=bin link=logit;

repeated subject=matchid / type=un corrw covb;

estimate 'RP vs. BP' tx 1 /exp;

run;

34






id matchid;



run;



id matchid;



run;




class matchid;




run;

35






id matchid;



run;



id matchid;



run;




class matchid;




run;

36

IPTW

• Each patient weighted by the inverse of the probability of receiving

the treatment that he or she actually received:

– BP patients: WEIGHT = 1 / PS

– RP patients: WEIGHT = 1 / (1 – PS)

• Weights are often “stabilized” as follows:

– BP Patients: STWEIGHT = PBP / PS

– RP Patients: STWEIGHT = (1 – PBP) / (1 – PS)

37

IPTW: Relative Treatment Effect

• Perform the appropriate regression while adjusting for stabilized

weights:

* IPTW with Stabilized Weights;


proc phreg data=ad_iptw;



weight stweight / normalize;

run;






run;


proc logistic data=ad_iptw;

class tx (ref="BP") / param=ref;

model orr (event="1") = tx / rl;


run;

38

IPTW: Relative Treatment Effect

• Perform the appropriate regression while adjusting for stabilized

weights:

* IPTW with Stabilized Weights;






run;






run;


proc logistic data=ad_iptw;


model orr (event="1") = tx / rl;


run;

39

Covariate adjustment

• Most elementary approach

• Simply include propensity score as a continuous covariate in the

model.

* Covariate Adjustment of PS;

* Relative Treatment Effect: OS;



model osmo*osevt(0) = tx propensity / rl;

run;

* Relative Treatment Effect: PFS;



model pfsmo*pfsevt(0) = tx propensity / rl;

run;

* Relative Treatment Effect: ORR;

proc logistic data=ad_matching;


model orr (event = "1") = tx propensity / rl;

run;

40

Covariate adjustment

• Most elementary approach

• Simply include propensity score as a continuous covariate in the

model.

* Covariate Adjustment of PS;

* Relative Treatment Effect: OS;



model osmo*osevt(0) = tx propensity / rl;

run;

* Relative Treatment Effect: PFS;



model pfsmo*pfsevt(0) = tx propensity / rl;

run;

* Relative Treatment Effect: ORR;

proc logistic data=ad_matching;


model orr (event = "1") = tx propensity / rl;

run;

41

Results: OS

Covariate Adjustment

IPTW

Matching

Stratification

Multivariable Model

Crude (no adjustment)

0,125 0,25 0,5 1 2 4 8

Hazard Ratio

0.565 (0.478-0.667)

HR (95% CI)

0.771 (0.643-0.924)

0.823 (0.697-0.972)

0.792 (0.661-0.950)

0.799 (0.656-0.974)

0.765 (0.638-0.916)

Red Pill is Better Blue Pill is Better

42

Results: OS


IPTW

Matching

Stratification

Multivariable Model


0,125 0,25 0,5 1 2 4 8

Hazard Ratio

0.565 (0.478-0.667)

HR (95% CI)

0.771 (0.643-0.924)

0.823 (0.697-0.972)

0.792 (0.661-0.950)

0.799 (0.656-0.974)

0.765 (0.638-0.916)


43

Results: PFS


IPTW

Matching

Stratification

Multivariable Model


0,125 0,25 0,5 1 2 4 8

Hazard Ratio

0.594 (0.533-0.663)

HR (95% CI)

0.688 (0.609-0.777)

0.707 (0.632-0.791)

0.686 (0.607-0.775)

0.659 (0.575-0.756)

0.679 (0.603-0.766)


44

Results: ORR


IPTW

Matching

Stratification

Multivariable Model


0,125 0,25 0,5 1 2 4 8

Odds Ratio

3.84 (2.82-5.23)

OR (95% CI)

2.62 (1.88-3.66)

2.64 (1.94-3.59)

2.61 (1.87-3.63)

2.62 (1.81-3.75)

2.93 (2.10-4.10)

Blue Pill is Better Red Pill is Better

45

Conclusions

• Propensity scores are a powerful tool to deal with confounding

when comparing treatments in observational studies, especially

when there are few outcomes.

• Propensity score methods can adjust only for measured

confounding variables.

• In our simulation:

– Propensity score methods attenuated the crude observed treatment effect

across all outcomes (all measures of association closer to the null).

– Stratification, matching, and covariate adjustment yielded similar results

for OS and PFS, while IPTW yielded point estimates closer to the null.

– For ORR, similar odds ratios were obtained for all propensity score

methods.

– Multivariable model gave similar results to the propensity score results.

46

Conclusions

Choose the Red Pill!

47

Contact Information

Lawrence Rasouliyan

+34 933 624 297 [email protected]

Jaume Aguado


Estel Plana


considerations in the use of propensity scores in observational studies · 2016. 11. 1. ·...

Documents