statistical methods in clinical trials ii categorical data ziad taib biostatistics astrazeneca march...
TRANSCRIPT
Statistical Methods in Clinical Trials
II Categorical Data
Ziad Taib
Biostatistics
AstraZeneca
March 7, 2012
Types of Data
ContinuousBlood pressureTime to event
Ordered CategoricalPain level
DiscreteNo of relapses
Categoricalsex
quantitative qualitative
Types of data analysis (Inference)
ParametricVs
Non parametric
FrequentistVs
Bayesian
Model basedVs
Data driven
Inference problems
1. Binary data (proportions)• One sample• Paired data
2. Ordered categorical data
3. Combining categorical data
4. Logistic regression
5. A Bayesian alternative
Categorical data
In a RCT, endpoints and surrogate endpoints can be categorical or ordered categorical variables. In the simplest cases we have binary responses (e.g. responders non-responders). In Outcomes research it is common to use many ordered categories (no improvement, moderate improvement, high improvement).
Bernoulli experiment
Randomexperience
Failure0
Success1
Hole in one?
With probability p
With probability1-p
Binary variables
• Sex
• Mortality
• Presence/absence of an AE
• Responder/non-responder according to some pre-defined criteria
• Success/Failure
Estimation• Assume that a treatment has been applied to n
patients and that at the end of the trial they were classified according to how they responded to the treatment: 0 meaning not cured and 1 meaning cured. The data at hand is thus a sample of n independent binary variables
• The probability of being cured by this treatment can be estimated by
satisfying
Hypothesis testing
• We can test the null hypothesis
• Using the test statistic
• When n is large, Z follows, under the null hypothesis, the standard normal distribution (obs! Not when p very small or very large).
Hypothesis testing
• For moderate values of n we can use the exact Bernoulli distribution of leading to the sum being Binomially distributed i.e.
• As with continuous variables, tests can be used to build confidence intervals.
Example 1: Hypothesis test based on binomial distr.
Consider testing H0: P=0.5
against Ha: P>0.5
and where: n=10 and y=number of successes=8
p-value=(probability of obtaining a result at least as extreme as the one observed)=Prob(8 or more responders)=P8+ P9+ P10=={using the binomial formula}=0.0547
Example 2
RCT of two analgesic drugs A and B given in a random order to each of 100 patients. After both treatment periods, each patient states a preference for one of the drugs.
Result: 65 patients preferred A and 35 B
Example (cont’d)
Hypotheses: H0: P=0.5 against H1: P0.5
Observed test-statistic: z=2.90
p-value: p=0.0037
(exact p-value using the binomial distr. = 0.0035)
95% CI for P: (0.56 ; 0.74)
Example 3 We want to test if the proportion of patients
experiencing an early improvement after some treatment is 0.35. n=312 patients were observed among which 147 experienced such an improvement yielding a proportion of (47.1%). The Z value is 4.3 yielding a p-value of 0.00002. Using the exact distribution 0.00001. Of course n here is large so the normal approximation is good enough. A 95% confidence interval for the proportion is [4.1, 5.2] and does not contain the point 0.35.
Two proportions
• Sometimes we want to compare the proportion of successes in two separate groups. For this purpose we take two samples of sizes n1 and n2. We let yi1 and pi1 be the observed number of subjects and the proportion of successes in the ith group. The difference in population proportions of successes and its large sample variance can be estimated by
Two proportions (continued)
• Assume we want to test the null hypothesis that there is no difference between the proportions of success in the two groups. Under the null hypothesis, we can estimate the common proportion by
• Its large sample variance is estimated by
Example 4
NINDS trial in acute ischemic stroke
Treatment n responders*rt-PA 312 147 (47.1%)placebo 312 122 (39.1%)
*early improvement defined on a neurological scale
Point estimate: 0.080 (s.e.=0.0397)
95% CI: (0.003 ; 0.158)
p-value: 0.043
Two proportions (Chi square)• The problem of comparing two proportions
can sometimes be formulated as a problem of independence! Assume we have two groups as above (treatment and placebo). Assume further that the subjects were randomized to these groups. We can then test for independence between belonging to a certain group and the clinical endpoint (success or failure). The data can be organized in the form of a contingency table in which the marginal totals and the total number of subjects are considered as fixed.
Failure Success Total
Drug Y10 Y11 Y1.
Placebo Y20 Y21 Y2.
Total Y.0 Y.1 N=Y..
R E S P O N S E
TREATMENT
2 x 2 Contingency table
Failure Success Total
Drug 165 147 312
Placebo 190 122 312
Total 355 462 N=624
R E S P O N S E
TREATMENT
2 x 2 Contingency table
Hyper geometric distribution
Urn containing W white balls and R red balls: N=W+R
•n balls are drawn at random without replacement.
•Y is the number of white balls (successes)
•Y follows the Hyper geometric Distribution with parameters (N, W, n)
Contingency tables
• N subjects in total
• y.1 of these are special (success)
• y1. are drawn at random
• Y11 no of successes among these y1.
• Y11 is HG(N,y.1,y 1.)
in general
Contingency tables
• The null hypothesis of independence is tested using the chi square statistic
• Which, under the null hypothesis, is chi square distributed with one degree of freedom provided the sample sizes in the two groups are large (over 30) and the expected frequency in each cell is non negligible (over 5)
Contingency tables• For moderate sample sizes we use Fisher’s exact
test. According to this calculate the desired probabilities using the exact Hyper-geometric distribution. The variance can then be calculated. To illustrate consider:
• Using this and expectation m11 we have the randomization chi square statistic. With fixed margins only one cell is allowed to vary. Randomization is crucial for this approach.
The (Pearson) Chi-square test
35 contingency table
The Chi-square test is used for testing the independence
between the two factors
Other factor
A B C D E
i niA niB niC niD niE ni
One Factor ii niiA niiB niiC niiD niiE nii
iii niiiA niiiB niiiC niiiD niiiE niii
nA nB nC nD nE niA
The (Pearson) Chi-square test
The test-statistic is:
i j
ij
2ijij2
E
)E(O
where Oij = observed frequencies
and Eij = expected frequencies (under independence)
the test-statistic approximately follows a chi-square distribution
p
Example 5Chi-square test for a 22 table
Examining the independence between two treatments and a classification into responder/non-responder is equivalent to comparing the proportion of responders in the two groups
NINDS again non-resp responder
rt-PA 165 147 312
placebo 190 122 312
355 269Observed frequencies
non-resp responder
rt-PA 177.5 134.5 312
placebo 177.5 134.5 312
355 269
Expected frequencies
• p0=(122+147)/(624)=0.43
• v(p0)=0.00157
which gives a p-value of 0.043 in all these cases. This implies the drug is better than placebo. However when using Fisher’s exact test or using a continuity correction the chi square test the p-value is 0.052.
TABLE OF GRP BY Y
Frequency‚ Row Pct ‚nonresp ‚resp ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ placebo ‚ 190 ‚ 122 ‚ 312 ‚ 60.90 ‚ 39.10 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ rt-PA ‚ 165 ‚ 147 ‚ 312 ‚ 52.88 ‚ 47.12 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 355 269 624
STATISTICS FOR TABLE OF GRP BY Y
Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 4.084 0.043 Likelihood Ratio Chi-Square 1 4.089 0.043 Continuity Adj. Chi-Square 1 3.764 0.052 Mantel-Haenszel Chi-Square 1 4.077 0.043 Fisher's Exact Test (Left) 0.982 (Right) 0.026 (2-Tail) 0.052 Phi Coefficient 0.081 Contingency Coefficient 0.081 Cramer's V 0.081
Sample Size = 624
SAS| output
Odds, Odds Ratios and relative Risks
The odds of success in group i is estimated by
The odds ratio of success between the two groups i is estimated by
Define risk for success in the ith group as the proportion of cases with success. The relative risk between the two groups is estimated by
Categorical data• Nominal
– E.g. patient residence at end of follow-up (hospital, nursing home, own home, etc.)
• Ordinal (ordered)– E.g. some global rating
• Normal, not at all ill• Borderline mentally ill• Mildly ill
• Moderately ill
• Markedly ill
• Severely ill
• Among the most extremely ill patients
Categorical data & Chi-square testOther factor
A B C D E
i niA niB niC niD niE ni
One Factor ii niiA niiB niiC niiD niiE nii
iii niiiA niiiB niiiC niiiD niiiE niii
nA nB nC nD nE niA
The chi-square test is useful for detection of a general association between treatment and categorical response (in either the nominal or ordinal scale), but it cannot identify a particular relationship, e.g. a location shift.
Nominal categorical data Disease category
dip snip fup bop other
treatment A 33 15 34 26 8 116
group B 28 18 34 20 14 114
61 33 68 46 22 230
Chi-square test: 2 = 3.084 , df=4 , p = 0.544
Ordered categorical data• Here we assume two groups one receiving the
drug and one placebo. The response is assumed to be ordered categorical with J categories.
• The null hypothesis is that the distribution of subjects in response categories is the same for both groups.
• Again the randomization and the HG distribution lead to the same chi square test statistic but this time with (J-1) df. Moreover the same relationship exists between the two versions of the chi square statistic.
The Mantel-Haensel statistic The aim here is to combine data from
several (H) strata for comparing two groups drug and placebo. The expected frequency and the variance for each stratum are used to define the Mantel-Haensel statistic
which is chi square distributed with one df.
• Consider again the Bernoulli situation, where Y is a binary r.v. (success or failure) with p being the success probability. Sometimes Y can depend on some other factors or covariates. Since Y is binary we cannot use usual regression.
Logistic regression
Logistic regression
• Logistic regression is part of a category of statistical models called generalized linear models (GLM). This broad class of models includes ordinary regression and ANOVA, as well as multivariate statistics such as ANCOVA and loglinear regression. An excellent treatment of generalized linear models is presented in Agresti (1996).
• Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure.
Simple linear regression
Age SBP Age SBP Age SBP
22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217
Table 1 Age and systolic blood pressure (SBP) among 33 adult women
80
100
120
140
160
180
200
220
20 30 40 50 60 70 80 90
SBP (mm Hg)
Age (years)
adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
Simple linear regression
• Relation between 2 continuous variables (SBP and age)
• Regression coefficient 1– Measures association between y and x– Amount by which y changes on average when x changes
by one unit– Least squares method
y
x
xβαy 11Slope
Multiple linear regression
• Relation between a continuous variable and a set of i continuous variables
• Partial regression coefficients i
– Amount by which y changes on average when xi changes by one unit and all the other xis remain constant
– Measures association between xi and y adjusted for all other xi
• Example– SBP versus age, weight, height, etc
xβ ... xβ xβαy ii2211
Multiple linear regression
Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Dependent Independent variables
xβ ... xβ xβα y ii2211
Logistic regressionTable 2 Age and signs of coronary heart disease (CD)
How can we analyse these data?
• Compare mean age of diseased and non-diseased
– Non-diseased: 38.6 years– Diseased: 58.7 years (p<0.0001)
• Linear regression?
Dot-plot: Data from Table 2
Logistic regression (2)
Table 3 Prevalence (%) of signs of CD according to age group
Dot-plot: Data from Table 3
0
20
40
60
80
100
0 2 4 6 8
Diseased %
Age group
Logistic function (1)
0.0
0.2
0.4
0.6
0.8
1.0
Probability of disease
x
Transformation
)(
)(
xyP
xyP
1
logit of P(y|x)
{
0.0
0.2
0.4
0.6
0.8
1.0
Fitting equation to the data
• Linear regression: Least squares or Maximum likelihood
• Logistic regression: Maximum likelihood
• Likelihood function– Estimates parameters and – Practically easier to work with log-likelihood
n
iiiii xyxylL
1
)(1ln)1()(ln)(ln)(
Maximum likelihood
• Iterative computing (Newton-Raphson)– Choice of an arbitrary value for the
coefficients (usually 0)– Computing of log-likelihood– Variation of coefficients’ values– Reiteration until maximisation (plateau)
• Results– Maximum Likelihood Estimates (MLE) for
and – Estimates of P(y) for a given value of x
Multiple logistic regression
• More than one independent variable– Dichotomous, ordinal, nominal, continuous …
• Interpretation of i – Increase in log-odds for a one unit increase in xi
with all the other xis constant– Measures association between xi and log-odds
adjusted for all other xi
ii2211 xβ ... xβ xβαP-1
P ln
Statistical testing
• Question– Does model including given independent
variable provide more information about dependent variable than model without this variable?
• Three tests– Likelihood ratio statistic (LRS)– Wald test– Score test
Likelihood ratio statistic
• Compares two nested models Log(odds) = + 1x1 + 2x2 + 3x3 (model 1)
Log(odds) = + 1x1 + 2x2 (model 2)
• LR statistic-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood model 1)
LR statistic is a 2 with DF = number of extra parameters in model
Example 6
Fitting a Logistic regression model to the NINDS data, using only one covariate (treatment group).
NINDS again non-resp responder
rt-PA 165 147 312
placebo 190 122 312
355 269
Observed frequencies
SAS| output
The LOGISTIC Procedure Response Profile Ordered Binary Value Outcome Count 1 EVENT 269 2 NO EVENT 355 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 855.157 853.069 . SC 859.593 861.941 . -2 LOG L 853.157 849.069 4.089 with 1 DF (p=0.0432) Score . . 4.084 with 1 DF (p=0.0433) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -0.4430 0.1160 14.5805 0.0001 . . GRP 1 0.3275 0.1622 4.0743 0.0435 0.090350 1.387
Logistic regression example
• AZ trial (CLASS) in acute stroke comparing clomethiazole (n=678) with placebo (n=675)
• Response defined as a Barthel Index score 60 at 90 days
• Covariates:– STRATUM (time to start of trmt: 0-6, 6-12)– AGE– SEVERITY (baseline SSS score)– TRT (treatment group)
SAS| output
Response Profile
Ordered Value BI_60 Count
1 1 750 2 0 603
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 2.4244 0.5116 22.4603 0.0001 TRT 1 0.1299 0.1310 0.9838 0.3213 0.035826 1.139 STRATUM 1 0.1079 0.1323 0.6648 0.4149 0.029751 1.114 AGE 1 -0.0673 0.00671 100.6676 0.0001 -0.409641 0.935 SEVERITY 1 0.0942 0.00642 215.0990 0.0001 0.621293 1.099
Conditional Odds Ratios and 95% Confidence Intervals
Wald Confidence Limits Odds Variable Unit Ratio Lower Upper
TRT 1.0000 1.139 0.881 1.472 STRATUM 1.0000 1.114 0.859 1.444 AGE 1.0000 0.935 0.923 0.947 SEVERITY 1.0000 1.099 1.085 1.113
A Bayesian alternative
•Prior knowledge is part of the Bayesian
approach.
•Prior knowledge matters
Case-Control
• Imagine a randomised clinical trial or a case control study. The analysis uses a chi square test and the corresponding p-values. If this turns out to be less than 0.05 we assume significance.
Example 7:
Some studies from the year 1990 suggested that the risk to CHD is associated with childhood poverty. Since infection with the bacterium H. Pylori is also linked to poverty, some researchers suspected H. Pylori to be the missing link. In a case control study where levels of infections were considered in patients and controls the following results were obtained.
Case/Control
Case CHD Control
High 60% 39% n11+n12
Low 40% 61% n21+n22
n11+n21 n12+n22
1
0
00 BF] H P[
] H P[-11 D] | H P[
]H | P[D
]H | P[D BF
1
0
where
The chi square statistic having, in this case, the value 4.37 yields a p-value of 0.03 which is less than the formal level of significance 0.05.
There is, however, no theoretical reason to believe that this result is true. So we take again P(H0)=0.5. This leads to
1BF
BF
BF
1BF
BF21
21
1 D] | P[H1
1
0
Berger and Selke (1987) have shown that for a
very wide range of cases including the case
control case
Using the value 4.73 for the chi square
variable leads to a BF value of at least 0.337
(M. A. Mendall et al Relation betweenH. Pylori infection
and coronary heart disease. Heart J. (1994)).
2
1
2
2
BF
e
Conclusion
252.01337.0
0.337 D] | P[H0
Taking another (more or less sceptical)attitude does not change a the conclusion that much:
P(H0)=0.75 => P[ H0| D] > (0.5) P(H0)=0.25 => P[ H0| D] > (0.1)
Questions or Comments?