biostatistics iii - university of adelaide · biostatistics iii 2 the design and analysis of...

THE UNIVERSITY OF ADELAIDE

Biostatistics III

Lecture Notes

Associate Professor Patty Solomon

School of Mathematical Sciences

Semester 2, 2007

Contents

1 Introduction 1

1.1 What is epidemiology? . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 What are clinical trials? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 The design and analysis of clinical trials 8

2.1 Phases of trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Key aspects of trial design . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Methods of randomization . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Simple (or complete) randomization . . . . . . . . . . . . . . . 11

2.3.2 Restricted randomization . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Biased coin designs (BCD) . . . . . . . . . . . . . . . . . . . . . 14

2.3.4 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.5 Stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.6 Randomization tests . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.7 Randomized consent designs . . . . . . . . . . . . . . . . . . . 23

2.4 Trial size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.2 Fixed trial size (non-sequential analysis) . . . . . . . . . . . . 26

2.4.3 Sequential Trials . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5 Crossover trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.5.2 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6 Equivalence trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Epidemiology and observational studies 59

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.2 Cohort Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.3 Case-control Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.4 Other designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.5 Binary responses and case-control studies . . . . . . . . . . . . . . . . 64

3.6 Estimation and inference for measures of association . . . . . . . . . 67

3.6.1 Finding the approximate variance in a cohort study . . . . . . 69

3.7 Attributable risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.7.1 Estimation of AR . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Inference for the 2x2 table 77

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Wald tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3 Likelihood Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.1 Profile Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3.2 Conditional Inference . . . . . . . . . . . . . . . . . . . . . . . 85

5 Tests based on the likelihood 90

5.1 Wald test statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.2 Likelihood ratio test statistic . . . . . . . . . . . . . . . . . . . . . . . . 91

5.3 Score test statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Biostatistics III

Biostatistics III Course Coverage

• Design and analysis of clinical trials

• Statistical epidemiology

1 Introduction

1.1 What is epidemiology?

• There is no standard definition: but broadly, it is the study of death anddiseases in human populations.

• Problem: epidemiology is not often experimental, and this leads to prob-lems in statistical analysis and interpretation.

• →we can establish association, but not causation.

A common epidemiological question: is a particular disease or illness associatedwith age, sex, . . ., or lifestyle factors, life experiences, or environmental factors, . . .?For example:

• do mobile phones cause brain tumours?

• will human consumption of genetically modified crops lead to cancerlater in life?

• which breast cancers are inherited (i.e., a case of nature versus nurture)?

An early example: Snow’s map of the London Cholera epidemic, 1854.

The greatest achievement of statistical epidemiology: was establishing link betweensmoking and lung cancer (before the biological link was observed).

Epidemiology encompasses:

• chronic disease epidemiology

• infectious disease epidemiology

• genetic epidemiology

c©School of Mathematical Sciences, University of Adelaide 1

Biostatistics III

• environmental epidemiology

• occupational epidemiology

• disease surveillance . . .

and so on.

Examples of chronic diseases: asthma, heart disease, cancer

• Do radioactive particles cause childhood leukemia? New Scientist, 2004,19/7

• Are there long term effects of eating GM crops? New Scientist, 2004, 26/7

• Does traffic pollution cause asthma?

• Will wearing ties make you go blind?

Examples of infectious diseases: measles, malaria, meningitis, SARS, influenza,HIV/AIDS

• Which MMR (measles, mumps, rubella) vaccination strategies are opti-mal?

• Are mosquito nets or insecticides more effective at preventing malaria?

• How great is the threat of bioterrorism? anthrax, small pox

And diseases are global: if you catch cold in Africa, the first sneeze may be backin Adelaide!

HIV/AIDS remains one of the biggest threats:

• globally, it is one of the top five causes of death

• the main burden falls on developing nations

• in Swaziland, Botswana:

- the infection rate is 40%

- life expectancy is 38 years

- 10% of households are headed by children


Biostatistics III

HIV/AIDS disease progression

HIV infection↓

seroconversion↓

antibodiesdetectable

. . .

AIDS↓

diagnosis

︸︷︷︸death↓

incubation period

Incubation period for AIDS:

• median ∼ 10 years + increasing

• long and variable

• treatment effects, AZT, HAART

[See Assignment 1 and AAO video # 26.]

1.2 What are clinical trials?

Clinical trials are designed medical experiments. They have a long history (seehandout article from Encyclopedia of Biostatistics, 1998), although modern clinical tri-als date from the 20th century.

Although not without problems and controversies of their own, clinical trials avoidthe difficulties associated with statistical epidemiology.

Examples:

• Would prescription heroin prevent long-term drug use? (People ran-domized to methadone only arm likely to drop out.)

• Does tamoxifen prevent primary breast cancer in women?

The key step is randomization: the use of chance to allocate patients to treatments.

The idea is: patients differ only by accidents of randomization, or the treatment theyreceive.

Clinical trials enable us to establish causality.

The gold standard is a:

• randomized


Biostatistics III

• controlled

• double-blind (or single- or triple-)

clinical trial.

Example: Early AZT trial (AAO video #26).Randomized: patients randomized to zidovudine or placebo.Controlled: placebo group provided baseline for comparison.Blind: neither patient nor doctor knew which treatment group; analyst also blinded(triple-blind).

What is the purpose of these features?

Note though, that the ‘gold standard’ is not always attainable.

1.3 Randomization

The first randomized experiments were in agriculture, in which the experimentalunits were plots of land, and the treatments were crops or fertilizers.

The pioneering statistical work was by R.A. Fisher in 1920’s in agricultural experi-ments.

An important difference: patient entry into clinical trials is ‘staggered’, often overmany years, and the data usually accumulate gradually. This affects both the con-duct and analysis of the trial. (If Fisher had worked in clinical trials, we may specu-late that modern trial designs would have evolved 80 or more years ago!)

Illustration of randomization:

Suppose the effects of two treatments, A and B, on lowering blood pressure are tobe compared; the response Y is continuous.

Suppose eight patients are available for the study. How should we allocate fourpatients to treatment A, and four to treatment B?

(1) Suppose the first four are given A, the next four, B

AAAABBBB

This is called the randomization list.

How could this allocation lead to confounding of treatment effects?


Biostatistics III

(2) Try alternating A and B:

ABABABAB

But this also runs risk of confounding (and potential selection bias). How?

We need an objective method of allocating treatments to patients.

(3) Best to use randomization, which means choosing an allocation at ran-

dom, such that each of the(

8

4

)possible arrangements are equally likely.

Randomization often enables us to obtain unbiased estimates of treatment differ-ences even in the presence of unsuspected systematic variation.

To see how randomization works, consider the following.

Our assumed model is

Yij = αi + εij

i = A,B indicates treatmentj = 1, . . . , 4 patient within treatment

αi treatment effectsεij measurement errors

i.i.d. zero meanVar(εij) = σ2

Yij response of patient j receiving treatment i

We want the treatment difference, so the ‘target quantity’ is αA − αB.

So the natural estimator isYA. − YB.,

where

YA. =1

4

4∑j=1

YAj, YB. =1

4

4∑j=1

YBj.

However, the true model isYij = αi + γij + εij,


Biostatistics III

where γij is a ‘patient effect’ representing (unknown) systematic variation, e.g., dis-ease state at randomization.

We can demonstrate that under randomization, these effects average out. To do this,we need to study the statistical properties of YA. − YB. under the true model.

Now,YA. − YB. = αA − αB + (γA. − γB.) + (εA. − εB.)

Thus, for any given (i.e., fixed) treatment allocation, the only variation is measure-ment error, so that

E(YA. − YB.) = αA − αB + (γA. − γB.)︸︷︷︸nuisance component

since E(εA.) = E(εB.) = 0.

That is, for a given treatment allocation, YA. − YB. is a biased estimator of true treat-ment difference.

We now take expectations ofE(YA.−YB.) over the randomization distribution, whichattaches probability 1/

(84

)to every possible treatment allocation (i.e., every possible

sequence of A’s and B’s.

Intuitively, this implies that any four of the γij are equally likely to be in the sametreatment group, i.e.,

ER(γA.) = ER(γB.)

by symmetry, where ER denotes expectation with respect to the randomization dis-tribution.

This implies thatER(γA. − γB.) = 0,

so we obtainER{E(YA. − YB.|R)} = αA − αB,

and known or unknown patient effects average out under randomization.

Remark 1: We can show that the usual estimate of standard error is approximatelyunbiased too.

In the usual situation in which γij = 0, we know that

Var(YA. − YB.) = σ2

(1

nA+

1

nB

)and we estimate σ2 by

s2p =

(nA − 1)s2A + (nB − 1)s2

B

nA + nB − 2.


Biostatistics III

Here,

s2p =

1

6

{4∑j=1

(YAj − YA.)2 +4∑j=1

(YBj − YB.)2

},

and we can show (but won’t) that s2p(1/4 + 1/4) is an approximately unbiased esti-

mator of Var(YA. − YB.).

Remark 2: Randomization forms the basis of important classes of testing proceduresknown as randomization and permutation tests.

In summary: randomization

• protects against confounding variables and avoids bias (including selec-tion bias)

• provides the basis for formal inference

• facilitates the use of blinding (or masking)

• facilitates the use of a control group.

See handout from the British Medical Journal, ‘Why randomize’ for reasons againstnon-random assignment, such as systematic allocation, or historical controls.

Example: Systematic allocation

e.g. odd birthday → Aeven birthday → B

invites selection bias. The physician will be able to determine in advance whethera potential patient will receive a treatment or control, and may use this informationto exclude them from the study.

Example: Historical controls:

e.g. current patients get treatment,use pre-trial patients as controls

Problems? Don’t know that observed differences are due to treatments. There maybe temporal trends, changes in the definition of disease, improved diagnostic pro-cedures, and so on, which argue against this sort of assignment.


Biostatistics III

2 The design and analysis of clinical trials

Key references: See ‘Notes for students’ for a number of important textbooks onclinical trials; see especially Pocock’s book, Piantodosi’s book, and Armitage, Berryand Matthews; see also the Encyclopedia of Biostatistics overview article on clinicaltrials (handout).

2.1 Phases of trials

Within the pharmaceutical industry and more broadly, clinical trials are classifiedinto four types, each of which has well-defined objectives, as follows:

Phase I: trials are exploratory and concerned with aspects of clinical pharmacologyand toxicity. An objective is to find a suitable dose level that avoids unacceptableadverse side effects. Usually the study size consists of between 20 and 80 healthyvolunteers, often pharmaceutical company employees or medical students.

Phase II: trials are pilot studies representing the initial clinical investigation. Theyare of moderate size involving 100 to 300 diseased patients, and are concerned withevaluating efficacy and safety aspects of the drug or treatment.

∗Phase III: trials are the definitive, full-scale evaluation of the new treatment, inwhich effectiveness if verified, and the presence of any long-term adverse effectsmonitored. Patients are randomly assigned to the treatment or current standard (orplacebo). These trials are typically large, often involve more than 1000 patients, andcan last 3 to 5 years or longer, depending on recruitment rates and necessary follow-up time. In this phase, the statistical design and analysis come under most attentionand scrutiny. These trials usually represent the final stage of testing which leads tothe request to market the drug or treatment.

∗Phase IV: trials refer to further testing and monitoring of experience with the newtreatment after it has been accepted and approved for general use. This is some-times referred to as ‘post-marketing surveillance’, and these trials tend to be large,population-based studies.

Notes:

• The above categorization is not a strict one; the purpose of a trial canoverlap the boundaries of these phases, especially II and III.

• The nomenclature was originally introduced for therapeutic trials, but isnow used more widely, especially in disease prevention trials.

• There is an on-going ethical debate about if and when to randomize, and


Biostatistics III

for how long. The best basic principle: start randomizing early, althoughnot necessarily at the beginning of drug testing, and continue to random-ize for as long as legitimate uncertainty exists surrounding the safety andefficacy of the therapy, and about the best treatment for the patient.

2.2 Key aspects of trial design

‘Design’ encompasses all the structural aspects of the trial. This is an extremelyimportant aspect of clinical trials - design flaws, such as the trial being too small, orthe failure to record a key variable, cannot be corrected at the analysis stage.

The main features are:

• The study population; which must be well-defined, and eligibility set outin the study protocol.

• The treatments to be evaluated, especially the choice of control group.

• The sample size, i.e., how many patients? Calculations usually use powerarguments, and depends on the type of trial (fixed, sequential). Theremay be a stopping rule, or some other more flexible design; there may beinterim analyses. (N.B.: prior sample size calculations are only a guideto an order of magnitude; there may well be logistical, financial or otherconstraints (a rare disease for example) which need to be considered.)

• Method of randomization (i.e., method of treatment allocation), includ-ing methods of protecting the randomization code from being broken.

• Procedures for blinding and monitoring compliance.

• Type of trial. A trial can be one or more of the following. The simplest isthe two-group

- parallel group triale.g. A or B

- crossover trial (2 treatments, 2 periods)

2× 2

G1 : A → B

G2 : B → A

- factorial trial


Biostatistics III

- equivalence trial: these trials are designed to show that a newtreatment’s efficacy or safety is ‘the same as’ or ‘at least noworse than’ that of a standard treatment. This is different tothe usual superiority trials which seek to find improved or su-perior therapies.

- sequential trial, as opposed to a fixed size trial. Sequential orgroup-sequential trials enable stopping the trial when enoughevidence has accrued.

- other trial types, e.g., cluster-randomized trials.

• outcome measures, which we use to assess treatment efficacy

- disease incidence

- death rate

- survival time (‘time to event’ data)

- alleviation of symptoms

- . . .

A trial will usually have

• a detailed study protocol covering all aspects of the study, especially pro-cedures for individual patient eligibility, and

• an operations manual, which specifies how the study is to be conducted.

The Data Safety Monitoring Board (DSMB) plays a key role in monitoring all aspectsof the design, conduct, analysis and interpretation of the trial. See “Data monitoringcommittees in clinical trials” by Ellenberg, Fleming and DeMets, Wiley, 2002.

2.3 Methods of randomization

The aim is to generate a randomization list, then to allocate patients to treatmentaccording to the list.

The list can be generated in advance and kept in the trial coordination centre forexample, or the allocation can be generated dynamically as the patients enrol.

Historically, trialists used tables of random numbers. However, tables can be cumber-some for all but the simplest trials, and if staff or doctors have access to the tables,they may be able to find the sequence in use and predict the next assignment.


Biostatistics III

Nowadays, we typically use computer assignment by phone, fax, or encrypted codeon-line. Ideally, list should be verifiable.

In Australia, the largest and best-known trial centre is the NHMRC Clinical TrialsCentre, Sydney University, www.ctc.usyd.edu.au

2.3.1 Simple (or complete) randomization

Illustration: we could assign patients to treatments A and B by tossing a coin

Heads → A

Tails → B

• this tends to be time consuming, impractical for large trials, and

• cannot be checked.

In practice: we use random-number generators, or tables for our purposes of prac-tice and illustration. To perform simple randomization, we apply a suitable rule tothe stream of random numbers.

Example: Table 5.2 from Pocock (handout)

• randomly choose a starting point in the table,

• obtain a ‘stream’ of random numbers by working across rows (or downcolumns) thereafter,

• apply a suitable rule to the stream of numbers

For example, for 2 treatments A, B(p =

1

2for A or B

)0− 4 → A

5− 9 → B

Start at the top left hand corner of Table 5.2:

0 5 2 7 8 4 3 7 4 . . .A B A B B A A B A . . .

Randomization list

Keep going to 20 patients, to get:

8 on A

12 on B


Biostatistics III

This is unbalanced, but it could be worse. For example, the probability of obtaining4 on one treatment and 16 on the other, or worse, is ' 0.0118. [See Tutorial 2.]

The idea is: that the treatment assignment is unpredictable, and in the long run, thesizes of the groups are roughly comparable (i.e., balanced).

Example: assume 3 treatments A, B, C, to be allocated to patients with equal prob-ability. A suitable assignment rule is

1− 3 → A

4− 6 → B

7− 9 → C

0 ignore

Exercise: generate a randomization list for 15 patients.

To summarize:

Want: treatment assignment to be unpredictable.

Want: overall balance. Simple randomization is OK for > 200 patients and twotreatments, in that the chance of severe imbalance is negligible.

However, we also want reasonable balance

• at any time

• and within subgroups.

But simple randomization may not be adequate for these requirements. Althoughwe will achieve balance in the long-run with simple randomization, there may be oc-casional long sequences of one treatment (which induces an unwanted homogeneityamongst all patients recruited at that time).

2.3.2 Restricted randomization

Restricted randomization refers to schemes with enhanced balancing properties.

Random permuted blocks: (RPB)

• this is the easiest method

• guarantees equal numbers in each group after every block of r patients


Biostatistics III

• it is the most widely used method of restricted randomization.

[The Altman and Gore handout from the BMJ provides excellent background read-ing.]

The scheme:

• if there are t treatments, choose k = number of replicates of each treat-ment per block, and take the block size to be r = kt

• for each block of size r = kt, choose a random permutation of treatmentsin which each treatment is replicated k times

• concatenate the blocks to form the randomization list.

Example: 2 treatments A, B; choose k = 1.

In this case, there are two possible blocks: AB BA

Using random digits 0− 9 fromTable 5.2, we can construct the randomization list asfollows:

0− 4 → AB

5− 9 → BA

Thus, the sequence/stream is

0 5 2 7 8 4 . . .AB BA AB BA BA AB . . .

Randomization list

That is, after every second patient, there are equal numbers of patients on each treat-ment. This gives tight control over balance, but is predictable.

There is less predictability with k = 2; then r = 2 × 2 = 4, and there are(

4

2

)= 6

possible arrangements of block length four:

AABB BBAA ABABBABA ABBA BAAB

Increasing k will reduce predictability, and reduce balance.


Biostatistics III

A good way to reduce predictability further, without compromising balance, is torandomly choose a different k for each block, i.e., randomly vary block length.

Example: use Table 5.4 from Pocock to assign 3 treatments in blocks of 15 patients.

Assign digits

1− 5 → A

6− 10 → B

11− 15 → C

0, 16− 19 ignore

Block 1

11 19 15 5 9 0 6 13 7 2 . . .C − C A B − B C B A . . .

Block 2

14 12 0 1 19 8 7 17 11 . . . . . .C C − A − B B − C . . .

and so on.

2.3.3 Biased coin designs (BCD)

• This is a method of dynamic allocation.

• Like random permuted blocks, the biased coin method controls balancein the randomization list, but is less predictable, and therefore less liableto selection bias.

• Like RPB’s, BCDs avoid severe imbalance in small trials and within sub-groups.

The idea: is to compromise between a perfectly balanced experiment and the ad-vantages of complete (i.e., simple) randomization.

The process is as follows: assume we have two treatments

T active treatmentC control

Suppose that n patients have currently been allocated to treatment with


Biostatistics III

↙ ↘Tn on T Cn on C

Let Dn = Tn − Cn, and allocate the (n+ 1)th patient as follows:

If Dn < 0→{TC

with probability pwith probability q = 1− p

where p > 1/2

If Dn = 0→{TC

with probability p = 1/2with probability 1/2

If Dn > 0→{TC

with probability qwith probability p

where p+ q = 1.

The assignment rule balances the number of T ’s and C’s by observing which grouphas fewer patients so far; that group then has probability greater than 1/2 of beingassigned.

Clearly, we must have p > 1/2; p is called the bias and we write BCD(p).

Typical values for p are:

p =3

5

which is adequate for large trials(n > 100)

p =2

3is useful for small trials

p =3

4

maintains strict control overbalance, but is predictable

Example: allocate 20 patients to T or C using a BCD(

3

5

)and Table 5.2.

LetDn = Tn − Cn


Biostatistics III

Scheme:

Dn < 0 0− 5 → T p = 0.66− 9 → C p = 0.4

Dn = 0 0− 4 → T p = 0.55− 9 → C p = 0.5

Dn > 0 0− 5 → C p = 0.66− 9 → T p = 0.4

Using the 15th row: Table 5.2:

0 2 2 7 2 4 6 . . .↑ ↑ ↑T C T T C C C . . .

[Finish this as an exercise.]

The balancing properties of BCD:

Let Xn = |Dn|, that is, the absolute difference between the number of T ’s and C’safter n allocations.

Then Xn forms a Markov chain with states {0, 1, 2, . . . }.

Beginning at X0 = 0 with probability 1, the transition probabilities are

P (Xn+1 = x− 1|Xn = x) = p for x = 1, 2, . . .

P (Xn+1 = x+ 1|Xn = x) = q

P (Xn+1 = 1|Xn = 0) = 1

These probabilities describe the evolution of the chain, which is called a randomwalk with a reflecting barrier at the origin.

• Based on the stationary distribution, one can show that for p = 3/5, thereis a 1/20 chance of the imbalance being ≥ 10.

• For p = 3/4, the corresponding imbalance is 4.

• One can set pre-defined limits for imbalance.


Biostatistics III

• An extension: use simple randomization until the limit is exceeded, thenintroduce a biased coin allocation to correct the imbalance.

We often want to achieve balance within subgroups defined by important factors,such as age, sex, time in remission from leukaemia, . . ., and so on.

Two strategies for achieving such balance are:

• minimization, and

• stratification

2.3.4 Minimization

• Minimization is a dynamic method of restricted randomization, and anan extension of BCDs.

• Aims to minimize the imbalance in the numbers of patients allocated toT and C over a factor or factors known to affect prognosis, includingcentre or hospital in a multi-centre trial.

• Minimization does not directly address imbalance within subgroups de-fined by combining several factors simultaneously to form strata: seestratification. Minimization addresses imbalance in a ‘marginal’ way.

It works as follows:

For a new patient, identify their levels of several important prognostic factors, forexample, sex, age at diagnosis, clinical stage of disease at diagnosis, and so on. Callthese categories.

For the ith category, observe that the new patient’s level already has Ti patients on Tand Ci patients on C.

Define a discrepancy score, Si, (this is also called a balancing function):

Typically, we use one of

S1i = Ci − Ti range

S2i =Ci − Ti

Ci + Ti + 1

S3i =

1 if Ci > Ti0 Ci = Ti−1 Ci < Ti


Biostatistics III

The total discrepancy score is then

S =∑i

wiSji or S =∑i

Sji j = 1, 2, 3

where wi is the weight attached to each factor or category; wi is large if the ith factoris important, and small otherwise.

Once the discrepancy score is determined, we allocate the new patient to T or C,whichever minimizes the overall imbalance according to a biased coin with highbias (p = 3/4).

[See handout by Gore on ‘Restricted randomization’ from the BMJ.]

Summary:

• Identify levels of important factors for the new patient, e.g., male, age 75,caucasian, no history of lesions. Denote by i = 1, . . . , n.

• Choose the score: one of S1, S2, or S3. Let’s choose S1.

• Calculate S =n∑i=1

S1i; then allocate the patient to T or C to reduce the

imbalance.

Example: Simple mastectomy plus radiotherapy (T ) versus radical mastectomy (C).(See handout by Gore.)

S =3∑i=1

(Ti − Ci)

= (8− 7) + (12− 13) + (13− 16)

= 1 + (−1) + (−3)

= −3

=⇒ choose T with high probability (p = 3/4), because allocation to T reduces |S|.


Biostatistics III

If you allocate the next patient to C, the imbalance gets worse:

S∗ = (8− 8) + (12− 14) + (13− 17)

= −6

On the other hand, if we allocate to T ,

S∗ = (9− 7) + (13− 13) + (14− 16)

= 0.

In this example, balance is characterized by the range of treatment totals, and thenext allocation is selected by minimizing the sum of the ranges across the factors/categories.

Minimization can be generalized to > 2 treatments.

Minimization: a more general scheme

Reference: Pocock and Simon, (1975) Biometrics.

For r treatments, k = 1, . . . , rf factors, i = 1, . . . , flf levels in f , j = 1, . . . , lf

tijk = number on treatment k in jth level of ith factor

t∗ijk(k) = number on treatment k (etc.) if k is allocated to next patient

Fij(t∗ijk(k)) is the balancing function, e.g., the range or variance.

Then the overall balancing function is a weighted sum of the balancing functions ofthe individual factors:

Bk =∑i

∑j

wiFij(t∗ijk)

where again, the weights are assigned if necessary on the basis of the relative im-portance of the prognostic factors. Moreover, the biased coin probabilities are deter-mined by the Bk.

For example, rank the r treatment assignments from least imbalance, k = 1, to most


Biostatistics III

imbalance, k = r. Then take

p1 >1

r

pk =1− p1

r − 1, k = 2, . . . , r.

That is, the degree of randomization is inversely related to p1. Note that the design

is fully randomized if p1 =1

r.

Remarks: Note that only the unique levels of each factor for the new patient areaffected by the choice of k. We can show that the values of Bk are especially easy toupdate and compute if the variance is used as the balancing function.

2.3.5 Stratification

• Stratification enables balance within strata defined by simultaneous com-binations of factors.

• In essence, we generate a separate randomization list using RPB or BCDfor each stratum.

• Random permuted blocks within strata is probably the most widely usedmethod of randomization in clinical trials.

It is important to avoid too many strata⇒ only use important factors. For example,a trial with 5 factors each with 3 levels gives 35 = 243 distinct strata.

=⇒ blocking will be rendered ineffective unless the trial is very large. Can useminimization to avoid this problem.

Remarks: In larger trials, we usually ignore blocking and stratification in the pri-mary analysis. The reason is that the extra complexity not worth the (small) gain inexpected power.

However, this is a topic of debate: stratification makes treatment groups more alike,which in turn implies that the treatment estimate more is precise; but the varianceis biased (positively), and this implies that tests are conservative.

[See the Encyclopedia of Biostatistics for further discussion.]

2.3.6 Randomization tests

Randomization provides the basis for classes of non-parametric (or distribution free)tests called randomization tests, which are finding increasing popularity in clinical


Biostatistics III

trials applications.

Suppose there are 2m patients to be randomized to two equal-sized groups:

m randomized to Am randomized to B

How many possible permutations are there?

(2m)!

How many possible distinct permutations?(2m

m

)=

(2m)!

(m!)2

We use randomization to choose one of the (2m)! possible designs, all equally likely.

Consider the null hypothesis of no treatment difference

H0 : µA = µB

i.e, the expected response for any individual is the same, whichever treatment theyreceive.

We view H0 in reference only to the (2m)! patients randomized (this is known asa deterministic hypothesis). It follows that for any of the possible designs, we canfind exactly the observations that would have been obtained, simply by permutingthe data.

Thus for any test statistic, we can find the exact null hypothesis distribution. Here,we take all permutations of the 2m observations and treat them as equally likely.

These are known as re-sampling-based methods.

Example: The two-sample permutation t-test.

Suppose we observe responses

x = (x1, . . . , x2m)

Under the null hypothesis and the randomization distribution, all permutations ofx are equally likely.

The optimal normal theory test-statistic for comparing two groups (assuming con-tinuous responses) is the two-sample t-statistic:

T =XA. − XB.

sp

√2

m


Biostatistics III

where sp is the (sample) pooled standard deviation. Recall: we obtain this form ofthe t-statistic by assuming equal group variances

σ2A = σ2

B = σ2

so that the standard error of the target difference XA. − XB. is√σ2

(1

m+

1

m

)= σ

√2

m

We estimate σ2 by the pooled sample variance, s2p.

Now suppose the observed value of T for the data x is

tobs =xA. − xB.

sp

√2

m

Permutation test procedure:

• For each permutation of the data x, calculate the test statistic (here, thetwo-sample t-statistic).

• Take the first m values in the permutation to be ‘group A′; Take the sec-ond set of m values to be ‘group B’.

• One of the values of t obtained in this way corresponds to the observeddata values tobs.

• And we knowP (T = tobs) =

1

(2m)!

provided all the values of T are distinct.

For simplicity, we will consider testing the one-sided alternative hypothesis

HA : µA > µB

i.e., large values of t are evidence against H0.

We therefore calculate the permutation P -value corresponding to tobs:

P (T ≥ tobs) =k(x)

(2m)!


Biostatistics III

where k(x) is the number of permutations of {1, . . . , 2m} giving values of the teststatistic ≥ the observed value tobs.

For a level α test, reject H0 if tobs is among the largest positive 100α% permutationvalues.

Recall: α = P (reject H0|H0 true)

= Type I error probability

Notes:

(1) Need only consider all(

2m

m

)possible distinct permutations of the data

(i.e., distinct permutations are equivalence classes with the same valueof the test statistic; k(x) is modified accordingly).

(2) If m is large, the permutation procedure will involve substantial com-putation, and we can then take a random sample of permutations. Thisleads to an approximate permutation distribution.

Remark: Biased coin designs affect the validity of standard permutation tests, be-cause the allocation of sequences is not equiprobable. One can simulate the correctnull hypothesis distribution though.

2.3.7 Randomized consent designs

• See the Altman and Gore Handout from the BMJ.

• Due to Zelen (1979).

• Different rationale to other trial designs.

• The results are analysed by ‘intention to treat’.

→ compare original randomized groups↙ ↘

means:group 1 group 2

µ µ+ ∆2

Comparison weaker than µ compared with µ+ ∆.


Biostatistics III

Figure 1: Standard design and Randomized consent design

• Design only works if a high proportion of Group 2 consent to the newtreatment→ otherwise we obtain an inefficient estimator of ∆.

• Used in criminology.

• Randomized consent designs have been used in medical trials, but heav-ily criticised.

• Unethical?

2.4 Trial size

Key references: Armitage, Berry, Matthews∮

6.6;

2.4.1 Introduction

A key issue in the design of any experiment is the number of observations to bemade.


Biostatistics III

The two extremes of design↙ ↘

fixed trial size fully sequential trial (adaptive design)

number of patients enrollment and observationspecified at outset continue until

e.g. AZT trial a stopping boundaryis crossed

– most common

– not widely used

– often impractical

– stopping rule can be complex

– trial continues if no difference(known as futility trials)

In between these two extremes are group sequential methods, for which

• interim analyses are performed on the accumulating data, with accuratecontrol of Type I error;

• these designs are popular, flexible, and relatively easy to conduct;

• and have been developed in recognition of the fact that large multi-centretrials are usually subject to regular analyses by the Data Safety Monitor-ing Board.

In determining the appropriate trial type and size, we usually have to strike a bal-ance between

• the cost per patient, and

• the increase in precision for each additional patient.

Ultimately, this usually comes down to a matter of judgement.

Nevertheless, it is highly desirable to make an advance, approximate calculationof the likely precision to be achieved in the trial. This protects against wasting re-sources on achieving unnecessary precision, or more commonly, against undertak-ing a trial of low power/precision, from which useful conclusions are unlikely.


Biostatistics III

2.4.2 Fixed trial size (non-sequential analysis)

The basic principle is to decide on the precision to be aimed at in detecting a treat-ment difference (or contrast), then tos relate the sample size to this.

Consider quantitative (continuous) outcomes, for two independent groups.

Let the responses on A and B be

group A XA1, XA2, . . . , XAn i.i.d. N(µA, σ2A)

group B XB1, XB2, . . . , XBn i.i.d. N(µB, σ2B)

and all observations independent.

Consider problem of testing the null hypothesis of no treatment difference:

H0 : µA = µB

against the two-sided alternative

HA : µA 6= µB,

or, the problem of estimating the true treatment difference

δ = µA − µB

The treatment difference δ = µA − µB is called the target quantity.

The natural estimator for δ is XA. − XB., which has (true) standard error√σ2A

n+σ2B

n

When σ2A = σ2

B = σ2, the standard error is

σ

√2

n

We consider two approaches to determining n, the number of patients required ineach group:


Biostatistics III

(i) Fix the standard error, and solve for n.We specify in advance that the standard error of the difference must notexceed ε, i.e., s.e. ≤ ε.Then

σ

√2

n≤ ε

=⇒ n ≥ 2σ2

ε2

Clearly as n increases, the s.e. decreases, and the precision increases.Note that specifying ε determines the shape of the distribution of theobserved difference, D = XA. − XB.. See Figure 2.

Figure 2: Different shapes for the distribution of D

(ii) Power calculations: these involve finding n for given power against aspecified alternative hypothesis. This is the most common approach toestimating trial size.

We specify δ = δ1, which is the smallest difference of clinical importancethat we would not want to overlook.

Consider now the problem of testing the null hypothesis of no treatmentdifference:

H0 : δ = 0 v.s HA : δ = δ1


Biostatistics III

where δ = µA − µB.

If σ2 is known, use the z-test:

Z =(XA. − XB.)− 0

σ

√2

n

Under H0, Z ∼N(0, 1).

To test H0 at the 2α level of significance, we use the rule reject H0 if |Z| ≥zα (see Figure 3).

Figure 3: Standard normal distribution

Recall,

2α = Type I error probability

= PH0(reject H0|H0 true).

Consider the alternative hypothesis HA : δ = δ1, then

β = Type II error probability

= PHA(retain H0|HA true).

The power of a test against a specified alternative hypothesis is

1− β = P (reject H0|HA true).

We want procedures to find n to achieve specified Type I and Type IIerrors.


Biostatistics III

Figure 4: Sample size power calculation

Now for σ2, α, δ, given, then β is a function of n.

Thus for trial sample size calculations we specify β, then solve for n;Figure 4 gives an outline.

C1 is what the critical value must be to achieve the desired significancelevel, 2α; C2 is the critical value needed to achieve the desired power,1−β. We cannot both reject and retainH0 at the same time, so the samplesize must be such that C1 ≤ C2.

Consider the difference D = XA. − XB..

Under H0, D ∼ N(0, 2σ2/n), and

C1 = zασ

√2

n

We ignore the probability ofZ < −zα as being very small when µA−µB =δ1.

Now, under HA : D ∼ N(δ1, 2σ2/n), so

C2 = δ1 − zβσ√

2

n.

Note that the power is one-sided: in practice, it is not desirable to rejectH0 in favour of δ1 < 0 when actually δ1 > 0, as this would imply arecommendation of the inferior of the two treatments.


Biostatistics III

Thus for C1 ≤ C2, we must have

zασ

√2

n≤ δ1 − zβσ

√2

n

⇒ δ1 ≥ σ

√2

n(zα + zβ)

⇒ δ21 ≥ σ2 2

n(zα + zβ)

2

⇒ n ≥ 2σ2

δ21

(zα + zβ)2,

where n is the minimum number of patients required in each group. In-creasing n will better separate the two distributions, or equivalently, in-crease the power.

What considerations will push the sample size up?

A more formal argument: Consider the standardized test statistic

Z =XA. − XB.

σ

√2

n

We can write this as

=

n∑i=1

XAi −n∑i=1

XBi

√2nσ2

∼ N

(µA − µB)√2σ2

n

, 1

Under H0 : Z ∼ N(0, 1) (µA = µB)

Under HA : Z ∼ N(±δ1

√n

2σ2, 1

)Consider the positive alternative:

µA − µB = δ1

and again ignore the very small probability that Z < −zα.


Biostatistics III

Thus

β = PHA(Z ≤ zα)

= PHA

(Z − δ1

√n

2σ2≤ zα − δ1

√n

2σ2

)

δ1

√n

2σ2∼ N(0, 1)

⇒ zα − δ1√

n

2σ2= −zβ.

Rearranging, gives

n = (zα + zβ)2 2σ2

δ21

.

To achieve power ≥ 1− β (i.e. better separation) we must have

zα − δ1√

n

2σ2≤ −zβ

=⇒ n ≥ (zα + zβ)2 2σ2

δ21

where n is the minimum number of patients required in each group.

• We want a high value of power, which can only be controlledat the design stage. (Note that we can control α during analysisby choice of significance level.)

• If σ is estimated, we can use the t-distribution.

Example: We propose to study lung function in two groups of men. The response isFEV, forced expiratory volume (in ml).

From previous studies, we know that

σ = 0.5 (ml)

The minimum ‘clinically significant’ difference is δ1 = 0.25.

Use a two-sided test, 2α = 0.05, and power 80% (this is a typical power assumption).

The question: how many men are required in each group?

α = 0.025, zα = 1.96

β = 0.2, zβ = 0.842


Biostatistics III

So

n ≥ 2

{(1.96 + 0.842)× 0.5

0.25

}2

= 62.3

i.e., the minimum trial size that achieves 80% power is 63 men in each group.

What if we want 95% power?

Check as exercise: would need at least 104 men in each group.

Note that C1 is what the critical value must be to achieve the desired significancelevel 2α. C2 is the critical value needed to achieve the desired power, (1 − β). Thesample size must be such that C1 ≤ C2.

The above arguments generalize to other types of outcomes, and the principles arequite general. In particular, the normal approximation works well for a wide varietyof situations:

Consider

C1 = zα s.e.H0(XA. − XB.),

C2 = δ − zβ s.e.HA(XA. − XB.)

Note that the standard errors under the null and alternative hypotheses can be dif-ferent, e.g., if comparing two proportions.

Further remarks on comparing 2 means:

(1) If σ2A 6= σ2

B, we can approximate σ2 by their average.

(2) If we know σ2A, σ

2B (σ2

B > σ2A), we can let nA = n, nB = kn, for some k.

The standard error is then √σ2A

n+σ2B

kn

and we can use this to determine the sample size [via C1 ≤ C2] withpossibly unequal group sizes.

Trial size for a binary response:


Biostatistics III

We now consider qualitative outcomes, and in particular, a binary response Y ,

e.g., yes/nosurvived/not survived.

Assume we have two groups A,B, and assume there may be nA subjects on A, andnB subjects on B (in the first instance, we are allowing the group sizes to differ).

Suppose we observe Xj ‘successes’ in group j, with j = A,B. Under the usualindependence assumptions,

Xj ∼ B(nj, πj) independently,

i.e., XA, XB are independent binomial random variables.

We want to compare two binomial proportions, i.e., to make inference on

πA − πB (the target quantity).

Under the above assumptions,

Pj =Xj

nj

hasE(Pj) = πj, Var(Pj) =

πj(1− πj)nj

.

Hence, we use PA−PB, the observed difference in proportions, to estimate πA− πB,where

s.e.(PA − PB) =

√πA(1− πA)

nA+πB(1− πB)

nB.

The hypotheses are:

H0 : πA = πB v.s. HA : πA 6= πB

We use the usual approximate normal-theory test based on

Z =PA − PB√

PA(1− PA)

nA+PB(1− PB)

nB

(Wald Test)

which is approximately N(0, 1) under H0.

Sample size calculation:

Assume nA = nB = n (i.e., equal group sizes), then solve for n in C1 ≤ C2:

zα s.e.H0(PA − PB) ≤ δ1 − zβ s.e.HA(PA − PB).


Biostatistics III

However, it is typically not the case that either standard error can be estimated inadvance in a simple manner.

We will look at 4 approaches:

(1) We know that if 0 < πi < 1, then

0 < πi(1− πi) ≤1

4.

So as a conservative approach, appropriate when there is little or no in-formation available, use

s.e.(PA − PB) ≤√

1

2n

(2) Sometimes we have available a ‘prior’ estimate for the probability of suc-cess in the control group, πB. Call this π∗B.

Then under H0 : πA = πB, the s.e.(PA − PB) is estimated by

s.e.∗H0=

√2π∗B(1− π∗B)

n

Under HA, we still estimate πB by π∗B, and use πA(1− πA) ≤ 1

4. Then

s.e.∗HA=

√√√√π∗B(1− π∗B) +1

4n

(3) A further improvement on (2): observe that under

HA : πA − πB = δ1,

we haveπA = δ1 + πB.

So if π∗B is available, letπ∗A = δ1 + π∗B

and use

s.e.∗H0=

√2π∗B(1− π∗B)

nas above, and

s.e.∗HA=

√(π∗B + δ1)(1− π∗B − δ1) + π∗B(1− π∗B)

n

=

√π∗A(1− π∗A) + π∗B(1− π∗B)

n.


Biostatistics III

(4) If prior estimates are available under H0 : πA = πB = π, say, then use π =the pooled, or average proportion.

Continuity correction:

For small samples, a continuity correction is used to obtain a more accurate assess-ment of significance from the asymptotic normal distribution for Z. Fleiss recom-mends adding

2

|πA − πB|=

2

|δ1|to the sample size to allow for this.

[See Fleiss (1980), or the 2nd edition of his book.]

Some general remarks:

(1) Sample size calculations are possible for more complex outcomes

e.g., survival time, correlations.

These often require specialized software.

(2) Sample size calculations are only a guide to the order of magnitude re-quired;

- they use ‘inputs’ that are subject to unquantifiable errors;

- one can also make adjustments for dropouts, noncompliance,etc.

(3) As a general rule, try to obtain as many observations as possible to esti-mate the treatment difference!

2.4.3 Sequential Trials

Key references: Chapter 18 Armitage et al. (2002); Jennison and Turnbull (2000);Whitehead (2002)].

Sequential trials have a long history in medical trials, and in industrial processes.Fully sequential trials which require that the data be analysed after each outcomehas been observed, are not always practical in the medical context (need a quickresponse, and storing and updating the information can become a major task).


Biostatistics III

Consider a simple case: where we want to achieve a specified precision in estimating µ.

Suppose we observe a random sample of size n from a distribution (µ, σ2).

The estimated mean is xn, and the estimated standard deviation is sn.

Therefore, the estimated s.e. of xn is

s.e.(xn) =sn√n.

This will tend to decrease as n increases.

Suppose we require thats.e.(xn) < ε.

Then a stopping rule will be:

continue sampling untilsn√n

first falls below ε.

This is an example of an adaptive design, where the stopping rule is based on theoutcomes as they emerge. (One can show that if the sampling is repeated manytimes, the usual confidence intervals are approximately valid.)

Group sequential methods: Pocock’s Test

Key references: Jennison and Turnbull (2000); Chapter 18, Armitage, Berry & Matthews(2002).

The whole approach is based on the idea of a ‘repeated significance test’ (RST). Thesituation very similar to that of adjusting for multiple comparisons: the aim is tocontrol the Type I error, here termed the ‘overall significance level’ (viz: 2α = 0.05).

Clearly, to control the overall significance level at a low value, a much higher signif-icance level (i.e., a lower probability) is required at each stage.

Pocock’s test is the simplest group sequential plan.

It uses a constant nominal significance level to analyse data a small number of timesover the course of the study, i.e., 2α′ is smaller than 2α).

Pocock’s Test: an example of a ‘RST’ plan:

As pointed out above, the idea is to control the Type I error at a constant level. So,what value of 2α′ should be chosen?

The answer depends on K, the number of times we analyse and test the data.

Patient entry is divided into K groups, each with m patients on each treatment. Thedata are then analysed after each group of 2m responses.


Biostatistics III

Assume treatment allocation within each group is random, and assume two treat-ments A,B, with responses:

XAi ∼ N(µA, σ2)

XBi ∼ N(µB, σ2) i = 1, 2, . . .

Use the standardized statistic after each group of observations: so at the kth test,define:

Zk =XA. − XB.√

2σ2

mk

k = 1, . . . , K

=1√

2mkσ2

(mk∑i=1

XAi −mk∑i=1

XBi

)

Formally, the stopping rule is as follows:

After group k = 1, . . . , K − 1

if |Zk| ≥ Cp stop, reject H0

otherwise continue to group k + 1,

after group K

if |Zk| ≥ Cp stop, reject H0

otherwise stop, retain H0.

where Cp = Cp(K,α) is the critical value, and a constant; it is calculated to give anoverall Type I error of 2α.

That is,

PH0(Reject H0 at analysis k = 1, or k = 2, or, . . . , or k = K) = 2α

(e.g., 2α = 0.05).

For this probability, we use the joint distribution of the sequence Z1, . . . , ZK . [SeeCh. 19, Jennison & Turnbull]. These statistics are not independent, in that the distri-bution of each Zk depends on Zk−1; we need to use numerical integration (usuallyGaussian quadrature) to evaluate the distributions.


Biostatistics III

Note that K = 1 is the non-sequential︸︷︷︸ case.

fixed sample

For Pocock’s test, we usually take K ≤ 5.

For example, if K = 5, 2α = 0.05, then we can show Cp = 2.413, so that the nominalsignificance level applied at each analysis is (See Figure 5)

2α′ = 2{1− Φ(2.413)}

= 0.0158

If K = 1, 2α = 0.05, Cp =?

Figure 5: Pocock’s Test

The power requirement,PHA

(reject H0) = 1− β

determines the group size.

The general scenario:

Suppose we are interested in the parameter θ (e.g., µA−µB), and the null hypothesisH0 : θ = 0, and want high power against the alternatives

HA : θ = θ1 or θ = −θ1

Assume the Type I and II errors are as before.

[δ = µA − µBδ = D = XA. − XB.

]Suppose that:


Biostatistics III

(1) At any stage, we can estimate θ by θ (the maximum likelihood estimator),and its variance by Var(θ).

(2) We inspect the accumulating data at intervals (up to K times) such thatat the kth test,

1

Var(θ)︸︷︷︸ ∝ k

I= kI

where I is called the ‘Fisher information’, and increases by I betweensuccessive tests.

Recall, when comparing two normal means, that

Var(θ) = Var(δ1) =2σ2

km,

and the mean under HA is

δ1

√mk

2σ2= δ1√I

at the kth test.

Example: A trial comparing the effects of Vitamin D supplements and a controltreatment in pregnant women. (From Armitage and Berry, third edition; using Table15.6; in what follows, we use δ1 for µ1 in A&B, and δ1 = µ1

√m/σ.)

The response is the infant’s calcium concentration (in mg/100ml), 6 days after birth.

The standard deviation (σ) of the response was 1.2; and the investigators requiredto detect a change in [Ca]=0.3 (= δ1).

It was assumed that 2α = 0.05, 1−β = 0.95, and the investigators intended to inspectthe data K = 3 times.

The question: how many women should be included in each group?

Let there be m women on each of the two treatments A,B in each group.


Biostatistics III

After the kth stage (k = 1, 2, 3)

Var(δ1) = Var(D) =2σ2

mk, (D = XA. − XB.)

so µ1

√m/σ = δ1

√m

2σ2

i.e., 2.22 = 0.3

√m

2(1.44)

m = 2(1.44)2.222

0.32

= 157.7→ 158

158× 2× 3 = 316× 3

= 948

Thus, the total number of patients would be 948 unless the trial stopped after one ofthe two interim analyses.

Each interim test would be conducted at the 2.2% significance level. [See Table 2.1from Jennison & Turnbull.]

Remarks:

(1) The maximum total number of subjects that may be required is greaterthan that for a fixed sample test. But there is benefit in the group sequen-tial plan if δ1 is large.

(2) There are many forms of stopping rule. The most popular is the O’Brien-Fleming scheme, in which it is more difficult to reject H0 early on. Atthe end of the trial, CB(K,α) is closer to the non-sequential value (seehandout).

(3) There is a need to be flexible, and the group sequential schemes providethis.

(4) There are other methods for ‘spending’ Type I error [see Whitehead (2002)and EoB (1998 or 2005)].

We can also reduce the required sample size using alternative designs, and we con-sider some important trial types in the following sections.


Biostatistics III

2.5 Crossover trials

2.5.1 Introduction

Motivation: can produce the same standard error for the target quantity of interestwith fewer patients.

But this is at the price of requiring additional assumptions.

Scenario: each patient receives each treatment in sequence; and the order in whichthe treatments are given is randomized.

For example: two treatments A,B:

Group 1: A then BGroup 2: B then A

Patients are randomized to Group 1 or to Group 2. This is known as the 2 × 2crossover trial (2 treatments, and 2 periods).

We can see that this design leads to ‘within-patient’ comparisons as well as to ‘between-patient’ comparisons. The repeated measures feature of the design means that weneed fewer patients.

In the standard parallel-group design, treatment comparisons are based on ‘between-patient’ information. In a cross-over design, important differences (i.e., betweentreatments) are made on a ‘within-patient’ basis; the main aim of the cross-over trialis therefore to remove from the treatment comparisons any component that is re-lated to differences between the individuals.

Key references for crossover trials:

• Armitage et al (2002)

• Altman (1991)

• *Jones & Kenwood (2003)

• *Senn (2002)

EoB article[see Course Information handout for details].

Illustrative example : A heroin trial as a 4×4 crossover design.


Biostatistics III

Group 1Group 2

...

0 3 6 9 12H H+M M choiceM H choice H+M...

......

...

months

The aim of this trial design was to improve individual injecting drug-user compli-ance (everybody in the trial would get some heroin, and no placebo). But there aremany other problems in considering such a design. For instance, changing treat-ment regimes for drug users every three-month would be disruptive, and it can beargued that it would be unethical to disturb the treatment of drug users who are sta-bilised on treatment, etc. Moreover, a high number of drop-outs are likely in such atrial, and this would lead to selection bias.

Crossover designs are especially useful for evaluating treatments for stable chronicconditions where the short-term effects of the treatments are of interest;

e.g., trials of anti-hypertensive drugs,e.g.; treatments for asthma in chronic sufferers.

Cross-over designs are widely used in early phase (I, II) drug trials, and in bioequiv-alence trials. But they are not appropriate for assessment of long-term conditions,such as the long-term treatment of HIV/AIDS patients, or for assessing survivalfollowing a diagnosis of cancer.

Figure 6: Scheme for 2× 2 crossover trial

Layout for 2× 2 crossover design: See Figure 6 for the scheme.

Notation:


Biostatistics III

Observation yijk

G1 AB ith group jth patient kth periodG2 BA i = 1, 2 within gp i k = 1, 2

j = 1, . . . , ni

n1 = number patients in Group 1n2 = number patients in Group 2n1 not necessarily equal to n2.

Induction stage:

• acceptance of eligible patients into study

• random allocation to Group 1 or Group 2.

Run-in period: desirable, but not always possible

• idea is to allow effects of previous medication to dissipate, and patient’sdisease state stabilizes

• baseline measurement zij1 (pretest measurement) taken here.

Period 1 (P1):

• patients receive treatment and response yij1 observed

• often yij1 is the average, or maximum, response taken near end of period.

Washout Period:

• again desirable, but not always feasible

• idea: come off P1 treatment and patient’s condition returns to pre-treatmentlevel

• zij2 observed, useful to compare with baseline zij1

• aim is to avoid carry-over effect of P1 treatment into P2.

Period 2 (P2):

• patients receive other treatment, and response yij2 observed

• same comments as for P1.


Biostatistics III

There are major problems/effects which are important and affect the model formu-lation and interpretation. These are:

(1) Period effect: this is a systematic effect going from time period 1 to 2;e.g., the patient’s condition may deteriorate over time.

(2) Carry-over effects: especially of drug in P1 carried over to P2. May alsobe psychological, or other effects.

(3) (Direct) Treatment × period interaction:e.g., responses on A differ in P1, P2, but responses on B do not (can bedue to carryover);e.g., a period effect may be explained biologically by a common carry-over effect.

In fact, we cannot distinguish between differential carry-over and othertypes of interaction in the 2× 2 design.

(4) Withdrawals/dropouts: can affect balance and results, especially in higher-order designs→ can lead to selection bias.

We incorporate (1)-(3) formally into the model.

2.5.2 Model formulation

We assume

Yijk = ηijk︸︷︷︸ + Sij + eijk︸︷︷︸fixed random

whereηijk systematic component (see below)

Sij subject effect i.i.d. N(0, σ2S)

independent <eijk measurement error i.i.d. N(0, σ2

e)

To model ηijk:

Grand mean µ

Treatment effect τ :

B − A = τtr.A : 0tr.B : τ


Biostatistics III

Period effect π:

P1 : 0

P2 : π

Treatment × period interaction, γ, implied by above:

B in P2 : τ × π = γ

rest : 0

(Differential) Carry-over effect ρ:

B then A : 0

A then B : ρ

We thus obtain an over-parameterized model for ηijk.

Expected values:

G1,P1 η1j1 = µ+ 0 + 0 + 0

G1,P2 η1j2 = µ+ τ + π + (γ + ρ)

G2,P1 η2j1 = µ+ τ + 0 + 0

G2,P2 η2j2 = µ+ 0 + π + 0

Observe:

• only get carry-over in P2

• γ, ρ appear once each, and together

i.e., they are said to be intrinsically aliased.

So, for example, the linear model for the response from the jth patient in Group 1(AB), Period 1, is:

y1j1 = µ+ S1j + e1j1


Biostatistics III

For the jth patient in Group 2 (BA), Period 2, the model is

y2j2 = µ+ π + S2j + e2j2

Note that in each case,Var(Yijk) = σ2

S + σ2e

where σ2S and σ2

e are called components of variance.

But we also have a covariance term (since responses within a patient are correlated):

Cov(Yij1, Yij2) = σ2S;

hence

ρs =σ2S

σ2S + σ2

e

which is the correlation between responses within a patient.

This model/design is an example of a split-plot design, in which the patients are themain plots, and the ‘time points’ where repeated observations are taken in P1, P2are the sub-plots. Here, the main plots comprise a large component of the error.

Sample size calculations:

Let n be the number of patients on each treatment in the usual trial design.

Let N be the total number of patients in a 2× 2 crossover trial.

For the same power, we can show that

N = n(1− ρS).

Clearly, when the within-patient correlation is high (i.e., ρS is large), the advantageof the cross-over design is greatest.

What if ρS = 0?

2.5.3 Analysis

The analysis of the 2× 2 design is based on the sums (i.e., totals) and the differencesof the observations for a subject in P1 and P2.

For continuous y: we use t-tests or Mann-Whitney tests.For categorical y: we use χ2 tests.

The randomization validates the comparisons between and within groups.


Biostatistics III

We will perform the analysis using t-tests and confidence intervals.

We work with

Dij = Yij2 − Yij1 (P2-P1)

Tij = Yij1 + Yij2 (P1+P2)

for i = 1, 2.

Differences: (within patients)

We find for

Group 1 (AB) D11, D12, . . . , D1n1

i.i.d. ∼ N(τ + π + γ + ρ, 2σ2e).

To get the variances, observe that

Dij = Yij2 − Yij1 (P2-P1)

= (ηij2 + Sij + eij2)− (ηij1 + Sij + eij1)

= (ηij2 − ηij1) + (eij2 − eij1).

So,

Var(Dij) = Var(eij2) + Var(eij1)

= 2σ2e .

Similarly,

Group 2 (BA) D21, D22, . . . , D2n2

i.i.d. ∼ N(π − τ, 2σ2e)

Observe that the two groups are independent.

Totals: (for the analysis between patients)

Group 1 (AB) T11, T12, . . . , T1n1

i.i.d. ∼ N(2µ+ τ + π + γ + ρ, 2σ2e + 4σ2

S)


Biostatistics III

Again to get the variances, observe that

Tij = Yij1 + Yij2 (P1+P2)

= (ηij1 + ηij2) + eij1 + eij2 + 2Sij.

(constant)

So,

Var(Tij) = 2σ2e + 4σ2

S.

Similarly,

Group 2 (BA) T21, T22, . . . , T2n2

i.i.d. ∼ N(2µ+ τ + π, 2σ2e + 4σ2

S)

Consider the totals:

Group 1: T11, T12, . . . , T1n1 T1.

Group 2: T21, T22, . . . , T2n2 T2.

Thus, subtracting the totals (Group 1 - Group 2) gives (γ + ρ).

This suggests basing a test for no interaction/carry-over on T1. − T2., i.e.,

H0 : (γ + ρ) = 0

The t-statistic is:

t =T1. − T2.

sp

√1

n1

+1

n2

on n1 + n2 − 2 d.f.,

where sp is the pooled estimate of the standard deviation, with

s2p =

1

n1 + n2 − 2

{n1∑j=1

(T1j − T1.)2 +

n2∑j=1

(T2j − T2.)2

}.

This estimates2σ2

e + 4σ2S = Var(Tij).

If we retain the hypothesis of no interaction/equal carry over, we can then use thedifferences to test for treatment and period effects.


Biostatistics III

(1) For the period effect, apply the two-sample t-test to

D11, D12, . . . , D1n1

and−D21,−D22, . . . ,−D2n2

because, if there is no period effect, i.e., π = 0 and γ + ρ = 0, then

E(D1.) = −E(D2.)

The null hypothesis is H0 : 2π = 0, assuming (γ + ρ = 0).

Thus

t =D1. + D2.

sD

√1

n1

+1

n2

on n1 + n2 − 2 d.f.,

gives a test ofH0 : 2π = 0;

s2D is the pooled within-group estimate of Var(Dij)

=1

n1 + n2 − 2

{n1∑j=1

(D1j − D1.)2 +

n2∑j=1

(D2j − D2.)2

}.

(2) For the treatment effect: use the two-sample t-test

t =D1. − D2.

sD

√1

n1

+1

n2

on n1 + n2 − 2 d.f.

If (γ + ρ = 0) this gives a test of H0 : 2τ = 0, because

E(D1. − D2.) = E(D1.)− E(D2.)

= (π + τ + γ + ρ)− (π − τ)

= 2τ if γ + ρ = 0.

A sequential approach to the analysis is thus::


Biostatistics III

Step 1: Test for interaction/differential carry-over (H0).

Step 2a: Insufficient evidence to reject H0

→ test for period effect→ test for treatment effect.

Step 2b: If we reject H0, the interpretation is more difficult;- usually need external information on possible causes:(1) True carry-over effects of treatment (different)(2) Psychological carry-over effects(3) True interaction of period × treatment(4) The two groups differ significantly

- can check from additional information, such as age, sex.

If a true direct-treatment-by-period interaction exists, we can:

• transform the data, e.g., log y,√y

• or we may decide to discard the P2 data, and analyse the P1 data only.

Why? Because the period effects and treatment effects are marginal to the interac-tion.

Is such an approach justified? Yes, because of the randomization.

But it is considerably less powerful, because

• the original sample size calculation was based on the cross-over design,and

• σ2S now enters the treatment comparison.

There is a problem with the sequential approach, in that the t-test for no interactionlacks power too. This is because it is based on the Tij with

Var(Tij) = 2σ2e + 4σ2

S

=⇒ t-test is conservative.

Using the baseline measurements Zij1 can help. For example, for each subject, cal-culate

T ′ij = Tij − 2Zij1

then testT ′

1. − T ′2.


Biostatistics III

Similarly for the P1 data only procedure, calculate:

Y ′ij1 = Yij1 − Zij1

and compare Y ′ij1 with Y ′

ij2.

Useful plots: (using terminology as in Kenwood and Jones)

Do the plots before the analysis.

(1) Subject-profiles plot:This is the simplest plot. For each group, plot the change in the subject’sresponses from P1 to P2;

i.e., yij1 v.s. P1

yij2 v.s. P2

and join the responses by a line. See Figure 7.

Figure 7: Subject-profile plot for one subject.

For example, Group 1 (AB); see Figure 8.

The idea is to compare treatments within groups.

Look for: trends, outliers, or anything unusual.

(2) Groups × periods plot:

Plot the four group × period means, and join the treatments by lines.See Figure 9.


Biostatistics III

Figure 8: Subject-profiles plot for Group 1.

Notation:Group 1 y1.1 y1.2

(AB) (1A) (1B)

Group 2 y2.1 y2.2

(BA) (2B) (2A)

P1 P2

Examples. In all these examples (Figures 9 to 13), there is a period effect,where the means in P2 are higher.

Figure 10:

• Parallel lines indicate that the treatment difference is the samein both periods (i.e., no treatment × period interaction).

• Period effect? Yes.

• See also plot (a) on handout.

• t-test for interaction is a test of parallelism.

Figure 11:

• Carry-over effect of B, which pushes up A in P2(?)


Biostatistics III

Figure 9: Groups × Period Plot

• Or there is a true interaction where the treatment difference islarger if the response is high.

Figure 12:

• Suggests A has a large effect carried over to P2.

• Or the response may have reached some natural limit=⇒ difference smaller in P2.

Figure 13:

• Indicates treatment× period interaction, as the treatment orderis reversed.

• Could be a large carry-over effect of A,

• or, the result of a true interaction effect.

(3.) Subject differences v.s. totals plot:

• The most helpful plot.

• Gives an overall view of the data and effects.

• Can see the variation both within and between groups.


Biostatistics III

Figure 10: Example 1 Figure 11: Example 2

Figure 12: Example 3 Figure 13: Example 4


Biostatistics III

For each subject, plotdij v.s. tij

Use a different plotting symbol for each group, and draw the convex hullfor each group.

Horizontal separation of the groups suggests interaction/carryover; seeFigure 14.

Vertical separation of the groups suggests a treatment difference; see Fig-ure 15.

Confidence intervals: an estimate of the treatment difference (i.e., effect size) τ is

D1. − D2.

2= τ

with standard error1

2s.e.(D1. − D2.);

t has (n1 + n2 − 2) d.f..

Thus , a (1− 2α) confidence interval is

τ ± tn1+n2−2(α)1

2s.e.(D1. − D2.).

Example from Altman: the Nicardipine trial

Using our formulation, a 95% confidence interval for τ is

(−12.8,−0.16).

Note that the null hypothesis of no interaction is retained, but an estimated 95% C.I.for the interaction term is

(−16.5, 9).

So, although the interaction is not significant, it is not negligible on the scale of thetreatment effect standard error.

Concluding remarks:

• The 2 × 2 crossover design is uncontroversial if differential carry-over,interaction and group effects can be assumed negligible.

• Some of the problems are removed by using higher-order designs (whichwe analyse using linear models).

• One can incorporate baseline information and covariates.


Biostatistics III

Figure 14: Subject Differences v.s. Totals Plot (horizontal separation)

Figure 15: Subject Differences v.s. Totals Plot (vertical separation)


Biostatistics III

2.6 Equivalence trials

Clinical trials are often designed to evaluate whether an experimental therapy ordrug (E) is sufficiently similar to an accepted or standard therapy or drug (S) to jus-tify its use. The experimental treatment is expected to be equal, or at least differentto within acceptable limits, in effect, but not superior to the standard. Such a studyis often called an equivalence trial. The benefits of the experimental treatment mayinclude fewer side effects, convenience of use, or lower cost.

Let the parameter δ denote the difference in outcome measures between two treat-ments, for example, δ = µS − µE , or δ = πS − πE . Here, positive values of δ indicatesuperiority of the standard treatment and δ takes value 0 when the treatments areequally effective.

If we wish to demonstrate that the effects of the two treatments do not differ muchin either direction, we want to establish that δ lies between two tolerance limits:

δL < δ < δU

for which we will ‘declare equivalence’. In equivalence testing, this is the alternativehypothesis, HA.

The corresponding null hypothesis is two-sided because it includes values on bothsides of the alternative values:

H0 : δ ≤ δL or δ ≥ δU

In an equivalence trial, the question of interest is often one-sided. For example, canwe show that the experimental treatment is not worse that the standard treatmentby as much as δU , say? The hypotheses are now

H0 : δ ≥ δU versus HA : δ < δU

We make a Type I error (α) if we falsely conclude equivalence, i.e., we rejectH0 whenH0 is true, which means that we obtain an upper confidence limit less than δU whenthe true value is greater than δU .

A Type II error (β) is a failure to conclude equivalence when the treatments aresimilar, i.e., we obtain an estimate of the upper confidence internal limit which isgreater than δU when the true value is a value less than δU . It is desirable to keepboth α and β small, especially α, not least because the ‘cost’ to the drug companiesof falsely declaring equivalence would be high.

The sample size formula for comparing two true means (i.e., average equivalence)using the one-sided hypotheses as described above, is

n =2σ2

(δU − δ)2(zα + zβ)

2


Biostatistics III

which is the same as the conventional hypothesis formula with δ1 replaced by δU − δ(recall that δ1 is the minimum difference to be detected in the standard ‘superiority’trial). If δ = 0, the formulas are the same and we obtain the same sample size for anequivalence trial as for the standard null hypothesis of no difference with δU as theminimum difference to be detected.

As for standard one-sided hypothesis testing, there are various methods for compar-ing two proportions. With equal numbers of patients in the two groups, the numberin each group is given by

n = {πS(1− πS) + πE(1− πE)}(

zα + zβδU − (πS − πE)

)2

.

Often it is appropriate to take πS = πE = π and the formula simplifies to

n = 2π(1− π)

(zα + zβδU

)2

.


Biostatistics III

3 Epidemiology and observational studies

3.1 Introduction

Epidemiology incorporates all aspects of the study of disease in human populations,other than designed experiments. Recall that we introduced some of these ideas inChapter 1.

Key references:

• Armitage et al (2002)

• Jewell (2003)

• Clayton & Hills (1993)

• EoB (1998, 2005)← advanced.

Examples:

• nutritional/dietary studies, including studies of the impact of geneticallymodified foods

• occupational health: early studies of the relationship between exposureto asbestos and mesothelioma (James Hardie Industries saga)

• long-term health effects from exposure to environmental electromagneticradiation

• the relationship between childhood leukaemia and mothers’ exposure toradiation

• exposure to cadmium soil contamination and illness (recent incidence atWest Lakes, SA)

• surveillance of bioterrorism and other public health threats

• studies of risk of vCJD from dietary exposure to BSE agents

• ‘small area’ prediction of Ross River fever; meningitis

• monitoring the spread of HIV, TB, malaria; forecasting the future spreadof infection.


Biostatistics III

We know it is often not practical or ethical to conduct experiments on people. It istherefore necessary to conduct observational studies and analyse observational data.

In general, observational studies produce weaker conclusions than well-designedexperiments. In particular, an observational study allows us to infer that an associ-ation is present, but not that the relationship is causal.

In principle, the distinction between experiments and observational studies is clearcut and important. However, in practice, the distinction can become blurred, andstrong conclusions can be drawn from well-designed epidemiological studies.

Types of observational studies:

We distinguish

(i) A cohort study - this is a prospective longitudinal study. Observationsare made on individuals at entry to the study, they are then followedforward in time, and explanatory and response variables are recordedfor each individual;

e.g., Multicenter AIDS Cohort Study (MACS)www.statepi.jhsph.edu/macs

(ii) A case-control study - this is a retrospective (longitudinal) study.

The response is recorded at entry to the study, and an attempt is made tolook backwards in time for possible explanatory features (i.e., exposureto risk).

(iii) A cross-sectional study - each individual is observed at just one point intime.

In (i) to (iii) above, the investigator may have substantial control overwhich individuals are included, and over the measuring process used.

(iv) A secondary analysis - the investigator only has control over the inclu-sion or exclusion of individuals for analysis;

e.g., National AIDS Registry data; e.g., S.A. Cancer Registry.

Study types (i) to (iv) are in

• decreasing order of effectiveness

• the (prospective) cohort study being closest to an experiment;

• and are also in decreasing order of complexity and cost.


Biostatistics III

For further background reading, see ‘Theory of the Design of Experiments’ by Cox& Reid (2000).

The two principal types of observational studies are cohort studies and case-controlstudies.

3.2 Cohort Studies

Consider a disease D, and a potential (or suspected) risk factor for disease, A. Wewant to determine whether A influences the occurrence of D.

In a prospective study, we take a sample of subjects who are exposed to A, and a‘comparable’ sample who are not exposed. We then track both groups over time,and record a response(s) on each subject; for example, whether or not the subjectdevelops D, or the ‘survival time’ to the development of D.

The important difference to clinical trials is that we do not use random allocation toassign the subjects to the two groups.

Some difficulties:

1) The choice of the ‘comparable’ non-exposed group can be problematic.For example, in studying the effects of living at West Lakes, we wouldnot choose the non-exposed sample from the general population becausethere are almost certainly potential confounders, such as socio-economicstatus (SES), or exposure to other toxic pollutants, such as living in anindustrial area.

2) Low incidence diseases need very large samples (often unrealisticallylarge).

3) Long incubation/latent periods: if the time from exposure to the onset ofdisease D is long, it will not be practical to conduct a prospective study.Some issues are that: surveillance definitions change over time; peoplemove states/countries, change jobs, etc.

4) Cohort studies are usually complex, more expensive, and take longer.

Major advantages of cohort studies include:

• many medical conditions can be studied simultaneously; and


Biostatistics III

• direct, current information is obtained over time. It is therefore easier tocalculate incidence rates, absolute risk, etc.

N.B.: Cohort studies can be retrospective - these involve cohort identification aftera conceptual follow-up period. For example, an historical prospective study (seehandout).

3.3 Case-control Studies

Case-control studies are the most widely used, and are usually retrospective.

Here we choose a sample of subjects who are known to have the disease D, and a‘comparable’ group known not to have the disease.

Disease group members = casesNon-diseased = controls

We then look back retrospectively to compare their exposure to the risk factor A.There may be multiple exposures, or complex patterns of exposure.

For example, select mesothelioma patients from hospitals, and select non-mesotheliomacontrols, then compare historical exposure to asbestos for the two groups.

In practice, we compare the histories and lifetimes of both groups.

The choice of controls is a critical issue:

The broad intention (i.e., the ideal) is that the controls should, on average, be similarto the cases in all respects except for the disease D under study and the associatedrisk factors. In other words, the controls *could* have the disease, but do not. For ex-ample, in the asbestos case-study, it would not be appropriate to sample the controlsfrom the general population.

Problems with choice of controls:

• Selection bias, e.g., the controls are younger than the cases. One impor-tant reason for obtaining wrong answers from case-control studies is in-correct sampling of the controls (or cases) from the study base.

• The cases are a higher sampling fraction of all cases, than are the controls.

• Non-response bias.

• Recall bias.


Biostatistics III

• Other biases, for example, bias in responders.

We usually match the controls for age and sex, so that the age and sex distributionsare similar; other factors often vary with age, sex. Note that matching is usually ad-vocated on the grounds of efficiency, but the main benefit is to reduce confoundingand avoiding selection bias (this is known as ‘de-confounding’).

Cases often arise from hospitals→ choose controls from the same hospital popula-tion with different illnesses. The idea here is that these patients will share character-istics such as SES, environmental conditions, etc.

Some problems with hospital controls are that

• the catchment populations for specialist hospitals may not coincide;

• patients sick with other diseases may not represent the population ofpeople free of the disease D. For example, factors associated with anincreased risk of these different diseases may appear to be protectiveagainst the disease of interest in the cases, because they are over-representedin the controls.

An important question: How many controls? → usually 1, 2 or 3 for each case.

Case-control studies are very useful when

• the disease D is rare;

• there is a long incubation period;

• for handling large datasets.

Exposure ascertainment (assessment) can be a problem though.

Also, it is important to draw attention to the fact that the best sampling scheme canbe invalidated by poor patient compliance.

3.4 Other designs

[See Clayton and Hills.]

In a case-cohort or case-base study, the controls are selected as a random sample ofthe cohort at the beginning of the study. If the disease is rare and there is little


Biostatistics III

loss to follow-up, then the analysis (see later) may be carried out as usual after firstremoving from the control sample any individuals who later become cases.

Incidence density sampling in a nested case-control study (matching on time) is use-ful when assessing exposure is expensive or difficult. Here, we select as controlsindividuals who are disease-free at the time of diagnosis of the cases. (A nestedcase-control study is a case-control study nested in a cohort.)

3.5 Binary responses and case-control studies

A binary response is a Bernoulli response variable, Y .

We have met binary responses in clinical trials:

e.g., Y =

{1 patient recovers0 otherwise

and in epidemiological studies:

e.g., Y =

{1 patient has disease D0 otherwise.

Consider a large population of patients, and let

π be the true proportion with the disease D;

π can be interpreted as the probability a randomly chosen patient has D;

π is also called the risk of disease or the disease rate.

Suppose now that we wish to compare the disease rates π1, π2 for two different sub-populations defined by exposure (π1) or non-exposure (π2) to a certain risk factorR.

Note that π1 is the probability of D amongst those at risk (exposed), and π2 is theprobability of D amongst those not exposed.

The following quantities are often used to model disease association:

1) risk difference:∆ = π1 − π2

2) relative risk (or risk ratio):φ =

π1

π2


Biostatistics III

3) odds ratio:

ψ =

(π1

1− π1

)(

π2

1− π2

)

Note that

∆< 0= 0> 0

⇐⇒ φ< 1= 1> 1

⇐⇒ ψ< 1= 1> 1

In general, ψ is less simple to interpret than φ or ∆.

However, if πi ≈ 0 for i = 1, 2, then

πi1− πi

≈ πi =⇒

(π1

1− π1

)(

π2

1− π2

)≈ π1

π2

= φ

i.e., the odds ratio is approximately equal to the relative risk if the disease is rare inboth populations. This is called the ‘rare disease assumption’, and note that it canbe relaxed by using particular designs.

ψ has the following useful property:

Consider a population of N1 individuals exposed to the risk factor R, and N2 non-exposed individuals, where N = N1 +N2.

Consider also the table of population frequencies that we can construct:

Disease (D)yes no Total

Exposed (R) yes N1π1 N1(1− π1) N1

no N2π2 N2(1− π2) N2

Total N1π1 +N2π2 N1(1− π1) +N2(1− π2) N1 +N2 = N

1) A prospective probability model:

Consider a prospective (cohort) study. In this case, the probability thatan individual chosen randomly from the exposed sub-population has the

disease is π1 (either by definition, or using the probabilityN1π1

N1

).


Biostatistics III

Similarly, the probability of disease for a randomly chosen individual

from the non-exposed population is π2 (either by definition or viaN2π2

N2

).

The odds ratio of these two probabilities is

ψ =

(π1

1− π1

)(

π2

1− π2

) .2) A retrospective probability model:

Consider a case-control study. Here we choose a sample of cases fromthe (sub)population of diseased individuals.

If we randomly choose an individual with the disease, the probabilitythat they were exposed is

ρ1 =N1π1

N1π1 +N2π2

=number of diseased exposed

total number of D

and the odds of exposure amongst the (diseased) cases is

ρ1

1− ρ1

=N1π1/(N1π1 +N2π2)

N2π2/(N1π1 +N2π2)

=N1π1

N2π2

.

Similarly, if ρ2 is the probability that a person chosen randomly from thenon-diseased (or control) population has been exposed to the risk factor,then

ρ2 =N1(1− π1)

N1(1− π1) +N2(1− π2)

andρ2

1− ρ2

=N1(1− π1)

N2(1− π2).

This is the odds of exposure amongst the non-diseased (or controls).


Biostatistics III

Hence the odds ratioρ1

1− ρ1ρ2

1− ρ2

=

(N1π1

N2π2

)/

(N1(1− π1)

N2(1− π2)

)

=N1π1N2(1− π2)

N2π2N1(1− π1)

=π1(1− π2)

π2(1− π1)

=π1

1− π1

/π2

1− π2

= ψ.

So although we clearly cannot estimate π1, π2 from the case-control data(since we choose the number with and without D), we can estimate ψ,the ratio of the odds of disease amongst the exposed and non-exposed.This tells us that the estimate of the odds ratio does not depend on thestudy design employed.

Note:

1) In prospective sampling/design

π1 = P (D|R)

π2 = P (D|R)

2) In retrospective sampling/design

ρ1 = P (R|D)

ρ2 = P (R|D)

3.6 Estimation and inference for measures of association

To begin, consider the retrospective (case-control) design.

Consider data comprising m1 diseased subjects of which X1 have been exposed to R,and m2 disease-free subjects of which X2 have been exposed to R.


Biostatistics III

The exact distributions ofX1, X2 are hypergeometric (i.e., sampling without replace-ment), and independent. In most applications where the sample is small comparedto the underlying populations, it is adequate to treat them as independent binomials(i.e., sampling with replacement).

That is,

X1 ∼ B(m1, ρ1)

X2 ∼ B(m2, ρ2)

Under this assumption, we estimate ρ1 by

ρi =Xi

mi

i = 1, 2.

For large mi

ρi ∼: N(ρi,

ρi(1− ρi)mi

).

We estimateψ =

ρ1

1− ρ1

/ρ2

1− ρ2

by

ψ =ρ1

1− ρ1

/ρ2

1− ρ2

.

For large samples, it can be shown (using the theory of maximum likelihood) that

log ψ ∼: N(

logψ,1

m1ρ1

+1

m1(1− ρ1)+

1

m2ρ2

+1

m2(1− ρ2)

)The sampling distribution of log ψ is more symmetric than that of ψ, and is thusbetter approximated by a normal distribution in large samples. Since the mean oflog ψ is close to the true logψ when n is large, it follows that

log ψ − logψ

has an approximately normal sampling distribution with zero expectation, and vari-ance V , say.

In practice, we estimate via

log ψ = log

(X1

m1 −X1

/X2

m2 −X2

)c©School of Mathematical Sciences, University of Adelaide 68

Biostatistics III

and the standard error for log ψ by

s.e. =

√1

X1

+1

m1 −X1

+1

X2

+1

m2 −X2

We will see where these estimates come from shortly.

Important remarks:

1) Several authors use the notation with ‘observed frequencies’

D D(cases) (controls)

(exposed) R a b n1

(non-exposed) R c d n2

m1 m2

In this notation, we find

log ψ = logad

bc

and

s.e. =

√1

a+

1

b+

1

c+

1

d.

2) To obtain a confidence interval for ψ: in practice, the best way to obtaina confidence interval ψ is to obtain one for logψ and back-transform theend-points to get a c.i. for ψ (i.e., exponentiate).

ψL, ψU obtained in this way are known as logit limits; they tend to be toonarrow, especially if any of the cell frequencies are small.

3.6.1 Finding the approximate variance in a cohort study

To understand the variance in a cohort study, write

log ψ = log

(π1

1− π1

)− log

(π2

1− π2

)where π1 is the observed proportion of D’s amongst the exposed, and π2 is the ob-served proportion of D’s amongst the non-exposed.

We know that the variance of π1 is binomial, i.e., π1(1−π1)n1

, where n1 is the number ofexposed subjects in the sample, and similarly for π2.


Biostatistics III

The key to estimating the variance V is then to get a simple approximation of log(

π1

1−π1

)in terms of π1.

Consider log(

π1

1−π1

), and expand in a Taylor series about π1, which gives

log

(π1

1− π1

)≈ log

(π1

1− π1

)+ (π1 − π1)

1

π1(1− π1).

Since the first term on the r.h.s. of this expression is constant, this immediatelyshows that

Var

(log

π1

1− π1

)≈ Var(π1)

1

{π1(1− π1)}2

=1

n1π1(1− π1).

We estimate this variance by plugging in π1 for π1, as usual, to give

Var

(log

π1

1− π1

)≈ 1

n1π1(1− π1)

=1

a+

1

b.

Exactly the same calculation leads to

Var

(log

π2

1− π2

)≈ 1

c+

1

d.

Since the exposed and non-exposed are independent samples, we then get an esti-mate of V by adding these two formulae, so that an effective estimate of the sam-pling variance of log ψ is

Var(log ψ) =1

a+

1

b+

1

c+

1

d,

applicable to both cohort and case-control designs.

To summarize: the approximate sampling distribution of

log ψ − logψ√Var(log ψ)

is N(0, 1). Thus two-sided 100(1− α)% confidence limits for logψ are given by

log ψ ± zα√

Var(log ψ)


Biostatistics III

where zα is the (1− α/2)th percentile of the standard normal distribution.

To obtain the relevant confidence limits for ψ, we simply anti-log, i.e., exponentiatethe limits for logψ.

Using the δ-method to find an approximate variance for ψ:

The delta-method: for a random variable X , the δ-method gives expressions for theapproximate mean and variance for g(X) (see Tutorial).

Consider a random variable

X ∼ (µ, σ2).

Then E{g(X)} ≈ g(µ)

Var{g(X)} ≈ σ2{g′(µ)}2

to first order.

Here,take X to be ψ

take g(X) to be log ψ.

Then using the δ-method,

Var(log ψ) ≈ Var(ψ)

(1

ψ

)2

= σ2ψ

(1

ψ

)2

∴ σ2ψ ≈ ψ2 Var(log ψ)

∴ σ2ψ =

(ad

bc

)2(1

a+

1

b+

1

c+

1

d

).

Note that this isVar(ψ) ' (ψ)2 Var(log ψ).

Example: See handout from Clayton & Hills p.157.

This is a case-control study. The controls are a cross-sectional survey of the wholepopulation (this allows us to use pre-existing data for the control group). It will be


Biostatistics III

the case that some members of the control group also have the disease, but for a raredisease, the effect is negligible.

What is the appropriate probability model?

The odds of vaccination amongst the leprosy cases is

101/260

159/260=

101

159

= 0.6352.

The odds of vaccination amongst the healthy controls is

46, 028

34, 594= 1.3305,

so the odds ratio is

101/159

46, 028/34, 594=

101× 34, 594

159× 46, 028

=0.6352

1.3305

= 0.4774.

Interpretation: vaccination halves the risk. This is the extent of protection againstleprosy afforded by vaccination.

Now,log ψ = −0.7394

and

s.e.(log ψ) '√

0.016241

Var(log ψ) =1

a+

1

b+

1

c+

1

d

∴ an approximate 95% c.i. for logψ is

log ψ ± 1.96 s.e.(log ψ)

i.e.,−0.7394± 1.96× 0.1274.


Biostatistics III

Thus

(logψL, logψU) = (−0.9838,−0.4842)

=⇒ (elogψL , elogψU ) = (0.3739, 0.6162)

We conclude, with 95% confidence, that the vaccinated individuals are between 37%and 62% as likely as the non–vaccinated to have leprosy.

Clearly the confidence interval does not contain 1, so we can also conclude that ψ issignificantly different to 1.

Since leprosy is a rare disease, we make the same conclusions for φ.

? ? ? ? ? ? ?

3.7 Attributable risk

Definition: Attributable risk (AR) is the proportion of cases in the total populationattributable to the risk factor, R.

AR is usually calculated when it is justified to infer causation from an observed as-sociation.

We are assuming exposure is harmful (π1 > π2).

Consider, as before, a disease D and risk factor R, and the following population(=study base) frequencies:

D DR N1π1 N1(1− π1)R N2π2 N2(1− π2)

The attributable risk is defined to be

AR =N1(π1 − π2)

N1π1 +N2π2

.

To see how the definition arises, observe that the total number of cases is

N1π1 +N2π2 (∗)


Biostatistics III

Suppose now that the risk of D in the exposed (sub)population was set to π2; thenthe total number of cases would be

N1π2 +N2π2 = (N1 +N2)π2

Then the surplus number of cases attributable to R is

N1π1 +N2π2 − (N1π2 +N2π2) = N1(π1 − π2).

Expressing this surplus as a proportion of the total number of cases gives:

AR =N1(π1 − π2)

N1π1 +N2π2

[NB. If π1 < π2, then the risk factor R is protective, and we use alternative methods.]

The AR can also be expressed as

AR =N1(π1 − π2)/π2

(N1π1 +N2π2)/π2

=

N1

(π1

π2

− 1

)N1

(π1

π2

)+N2

=N1(φ− 1)

N1(φ− 1) +N1 +N2

where φ =π1

π2

is the relative risk.

Divide the numerator and denominator by N1 +N2, then

AR =

N1

N1 +N2

(φ− 1)

1 +N1

N1 +N2

(φ− 1)

This shows that the AR ≈ 0 when the numerator is small, and the AR is large onlyif the numerator is large.

This means that to obtain a big AR, we must have both

i)N1

N1 +N2

‘not too small’ (approximately 1:1000), and

ii) φ >> 1.


Biostatistics III

One final form of the AR that we use to define an estimate from case-control data is

AR =N1(π1 − π2)

N1π1 +N2π2

=N1π1

N1π1 +N2π2

(1− π2

π1

)= θ1

(1− 1

φ

),

where θ1 is the proportion of exposed cases in the diseased population.

This tells us that we can estimate the AR from the relative risk, and the proportionof the population exposed to R.

3.7.1 Estimation of AR

Consider a table of frequencies obtained from a case-control study:

D DR a bR c d

a+ c b+ d

Observe that

1) θ1 is estimable from case-control data, and the obvious estimate is θ1 =a

a+ c.

2) φ not directly estimable, but for rare diseases

φ ' ψ.

So we takeψ =

ad

bcand substitute for the relative risk φ to obtain

AR = φ1

(1− 1

ψ

)=

a

a+ c

(1− bc

ad

)=

ad− bcd(a+ c)

.


Biostatistics III

Finally, we derive an approximate standard error for AR.

Observe first that

1− AR =d(a+ c)− (ad− bc)

d(a+ c)

=c(b+ d)

d(a+ c)

=1− θ1

1− θ2

,

where θ2 is the proportion of exposed controls.

Hence,log(1− AR) = log(1− θ1)− log(1− θ2).

Next, observe that θ1 and θ2 are simply observed proportions from two independentbinomial samples.

Hence,

Var(θ1) =θ1(1− θ1)

m1

Var(θ2) =θ2(1− θ2)

m2

m1 = number of casesm2 = number of controls

So by the δ-method, we find

Var{log(1− AR)} ' 1

(1− θ1)2Var(θ1) +

1

(1− θ2)2Var(θ2)

=θ1

m1(1− θ1)+

θ2

m2(1− θ2).

We estimate this by

Var{log(1− AR)} =a

c(a+ c)+

b

d(b+ d).

Furthermore, the large-sample distribution of log(1− AR) is asymptotically normal.

We obtain an approximate confidence interval forAR by back-transforming the con-fidence limits for log(1− AR).


Biostatistics III

4 Inference for the 2x2 table

4.1 Introduction

Consider the 2x2 table:

D DR a b n1 = a+ bR c d n2 = c+ d

m1 = a+ c m2 = b+ d n1 + n2

Depending on the sampling scheme we will model

the prospective case:

a ∼ B(n1, π1)

c ∼ B(n2, π2) independently

or

the retrospective case:

a ∼ B(m1, ρ1)

b ∼ B(m2, ρ2) independently.

In both cases we want inference about

ψ =π1/(1− π1)

π2/(1− π2)=ρ1/(1− ρ1)

ρ2/(1− ρ2).

4.2 Wald tests

An obvious way to proceed is to take

ψ =ad

bc.

We can then prove (using the δ-method) that


Biostatistics III

(prospective case)

Var(log ψ) =1

n1π1

+1

n1(1− π1)+

1

n2π2

+1

n2(1− π2)

=⇒ Var(log ψ) =1

a+

1

b+

1

c+

1

d.

Suppose we wish to test

H0 : ψ = 1

⇐⇒ H0 : log(ψ) = 0

In this case we use a Wald Test, i.e., take

Z0 =log(ψ)√

1

a+

1

b+

1

c+

1

d

Under H0, the asymptotic distribution of Z0 is N(0, 1).

So to obtain a test with significance level 2α, we reject when

|Z0| ≥ Z(α).

An unsatisfactory aspect of Wald tests is that they are not invariant under transfor-mations of the parameters.

For example, observe that

ψ =π1/(1− π1)

π2/(1− π2)= 1⇐⇒ π1 − π2 = 0

Hence we could instead use the standard Wald test for two binomial proportions. Inthe prospective case, we take π1 =

a

n1

and π2 =c

n2

and since π1, π2 are independent,

we findVar (π1 − π2) =

π1(1− π1)

n1

+π2(1− π2)

n2

.

We can therefore define an alternative Wald test statistic by

Z1 =π1 − π2√

π1(1− π1)

n1

+π2(1− π2)

n2

.


Biostatistics III

For the retrospective model, the Wald test statistic is of the form

Z1 =ρ1 − ρ2√

ρ1(1− ρ1)

m1

+ρ2(1− ρ2)

m2

.

Observe thatZ0, Z1 are different test statistics for the sameH0. There is no theoreticalreason to prefer one over the other.

For large samples at least, the different Wald statistics are often numerically similar.

Example: In a retrospective study of the association between Hodgkin’s disease andtonsillectomy, the following data were recorded:

Hodgkin’s (D)yes no Total

Tonsillectomy (R) yes 90 165 255no 84 307 391

Total 174 472 646

In this case we find Z0 =

log

(90× 307

84× 165

)√

1

90+

1

84+

1

165+

1

307

=⇒ Z0 = 3.8367.

Clearly reject H0 and conclude that there is a significant association (i.e., ψ 6= 1)between Hodgkin’s disease and tonsillectomy.

Here, ψ = 1.9935, and tonsillectomy is associated with almost twice the risk ofHodgkin’s disease.

To get Z1, observe ρ1 =90

174, ρ2 =

165

472, then

Z1 =ρ1 − ρ2√

ρ1(1− ρ1)

m1

+ρ2(1− ρ2)

m2

= 3.8296,

and we reach the same conclusion.


Biostatistics III

4.3 Likelihood Ratio test

An alternative method of inference is based on the likelihood:

Consider a statistical problem in which the distribution of data X depends on anunknown parameter θ.

That is, the probability distribution is p(x; θ). The log likelihood is then

`(θ;x) = log{p(x; θ)}.

Given a likelihood function `(θ;x), we can

i) Estimate θ by maximising the likelihood,

i.e., θ = arg maxθ

`(θ;x).

Usually, the solution to∂`

∂θ= 0.

ii) Under regularity conditions, we can find a large sample variance for θ.

That is, we use Var(θ) =1

I(θ), where

I(θ) = E

[− ∂2

∂θ2`(θ;x)

].

iii) We can obtain tests of H0 : θ = θ0.

The Likelihood Ratio (LR) test statistic is

G2 = 2[`(θ;x)− `(θ0;x)].

For large samples (with θ scalar) the null hypothesis distribution of G2 isapproximately χ2

1.

Recall our situation:

D DR a ≡ X1 b n1

R c ≡ X2 d n2

m1 m2


Biostatistics III

We assume Xi ∼ B(ni, πi) independently, and we want to make infer-ences for

ψ =π1/(1− π1)

π2/(1− π2).

We stated above that the likelihood approach to hypothesis testing for ascalar parameter θ led to the LRT statistic

G2 = 2[`(θ)− `(θ0)].

For large samples, G2 has approximately the χ21 distribution if

H0 : θ = θ0 holds.

However, in the case of the 2x2 table, we cannot apply this directly be-cause there is also a nuisance parameter. In particular, it is natural to pa-rameterize (π1, π2) by

ψ =π1/(1− π1)

π2/(1− π2), ω2 =

π2

1− π2

.

In this case we treat ω2 as a nuisance parameter. There are various ap-proaches to eliminating the nuisance parameter ω2.

Before we consider these various approaches, we will consider the likelihood andthe afore-mentioned parameterisation:

Consider the prospective model with:

a+ b = number exposedc+ d = number unexposed

}these are considered fixed.

The model is a ’product binomial model’, where

L ∝ πa1(1− π1)bπc2(1− π2)

d

=

(π1

1− π1

)a(1− π1)

a+b

(π2

1− π2

)c(1− π2)

c+d.

Let w1 =π1

1− π1

, w2 =π2

1− π2

.

Now, interest is in ψ =w1

w2

. So we reparametrize in terms of w2 and ψ, i.e., write

w1 = ψw2.


Biostatistics III

We treat w2 as a ’nuisance parameter’.

Then

L = (ψw2)a

(1

1 + ψw2

)a+bwc2

(1

1 + w2

)c+dand

` = logL

= a log(ψw2)− (a+ b) log(1 + ψw2) + c logw2 − (c+ d) log(1 + w2).

Now consider the approaches to eliminating the nuisance parameter ω2.

4.3.1 Profile Likelihood

The idea is to eliminate ω2 by maximizing over it, i.e., to replace ω2 with its mostlikely value. In particular, consider the likelihood function `(ψ, ω2;x).

We can defineω2(ψ) = arg max

ω2

`(ψ, ω2;x) for each ψ.

The profile likelihood is then

`p(ψ;x) = `(ψ, ω2(ψ);x).

Provided the number of nuisance parameters is fixed, the profile likelihood behavesasymptotically like an ordinary likelihood function.

This leads to the LRT statistic

G2 = 2[`p(ψ)− `p(ψ0)],

where ψ = arg maxψ

`p(ψ;x).

Under H0 : ψ = ψ0, G2 has approximately the χ21 distribution for large samples.

For the comparison of two binomial probabilities π1, π2, we test H0 : ψ = 1.

It can be shown that

G2 = 2

(n1P1 log

(P1

P

)+ n1(1− P1) log

(1− P1

1− P

)+n2P2 log

(P2

P

)+ n2(1− P2) log

(1− P2

1− P

))c©School of Mathematical Sciences, University of Adelaide 82

Biostatistics III

where Pi =Xi

ni

and P =X1 +X2

n1 + n2

.

To see how this formula arises, we use the fact that the likelihood function is invari-ant under transformation of the parameters.

According to our definition,

G2 = 2

[max(ψ,ω2)

`(ψ, ω2)−maxω2

`(1, ω2)

].

However, it is much simpler to observe that the log likelihood can equivalently beparamaterized by π1, π2. Writing this likelihood as `∗(π1, π2;x), where

`(ψ, ω2;x) = `∗(π1, π2;x),

ψ =π1/(1− π1)

π2/(1− π2),

ω2 =π2

1− π2

,

we haveG2 = 2( max

(π1,π2)`∗(π1, π2;x)−max

(π)`∗(π, π;x)).

Now

`∗(π1, π2;x) = x1 log π1 + (n1 − x1) log(1− π1) + x2 log π2 + (n2 − x2) log (1− π2).

To maximize, observe that we can treat each term separately.


Biostatistics III

Taking

g(π1) = x1 log π1 + (n1 − x1) log (1− π1),

g′(π1) =x1

π1

− (n1 − x1)

1− π1

=x1(1− π1)− π1(n1 − x1)

π1(1− π1)

=x1 − n1π1

π1(1− π1)

= 0

⇐⇒ π1 =x1

n1

.

Hence π1 = P1 =x1

n1

.

Similarly, π2 = P2 =x2

n2

.

Finally, we find by the same method,

arg maxπ

`∗(π, π;x) =x1 + x2

n1 + n2

.

Example: Hodgkin’s disease versus Tonsillectomy.

G2 = 14.191 with 1 df (for comparison with previous Wald test).√G2 = 3.767 (LRT invariant, always the same).

In what follows, consider the following 2× 2 table:

D DR a ≡ x1 b n1

R c ≡ x2 d n2

m1 m2 N

We now consider a second approach to eliminating the nuisance parameter.


Biostatistics III

4.3.2 Conditional Inference

General description:

Consider a problem in which p(x;ψ, ω2) is given, ψ is the parameter of interest andω2 is a nuisance parameter.

Another way to eliminate ω2 is to find a sufficient statistic for ω2, and to conditionon it.

That is, we would like to find s(x) such that

p(x;ψ, ω2) = q(x;ψ)h(s(x);ω2).

Because then we could calculate the conditional likelihood:

p(x|s(x);ψ, ω2) =p(x;ψ, ω2)∑

x′:s(x′)=s(x)

p(x′;ψ, ω2)

In this case A = {X = x}

B = {s(X) = s(x)}

=

q(x;ψ)h(s(x);ω2)∑x′:s(x′)=s(x)

q(x′;ψ)h(s(x′);ω2)

=q(x;ψ)∑

x′:s(x′)=s(x)

q(x′;ψ).

Then we can use the conditional likelihood p(x|s(x);ψ) to make inference about ψ.

For the problem at hand, observe that

p(x1, x2) =

(n1

x1

)πx1

1 (1− π1)n1−x1

(n2

x2

)πx2

2 (1− π2)n2−x2

=

(n1

x1

)(n2

x2

)(π1

1− π1

)x1

(1− π1)n1

(π2

1− π2

)x2

(1− π2)n2 .

Now let ψ =π1/(1− π1)

π2/(1− π2), ω2 =

π2

1− π2

;


Biostatistics III

and observe π2 =ω2

1 + ω2

=⇒ 1− π2 =1

1 + ω2

,

and π1 =ψω2

1 + ψω2

=⇒ 1− π1 =1

1 + ψω2

.

Hence we have

p(x1, x2;ψ, ω2) =

(n1

x1

)(n2

x2

)(ψω2)

x1ωx22

(1

1 + ω2

)n2(

1

1 + ψω2

)n1

=

(n1

x1

)(n2

x2

)ψx1ω

(x1+x2)2

(1

1 + ω2

)n2(

1

1 + ψω2

)n1

=

(n1

x1

)(n2

x2

)ψx1ω

s(x1,x2)2

(1

1 + ω2

)n2(

1

1 + ψω2

)n1

.

Hence

p(x1|x1 + x2 = m1;ψ, ω2) =

(n1

x1

)(n2

m1 − x1

)ψx1

(ωm1

2

1

(1 + ω2)n2

1

(1 + ψω2)n1

)∑x′1

(n1

x′1

)(n2

m1 − x′1

)ψx

′1

(ωm1

2

1

(1 + ω2)n2

1

(1 + ψω2)n1

)

=

(n1

x1

)(n2

x2

)ψx1

∑x′1

(n1

x′1

)(n2

m1 − x′1

)ψx

′1

.

We can now make inference for ψ, based on the conditional distribution of x1|x1 +x2 = m1.

For convenience, call

q∗(x1;ψ) =

(n1

x1

)(n2

m1 − x1

)ψx1

∑x′1

(n1

x′1

)(n2

m1 − x′1

)ψx

′1

(Non-central hypergeometric)

0 ≤ x′1 ≤ n1

0 ≤ m1 − x′1 ≤ n2

The conditional maximum likelihood estimate for ψ can be found using

ψCML = arg maxψ

q∗(x1;ψ)


Biostatistics III

If we want to test H0 : ψ = 1, we need only consider q∗(x1;ψ)def= q∗0(x1).

Substituting ψ = 1, we find

q0(x1) =

(n1

x1

)(n2

m1 − x1

)(n1 + n2

m1

)

=

(n1

x1

)(n2

m1 − x1

)(N

m1

)

=

(m1

x1

)(m2

n1 − x1

)(N

n1

) .

To summarize, we have shown if X1, X2 are independent with Xi ∼B(ni, πi), then

q∗(x1)def= P [X1 = x1|X1 +X2 = m1]

q∗(x1) =

(n1

x1

)(n2

m1 − x1

)ψx1

∑u

(n1

u

)(n2

m1 − u

)ψu.

Tests of H0 : ψ = 1 are based on the null hypothesis distribution

q∗0(x1) = q∗(x1)|ψ=1

=

(n1

x1

)(n2

m1 − x1

)(N

m1

)

=

(m1

x1

)(m2

n1 − x1

)(N

n1

) .

To test H0, we can use


Biostatistics III

1) An approximate Z-test:

i.e., observe from q∗0(x) we have E[X1] = n1P , where P =m1

N=x1 + x2

n1 + n2

and Var(X1) =N − n1

N − 1n1P (1− P ).

So we can use the approximate Z-statistic

Z =x1 − n1P√

N − n1

N − 1n1P (1− P )

.

Observe Z =[(n1 + n2)x1 − n1(x1 + x2)]/(n1 + n2)√

1

N − 1n1n2P (1− P )

=n2x1 − n1x2√

N(n1 + n2)

N − 1n1n2P (1− P )

=(n2x1 − n1x2)/(n1n2)√N

N − 1

n1 + n2

n1n2

P (1− P )

=P1 − P2√

N

N − 1

(1

n1

+1

n2

)P (1− P )

,

where Pi =xini

.

Note: The previous Z-statistic differs from a score test statistic for testing

H0 : π1 = π2 only in the inclusion of other termN

N − 1.

The approximate significance level is obtained from the N(0, 1) distribu-tion.

Provided that the cell frequencies all exceed 5, the normal approximationis adequate.

2) If the cell frequencies are small, an exact P -value can be obtained fromq0∗(x). This is called Fisher’s Exact Test.

In this case we define:

P -value =∑

u:q∗0(u)≤q∗0(x1)

q∗0(u)


Biostatistics III

This corresponds to the following:

Figure 16: Hypergeometric distribution

In practice, we can evaluate the exact P -value as follows:

1) Find all legitimate values for x1.The easiest way to do this for small samples is to list all pos-sible 2x2 tables of non-negative integers having the prescribedmargins.

2) Calculate q∗0(x1) for each legitimate value.

3) Calculate the P -value shown above.(Get the two most extreme values - pick the one with minimumprobability etc.)


Biostatistics III

5 Tests based on the likelihood

There are three tests connected to the log likelihood, which are all asymptoticallyequivalent:

(i) Likelihood Ratio test;

(ii) Wald test;

(iii) Score test.

5.1 Wald test statistic

Denoted We; we consider: H0 : θ = θ0 vs. HA : θ = θ0.

We makes direct use of the m.l.e. θ, and is based on the distance between θ and θ0,the null hypothesis value.

It is defined to beWe(x) = (θ − θ0)

√I(θ),

where I(θ) is the expected information evaluated at θ.

Recall, E[− ∂2

∂θ2`(θ;Y )

]= I(θ) and it is the variance of the score U =

∂`

∂θ, where

E

(∂`

∂θ

)= 0, (assuming sufficient regularity, that range of x does not depend on θ).

We also know that Var(θ) =1

I(θ)and we estimate Var (θ) by

1

I(θ).

(For vector θ, I(θ) is a matrix, and Var θ = I(θ)−1.)

So

We(x) =(θ − θ0)

1/

√I(θ)

= (θ − θ0)I(θ).


Biostatistics III

Under H0 : θ = θ0, We(x) ∼ N(0, 1) , or equivalently, We(x)2 is χ2

1.

GraphicallyRefer to Figure 17. Note thatWe is the most accurate quadratic approximation in the

Figure 17: Wald Test.

region of the m.l.e.

5.2 Likelihood ratio test statistic

The general form for the LR test statistic for testing

H0 : θ ∈ Θ0 versus Ha : θ ∈ Θ

(composite hypotheses)

where Θ0 ⊆ Θ, is


Biostatistics III

λ(x) =

supθ∈Θ0

L(θ|x)

supθ∈Θ

L(θ|x)

’ratio of likelihoods’

λ takes values in [0, 1] and H0 is rejected if λ is ’too small’.

The level α rejection region is {y : λ(x) ≤ Cα}where Cα is chosen so that

supθ∈Θ0

P (λ(x) ≤ Cα) ≤ α

In general, we need the PDF λ(x) underH0 to find the critical region, but this is oftenhard or impossible. Happily, Wilk’s theorem comes to the rescue. The theorem is aconsequence of the asymptotical normality of the m.l.e.

Wilk’s Theorem

If H0 is true, then−2 log λ is asymptotic χ2

r−s

where r is the number of independent parameters in Θ, i.e., dim Θ, the unrestrictedparameter space, and s = dim Θ0 (the null hypothesis parameter space), ∀ θ ∈ Θ0.

Note that W (x) = −2 log λ(x) is an equivalent test statistic because a monotonictransformation of a test statistic together with a corresponding transformation ofthe critical value doesn’t change the partition of the sample space into acceptance orrejection regions, hence it does not change the test procedure.

Consider scalar θ:W (x) measures the difference in the vertical axis between ` at θ and ` at θ0; refer toFigure 18.

Write the general LR test statistic as

W (x) = 2[`(θ; x)− `(θ0; x)]

For scalar θ, W (x) has an approximate χ21 distribution when H0 : θ = θ0 is true.


Biostatistics III

Figure 18: The Likelihood Ratio Test.

In the case of a 2x2 table (i.e., case-control data), we cannot apply this directly be-cause there is also a nuisance parameter w2 , as we have discussed previously. So webase inference about ψ on the profile likelihood `p. Provided the number of nuisanceparameters is fixed, `p behaves asymptotically like an ordinary likelihood function.

5.3 Score test statistic

The score test is based on the gradient and curvature of ` at θ0 (the null hypothesisvalue of the parameter). Refer to Figure 19.

The score test statistic is

Wu(y) =`′(θ0)

2

I(θ0)

where I(θ0) is the Fisher Information evaluated at null hypothesis value and is equalto the variance of the score, U .

The score test is the most accurate quadratic approximation in the region of the nullhypothesis value (and is the locally most powerful test).


Biostatistics III

Figure 19: The Score Test.


biostatistics iii - university of adelaide · biostatistics iii 2 the design and analysis of...

Documents