some aspects of bayesian frequent asympt ot

Some Aspects of Bayesian and Frequent kt Asympt ot ics

Jiahui Li

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Statistics

University of Toronto

@Copyright by Jiahui Li 1998

National Library 1*1 of Canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et BibliographicServices servicesbibliographiq~e~

395 Wellington Street 395, rue Wellington ûttawaON K1AONQ OttawaON KIAON4 Canada Canada

The author has granted a non- exclusive licence allowing the National Lïbrary of Canada to reproduce, loan, distribute or seil copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliotheque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation,

Some Aspects of Bayesian and Frequentist Asymptotics

Jiahui Li

Ph.D. 1998

Department of S tatistics

University of Toronto

Abst ract

In this thesis we consider various aspects of asymptotic theory in Bayesian

inference and frequentist inference. We give a detailed review of recent develop-

ments on matching priors, investigate their relationships with each other and their

invariace properties, and discuss how to obtain appropriate matching priors. We

investigate matching priors in the product of normal means problem (Berger &

Bernardo, 1989) and suggest a class of priors. We give an introduction to the

shrinkage argument and provide examples to demonstrate how to derive matching

priors by using the shrinhge argument. In the scalar parameter case, we apply

the shrinkage argument to derive the frequentist conditional distribution to order

0(n-2) of the maximum likelihood estimate given an ancillary statistic. Using this

result, we venfy the p' formula and Lugannani & Rice type formula, obtain a version

of the renormalizing constant in the p* formula and an expression of the p' formula

to order 0(n-2) for general models, and consider constructing confidence intervals

to O(n-*). We further use the shrinkage argument in the nuisance parameter case

to obtain the expansion of the frequentist conditional distribution function to order

0(n-3/2) of the signed root of the conditional likelihood ratio statistic of Cox k Reid

(1987) in location-scale models. We also discuss this approach for other models.

Acknowledgement s

1 would like to express my sincere appreciation to my supervisor, Professor

Nancy Reid, for introducing me to this research area and her invaluable insight,

advice, and for her encouragement and assistance through every step of my Ph.D.

program.

1 would like to thank Professor David Andrews and Professor Mike Evans for

their helpful comments leading to a better version of this thesis.

1 would also Like to thank Professor Mike Evans, Professor Keith Knight, and

al1 other faculty members, staffs and graduate students, for their help while I was

studying in the Department of Statistics.

1 am indebted to the University of Toronto and my supervisor for the generous

financial support on my studies and researches in the Department of Statistics,

University of Toronto.

Finally, 1 would like to thank my family and £riends for their constant help

and encouragements on my pursuing the Ph.D. degree in the University of Toronto.

... I l l

Contents

1 Introduction 1

1.1 Asymptotics in Bayesian Inference . . . . . . . . . . . . . . . . . . . . 1

1.2 Frequentist Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Matching Priors and Their Invariance Properties 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Matching Priors via the Posterior Quantiles . . . . . . . . . . . . . . 11

2.3 MatchingPriors viatheDistribution Function . . . . . . . . . . . . . 15

2.4 Matching Priors via the Posterior Regions . . . . . . . . . . . . . . . 19

2.4.1 Equal Tai1 Areas Consideration . . . . . . . . . . . . . . . . . 19

2.4.2 Frequentist Statistics Consideration . . . . . . . . . . . . . . . 30

2.4.3 HPD Consideration . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.4 OtherIssues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Invariance Properties of Matching Priors . . . . . . . . . . . . . . . . 26

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 Casestudies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Matching Priors in the Product of Normal Means Problem 38

. . . . . . . . . . . . . . . . . . . . . . . . 3.1 h t roduction and Notation

. . . . . . . . . . . . . 3.2 First Order and Second Order Matching P ~ o ~ s

. . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Cornparison of T, and a.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 A Class of Priors

4 The Shrinkage Argument and Matching Priors

. . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Shrinkage Argument

4.2 Frequentist Distribution of the SRLR S tatistic by Using the Shrinliage

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Argument

. . . . . . . . . . . . . . . . . . . . 4.2.1 Introduction and Notation

4.2.2 The Frequentist Distribution of the SRLR Statistic . . . . . .

. . . . . . . . . . . . . . 4.3 Matching Priors via the Posterior Quantiles

. . . . . . . . . . . . . 4.3.1 Cdculating the Posterior Quantiles of 0

4.3.2 Frequentist Coverage Probabilities of Posterior Intervals . . .

. . . . . . . . . . . . . . 4.3.3 Matching via the Posterior Quantiles

. . . . . . . . . . . . . 4.4 Matching Priors via the Distribution Function

5 Tai1 Probability of MLE to O(n-*) by Using the Shrinkage Argu-

ment

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Prelirninaries

. . . . . . . . . . . . . . . . . . . . . 5.2 Posterior Probability to 0 ( n - 2 )

. . . . . . . . . . . . . . . . . . . . . . . 5.3 Some Detailed Calculations

. . . . . . . . . . . . . . . . . . . . . . . . 5.4 Tai1 Probability to 0 ( n - 2 )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.2

6 The p' Formula and Tai1 Probability of MLE to O(n-') in the E're-

quentist Setup 85

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2 Direct Integration of the p' Formula . . . . . . . . . . . . . . . . . . . 85

6.3 p* Formula to 0 ( n - 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

. . . . . . . . . . . . . . . . . . . . . 6.4 Confidence Intervds to 0 ( n - 2 ) 97

7 Conditional Distribution of the SRCLR Statistic by Using the Shrink-

age Argument 103

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.2 Marginal Posterior Density . . . . . . . . . . . . . . . . . . . . . . . . 105

7.3 Approximation to the Marginal Postenor Deosity . . . . . . . . . . . 108

7.4 Marginal Posterior Distribution . . . . . . . . . . . . . . . . . . . . 109

7.5 Frequentist Conditional Distribution of the SRCLR Statistic . . . . . 113

List of Tables

3.1 Shifted Coverage Probabilities to ~ ( n - ~ ' ~ ) of 0.05 Posterior Quantiles. 44

3.2 Shified Coverage Probabilities to of the Posterior Quantiles

via Lrsing the Prior r0.5- . . . . . . - . . - . . . . - . - . . . - . . . . 48

vii

Chapter 1

Introduction

Traditionally there axe two ways of doing statistical inference, one is the

Bayesian approach, the other one is the frequentist approach. In this thesis, we

consider the contacts between these two approaches in the asymptotic aspect. In

the following, we give a b ief review on some background and the current develop-

ments in Bayesiân and frequentist asymptotic methods (see also Reid, 1995), then

we outline the work of this thesis in the last section.

1.1 Asymptotics in Bayesian Inference

One important issue in Bayesian inference is the choice of the pnor density. In

practice most of the analyses are performed with noninformative priors. The philos-

ophy related to this issue is the long time debate between the subjective inference

and the objective inference. Kass & Wasseman (1996) gave a full discussion on this

issue. The earliest studies of noninformative priors are due to Laplace (1820) who

used the unifom prior, and Jeffreys (1946) who used the prior ~ ( 0 ) a1 1@) Ili2 for

inference, where 0 is the parameter, Il($) is the per observation Fisher information

CHAPTER 1. INTRODUCTION 2

matrix. There are now a variety of methods that have been proposed for constmct-

ing noninformative priors. For a detailed review and appraisal, refer to Kass &

LVasserman (1996). We here shall concentrate on matching priors and reference

priors.

Matching priors are defined by ensuring some posterior quantities have the

frequentist coverage probabilities to some asymptotic order. This met hod of derïv-

ing noninformative pnors dates back to Welch & Peers (1963) and Peers (1965),

who derived matching priors by ensuring the postenor intervals have the frequentist

coverage probabilities to first order or second order. Tibshirani (1989) considered

matching via the posterior quantiles as in Peers (1965) and obtained an explicit form

of the first order matching priors. Mukerjee k Dey (1993) extended the matching

via the posterior quantiles (Peers, 1965; Tibshirani, 1989) to the next order. Several

rnatching criteria have been proposed, the most importmt one is matching via the

posterior quantiles (Welch & Peers, 1963; Peers, 1965; Tibshirani, 1989; Mukerjee &

Dey, 1993; etc.), followed by matching via the distribution function (Ghosh & Muk-

erjee, 1993b; Mukerjee & Ghosh, 1996; etc.), and matching via the posterior regions

(Peers, 1968; Severini, 1991; Ghosh & Mukerjee, 1991, 1992b, 1995b; etc.). We s h d

give a detailed review and investigation in Chapter 2 of the current matching priors,

their relationships and invariance properties. We also discuss the matching priors

in the product of normal means problem of Berger & Bernardo (1989) in Chapter

3, and discuss deriving matching pnors in Chapter 4.

Reference priors, initiated by Bernardo (1979) and refined Iater by Berger tk

Bernardo (1989, 1991, 1992a, 1992b) (see also Berger & Yang, 1992), are defined

by maximizing the missing information, the notion defined by Bernardo (1979).

When there are no nuisance parameters, the reference prior tums out to be Jeffreys7


prior. Ln the nuisance parameter case, one needs to use a stepwise procedure, and

the reference ptiors are usually different from Jeffreys' prior. Reference priors are

quite different from matching priors in their constructions. In geoeral there is no

direct cornparison between t hese two kinds of priors eit her in reference properties

or in matching properties. But in some specific situations, there may be some

similar properties. One interesting phenomenon is the form of reference priors and

Tibshirôni's h s t order matching priors. For details see Berger (1992) and Kass &

Wasserman (1996). We shall also discuss this phenornenon in Chapter 2 and discuss

the matching properties of two reference priors of Berger & Bernardo (1989) in

Chapter 3.

Another issue in Bayesian inference is the approximation of the posterior den-

sity and the posterior distribution. When there are nuisance parameters present

or when the dimension of the parmeter becomes large, the calculations are quite

involved, and so simpler approximations are required. In the nuisance parameter

case, Tierney & Kadane (1986) obtained the approximation of the marginal poste-

rior density to order 0(72-~ /~ ) , and the approximation of the posterior moments to

the order O(n-'). DiCiccio Field & Fraser (1990) obtained the approximation of

marginal posterior distribut ion to order 0 ( n - 3 / 2 ) . We shall discuss the approxima-

tion of posterior density and posterior distribution in Chapter 5 (scalar parameter

case) and in Chapter 7 (nuisance parameter case).

1.2 Frequentist Asymptotics

Asymptotic theory in frequentist inference is mostly related to approxima-

tions of the conditional density and the tail probability of the maximum likelihood

CHAPTER 1. INTROD UCTION 4

estimate, sometimes c d e d p' based approximations.

The p' formula of Bamdorff-Nielsen (1980, 1983) approximates the conditional

density of the maximum likelihood estimate given an ancillary statistic to order

with a renormalizing constant. In transformation rnodels the p' formula

is exactly equal to the conditional density and the renormalizing constant is free of

the parameter. In exponential models wit h the canonicd pararneter, the p- formula

equals the renormalized saddlepoint approximation (Reid, 1988). A quite general

derivation of the p* formula is given by Skovgaard (1990). We shall discuss the p-

formula in detail in Chapter 6.

To use p- to cornpute the tail probability we need to integrate either numeri-

cally or analytically over values of the maximum likelihood estimate more extreme

than the observed value. An additional integration is needed to compute the renor-

molizing constant. In the one parameter case, approximations of the tail probability

to order O(n-Y2) are available. For details please see Lugannani & Rice (1980) for

exponential models, DiCiccio Field & Fraser (1990) for location models, Barndofi-

Nielsen (1991) and Fraser & Reid (1993) for general models. We shall in Chapter

5 and Chapter 6, discuss the approximation of the tail probability of the maximum

likelihood estimate.

In the case when nuisance parameters are present, approximation of the tail

probability for a scalar pararneter of interest is available in exponential rnodels

with canonical parameters, in which the nuisace parameters are eliminated by

conditioning (Skovgaard, 1987; Fraser & Reid, 1993); and in transformation models

with transformation parameters, in which the nuisance parameters are eliminated

by rnarginalization (DiCiccio Field & Fraser, 1990). In Chapter 7, we s h d discuss

the approximation in the nuisance parameter case.

CHAPTER 1. INTROD UCTION 5

The p* formula still holds its accuracy to O(TZ-~/ ' ) when the ancillary statistic

replaced by a second order approximate ancillary statistic. In Fraser & Reid (1995),

it is shown that third order inference needs only the observed likelihood and the

tangent directions for a second order ancillary. This makes the p- formula and

other tail probability approximations more applicable to a variety of models. For

the construction of approximate ancillaries, see BanidorfF-Nielsen (1980), BamdorfF-

Nielsen & Cox (1994, Chapter 7), McCullagh (1987, Chapter 8), Skovgaard (1990)

and Fraser k Reid (1995).

Cornparing to the p* formula of Barndofi-Nielsen (1980, 1983), the tangent

exponential rnodel of Fraser (1988, 1990) and Fraser & Reid (1993, 1995) is another

approach for deriving the conditional density of the maximum likelihood estimate.

It is a generalization of the p* formula. Because in the later chapters we do not

mention this again, we, in the following, give a more detailed description of the

mechanism of this approach, and thus we can understand more about the similarities

and differences of different kinds of the approaches.

For models with parmeters and variables of the same dimension, the tangent

exponential model approximates the given model to third order in a first order

derivative neighbourhood of the data point and to the second order otherwise. To

cdculate the tangent exponential model we need only the observed likelihood and

the likelihood gradient at the data point. In the general model, the log-likelihood

function is [ ( O ; x) where û is of dimension d and x is of dimension p. Suppose that

there is a third order ancillary, then there is a conditional likelihood with the same

variable and parameter dimension having third order accuracy. Now we can use a

tangent exponential model to work on this conditional likelihood. Since the model is

conditional with respect to a third order âncillary, the conditional likelihood gradient

CHA PTER 1. INTRODUCTION 6

becornes the full Lkelihood gradient tangent to the ancillary surface. Let xo be the

observed data point, V be the directions tangent to a third order ancillary at the

data point, then t(B;xo), 9 = e,v(B;zo) fully determine the tangent exponential

model within the conditioning of the ancillary. In Fraser k Reid (1995) it is shown

that for third order inference it suffices to have V tangent only to a second order

ancillary. The problem left is to find the tangent directions for a second order

ancillary.

A procedure has been developed in Fraser & Reid (1995) for calculating the

tangent directions to a second order ancillaxy at the data point. First we can use the

approximate location model theory (Fraser, 1964; Fraser & Reid, 1995) to determine

a f i s t order ancillary a t the data point. In Fraser & Reid (1995), it is shown that

there is a second order ancillary with the same tangent directions as the first order

ancillary at the data point. So in applications we can work with this first order

ancillary suggested by Fraser & Reid (1995).

Our work focuses on matching priors in Bayesian inference, the shrinkage ar-

gument which has been used in several papers for deriving matching priors, and

derivation of the frequentist version of the densities and distribution functions of

some quantities using the shrinkage argument.

In Chapter 2 we give a detailed review of the developments of matching pri-

ors and investigate relationships among various matching priors. We sort out the

partid differentiai equations which matching priors need to satisfy. Based on match-

ing methods, we categorize matching priors into three classes: matching priors via

the posterior quantiles, matching priors via the distribution function nnd matching

CHAPTER 1 . INTRODUCTION 7

priors via the posterior regions. These three classes cover h o s t al1 current match-

ing priors. Within the matching priors via the posterior regions, we further divide

them into Equal Likelihoods Consideration, Frequentist Statistics Consideration?

and Highest Posterior Density Consideration. We first try to give a different expla-

nation of why the first order matching via the distribution function is equivalent

to the first order matching via the posterior quantiles, but second order matching

is different. We investigate the invariance properties of matching priors currently

available, and thus we extend the findings of parameterkation invariance of match-

ing priors via the posterior quantiles and via the distribution function of Mukerjee

& Ghosh (1996). Based on the relations with each other and their invariance prop-

erties, we make suggestions on how to derive reasonable matching priors. Finally

we give some exarnples to illustrate the current work.

In Chapter 3, we consider matching priors in the product of normal means

problern of Berger & Bernardo (1989). We first obtain the matching properties of

the flat prior. Based on the matching properties, we make cornparisons among the

flat prior and two reference pt-iors derived in Berger k Bernardo (19S9), and give a

noninformative prior from a class of priors we suggested. In Chapter 4, we introduce

the shrinkage orgument (Bickel & Ghosh, 1990; Dawid, 1991; Ghosh & Mukerjee,

1991; Sweeting, 1995a, 1995b). This argument has been widely used as an effective

tool to evaluate the frequentist probabilities of some posterior quantities, and thus

to obtain matching priors. As demonstrations, we use the shrinkage argument to

denve the frequentist distribution of the signed root of the likelihood ratio statistic,

and to derive matching piors via the posterior quantiles (Welch & Peers, 1963). We

also consider matching via the distribution function and obtain interesting results,

matching via the distribution function is equivalent to matching via the posterior


quantiles in the first case, but leads to matching via the posterior quantiles in the

second case-

In Chapter 5, in the scalar parameter case we use the shrinkage argument

to denve the frequentist conditional distribut ion to order 0 ( n - 2 ) of the maximum

likelihood estimate given an ancillary statistic. This result extends the current

approximations in the literature by the magnitude of one more order. Mie also obtain

the posterior distribution to order 0(n-2) , and the posterior and the frequentist

conditional Bartlett corrections of the likelihood ratio statistic. Then in Chapter 6,

we verify the pu formula being the conditionai density of the mairnum likelihood

estimate to order O(n-* ) ( Barndofi-Nielsen, 1980, 1983) by directly integrat ing the

p' f o m d a and comparing to the result in Chapter 5. We look for the renormdizing

constant in the p' formula and extend the pœ formula of Barndofi-Nielsen (1980,

1983) to order 0(n-2) . We also verify a Lugannani & Rice type formula and obtain

the third order error terms of different kinds of approximations to the tâil probability

of the maximum likelihood estimate. In the final section of Chapter 6, we consider

constructing confidence intervals to order 0(n-2) .

In Chapter 7, we extend the approach used in Chapter 5, i.e. using the shrink-

age argument, by including the nuisance parameter to derive the frequentist condi-

tional distribution to order ~ ( n - ~ ' ~ ) (DiCiccio Field & Fraser, 1990) of the signed

root of the conditional likelihood ratio statistic (Cox & Reid, 1987) in location-scale

models, and discuss the possibilities for other models. We also obtain approxima-

tions to the marginal posterior density (Tiemey & Kadane, 1986) and the marginal

posterior distribution ( DiCiccio Field & Fraser, 1990) to order O (n-3/2) .

Chapter 2

Matching Priors and Their

Invariance Properties

In this chapter we give a detailed review of the development of matching priors

and investigate relationships among various matching priors. We also explore the

invariance properties of the matching priors currently available. Based on these d a -

tionships and invaxiance properties, we make some suggestions for the development

of reasonable matching priors. Findly we give some examples of this work.

2.1 Introduction

In recent years there has been a considerable development in deriving noninfor-

mative prion through ensuring the frequentist validity of some Bayesion procedures.

other types of noninformative priors.

formative priors in Bayesian inference

These types of noninformative priors are called "matching priors", in contrast to

Aside from their role in serving as the nonin-

, they are very useful for constructing accurate

confidence regions which can be difficult sometimes in the frequentist approach, e.g.

CHAPTER 2. MATCHING PRIORS 10

in the product of normal rneans problern (Berger & Bernardo, 1989). The earliest

studies on this issue date back to Welch & Peers (1963) and Peers (1965, 1968).

Further studies are followed by Stein (1985), Tibshirani (1989) and Mukerjee k

Dey (1993), etc. There have been significant developments; more and more match-

ing methods have been suggested. Most of the current results are on second order

matching, and have multiparameter and nuisance parameter versions. For discus-

sion on matching priors, please refer to Ghosh t Mukerjee (1992a), Reid (1995),

Kass & Wasserman (1996) and Ghosh & Mukerjee (1996).

Our present work focuses on the following aspects. (1). A variety of procedures

have been developed. So there is a question as to which one is better, or whether

there is an order for the choices. For this reason, we investigate different procedures

for deriving matching priors and find out the relations among these. (3). Invaxïance

properties are important for priors in Bayesian inference. If the matching priors

do not have the matching property under another parameterkation, t hen they are

not convincing to serve as noninformative priors, and the attempt to make global

parameter orthogonality may cause problems. In this regard, we shall explore the

invariance properties of the matching priors currently available. (3). Based on the

results of (1) and (2), we shall give suggestions on how to use the current procedures

to derive more reasonable and better noninformative priors. (4). We shdl give some

examples to compare the matching pnors derived via the present procedures.

We here make some assumptions. Suppose that we have i.i.d. observations

x = ( X I , - -. , Xn)T fiorn a mode1 f(s; 6 ) with parameter O, where t9 = (O1? - - - ,O,)T

belonging to an open subset of RP has prior density r(0). Let [ ( O ; 2) be the log-

likelihood function and 4 be the maximum likelihood estirnate of 8. We assume that

O1 is the parameter of interest and all the other parameters are nuisance parameters

CEAPTER 2. IMATCHING PRTORS 1 I

if not stated otherwise. The regularity conditions needed for deriving the results we

shall introduce in the following, can be found in Bickel & Ghosh (1990).

For convenience, we let Il = I l ( @ ) = (IG),,, be the per observation Fisher

information matrix, jl = ( j i j ) be the per observation observed information rnatrix

evaluated at ê, j;' = (3") and also

The study of matching priors via the posterior quantiles dates back to Welch &

Peers (1963) who worked on the scalar parameter case, i.e. p = 1 with no nuisance

parameter. Let O:,)(X) be the a-quantile of the posterior distribution of 6 . i.e.

what Welch & Peers did is to seek priors such that

where Po refers to probability in the repeated sampling model. P ~ o ~ s satisfying

the above conditions are known as the first order matching priors. Welch & Peers

showed that the first order matching prior is Jeffreys' prior? n(0) oc 1~(6) ' /* .

It is a naturd step to move to the next asymptotic order. Welch & Peers

(1963) considered also seeking priors such that

CEAPTER 2. MATCEING PRIORS 12

and

such priors being called the second order matching priors. Welch & Peers found

that in this consideration, the prior a is a second order matching pnor if and only

if it satisfies both - $ { I ; " ~ K ) = O, the condition for h s t order matching, and in our

notation

It tums out that there is no solution in generd, but if the skewness of the score

function doesn't depend on the parameter 8, then Jeffreys' pnor ~ ( 8 ) cc 1~(8)''* is

the second order matching prior. Second order matching via the quantiles therefore

depends on the model.

Peers (1965) extended the study of Welch & Peers (1963) by including nuisance

pàrameters. Let $1 be the parameter of interest and B:( , ) (x) be the a-quantile of

the posterior marginal distribution of B I , i.e.

Peers sought priors such that

He found that in order to have the first order property, the priors should satisfy the

following partid differential equation

It turns out that this partial differential equation has infinitely many solutions.

Peers discussed possible solutions for this equation, but there is no explicit form

CHAPTER 2. MATCarNG PRIORS 13

and each mode1 must be investigated separately- Peers (1965) suggested using a

pnor which satisfies the first order matching conditions for each component of O, i.e.

the parameters of interest are each cornponent of 8, but typically there is no such

prior.

An important step was made by Tibshirani (1989). Using the result of Stein

(1985) and parameter orthogonality, he obtained an explicit f o m of the first order

matching priors. Let the parameter of interest di be orthogonal to the nuisance

parameters (O2' - -a- , O p ) , ive. Ii t = O for al1 0, i = 2, - - -, p. From equation (2.21, the

solutions are

r (@) a g(e2, - - O , ep)~:i2, (2.3)

where g is âny smooth positive arbitrary function. This kind of the priors are termed

Tibshirani's first order matching priors via the posterior quantiles. Nicolaou (1993)

gave a rigorous proof of the result of Tibshirani (1989).

It is interesting that Tibshirani's first order matching priors have the form of

Berger-Bernardo's reference p io r for a particular choice of g if roles of the parameter

of interest and nuisance parameter were switched. Berger (199%) made a comment

on this phenornenon (see also Kass & Wasserman, 1996). In regard to the choice of

the arbitrary function g, Tibshirani (1989) (see also Peers, 1965; Datta, 1996) noted

that, one c m consider each component of 0 as parameter of interest, but global

parameter orthogonalization is difficult and sometimes it is not possible, and also

this consideration can lead to no solution (see also Ghosh 9r Mukerjee, 1993).

Mukerjee & Dey (1993) considered matching to the next order in the case of

one nuisance parameter. They obtained one more partial differential equation, in

addition to (2.2) of Peers (1965), i.e. second order matching pnors should satisfy

two partial different ial equations. Their derivation is in the general non-ort hogonal


parameter setup. Most of the time these two partial differential equations lead to

a unique solution, but sornetimes there is no solution and sometimes al1 the first

order matching priors are second order matching priors. Mukerjee & Ghosh (1996)

generalized the results of Mukerjee 9t Dey (1993) to the case of several nuisance

parameters. They showed that a p io r ?r is a second order matching pnor via the

posterior quantiles if and only if it satisfies both equation (2.2) and the following

equat ion

Under parametric orthogonality, the second order matching priors are

where g = g(02 , - - - ,O,) sat isfies

and = Eg[{& log f (XI; If there is no nuisance pararneter present,

then the above equation becornes $ {1<1,1,1 1 ~ ~ ' ~ ) = O , showing t hat Jeffreys' prior

~ ( 0 ) oc 1 ~ ( 9 ) ' / ~ is the second order matching prior if and only if the skewness of

the score function is free of the parameter O , which is the result of Welch & Peers

(1963).

In the case of no nuisance parameter, equation (2.4) reduces to the following

This equation is different frorn ( 2 4 , the second equation of Welch k Peers (1963).

But if combined with the first equation, then the two equations are the same as the


two equations of Welch & Peers (1963). This is because Welch & Peers (1963) eval-

uated the moment generating function while Mukerjee k Dey (1993) and Mukerjee

k Ghosh (1996) evaluated directly the coverage probability. The discussion of this

paragraph is due to the fact that, if we consider the two-sided posterior intervals of

equal tail areas to have the frequentist coverage validity, then the only condition is

to let the priors satisfy (2.4), the second partid difFerential equation of Mukerjee &

Dey (1993) and Mukerjee & Ghosh (1996). We shall discuss this issue later.

In specific situations, probability matching pnors via the posterior quant iles

were considered by several authors. For details please refer to Lee (19S9), Datta k

Ghosh (1995a), Sun & Ye (1995, 1996), Ghosh Carlin & Srivastava (1995), Ghosh

& Yang (1996) and Garvan & Ghosh (1996).

2.3 Matching Priors via the Distribution Func-

Matching via the distribution function follows from the relationship of the

posterior quantiles to the posterior distribution. Ghosh & Mukerjee (1993b) and

-11 1/2 6 A Mukerjee& Ghosh (1996) considered thes ta t i s t icT= (nl j ) ( 1 -Oi), the pos-

terior standardized version of O1. While in Datta & Ghosh (1995a), they considered

a standardized version of a parametric function. The matching pnors based on the

distribution function of the statistic T are the priors such that

for aU real t which is free of O and independent of 2. In the error te-, i con be 2 or

3 which corresponds to first order matching or second order matching respectively.

It turns out that the first order matching priors should satisfy the same partial

CHAPTER 2- MATCHING PRiORS 16

differential equation (2.2) as via matching the posterior quantiles of O1 (Ghosh &

Mukerjee, 1993; Datta k Ghosh, 1995a). The second order matching priors axe

different from those obtained via matching the quantiles (Mukerjee & Ghosh, 1996).

In addition to equation (2.2), they should ais0 satisfy the following two partial

differential equat ions

and

& = rsi - Tsi . From the performance of some examples (see 52.7, Case

Studies), it seems that these conditions are more difficult to satisfy than those

obtained via matching the posterior quantiles. In the scalar parameter case, we

obtain that in $4.4, matching via the distribution function is equivalent to matching

via the postenor quantiles using the statistic r , and can lead to matching via the

posterior quantiles if using the statistic p.

One question might a i s e that, why is first order matching via the distribut ion

function of T the same as that via the posterior quantiles of BI, but second order

matching is different? To some extent, this phenornenon can be explained by the

use of the following lemma. We cal1 it the generalized Cornish-Fisher inversion. For

details, please refer to Barndofi-Nielsen & Cox (1989, Chapter 4) and Barndofi-

Nielsen & Cox (1994).

Lemma 2.1 Suppose the distribution function of Y has the following form

CHAPTER 2. MATCEING PRIORS 17

We assume that the inverse of F, is well defined. Let Fn(y,) = a, Le. y, is the

a-quantile of F,, then y, has the following form

where 8(za) = a, and

PROOF: According to the distribut ion function given above,

Now we expand functions @(y,), #(y,), Rl(ya) and R2(y,) iri y, about z, to get

Let the terms O(n-II2), O(nV') equal to zero and @(z,) = a, we then obtain the

result as stated, O

Suppose that the posterior distribution of T has the following expansion

CHA PTER 2. MATCHING PRIORS 18

and the fiequentist distribution of T has the following expansion

First order matching via the distribution function is to let the pnor x(B) satisfy the

following equation with an error of order 0 ( n - ' / 2 ) ,

for al1 red t.

Now we consider matching via the quantiles of O1. Firstly we note that, from

the definition of T , the a-qumtile of B1 c m be changed to the form of the a-quantile

of T. From the above lemma, the postenor a-quantile of T can be expressed as

First order matching via the posterior quantiles of O1 is to let the p ior n satisfy

On the other hand, the a-quantile of the frequentist distribution of T c m be ex-

pressed as

So first order matching via the quantiles is equivalent to letting the prior T satisfy

for all z,, with an error of order O(n-'j2). And so first order matching via the

distribution function is the same as first order matching via the quantiles. Also

from the above lemma, we c m see that, second order matching via the distribution

function is not the same again in general as that via the quantiles.

CHAPTER 2. MA TCHING PRIORS 19

2.4 Matching Priors via the Posterior Regions

Matching via the posterior regions is a generalization of matching via the

posterior quantiles. Matching via the posterior regions c m &se from the inversion

of certain statistics, e.g. the likelihood ratio (LR) statistic, the profile LR stat istic,

the score statistic and the highest posterior density (HPD) regions.

The earliest study of matching via the posterior regions is due to Peers (1968)

who discussed three kinds of the posterior intervals in the scalar parameter case. The

first is the two-sided Equnl tail areas, the second is the Equal likelihoods and the

third is the Equal posterior densities. We consider these in the subsections below,

under the names: Equal tail areas consideration, Frequentist statistics consideration

and HPD consideration.

2.4.1 Equal Tail Areas Consideration

Peers (1968) considered constmcting two-sided Equal tail areas posterior in-

tervals (OL, du), where

He obtained that, for the Equal tail areas posterior intervals to have frequentist

coverage probabilities to second order, the prior K should satisfy the iollowing dif-

ferential equation

This differentid equation is not the the same as ( 2 4 , the second one in Welch &

Peers (1963) if we just let the error term of order O(n-') be zero. But it is equivalent

to (2.6), the second equation of Mukerjee & Dey (1993) who considered second order

CITA PTER 2. MATCHING PRIORS 20

matching via the posterior quantiles. This type of the Equal tail areas intervals have

the merit of balance of posterior probability on both sides. If we want them to have

the same balance of frequentist coverage probability, then it turns out to be the same

consideration as the second order matching via the posterior quantiles of Welch Sr

Peers (1963).

In the case of one nuisance parameter, for the two-sided posterior intervals of

Equal tail areas to have correct frequentist coverage probabilit ies to second order,

the prior should satisfy the second partial differential equation of Mukerjee Sr Dey

(1993). If the parameters are orthogonal, the equation is

This result can be extended to the general case of several nuisance parameters. It

turns out that, the prior x is the matching prior under the current consideration if

and only if it satisfies the partial differential equation (2.4) of Mukerjee & Ghosh

(1996).

2.4.2 fiequentist Statistics Consideration

Peers (1968) discussed two-sided posterior intervals (OL, Ou) of Equal likeli-

hoods, where [(OL; X) = [(Ov; x). He obtained that, for the Equal likelihoods pos-

tenor intervals t o have frequentist coverage probabilities to second order, the prior

a should satisfy differential equation

Since one always assumes the maximum likelihood estimate is unique for asymptotic

expansion, this Equal likelihoods consideration is thus equivalent to the likelihood


ratio (LR) statistic consideration, and we s h d refer to the latter in general. Severini

(1991) considered constructing the frequentist probabilj ty intervals based on the LR

statistic. He obtained that, for the intervals to have posterior coverage probabilities,

the prior a should satisfy one differential equation which is equal to (2.11) of Equal

likelihoods of Peers ( 1968).

In the multiparameter case, Ghosh & Mukerjee (1991) considered matching

the posterior Bartlett correction and the frequentist Bartlett correction of the LR

statistic to have an error of order ~ ( n - ' 1 ' ) . Using the prior satisfying the match-

ing condition, the pos terior regions based on t 6e posterior Bart let t corrected LR

statistic have lrequentist validity to the second order. Following the lines of Ghosh

& Mukerjee (1991), it is not difficult to see that, matching the posterior Bartlett

correction aad the frequentist Bartlett correction to have an error of order O(n-I l2 )

is equivalent to matching the posterior regions based on LR statistic to have an error

of order O ( T ~ - ~ / ~ ) . The LR statistic consideration is the multiparameter version of

Equal likelihoods of Peers (1968). Ghosh & Mukerjee (1991) obtained that, the prior

R is the matching prior via the LR statistic consideration if and only if it satisfies

the following partial differential equation

In the two pârameter case, under parametric orthogonality, the above equation

becornes

When there is only one parameter, the above equation reduces to (2.1 1) of Equaf

likelihoods of Peers (1968).

Rao & Mukerjee (1995) considered the posterior regions based on the modified

score statistic to have frequentist validity to second order. The original score stat istic

is S = S ( 6 : 2) = n-11'(6)T~;1e ' (0) (Rao, 1948). What they considered is the

modified version S' = S'(a; X) = n - l l ' ( ~ ) ~ j ~ ' l ? ' ( ~ ) . Rao & Mukerjee (1995) also

discussed a rnodified Wald's statistic CV' = n(ê - ~ ) ~ j l ( ê - 8). The original Wald's

statistic is W = n(6 - B ) ~ I ~ ( B ) ( ~ - 6). They obtained that, for the postenor regions

based on the modified score statistic S* or the modified Wald's statistic W* to have

frequentist validity to the second order, the prior should satisfy the same two partial

different id equat ions

These two partial differential equations are stronger than (2.12) of Bartlet t correc-

tions consideration of Ghosh & Mukerjee ( l 9 9 1 ) , which can be seen as a special case

of the LR statistic consideration. A c t u d y the s u m of the above two equations is

exactly equal to equation (2.12) of Ghosh & Mukerjee (199 1) . In the one parameter

case, Rao & Mukerjee (1995) mentioned that consideration of the original version of

Wald's statistic leads to the sarne result as does the consideration of the rnodified

version, while consideration of the onginal version of the score statistic leads to a

different result cornparing to that of the modified one. But al1 of the considerations

in the one parameter case lead to equation (2.11) of Equal likelihoods of Peers (1968).

For more discussion on Wald's statistic, please refer to Lee (1989).

Ghosh & Mukerjee (1992b) discussed rnatching the posterior and frequentist

Bartlett corrections of the profile LR statistic in two orthogonal parameters case. As

we mentioned for the consideration of the LR statistic of Ghosh & Mukerjee (1991),


the current consideration is equivalent to matching the posterior regions based on

the profile LR statistic to have an error of order ~ ( n - ~ ' ~ ) . Let el be the parameter

of interest and O2 be the nuisance parameter. The profile LR statistic is

where d2(e1) is the maximum likelihood estimate of OZ holding dl constant. They

obtained that, under parametric orthogonality, the p i o r T is the matching prior

based on the profile LR statistic A, if and only if it satisfies the following partial

different i d equation

At the same tirne, Ghosh & Mukerjee considered matching the posterior and fre-

quentist Bartlett corrections of the conditional LR statistic (Cox & Reid, 1987)

where

is the conditional log-likelihood suggested by Cox & Reid (1987) on the ground of

conditioning, 81 is the maximum likelihood estirnate of el based on tc(OI). In this

consideration of A,, under parametric orthogonality, the pnor rr is matching prior if

and only if it satisfies the following partial differential equation


2.4.3 HPD Consideration

Highest posterior density (HPD) consideration d s o dates back to Peers ( L968),

who considered the scalar parameter case. He used the oame Eqzial posterior densi-

tics, for the condition n(& 1 X) = r(Bu 1 X) for the posterior interval (OL, Ou). He

obtained that, for the Equal postenor densities intervals to have frequentist cover-

age probabilities to second order, the priors should sat isfy the following differeotial

equat ion

Since the assumptions needed for the asymptotic expânsion ensure that the poste-

rior mode is asymptoticdy unique, this Epual posterior densities consideration is

equivalent to the highest posterior density (HP D ) consideration. Severini ( 199 1)

considered the HPD intervals in one parameter case. He obtained that, for the

intervals to have the frequentist coverage probabilities, the prior R should satisfy

one differential equation which is equivdent to ('2.18) of Equal posterior densities of

Peen (1968).

Resdts on HPD consideration in the multiparameter case are available. Ghosh

& Mukerjee (1993a) considered the following HPD region

where kl-, is defined by

Then the pnor T such that


if and only if it satisfies the following partial differential equation

In the two parameter case, under parametric orthogonality. the above equat ion

becomes

Comparing to equation (2.13) of the consideration of LR statistic of Ghosh & Muk-

erjee (1 991), the frequentist version of HPD consideration, the difference is just the

signs. When there is only one parameter, then the above equation reduces to (2.15)

of Equal posterior densities of Peers (1968).

In the case with nuisance pararneter, Ghosh & Mukerjee (1995b) obtained

results of the HPD regions having frequentist coverage validity to second order with

q parameters of interest and p - q nuisance parameters. In the case of one pararneter

of interest wit h one nuisance parameter, under global pararnet ric ort hogonality, the

pnor ?r is matching prior via the HPD regions if and only if it satisfies the following

partial differential equation

It is interesting to compare equation (2.21) wit h equation (2.20) of two parameters

with no nuisance parameter case. We can see that, if the pnor is the matching

prior via the HPD regions for each component pararneter of the two parameters,

then this pnor is the matching prior via the HPD regions for both parameters of

interest. But the matching pnor for two parameters of interest doesn't have to be

the matching prior via the HPD regions for any component pararneter of interest.


If me compare (2.21) of HPD with (2.10) of Equal tail areas, the only difference is

that, (2.10) of Equal tail areas has one more factor $ & { ~ l & n ) . If we restnct

the priors in a class suggested by Tibshirani (1989), i-e. ~ ( 0 ) = g ( ~ 2 ) I:/*, then if

Kli 1i3I2 is independent of the parameter of interest, matching via the HPD regions

is equivalent to matching via the Equal tuil areas.

2.4.4 Other Issues

Comparing to matching via the postenor regions, Severini (1993) discussed

how to choose posterior intervals for a given prior, such that the postenor intemals

have the frequentist coverage validity to the second order. In multiparameter case,

Ghosh & Mukerjee (1995a) discussed how to choose perturbed eiiipsoidal and HPD

regions for a given prior, such that the posterior regions have frequentist validity to

the second order.

The matchings of Ghos h & Mukerjee (l99l, 1993b), via the posterior regions we

cal1 here, were originally considered by using the pos terior and frequentist Bartlet t

corrections. For discussion on Bartlett corrections, please refer to Cox Sc Reid ( 1987),

Bickel & Ghosh (1990), McCullagh & Tibshirani (1990), DiCiccio k Stern (1994)

and Ghosh & hlukerjee (1992b, 1994).

2.5 Invariance Properties of Matching Priors

For convenience, we assume t hat 0 = (Bi, . - - , O J T is the current version of the

parameter with pnor K ( B ) , and that A = (Xi, - --, x,)~ is the original version of the

parameter, where

pnor with respect

the transformation from X to 0 is one-to-one. Invariance of the

to the parameterization means that


has the same property as does a(0) in the B parameterization. This invariance

property is desirable for priors on logical grounds. Also, if the prier is invariant, then

inference is more convenient. Therefore, we should investigate this property on priors

derived from any procedure. Matching pnors have played more and more important

role in serving as noninformative priors in Bayesian inference. Mukerjee & Ghosh

(1996) showed that matching pnors via the posterior quantiles and the distribution

function do have the parameterization invariance property (see also Datta & Ghosh,

1996). But for other matching priors, there seems no literature discussing this issue

so fax. In this section we shall investigate the invariance properties of different

matching pnors currently available.

We note that most of the matching procedures can be expressed in terms of

a scdar valued function of interest, denoted by + ( O ; x ) . The defining criterion is

to let the posterior region d e b e d by $ ( O ; X ) have some posterior probability level,

and then match it to have the same frequentist probability level to some order. For

example, rnatching via the posterior quantiles is to let +(d ; x) = B I ; matching via

the LR statistic is just to let +(O; z) = t'(O; x). In the following we develop an

argument which is a generalization of Mukerjee & Ghosh (1996).

Theorem 2.1 Suppose that under the current parameterization, the log-

Iikelihood function is t ' (O; 2) where the parameter 9 has prior ~ ( 0 ) : and $ ( O ; .%)

is a scalar vdued function of interest depending on 0 and X. Under the original

pararneterization, the log-likelihood function is C(X; X) = !(0(X); x), and $'(A; .%)

is the scalar valued function of interest. If d'(A; 2) cc $@(A); x)', i.e. the defin-

ing function of interest is parameterization invariant, then the matching prior is

parameterkat ion invariant.

'ExcePt a factor of no parameter involved.


PROOF: Let k ( x ) be a scalar valued function depending on x only. Simple

steps (i) and (ii), the result is followed by (iii).

(il. PT( + ( O ; X) < k ( ~ ) 1 x ) = Pr ( $(@(A); Z) < k ( X ) 1 x )

( i ) Pd( $ ( O ; X ) < k ( Z ) ) = Po( + ( B ( X ) ; X) < k ( ~ ) )

(iii). If Pr ($(& x) < k ( ~ ) 1 x ) = P e ( ~ ( 9 ; X) < k ( ~ ) ), then we have

The result of Theorem 2.1 doesn't depend on the order of the matching, but

if we consider matching with some asymptotic order, then this result is true for the

same asymptotic order. In the following various situations, we appiy Theorem 2.1

to see whether the matching priors are parameterization invariant.

(1 ) . Matching via the posterior quantiles. Mukerjee & Ghosh (1996) showed

that it is parameterkation invariant. If we use Theorem 2.1, then simply take the

function $(O; X) = dl, and $'(A; 2) = 9,(X). If we consider a parametric function

h(O), then toke $ ( O ; 2) = h(9). and $'(A; 2) = h(B(A)).

(2 ) . Matching via the distribution function. Mukerjee & Ghosh (1996) obtained

that the considerations of Ghosh & Mukerjee (1993b) and Mukerjee & Ghosh (1996)

are parameterïzation invariant. If we use Theorem 2.1, then it is similar to the case

of (1).

(3). Eqaal tail areas consideration. It is parameterization invariant. This is

because the partial differential equation from the consideration of Equal tail areas

is the second equation of second order matching via the posterior quantiles.

CHAPTER 2- MATCHING PRIORS 29

(4). LR statistic consideration. This consideration is equivalent to taking the

function $(O; X ) = ((0; x), and $'(A; X) = !(@(A); x). So the rnatching ptior via

the posterior regions through the LR statistic is parameterization invariant.

( 5 ) . Score statistic and Wald's statistic considerations. The consideration of

the modified score statistic S' = n-l t ' ( ~ ) ~ j ; l t'(t9) is not pararneterization invariant

( s e Example 2.3). Neither is the consideration of the Wald's statistic because it

leads to the same result as the consideration of the modified score statistic (Rao &

Mukerjee, 1995). But if one use the original score statistic S = n-1t'(8)T~;'t'(0),

since it is parameterization invariant, by Theorem 2.1 the matching prior obtained

by consideration of the original score statistic is parameterization invariant.

( 6 ) . Profile LR statistic consideration. First we introduce a notion, interest-

respecting transformation, originally from Barndofi-Nielsen & Cox (1994). Let be

the current parameter of interest and A l be the original parameter of interest, then

the transformation should be such that BI is a function of Al only, i.e. (A1, X2) being

transformed to (O1 (Xi), &(A1, X2) ) . Applying Theorern 2.1 to this case, +(O; X) =

e(ûl,$(81); x), and d'(A; x ) = tR(X1, i2(&); x). The establishment of

is due to the following

So the consideration of the profile LR statistic is parameterization invariant as long

as the transformation doesn't change the pârarneter of interest essentially.

(7). Conditional LR statistic consideration. There is no direct application of

Theorem 2.1 to this case. In transformation rnodels, DiCiccio Field & Fraser (1990)


obtained t hat the condit ional likelihood approximates the marginal li kelihood of

the parameter of interest to third order, so the use of the conditional likelihood in

transformation models is equivalent to the use of the marginal likelihood to third

order. But the margind likelihood doesn't have the nuisance pararneter aiready.

The use of the marginal likelihood leads to ô case similar to (3) of the LR statistic

consideration without nuisance parameter. In exponent i d models. the condit ional

likelihood approximates the likelihood of the parameter of interest to third order

(Skovgaard, 1987; Fraser & Reid, 1993), and the likelihood of the parameter of in-

terest doesn't have the nuisance paarneter already. So it is similar to the case of

transformation models. And finally what we have is that, to second order matching,

the conditional LR statistic consideration is parameterization invariant in transfor-

mation models and in exponential models. For general models, the answer is oot

cleax asd so further investigation is required.

(8). HPD consideration without nuisance parameter. Generally speaking, this

consideration is not parameterization invariant. In the one parameter case, this can

be verified directly from the differential equation. In Example 2.1 of the Case Studies

section, we can see how the matching priors change after making a parameter tram-

formation. But if the transformation is a linear transformation, i.e. [ $ I= constant,

then the HPD consideration is pararneterization invariant with respect to this linear

transformation. Let the likelihood function be L(0: X) in the current parmeteriza-

tion, in the original parameterization the likelihood function L'(A; x) = L(B(A); z). So $(0; 3) = L(0; x ) T ( o ) , and $'(A; X ) = L(O(X); Z ) r ( 0 ( h ) ) 1 1. If the transfor-

mation is linear, i.e. 1 I= constant, then +(#(A); X ) oc $-(A; x) (except a factor

of no pararneter involved), and so it is paameterization invariant with respect to

this linear transformation.


( 9 ) . HPD consideration with nuisance parameter. From case (8), we know

that it is not invariant of the transformation of the parameter of interest (except

linear). Now we consider the following transformation, dl = XI, d2 = @,(A1, A 2 ) , so

1 8 I=I 2 1. In the curent 0 setup, the defining function of interest is 1/49; X) and

Combined with case (8), we obtain that, the consideration of the HPD is param-

eterïzation invariant with respect to the linear transformation of the parameter of

interest and any transformation of the nuisance puameter.

2.6 Discussion

As we have seen that there are a variety of methods having been proposed for

deriving matching priors. Matching priors are not jus t serving as noninformat ive

prïors for Bayesian inference, but also providing a way for constructing accurate

confidence regions. Among dl the methods proposed, it appears that matching via

the posterior quantiles is the most important one. This is because: (1). matching

via the posterior quantiles has a naturd interpretation in Bayesian inference and

frequentist inference; (2). Tibshirani's first order matching priors via the posterior

quantiles, aIthough they have an arbitrary factor, provide a guide for choosing priors

from other considerations (we s h d discuss this in the following), and the second

order matching via the postenor quantiles c m lead to a unique prior in many cases;

(3). it is parametenzation invariant.

Matching via the distribution function is quite different from matching via the

quantiles on the second order although they are equivalent to the first order. It

seems that second order matching v i a the distribution function depends on how the

variable is standardized (see the results in 54.4). Also, in some examples, second

order matching via the distribution function leads to no solution but second order

matching via the posterior quantiles has a unique solution (see the Case Studies

later). So it seems that, as we mentioned in 82.3, the conditions of second order

matching via the distribution function are more difficult to satisfy than those via

the posterior quantiles (see also 54.4). On the other hand, second order matching

priors via the distribution function c m not be used in general to construct accu-

rate confidence intervals or regions comparing to matching priors via the posterior

quantiles and via the posterior regions.

In contrast to matching via the posterior quantiles and matching via the dis-

tribution function, matching via the posterior regions can have many or infinitely

many solutions. The extreme case is that, for any given prior we can construct

posterior intervals or regions by perturbing the first two order terms, such that they

have the frequentist coverage validity to the second order (Severini, 1993; Ghosh E

Mukerjee, 199Sa). So for the purpose of serving as noninformative priors, we should

give some constraint in order to narrow down the matching priors. One natural

constraint is to consider the priors within the class of Tibshirani's first order match-

ing priors. From some examples, we notice that considerations of the profile LR

statistic, the conditional LR statistic and the HPD regions, and sometimes other

considerations, can lead to a unique solution as does the second order matching via

the posterior quantiles. So it seems that this constraint on the priors to be within

the class of Tibshirani's first order matching priors is reasonable and can often lead

to a unique solution. But sometimes if there is no solution for second order matching


via the posterior quantiles, it seems that it is not quite possible to find a solution for

rnatching via the posterior regions within the class of Tibshirani's first order match-

ing priors. If we do oot restrict attention to this class, then the matching priors

may not seern sensible for the problem. In this case, we should take further investi-

gation, e.g. comparing to reference priors or Jeffreys' prior. But for the purpose of

constructing accurate confidence intervals or regions, t hey are still the solutions of

the matching equations, although the intervals or regions might have some kind of

perturbation. From the above argument, we can see that in any case we should seek

the matching priors via the posterior quantiles first. Another concern in choosing

among the matching priors via the posterior regions, is that we should examine the

invariance properties of the matching priors, since some of the matching priors are

not pararneterization invariant.

If al1 the parameters are of interest, we feel that the consideration of the LR

statistic is the best choice comparing to the rnodified score statistic consideration

and the HPD consideration. The reason is simply that the other considerations

are not parameterization invariant. Aside from its intrinsic property, the use of

m y parameterization invariant procedure haç an advontage, i.e. we can calculate

the necessary quantities under any parameterization, and thus we can choose the

most convenient one for calculation and finding solutions. For example; in the

independent normal means problem, the calculation in the original parameterization

is straightforward. Sometimes, the modified score statistic c m help in narrowing

down the choice arnong the pnors satisfying the consideration of the LR statistic

(Rao & Mukerjee, 1995). By the way, some authors might prefer the use of HPD

regions, since they axe the shortest intervals of given size in the scalar parameter

case, but there is a choice as how to select the parameterkation.


2.7 Case Studies

Exarnple 2.1 Normal Distribution Mode1

Suppose X foliows a normal distribution with mean p and variance a2. Let p be the

parameter of interest o be the nuisance parameter. Under t his parameterization,

the parameters are orthogonal. After detailed calculation we have, Ill = o - ~ , =

%c-~, = Ki,il = O, Kt12 = ~ c T - ~ , K222 = 1 0 ü 3 , iC2,22 = -6cr3 , 1i221 = 0.

Jeffreys' prior under his general rule is O-*, but he prefers the other one a-' which is

also the right Haar measure and the reference prior. Tibshirani's first order matching

priors are ~ ( o ) cc g ( o ) ü l . Within the class of Tibshirani's priors, there is only one,

~ ( a ) o: O-', i.e. taking the function g as a constant, which is also the matching prior

for the considerations of the second order matching via the quantiles, the profile LR

statistic, the conditional LR statistic and the HPD regions with nuisance parameter

case. If two parameters p and a are both of interest, then for the LR statistic

consideration, priors a(o) a O-', a-' satisfy the matching condition; while for the

HPD regions consideration, priors ~ ( a ) a a-', 0' satisfy the matching condition.

Now we consider another parameterization. Let BI = p be the parameter

of interest, e2 = a2 be the nuisance parameter. Under this parameterization, the

parameters are also orthogonal. The transformation of the parameters is just on

the nuisance parameter. In this setup, we have Ill = O;', 122 = Klll =

= O, KlI2 = KZZ2 = 20;~, = K221 = O. Tibshirani's first

order matching priors are n(&, 02) oc 9(82)~;"2. Within the class of Tibshirani's

gives ~ ( p , 4) oc O-', which is also the rnatching prior for the considerations of the

second order matching via the quantiles, the profile LR statistic, the conditional


LR statistic and the HPD regions with nuisance parameter. If both parameters O1

and B2 are the parameters of interest, then under the LR statistic consideratioo, the

priors ~ ( 0 ~ ~ 82) cc O;', 0q3 satisfy the matching condition. Transforming back to the

( p , a) pararneterization, the above priors become ~ ( p , O) OC O-', a-5 respectively.

This indicates that matching piors by using the LR statistic is parameterization

invariant in this model. Consideration of the HPD regions without nuisance pararn-

eter leads to a different result. It turns out that, priors a(& $2) O( 6;' $2 satisfy the

matching condition via the HPD regions. Transforming back to the onginal (pl 0)

pararneterization, give n(p, a) oc a-', a'. These are different from those under the

same consideration but in the ( p , a ) setup, which give priors O-' and as. This

indicates that HPD regions consideration is not invariant to the parameterization.

Now we consider a simple rnodel X - N ( 0 , 02) with only one parameter. If we

use a as parameter, then LR statistic consideration has solutions ü1 and ü3; HPD

consideration has solutions O-' and 03. But if we use dl = a2 as parameter, then

LR statistic has solutions d;' and trançforming back to 0 parameterization,

which give a-' and ü3 respectively; while the HPD consideration has solut ions 9;'

and O:, transforming back to the original o parameterization, which give a-' and

o5 respectively.

Example 2.2 Ratio of Independent N o n a l Means

Suppose that X and Y are independent normally distributed with means a and /?

and variances one respectively. The parameter of interest is the ratio of the means

O1 = pla. Now we make a transformation from (a, p) to (O1, 02) with 0, = ,/m. Under this parameterization, the parameters O1 and O2 âle orthogonal. After detailed

calcdations, we have Ili = 8:(1 + 12* = 1, Klll = 60&(1 + 1<1,11 =

-26&(1 + Ki12 = -&(l + K222 = K2,22 = K221 = O. Tibshirani's

CHAPTER 2. MA TCHING PRIORS 36

fint order matching priors are r(O17 OZ) OC g(02)02(1 + O:)- ' ; transforming back to

(aJ) parameterizationgive r(a ,p) cc g(Jw)). Within the class of Tibshirani's

priors, there is only one prior s(O1, Oz) a B2(1 + O:) - ' , which is also the matching

prior for the considerations of the second order matching via the quantiles, the LR

statistic, the profile LR statistic and the conditional LR statistic. Pnor x(B1, 82) a

02(1 + O:)-', transfoming back to the original parameterization, gives a(&, P ) a

cas tan t , i.e. flat prior in the original parameterization, also Jeffreys' prior. There is

no solution within the class of Tibshirani's priors for the considerations of the H P D

regions with nuisance parameter and no nuisance parameter cases? the modified

score statistic (Rao & Mukerjee, 1995) and the second order distribution function

(Mukerjee & Ghosh, 1996). But the considerations of the HPD regions with nuisance

parameter and no nuisance parameter cases do have solutions, r(el, 02) a 02(1 + 0:)

satisfies bot h considerations.

Example 2.3 Non-invariance of the consideration of Modified Score Statistic

Suppose that X - N(cr,l), Y - N(/3,1) and they are independent. Both pa-

rameters cr and /? are of interest. Under (a, B ) paxarneterization, global paramet-

ric orthogonality holds, Ill = 1, 122 = 1, and all the other quantities involved

in our calculâtions are zero. It is easy to obtain that, for the consideration of

the rnodified score statistic (Rao & Mukerjee, 1995), the general solutions are

.(a,@) a (Ala + B4(A2p + B2), where Ai, Bi are constants. So the flat prior

in this (a, ,û) setup is a solution. But if we make a transformation from (a, ,@) to

(01, 02), where O1 = pja, O2 = I/-, then it becomes the case as in Example

2.2. The flat prior in the original (a, @) setup becomes n(B1, &) oc 02(1 + O:)-' in

the current (el, 82) setup. Frorn Exarnple 2.2 we know that, this prior is not the

solution for the consideration of the modified score statistic. So the consideration

CHA PTER 2. MATCHING PRIORS 37

of the modified score statistic is not parameterization invariant.

Example 2.4 Ratio of Independent Exponential Means

Suppose that X and Y foilow exponential distributions with rneans pl and p2 re-

spectively. The parameter of interest is the ratio of the means 01 = p 2 / p l . We make

a transformation from ( p l , p 2 ) to (Bi , Oz), where B2 = plp2. Under this parame-

terization, global parametric orthogooality holds. I l l = $ 0 ~ ~ ~ 122 = f d ~ ~ ? K l l l =

i ~ ; ~ ~ K ~ , ~ ~ = -W3 , Kl12 = $B;28;1? KZî2 = W 3 , 1<2,22 = 4 2 7 1 6 ~ ~ = O.

Tibshirani's first order matching pnors are K ( & , 82) oc g ( ~ z ) B ; l ; t ransforming back

to the onginal (pi , p2) parameterization give ?r(pl , p l ) oc g ( p 1 p 2 ) . Within the class

of Tibshirani's priors, only n(Bi, 02) cc 8;'8;', or a ( p 1 7 p z ) oc ( (11p2) -1 in the origi-

nal parameterization, is dso the matching prior for the considerations of the second

order matching via the quantiles, the LR statistic, the profile LR statistic, the con-

ditional LR statistic, the modified score statistic and the HPD regions with nuisance

parameter. The pnor r ( p l , p 2 ) a ( p 1 p 2 ) - l is also Jeffreys? prior ând the reference

pnor of Bernardo (1979).

Example 2.5 Exponential Regression Mode1 of Cox & Reid (1987)

Suppose that x, - , Y, are independent and k;. follows exponential distribution

with mean O2 exp(-Olai), where ai is constant and CL, ai = O . The parameter

BI can be any r ed value, the parameter O2 must be positive. The parameter of

interest is 91. Under this parametrization, global pararnetric orthogonality holds.

Ill = a:+- .-+a$ 122 = + B ; ~ , Kl l l = -1(lvll = a:+-- -+a:, Kl12 = (a:+- -+a:)9;'.

Tibshirani's first order matching priors are r(B1, &) oc 9(B2). Within the class of

Tibshirani's priors, x(Bl7 0 2 ) OC Oc1 is the only one prior which is also the matching

prior for the considerations of the second order matching via the quantiles, the

conditional LR statistic and the HPD regions with nuisance parameter case.

Chapter 3

Matching Priors in the Product of

Normal Means Problem

We consider matching priors in the product of normal means problem (Berger

& Bernardo, 1989) and make cornparisons among the Bat prior and two reference

priors derived in Berger & Bernardo (1989). We suggest a class of priors for this

problem and obtain a noninformative prior based on matching properties.

3.1 Introduction and Notation

Suppose we have a vector of i i d . observations 2 = ( ( X I , K ) , -. , ( X n 7 K))T

from Z = (X, Y) where X, Y are independent and follow normal distributions

with mems a, B respectively and variance 1. We also assume a > O, ,O > 0. The

parameter of interest is the product of the means, di = a@. The classicd approach

encounters difficulties in this problern, as does the standard noninformative pnor

approach (Efron, 1986). Berger & Bernardo (1989) used the reference prior approach

to consider this problem and develop two reference priors. We now consider the

CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 39

choice of the noninformative priors via the cornparison of the frequentist coverage

probabilities of posterior probability intervals.

The three noninformative pnors we consider a e the flat prior rU((r,P) OC 1

and the two reference priors of Berger & Bernôrdo (1989):

Now we make a transformation of the parameters from (a, p ) to ( O 1 , 0 2 ) , where

B2 = a* - 0'. It is easy to obtain that

The Jacobian of the transformation from (a,/?) to is

The per observation information matrix in the new parameterization is

so the parameter el is orthogonal to O*. The three noninformative priors become

n, (dl, 02) x (40: + o ; ) - ~ / ~ ,d

In this chapter we shall use the same notation as in Chapter 2. After detailed

calculat ion, we have


3.2 First Order and Second Order Matching Pri-

ors

First we mention that we can obtain the matching priors by using the partial

differentid equations (2.2) and (2.4), but later we have to calculate the coverage

probabilities, so we need to introduce the following result.

Suppose that we have the posterior y-quantile to order 0(ne3l2) under the

prior x , Le.

The frequentist coverage probabilities of the one-sided posterior probability intervals

then can be obtained by Theorem 1 of Mukerjee & Dey (1993) (see also Mukerjee

& Ghosh, 1996)

where @(z,) = y , O < - y < 1, and

This result is true in general as long as the parameters ( B I , 8 4 are orthogonal.

If we consider the first order matching via the posterior quantiles, then the prior

R should satisfy Tl (n; el, Oz) = O. Solving the equation Tl (a; el, 02) = O, we obtain

the first order rnatching priors, r(@&) 0: g(82)~:1/2 = g(e2)(40: + 0 2 ) - ~ / ~ , where

g(02) is any smooth positive function (see Tibshiraai, 1989). Transforrning back

to the original (a, P ) pararneterization, which gives n(a, P ) oc g(a2 - p 2 ) J m .


When g(B2) = 1: which gives a,, i.e. the reference prior n, is the f i s t order matching

pnor via the quantites. The other two priors n, and nr do not satisfy the first order

matching condition. So on this ground A, is a better choice than K. or T,.

It is worth mentioning that if the parameter of interest is Oz = rr2 - ,B2, then

the first order matching priors are ?r(B1, 02 ) a g(81)(49: + where g(Bl ) is any

smooth positive function. Tranforming back to the original (a, ,8) parameterization,

gives a(a, /3) cc g(~/3)Jm. Taking g(&) as 1 and &, t hen gives the priors K,

and T,. So in this case the piors rS and n, are the first order matching priors, but

the pnor nu is not the first order matching prior.

We now consider second order rnatching based on quantiles. The necessary and

sufficient conditions are both Tl@; BI, B2) = O and T 2 ( n ; 4 , 82) = 0- From equation

T&r;B&) = O we obtain solutions ~ ( 0 ~ ~ 0 ~ ) cc g(02)(40: + 0 2 ) - ~ ' ~ . But when we

use a(&, 82) rn g(Oz)(48: + B2)-ll4 in the second equation, they do not satisfy the

equation for any kind of the function g(02). So in this case, there is no matching

prior for second order mat ching via the posterior quant iles.

But if we consider the two-sided Equal Tai1 areas posterior probability intervals

having the same frequentist coverage probabilities to second order, i .e. to ensure

then the only condition is T2(sr; 01, B 2 ) = O. The flat prior ru satisfies t his condition

but the other two reference piors T, and a, do not. So in the sense of accuracy of

the frequentist coverage probabilities of two-sided posterior intervals, the flat prior

a. is better thân the two reference priors.

Within the class of first order matching priors via the posterior quantiles, there

is no solution for matching via the posterior intervals based on the conditiond LR


statistic (Cox & Reid, 1987; Ghosh & Mukerjee, 1992) and the highest posterior

density (HPD) (Ghosh & Mukerjee, 1995). In the above we have obtained that,

the Bat prior nu is the solution for the matching via the two-sided Equal Tai2 arens

intervals. But this prior ru is not the solution for the matchings via the postenor

intervals based on the conditional LR statistic and HPD.

3.3 Cornparison of nu and n,

From the previous discussion, n, doesn't have any good matching properties

if Oi is the parameter of interest, so we do not consider it further. The posterior

quantiles using the prier a, have frequentist coverage to order O ( n - L ) while the

posterior quantiles using K, do not. So the posterior dishibution using the prior

r, is better in describing the location of the puameter dl. But on the other side,

the two-sided posterior intervals via using the prior ir, have frequentist coverage

accurate to order O ( T ~ - ~ / ~ ) while those via using the prior T, just have frequentist

coverage accurate to order O(&). So the choice between a, and a, is still hard to

make. We s h d l go further to see their performance in different situations.

First let us look at the following coverage probabilities

and


After detailed calculation, we have

The criterion of the cornparison is that, the srnaller the absolute values of the func-

tions Tl and T2, the better the priors. When a and /3 are very close, say a = 13,

,P, If a increases, both tend to zero, then Tl(~u;Ol,&) = &-, T2(rS;&,&) = --

but T2(n,; 01, û2) tends to

slowly t han T2 (T,; 01, 02) -

p > a), then for large r l ,

zero faster. If a goes to zero, Tl ( ru ; dl, B2) increases more

When a and 9 are unequd, for example /?/a = 7 (suppose

L T1(*;W2) & = 2, Tz(GO1,02) = & = SO

actually in this case, T1(ru; 01, 62) and T2(7rs; el, Oz) tend to zero at the same rate of

r)-* as r ) increases. But there is one interesting case, if we fix P and let cr go to zero,

then T2(7rS; Bi, 02) is approximately &, but Tl(au; &, 02) still tends to zero. So we

c m Say that, for more accurate coverage of the posterior quantiles, the prior K , is

good for medium to large values of (a ,P) ; for small values of (u,/3), nu is better

than K, although both perform poorly.

In order to have a direct comprehension of the coverage probabilities in both

cases, it is bet ter to have some exact values. Now we define

1 p?(Tu) = -4(~-y)Tl(ru;81i&2), P~(Ts) = 3d(Z7)~2(xs;elre2) fi n

(3.8)

which are the error terms of the coverage probabilities to order ~ ( n - ~ ' ~ ) . We cal1

tbem the shifted coverage probabilities of the posterior quantiles arising from using

the priors ru and x, respectively. Using +y = 0.05, 20.05 = -1.645, and so P0.9S(~Y) =

Po.os (?ru), Po.ss (T.) = - Po.os (r,) In cases of sample size n = 1,5,10, we summarize

in Table 3.1 using different values for (a, B) .

To error of order 0(n-3/2), the posterior quantiles using the prior T, are always

too srnall for a.lI (a, p), being to the left of Bi more often than what we would desire.

CHAPTER 3- PRODUCT OF NORMAL MEANS PROBLEM 44

Table 3.1: Shi'ed Coverage Probabilities to O ( T ~ - ~ / * ) of 0.05 Posterior Quantiles.


Or we c m Say, the posterior dishibution arising from using rr, is shifted to the Mt.

The shifted rate has order O(n-'la), but the direction is always to the left. The

(0.05) 8(0.95) error is cancelled out for the two-sided posterior interval (O, , , ), and thus

the interval has coverage accuracy of order O ( ~ Z - ~ / * ) .

The posterior quantiles arising from using the prior R, also have a shift, but

the rate is in the order of O(n- l ) , and the directions depend on (a, P ) . If a and

p are sirnilar or the ratio 7 of the larger one to the smaller one is less than fi + 6, the quantiles less than B1 happen a bit often in the left side. But in the

right side, the quantiles less than O1 happen a bit not enough. We can Say, the

posterior distribution arising from using the prior n, is shifted to both sides in the

O(n-') t e m . This kind of the shift makes up double error rate of O(n- ') from

(0.05) 8(0.95) the calculations of the quantiles for the two-sided posterior interval (O1 , ),

whose coverage probability is larger than 90% in an error of O(n-'). In other cases of

(a, p), the posterior distribution is shifted to the center, and the result ing posterior

(0.05) 0(0.95) interval (O, , , ) has srnaller frequentist coverage than 90% in an rate of order

O(n-l ).

When O1 is close to the boundary Oi = O, both perform poorly, but K, is better

than rr,. From Table 3.1 we see that, when cr and ,6 are close to 1, coverage is very

poor. In this case, if the difference between a, f l is large, then we can still use nu,

but K. performs poorly.

F i n d y we would like to mention that, the results presented in Berger &

Bernardo (1989) are consistent with Our previous discussion and our Table 3.1.

3.4 A Class of Priors

From the cornparison of x, and rS, we know that the prior a, is a good p i o r


for this problem. but it is not good when the parameter goes t o the b o u n d ~ y and

not too good for constructing two-sided confidence intervds. We suggest a c h s of

priors which may have some of the good properties from the priors T. and r,:

where p is constant between O and 1. We shdl investigate this class of piors further.

Writing out the exact form of the prior R,, which is r,(a,B) K p + (1 -

p ) Jw-, in (81,&) parometerization,

From (3.1), the frequentist coverage probabilities a.rising from using the prior R, are

where

are the shifted coverage probabilities to order 0(nm3/*) of the posterior quantiles

using the prior K,. The shifted coverage ~robabilities of the two-sided posterior

intervals (OF', O{'-')) are two tirnes the second part of P,(lr,), which have order

O(n-'). Please note that

After detailed cdculation, we obtain that


So the shifted coverage probabilities of the posterior quantiles arising from using the

prior rP to order 0 ( n - 3 / 2 ) are

where P,(a,) has the order of ~ ( n - I I 2 ) . P7(rS) ha. the order of O(n- ') .

Now suppose we

and then no.&q 13) a:

py(r0.5)

have no preference for the prior K. or q, so we choose p = 0.3,

From the expression we see, for medium to large values of (a, B). the error terrn

of the coverage probabilities P7 ( x O - 5 ) just takes a little proportion of the first order

error P,(R,), the rest of that is from the second order error PY(rs). SO the resulting

P,(T~.~) is comparable to the second order error P,(T.) in small sample case. and

thus has a great improvement as to the first order error P,(n,). For small values

of (a, B ) , prior ;rs perforrns worse than the pnor ru. In the expression of the error

P,(T~.~), we see that most of the contribution

P,(r,). So the resulting P , ( T ~ . ~ ) is just what

posterior intervals (@), 0i i7) ) via using the

is from P,(ir,), just a small part from

we expect to be. Also, the two-sided

pnor m . 5 have coverage probabilities

1 - 27 with error just Q*+P proportion of that via using the prior n,, so it is 1+ Jcr"+PZ

always better than that via using the prior R,. Therefore we c m Say, the prior roS5

indeed has some good properties from pnors K, and R,, and has not too much bad

properties from pnors ir, and n,.

Using 7 = 0.05, z0.05 = -1.645, as an example we provide Table 3 . 2 It is

good for us to have a direct comprehension of the improvements of the coverage

~robabili t ies via using the prior ~ 0 . 5 when comparing to Table 3.1. The entries

CHAPTER 3. PRODUCT OF NORMAL M E A M PROBLEM 48

in colurnn 4 and column 7 are the shifted coverage probabilities of the posterior

quantiles via using the prior 7ros. The entries in column 3 and column 6 are the half

shifted coverage probabilities of the two-sided posterior intervals.

Table 3.2: Shifted Coverage Probabilities to ~ ( n - ~ / * ) of the Posterior Quantiles via

Using the P ~ O T ~0.5.

The pnor ~0.5 is a prior between and r,, one question might be raised that,

what about the performances of other priors? Now we consider the family of Berger

& Bernardo (1989):

We s h d try to look for other priors in this class which have good performance. In


The shifted coverage probabilities of the posterior quantiles to order 0 ( n - 3 / 2 ) arising

from using the prior rij is

It is quite easy to obtain

Our objective is to choose i and j such that, the absolute value of Tl (~i j ;Olr 02)

as small as possible. From the expression of Tl (ri'; Bi, 82) , we see that the factor

(a2 - ,B2)2(~t/3)-2 is sensitive to the values of (a, /3), and when there is a difference

between a and ,8, it becomes large. So for this reason, it is better to remove this

factor. We take j = O, then

The best choice is i = 112, which gives the prior ?r,. Now we choose i which

close to 112, then the absolute value of Tl (rio; B I , 8,) becomes smailer comparing to

Tl (ru; 01, 02). Next let us look at the second error term,

when i is close to 112, T2(riO; dl, 0,) is then close to T2(riO; el, O,). So actually it is

not better than directly choosing the prior R,.


From the above discussion, we conclude that, the prior ~ 0 . 5 a 1 + Jc;TfBl

is suitable for this problem based on the matching properties. In regard to other

choice of p in the class of p io r s x,, if we have a loss function, we then can choose

an optimum one. Otherwise, we tâke p = 0.5 and we can still say that, the prior

~ 0 . 5 is an noninformative prior.

Chapter 4

The Shrinkage Argument and

Matching Priors

In t his chapter we discuss the shrinkage argument which has been used quite ex-

tensively in recent Iiterature for deriving the frequentist properties of some Bayesim

procedures. We demonstrate how we c m use the shrinkage argument to deive the

frequentist distribution of the signed root of the Likelihood ratio (SRLR) statistic,

and to derive matching priors via the posterior quantiles and via the distribution

funct ion.

4.1 The Shrinkage Argument

To evaluate the frequentist coverage probability of a posterior credible set is a

crucial step in deriving different kinds of matching priors. The shrinkage argument

used in several papers (Bickel & Ghosh, 1990; Ghosh & Mukerjee, 1991, 1993a7

1993b, 1995a7 1995b; Mukerjee & Dey, 1993; Sweeting, 1995a, 1995b; etc.) is a very

useful tool for this evaluation. It is also useful for the study of contacts between

CHAPTER 4. THE SHRINKAGE ARGUMENT 52

Bayesian inference and frequentist inference. Dawid (1991) gave as exposition of

this argument eloquently. Now we outline this argument in general based on an

unpublished notes by R. Mukerjee.

Suppose that we have observation 2 with likelihood function L(0; x). The

parameter B might be a vector and has prior density ~ ( 0 ) . Now suppose we want to

calculate the frequentist expectation of the function + ( O , x). We dernonstrate how

ive con get it by using the shrinkage argument. The regularity conditions needed

for the validity of this argument c m be found in Bickel & Ghosh (1990) (see also

Sweeting, 1995a, 1995b).

(1). Er Step. We fist take expectatioo with respect to the postenor distribu-

tion of e given -f:

(2). Es Step. We then take expectation with respect to the frequentist distri-

bution of x:

(3). E, Step. We oow take expectation with respect to the prior n:


So overdl the shrinkage argument can be summarized as

Usually what we evaluate is the left side of the above equatioo. In order to get the

frequentist expectation of the function of interest 1C>(B, x), i.e. Ee { + ( O , ,$)}, we let

the pnor T converge weakly to the degenerate measure at the true parameter 8.

In the application of this shrinkage argument, the evaluation of the EB step

with the error of order ~ ( n - ~ / ~ ) is quite straightforward, but if we wont to cdculate

to the next order, then it becomes more cornplicated (see Chapter 5) . As to the

evaluation of the E, step, Ghosh & Mukerjee (1991) developed a simple and effective

technique. We first choose a pnor n which satisfies the regularity conditions of Bickel

& Ghosh (1990), and also that the prior R and its derivatives ( the number of order

depends on the order we want to evaluate the probability) vanish on the boundaries

of a rectangle containing the true parameter as an interior point. Thus we can

evaluate the E, step by integration by parts. Finally we let the prior n converge

weakly to the degenerate measure at the true parameter. This process t hen leads to

the desired result, the frequentist expectation of the function of interest + ( O , 2). In

54.2, 54.3 and 54.4 we give some examples to dernonstrate how to use this shrinkage

argument in some specific situations.

It is not difficult to see that the Ee step in the shrinkage argument, which

refers to taking expectation with respect to the frequentist distribution of X, can be

replaced by E(s,a), the frequentist conditiond distribution of MLE given an ancillary

stizttistic a, and then we c m have the frequentist conditional expectation given a of

the function of interest $ ( O , x). In Chapter 5, we use this shrinkage argument to

obtain the asymptotic expansion of the frequentist conditional distribution of MLE

CEA PTER 4. THE SHRINKAGE ARGUMENT 54

given an ancillary statistic a.

4.2 Frequentist Distribution of the SRLR Statis-

tic by Using the Shrinkage Argument

4.2.1 Introduction and Notation

Suppose that we have i i d . observations { X i , i 2 1). Let 2 = (XI, - - - , x,)* and the log-likelihood function be ! (O; X ) , where 6 is a scalar parameter and has

pnor ~ ( 0 ) . The maximum likelihood estimate of O based on x is ê. In this section

we shrtll use the shrinkage argument to derive the frequentist distribution to order

~ ( n - ~ / ~ ) of the signed root of the Likelihood ratio (SRLR) statistic r defined as

We assume the regularity conditions as in Bickel& Ghosh (1990) with rn = 2.

A11 formal expansions for the posterior as used here, are valid for sample points in a

set S which can be defined dong the lines of Bickelk Ghosh (1990) with Pd prob-

ability 1 - ~ ( n - ~ / ~ ) uniformly over compact sets of 6 . Let j @ ) = nj, ( 6 ) = - !"( j )

be the observed Fisher information evaluated at 8, I ( 0 ) = n l l ( 0 ) = Ee[-!"(6)] be

the expected Fisher information. Sometimes we use and Il in short for jl@) and

Il (6). We also let

and

CHAPTER 4. TEE SHRINKAGE ARGUMENT 55

1î = @), n' = ~ ' ( ê ) , 4" = ~ ' ( ê ) .

In the next two sections: 94.3 and 54.4, we shall use the same notation as defined

above.

4.2.2 The Ekequentist Distribution of the SRLR Statistic

In order to get the frequentist distribution of r, we use the shrinkage argument

as described in 54.1. First we need the posterior distribution of r.

The expansion of the posterior distribution of r in Chapter 5 is up to the order

of 0 ( n - 2 ) , so we here just use the result in Chapter 5 to the order of O ( ~ Z - ~ / ~ ) .

We shall give the complete expansion to order 0 ( n - 2 ) in Chapter 5 to maintain the

integrity of the expansion. Let I denote the indicator function, and ET, Es, E, as

in 54.1. From (5.15), the posterior distribution to order 0 ( 7 2 - ~ / ~ ) of r given x under

the prior K is

where

Please note that b3, P4 are denoted i3$, b4$ in Chapter 5, but the current notation

is more convenient for the present purposes.

Now we need to evaluate the expectation with respect to the fiequentist dis-


tribution of X,

Following the steps of Ghosh & Mukerjee (1991, 1992b, etc.) and noting the regu-

larity conditions of Bickel k Ghosh (1990), we obtain

where

Hence we have

The last step is taking expectation with respect to the pnor a:

Now we suppose that the tme parameter is 6 . We then choose a prior T such that

r and its first derivative vanish on the boundaries of an open interval contoining 6

CHAPTER 4. THE SHRINKA GE ARGUMENT a7

as an interior point. Calcdating the E, step by integration by paxts, we have

Finally we let R converge weAy to the degenerate measure at O (see 54.1 for the

general shrinkage argument). This leads to the following result.

Theorem 4.1 The frequentist distribution of the SRLR statistic r defined in

(4.2) has the following asymptot ic expansion

where Tl ( O ) , T@) are dehed in (4.7) and (4.8). 0

Remark 1. In general the result of Theorem 4.1 is different from that of

Theorem 5.2 to order ~ ( n - ~ / ~ ) . This is because Theorem 5.2 is about the frequentist

conditional distribution. But in one parameter exponentid models, they are the

sarne to order O ( ~ Z - ~ / ' ) , and a p e with the Lugannani & Rice approximation.

4.3 Matching Priors via the Posterior Quantiles

In this section we demonstrate how to use the shrinkage argument to derive

matching priors. We consider the situation of Welch & Peers (1963), look for priors

ensuring the frequentist coverage probabilities of the one-sided posterior in te rds to


some asymptotic order, i.e. the priors should have the following properties

Pr( 6 < W(T, X ) 1 X) = a + 0(92-~/'), Pd( 6 < P ) ( T , K ) ) = O + 0 ( d 2 ) :

where i = 2 or 3.

4.3.1 Calculating the Posterior Quantiles of 0

To cdculate the posterior quantiles of O, we use a posterior standardized ver-

sion p defined as

p = fiji(6)1/2(~ - 6). (4.10)

We shall use the posterior distribution of r in (4.4) to calculate the posterior distri-

bution of p. From (5.7): r and p have the following relation

Now we can calculate the postenor distribution of p: for given po which is free of O ,

using expasion (4.4) and equation (4.1 l),

where

Now expanding Q(ro) and #(ro) in r~ around p~ and rearranging the terms, we then

have the posterior distribution of p given 2:

CHA PTER 4. THE SHRINKAGE ARGUMENT 59

where

Esing Lemma 2.1 with the posterior distribution of p. from (4.12) ive have

so the a-quantile of the posterior

Pa = ;a +

distribution of p to order 0 ( n - 3 / 2 ) is

where

If we change to the a-quantile of the posterior distribution of 8 . we get

and

Please note that p, depends on x and x.


4.3.2 Frequentist Coverage Probabilities of Posterior Inter-

Now we use the shrinkage argument to evaluate the frequentist probabilities

Po( p < pa ) - Under a prior RO which satisfies the regularity conditions as in 84.2,

from the asymptotic expansion (4.12) of the posterior distribution of p, we have

where

- Now we expand $ ( p , ) and Q 5 ( p a ; a; X ) in pa "round z, to get

where


Next we need to calculate the expectation of Pm( p < % [ x ) with respect

to the distribution of X. As in 54.2 we can obtain

where

The final step is to calculate the expectation with respect to the prior ~ 0 . Here

we choose prior ro such that 7ro and its first derivative vanish on the boundaries of

an open interval containing the true 8 as an interior point, and allow the prior 1r0 to

converge weakly to the degenerate measure a t O. From this process we can obtain

the frequentist coverage probabilities of the one-sided posterior intervals:

where

CHAPTER 1- THE SHRINKAGE ARGUMENT 62

4.3.3 Mat ching via the Posterior Quantiles

To consider the first order matching via the posterior quantiles, Le. ensuring

the same frequentist coverage probabilities to an error term O(&) of the one-sided

posterior intervals, the prior n(0) should satisfy the equation Tl(n; 8 ) = O. Solving

the equation Tl (a; B ) = O for n, we get a solut ion ~ ( 9 ) 0: Il

As to the second order matching via the quantiles, the p i o r a(8) should satisfy

Tl ( r ; 8 ) = O and Tz(n;B) = O. From Tl(a;8) = 0: weget x(0) o: 1 1 ( 8 ) ~ ' ~ . Now we use

n(t9) cc I ~ ( O ) ' / ~ in the second equation T'(a; 6). We find that to ensure T2(a; O ) = 0,

the log-likelihood function should satisfy some conditions. Hence, second order

rnatching via the quantiles depends on the model, not just the prior itself.

In order to express the conditions required for the second order matching we

define a function of 8, the skewness of the score function, as

It is not difficult to see that we can have the following Bartlett (1953) relations

From the above equations solving for E ~ [ ~ ? ( O ) ] to get

we then have the relation between P3 and p3

and so


Now if we use (4.21) in the equation T2(7r; d ) = O when using the prior ~ ( 8 ) cc

I ~ ( B ) ' / ~ obtained Erom the first order matching, we get = O, Le. the skewness of

the score function doesn't depend on the parameter B. This is the result of Welch &

Peers (1963). Mukerjee & Dey (1993) also obtained this result by using the shrinkage

argument. Our derivat ion here gives more details.

4.4 Matching Priors via the Distribution Func-

tion

In this section we consider matching via the distribution function in two cases.

First using the SRLR statistic r defined by (4.2), we obtain that second order

matching via the distribution function is equivalent to second order matching via the

posterior quantiles. We then using the statistic p defined by (4. IO), and obtain that

second order matching via the distribution function leads to second order matching

via the posterior quantiles. Al1 the notation is the sarne as in 54.2.

From (4.9), the frequentist distribution of the SRLR statistic r is

where

In order to obtain matching prion we average the posterior distribution of r given

2 over the distribution of 2. The posterior distribution of r is given in (4 .4 ) . From

CHAPTER 4. TEE SHRINKAGE ARGUMENT 64

54.2, we have

where

3p4 + 5 p i a'' 1 n' p3 G(0) = -t -- + -- 34 a 211 R 2 p 2 '

To consider the first order matching via the averaging posterior distribution

and the hequentist distribution of r , the pnor n should satisfy GT(0) = T @ ) .

Solving the equation we get x ( 6 ) cc 1 ~ ( 6 ) ' / ~ . To consider the second order matching,

the priors should satisfy G:(O) = T @ ) and Gl(B) = T2(8). From the first equation

we have got n(0) oc I ~ ( O ) ' / * . Now using n(0) cc 1 ~ ( 9 ) ' / ~ in the second equation, and

as we did in 54.3 expressing p3 in terms of 8, we find that in order to satisfy the

second equation, we again need = O. In other words, second order matching needs

that the pnor r ( 0 ) a Il(B)1/2 and the skewness of the score function is independent

of the parameter O . It turns out that second order matching via the distribution

function of r is equivalent to second order matching via the posterior quantiles of 8.

If we use p = ~ n j ~ ( 8 ) ' / ~ ( 8 - O ) , the posterior standardized version, instead

of r to consider the matching, we s h d get a different resdt. Please see Ghosh L

Mukerjee (1993b) and Mukerjee & Ghosh (1996) in this context.

From (4.12) of 54.3 we have obtained the postenor distribution of p. Following

the steps in the shnnkage argument we c m obtain


where I ,

And also, we can have the frequentist distribution of p

where

are equivalent to some asymptotic order. To consider the first order matching,

the prior n should satisfy GJ(po; O ) = T3(pO; O). We solve the equation to have

~ ( 0 ) oc 1: '~(6) . For the second order rnatching, the prior a is a rnatching prior if

and only if GJ (po; O) = T3(p0; O ) and G;(pO; O) = T4(;)0; 0). Solving the equationç ive

get ~ ( 0 ) cc I~ (6)'12, and also the mode1 should satisfy

which irnplies 83 = O, i.e. the skewness of the score function doesn't depend on the

parameter B. So in this case second order matching via the distribution function

leads to second order matching via the posterior quantiles. This is stronger than

that using the statistic r.

Chapter 5

Tai1 Probability of MLE to 0(C2)

by Using the Shrinkage Argument

In this chapter, we use the shrïnkage argument to obtain the asymptotic expan-

sion to 0 ( n - 2 ) of the frequent ist condi tional distribution of the maximum likelihood A

estimate 0 given an ancillary statistic a in the scalar parameter case. The expansion

can be used to compare other approximations of the conditional tail probability of

8. We çhall in the next chapter discuss this topic. We also obtain the posterior

probability of O to 0 ( n - 2 ) and the Bartlett corrections for which the posterior and

the frequentist conditional distributions of the likelihood ratio (LR) statistic are x:

to O(n-*).

Suppose that we have i.i.d. observations 2 = (Xi, . . . , X,.JT from a continuous -

mode1 f (x; 6 ) with scalar parameter B. The log-likelihood function Q B ) = [ ( O ; X) =

log nL, f (X i ; O) and the maximum likelihood estirnate of 0 is 8. Now we assume

CHAPTER 5. TAIL PROBABILlTY OF MLE TO ~ ( n - ~ ) 67

there exists a sufficient statistic, otherwise x itself, then we make a dimension reduc-

tion t hrough t his sufficient statistic to obtain a li kelihood funct ion which depends on

parameter O and the sufficient statistic. After the sufficiency reduction, we assume

that there is an one-to-one transformation to (8, a), where a is an ancillary statistic

for 8. For conditional inference it is more convenient to rewrite the log-likelihood

function as [ ( O ; 6, a), while in some cases, we shall still use [ ( O ; le). Now [ ( O ; Ô, a ) is

a function of both O and ( 8 , a ) depending on the data via (6, a).

We let the parameter i? have a prior density ~ ( 0 ) which is four times con-

tinuously differentiable. Now we assume the regularity conditions similar to those

of Bickel & Ghosh (1990) with m = 3, but the probability Po is replaced by the

conditional probability Pp,) given the ancillary statistic a. Al1 forma1 expansions

for the posterior as used here, are valid for sample points in a set S which can be

defined dong the lines of Bickel L Ghosh (1990) with P(e,=) probability 1 - 0 ( n - 2 )

uniformly over compact sets of 6.

For convenience, we use the following notation (also in Chapter 6):

pi;j = pic ( O ; a ) = Liu l n di(i+i)'2

[,,ln -- Firi = pid ( O ; a ) l k 8 - . ( i+j)/2 ' 31

functions of 8 and a while Fi;, , b, dl, pi;, are functions of only O and a. At the same

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0(n-2) 68

f . I I t I * A III Â = X ( ê ) , = = ($1, = K'l'(ê).

In the following we introduce some basic results which we shall use in the

future calculations.

Writing out the likelihood equation clearly, t hat is &,(8; 8, a) = O identically

in 6. Replacing 8 by O, we then have t?i:o(B; 8, a) = O or = O. Differentiating

this repeatedly we have the following observed balance relations (BarndorfF-Nielsen

& Cox, 1994, Chapter 5):

P2;0+ FI;L = 0,

On the other hand, ] = - differentiating 3 = O repeatedly, ive get

After some manipulations we have the following version of the above equations

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0in-*) 69

5.2 Posterior Probability to 0(n-2)

It is well-known that the postenor density of 0 given (9, a) under the prior

Now we present a result of the approximation of $ e ~ ~ { e ( 0 ) } ~ ( 8 ) d O ,

where

For details please d e r to Laplace's method in BarndorfF-Nielsen & Cox (1989).

Actually this result can be obtained directly from the following expansions, we shaii

mention this later in Appendix 5.1 at the end of this chapter. The p ~ s t e n o r density

then has expansion:

Now we make a transformation

ratio (SRLR) statistic defined as

from 0 to r , the signed root of the likelihood

The Jacobian from 0 to r can be calculated out easily

For convenience we let

CEAPTER 5. TAIL PROBABILITY OF hfLE TO 0W2) 70

then the posterior density can be rewritten as

where d(-) is the standard normal p.d.f..

In order to calculate the posterior probabilitj- of 9 given @:a), we need to

obtain the expansion of u with respect to r . But

expansion of u-' - r-' with respect to r. We begin

actually what we need is the

by expanding t ' ( O ) in O around

Now we introduce another variable p which is asymptotically equivalent to r:

Now we have expansion of r2 wrt p,

Making another expansion, we then have

There are two other formulas that will be helpful later: the expansion of r wrt p

and its inversion, the expansion of p wrt r.

CEAPTER 5. TAIL PROBABILITY OF MLE TO O(n-*) 71

Similarly we expand u-' in 8 around 8 and transform to an expression of p:

Now we can easily obtain

Using equation (5.8), we then have

where

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 72

When we have the above expansion, the integration calculation is straightfor-

ward

By equations (5.5), (5.11) and (5.13), we then obtain the postenor distribution of

8 (also the posterior distribution of r ) . Here we state the result in the following

t heorem.

Theorem 5.1 The posterior distribution of O given (8, a) under the prior a(0)

has the following asyrnp tot ic expansion

where

CHAPTER 5. TAIL PROBABILITY OF MLE TO ~ ( n - ~ ) 73

From the asymptotic expansion of the posterior distribution of r , we can obtain

the following result regarding the Bayesian Bartlett correct ion of the likelihood ratio

(LR) statistic r2 . For other derivations of Bayesian Bartlett correction, please see

Bickel & Ghosh (1990), Ghosh & Mukerjee (1991) and DiCiccio & Stem (1993).

Corollary 5.1 If we define the SRLR statistic r as in (5.3), then the posterior

distribution of r2(1 - 2H'"(ê)) n is X : to 0(n-2) .

PROOF: First we see

where

Now we use expansion (5.15) to get the posterior probability expansion with

respect to rl and then expand @(ri ) and #(rl) in rl about ro to get expansion with

respect to ro.

CEAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 74

and so

i.e. r2(1 - 2 *:(')) follows X: distribution to order 0(nd2). 0

For illustration we give results for the simple example of the location models.

Exarnple 5.1 Suppose that X I , - - -, Xn are i.i.d. observations from the Ioca-

tion models with density f(x - O), where !(-) is a known function and 8 is a scalar

parameter. We use flat prior n.(O) cc 1 for B . Let g ( - ) = log f (-), t heu the maximum

likelihood estirnate is defined by the following equation

A

It is not difficult to verify that a = (ai, --,a,), where ai = ,Yi - O , is an ancillary

statistic for O and has dimension n - 1. NOVJ the log-likelihood function can be

expressed as

and t hen


5.3 Some Detailed Calculations

In this section we introduce some notation first, then present most of the

detailed calculations involved by using the shrïnkage argument. The final result will

be given in the next section.

we let I(,<,) denote the indicator funct ion, E(e,a) denote taking expectat ion

with respect to the conditional distribution of given an ancillary statistic a . Er

and E, are the same as defined in $4.1. Using the generd shriokâge argument in

54.1, for given ro which is free of 0 and independent of 8:

ET ( ~ ( 6 . d ) {En [ ~ ( r < r o ) 1 8 1 a ] ) ) = Er (EW {rcrcro,)) --

From (5.15) of Section 5.2, we have

Now we are going to calculate the E(e,a) step, Le. taking expectation with

respect to the conditional distribution of 8 given the ancillary statistic a assuming

CHAPTER 5. TAIL PROBABILITY OF MLE TO Oh-*) 76

the parameter is O:

1 r 0 E(fl,a) { E ~ [ ~ ( r < n ) 1 8, a ] } = E(flla) {@(ro) - -4(ro)~:(8) - -~ (To)H; (~) fi n

Note that we have the following results. There are many ways to derive them,

for details please refer to Barndorff-Nielsen & Cox (1994, Chapter 5). Ive just

present them here rvithout proof.

Expnding functions ~ r ( 8 ) ' s in 4 about 6 to the second order in 8 - 8: aod then

using the above results when taking expectations, we have

The functionç Hr(B)'s can be obtained by replacing 8 with O in functions ~ ~ ( ê ) ' s .

As to the O(n-') terms we only need to get the one in E(~,=)H;(~) . We give separate

result s here:

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n d 2 ) 77

Now we can easily obtain the result : given ro which is free of 9 and independent

of 4,

where

The functions Hr(O) , H;(O) and H,"(O), as we stated above, can be obtained by

replacing with O in functions H;(@, and HZ(@, but & ( O ) is not obtained

by replacing 8 with 9 in function ~ t ( 8 ) .

The final step is taking expectation with respect to the prior K(-). Here we

make more assurnptions on the pnor n. We assume that K, its first denvative and

second derivative vanish on the boundaries of an open interval. Now we can calculate

E, by using integration by parts. The cdculations are quite straightforward but easy

to make some mistakes, especially in reorganizing the results when the derivatives

CHA PTER 5. TA IL PROBABILITY OF MLE TO 0 in -2 ) 78

of pi,'s corne in. We shall give the full results in the next section, but present the

details of the cdcdat ions in Appendix 5.2 a t the end of this chapter.

5.4 Tai1 Probability to 0(n-2)

In this section we shall demonstrate how we can get the frequentist conditional

distribution of 8 given the ancillary statistic a . After the ET step using integration

by parts, we have the following results: given ra which is free of 0 and independent

where

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0fn'2) 79

The function G4(8) can not be obtained simply from functions GI(6), Gz(B) and

G3 (8) comparing to H:(@ of (5.16) in the posterior expansion.

In order to state the argument cleady we now change the variable of integratioo

in the above equation from 0 to p. This change of variable doesn't affect the equation

in general. But at the same time the conditions we assurned for 7-0, that it is free of

û and independent of 8: now becornes that it iç free of ,û and independent of 8. Also,

al1 functions of 8, narnely Gi(B)'s, become functions of ,B7 Le. G(?)'s. Now suppose

that the true parameter is 6. We then choose prior T such that K, its first derivat ive

and second derivative vanish on the boundaxies of an open interval containing 8 as

an interior point. After the change of notation we have

Finally we let ir converge weakly to the degenerate measure at O. This leads to the

following result, the frequentist conditional distribution of r.

Theorem 5.2 The frequentist conditional distri but ion of the SRLR statist ic

r defined in (5.3) has the following asymptotic expansion

where ro i s independent of 4, and G@), GZ(0), G3(0), G4(4 are defined in (5.17)-

(5.20) respectively. 0

CHA PTER 5. TAIL PROBABILITY OF MLE Ti3 ~ ( n - ~ ) 80

If our interest is in the maximum likelihood estimate 4, then we have the

following coroilary.

Corollary 5.2 If ive let 40 be the observed value of 8, then the tail probability

has the foilowing asymp totic expansion

where

= s p ( 8 - 8^0)&[t(ê0; 80, a) - !!(O; 60, a)]

and G1 (O), G2 (O), G3 ( O ) , G4 (8) are defined in (5.17)-(5 -20) respect ively. 0

Remark 1. The result of Corollary 5.2 extends the approximations of the tail

probability of MLE in the curent Literature by the magnitude of one more order. In

order to have a rigorous proof of Corollory 5.2: we can follow the steps in Bickel k

Ghosh (1990) by examining the regularity conditions, but here we do not attempt

this.

Remark 2. We have found that the functions Gi ( O ) and G2(B) are invariant

with respect to one-to-one parameterizations. As to G3(8) and G4(9), from the

invariance of the probability and the invariance of ro, it seerns that they might

have the invariance property, but further investigation is required to clarify (see

McCullagh, 1987, Chapter 7, for invariant expansion).

Remark 3. The function Gi(B) is related to the probability at the true

parameter point, & ( 8 ^ > 0 1 a ) = $ + y 27rn + 0(72-~/~) . Lugannani & Rice type

formula (see Chapter 6 ) fails to capture the probability at this point (see also Reid,

1988).

CHAPTER 5. TAIL, PROBABILITY OF MLE TO O(n-*) 81

Remark 4. The function G2(0) is related to the renormalizing constant in

the p* formula. For details, please refer to Chapter 6.

As to the LR statistic r2, we have the Bartlett correction of its frequentist

conditional distribution. We state it in the following corollary. For derivations of

Bartlett correction of its frequentist distribution, rat her t han the frequentist condi-

tional distribution here, please refer to Barndofi-Nielsen & Cox (19S4), Barndorff-

Nielsen St Blaesild (1986), Barndofi-Nielsen k Ha11 (1988) and Ghosh & Mukerjee

Corollary 5.3 If we defme the SRLR statistic r as in (5.3), then the frequentist

conditional distribution of r2(l - 2 y ) given a is X: to 0(n-2).

PROOF: Using Theorem 5.2 and proceeding exactly dong the lines of those

in Corollary 5.1. 0

Example 5.2 (Continued from Example 5.1)

In the folIowing, we give the quantities needed for the expansion of the frequentist

conditional distribution of 4 given the ancillary statistic a.

Comparing to the results of Example 5.1, by expansions (5.15) and (5.21), we obtain

that, in one parameter location models the frequentist conditional distribution of r

CHAPTER 5. TAIL PROBABILITY OF MLE TO ~ ( n - ~ ) 82

agrees with the posterior distribution of r to order 0 ( n - 2 ) when the prior is taken

to be the 0at prior. O

Example 5.3 Suppose that Xi, ---, X, are i . i .d. from the exponential farnily

where B is scalar canonical parameter. The likelihood function

We can see C; t ( X i ) is a sufficient statistic for O. Since there is no ancillary statistic

in this case we rnake a transformation from C; t ( X i ) to 8 , where 8 is the maximum

likelihood estimate of 8 defined by

n

C t (Xi) = n$'(ê). 1

The log-likelihood function then can be expressed as

Now it is easy to obtain that

and al1 other piZj1s are equal to zero. The functions Gi(B)'s now have quite simple

form


Appendix 5.1 Derivation of Equation (5.1).

In $5.2 we have mentioned that we can get the result (5.1) of the approximation

of lexp{l(B)}n(B)dB €rom expansions in 55.2. Simply using (5.13) and (5.1 1). we

then have

Appendix 5.2 Some Calcuiations in 55.3.

CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n A 2 ) 84

Chapter 6

The p* Formula and Tai1

Probability of MLE to 0(C2) in

the Frequentist Setup

In this chapter we verify the p' formula and Lugannani & Rice type formula

in the one parameter case by using the conditional distribution of MLE obtâined

in Chapter 5. We obtain a version of the renormalizing constant in the pœ formula

and an expression of the p- formula to order 0 ( n - 2 ) for general models. We also

consider constructing confidence intervals to order O(n-*) and obtain an explicit

form for the endpoints of two-sided confidence intervals.

6.1 Introduction

Suppose that we have i.i.d. observations x = (Xi,. . . , x,.JT from a continuous

mode1 f ( x ; O ) with scalaz parameter O. As in Chapter 5, we assume that the log-

likelihood function can be written as [ ( O ; 4, a), where 8 is the maximum likelihood

CHAPTER 6. THE f FORMULA 86

estimate of 8 and a is an ancillary statistic for B. Barndofi-Nielsen (1980, 1983)

obtained that the p* formula

C p.(ê I '7 a ) = (27)d/2 exp(t(8; ê, a ) - e(ê; Il a ) } 1 j ( ê ) 1

approximates the conditional dençity of 8 given a to order ~ ( n - ~ l ~ ) , where C is a

constant in 8 to 0 ( n - 3 / 2 ) and is equai to 1 to O(n-').

Approximations to the tail probobility of 8 to order O ( T Z - ~ / ~ ) are also availabie.

For details please see Lugannani & Rice (1980) for exponential models, DiCiccio

Field & Fraser (1990) for location models, Barndofi-Nielsen (1991) and Fraser &

Reid (1993) for general models. In general they have the form of Lugannani & Rice

where @ is the standard normal c.d.f., 4 is the standard normal p.d.f., and

If one wants to construct confidence intervals for 6 , then the rœ approximation of

Barndofi-Nielsen (1986) is more convenient ,

where

1 uo ro = ro + - log - .

ro To

In this chapter we shall use a different approach to investigate the p' formula

and Lugannani & K c e type formula in the scalar parameter general models. We

first integrate directly the p' formula and obtain a version of the renormalizing

CHAPTER 6. THE v* FORMULA 81

constant, then verify it being the conditional density to order O(7C3I2) of 4 given an

ancillary statistic by comparing the integrat ion wit h the condit iond distribution of

8 given an ancillary statistic obtained by using the shrinkage argument in Chapter

5. We also verify Lugannani & Rice type formula, obtain the third order error

terms of different kinds of approximations to the conditional tail probability of 6,

and consider extending the p' formula to order O(n-*) in general models. In the

1 s t section, we consider constructing confidence intervals to order 0 ( n m 2 ) .

We assume the regularity conditions as in 55.1 except the Bayesian part. The

notation used here and some basic results are mostly the sarne as in 55.1, but in the

definitions of statistics r and p, we change the signs here in this chapter comparing

to those in Chapter 5 for the convenience of the cdculations in the frequentist setup.

We shall mention t his difference when giving t heir dehitions later.

6.2 Direct Integration of the p* Formula

Now we consider integrating directly the main part of the p' formula, i.e. for

given 40, we are going to calculate

If the p' formula is a density of some variable to order O ( T ~ - ~ / ~ ) , then we can obtain

the constant C by renormalization. Now we make a transformation frorn 8 to r , the

signed root of the likelihood ratio (SRLR) statistic defined as

Please note that we change the sign of r here comparing to that in Chapter 5. Under

regdarity conditions, the transformation from 8 to r given a is asymptotically one-

CHAPTER 6. TEE p* FORMULA 88

to-one. The Jacobian of the transformation is

If we let

then the integral (6.4) can be expressed in t e m s of the new variable,

where d ( - ) is the standard normal p.d.f., and

From the above expression we can see that we have to get the expansion of

u-' with respect to r , or the expansion of u-' - r-l with respect to r. First we

expand ~ ( 8 ; 6, a) and [(O; 8, a) in B̂ about O ,

Now we introduce an intermediate variable q which is asyrnptotically equivalent to

r. We shall obtain the expansion of ü1 - r-' in terms of q and then invert it back

to get the expansion in terms of r. Here q is defmed as

CHAPTER 6. THE p* FORMULA 89

After simple calculation we have the expansion of r2 with respect to q,

Making some transformations we have the expansion of r-' wrt to q, the expansion

of r wrt to q and the expansion of q wrt r:

Sirnilady we expand u-' in 8 around 0 and then transform it to an expression

CHAPTER 6. TEE p* FORMULA 90

Now we c m directly obtain

Using equation (6.10), the expansion of q with respect to r , then w e finally get the

expansion of u-l - r-' with respect to r:

where Gi (O), G2 ( O ) , G@) are defined in (5.17), (5.18), (5.19) of Chapter 5 .

The integral (6.7) then can be obtained using the above expansion


w here

Now we can obtain a version of the constant in the p' formula of (6.1) by

renormalizing (6.1 ) and using the expansion (6.12),

This constant ensures the p- formula of (6.1) being a density to order 0(rt-3/2). In

the next section, we shdl further v e d y the p- formula of (6.1) is the conditional

density of 8 to order 0(n-3/2) given the ancillary statistic a.

6.3 p* Formula to O(n-')

Ln the last chapter we have obtained the asymptotic expansion of the frequen-

tist conditional distribution of 6 Qven a by using the shrinkage argument. From

(5.22), under the curent ro

where

and Gi ( O ) , G@), G3(8), G4 (8) are defined in (5.17)-(5.20) of Chapter 5 respectively.

CEAPTER 6. THE v* FORMULA 92

Now we look at how rnuch conditional probability of the integation of the

p' formula can capture. From (6.1 1) and (6.12), the integration of the pR formula

G2(4 with the constant C = C(0) = 1 - ,

Note that, equations (6.14) and (6.15) agree with (6.13) to the O(&) term. Hence-

forth, we have verified Lugannani & Rice type formula of (6.2) and the p' formula

of (6.1) with a renormalizing constant C(B) in general models.

Here we consider the following corrected p' formula

where ~ ( ê ) = 1 - -n/, and ~ ~ ( 9 ) is obtained by replacing 0 with 9 in G2(B).

Expanding ~ ~ ( ê ) in 8 about 8, we get

From here we can directly cdculate the integration of the corrected p' formula


W e summarize the above results into a theorem regarding the accuracy of the

pK formula and the renormalizing constant in the p* formula.

Theorem 6.1 In the p' formula

m, the p* formula approximates the conditional density of 8 when C(0) = 1 -

given a to order O ( ~ Z - ~ / ~ ) . But if we change 0 to 4 in the renormalizing constant

C(8) , i.e. 1

p;(ê 1 0, a ) = ~ ( 8 ) -e'(0)-'(ê)j(ê)'/2, (6.18) 6 m, then p; approxirnates the conditional density of 6 given a where ~ ( 8 ) = 1 -

to order 0 ( n - 2 ) . 0

Remark 1.The renormalizing constant C ( 0 ) is invariant with respect to one-

to-one parameterizations (see Remark 2 of Corollary 5.2).

There are diiferent kinds of approximations to the tail probability. They al1

have the error terms of O ( ~ Z - ~ / ' ) . Actually we c m obtain al1 these third order error

tenns.

If we use the p* formula with renormalizing constant C(9) given above md

integrate it to approximate the tail probability, from (6.13) and (6.15), the third

order error term is

CHAPTER 6. THE D* FORMULA 94

If we use Lugannani & Rice type formula to approximate the tail probability, from

(6.11), (6.13) and (6.14), the third order error term is

Following Barndofi-Nielsen (1986), we d e h e statistic r' as

we can use (6.11), the expansion of u-' -r-', to get the expansion of r' with respect

to r

From the above expansion we can easily obtain the third order error term of the r*

approximation,

In the following we give a simple example from which we c m have a clear view

on what the results we have developed above (also in Chapter 6) look Like in this

specific situation.

Example 6.1 Ezponential Distribution

Suppose that we have i.i.d. observations x = (XI, , X,,)T from the exponentiai

CHAPTER 6. TEEp' FORMULA 95

distribution of mean 6 , i.e. fiom the density ~ - ~ e - ~ / ~ , 0 > O. The log-likelihood

function is

The maximum likelihood estirnate of O , 8 = ! C: Xi . The log-likelihood function

then c m be written as 8 e(e; ê) = -n- - 8

n loge,

and so ( (6; ê) - ((8; 8) = n log(8/6) - rd/$ + n, j ( ê ) = n / i 2 . The p- formula has

this form

The exact density of the distribution of 8

(a). When C = 1, f ( Ê ; 6) approxirnates p(ê; 0) with a relative error of order

O(n-') . This is equivdent to Stirling's approximation of r(n) by \/2nrnn-L'2e-n

with a relative error of order O(n-').

(b). Renormalizing the pn formula can get the exact density of ê, p(8^; 8).

( c ) . Asymptotic ezpansion of tail probability to

It is easy to obtain that

and

CHAPTER 6. TEE v' FORMULA 96

By Corollary 5.2, the distribution of 8 has the foliowing asymptotic expansion

w here

(d). p' formula to 0(n-2):

By Sheorem 6.1, when C(0) = 1 - = 1 - - n 12n l 7 the p' formulaapproximates

L p(8;8), the exact density of 4. to order ~ ( n - ~ l * ) . But ~ ( 9 ) = C(0) = 1 - - 1 2 n ~

£rom Theorem 6.1 y the p; fornula

approximates the exact density 0) to order 0 ( n - 2 ) . This is equivalent to the

approximation of T(n) by finn-lj2e-" ( 1 + &-) "th a relative error of order

0 ( n - 2 ) .

(e). Third order emor terms of approximations to the tail probability:

For the p; approximation, frorn case (d) or (6.19):

For Lugannani & Rice type formula, frorn (6.20),

CHAPTER 6. THE P* FORMULA 97

where

For the r' approximation, from (6.23),

where

1 uo r; = +o + - log - ,

ro ro

and 7-0, uo as defined above.

6.4 Confidence Intervals to 0(n-2)

To contruct confidence intervals, it is convenient to use the r- statistic, as

defined in (6.31),

Our objective is to construct two-sided confidence intervals for O. If we let r' =

where O(z,) = a, we can solve the r* expression for 8 to obtain one-sided confidence

intervals having accuracy of order O ( T ~ - ~ / ~ ) . Repeating the process for r- = rl-,,

then obtain other one-sided confidence intervals having accuracy of order O(TZ-~ / * ) .

It is easy to see that from equation (6.23), the two-sided confidence intervals obtained

from the above two steps then have accuracy of order 0(n-2).

The problem arising from using the above procedure is that, it is not always

easy to solve 0 out from the expression of r', one might still have to make some

expansions, but this one more step might affect the accuracy of the intervals. In

the following construction we shdl make more expansions from the r' expression to

give an explicit form of the two-sided confidence intervals for 0 , then we prove that

CHAPTER 6- THE p" FORMULA 98

they have accuracy of order 0 ( n - 2 ) and the one-sided confidence intervals still have

accuracy of order 0 ( n - 3 / 2 ) .

Our construction is based on the asyrnptotic expansion of the conditional dis-

tribution of r given an ancillary statistic a obtained in Chapter 5. From (5.21),

under the current r defined in (6.5)

where G1 ( O ) , G2 ( O ) , G@), G4 (8) are defined in (5.17)-(5.20) respect ively.

If we directly calculate the cequantile of r to order 0 ( n - 2 ) , we c m obtain the

conditional confidence intervals, but we need to compute the first five derivatives of

the log-likelihood funct ion. Our cont ruct ion here can reduce the calculations to the

Ç s t four derivatives, but the two-sided confidence intervals st il1 have the accuracy

of order O ( C ~ ) , and also, the left-sided and right-sided intervals have accuracy of

order 0(n-3 /2 ) .

In order to get the conditional confidence intervals via statistic r , we use the

intermediate variable p which is here defined as

Please note that we change the sign of p comparïng to that in Chapter 5. The

relations between r and p cas be seen from the following two formulas.


From (6.22), letting r' = z,, where a(&) = a, O < a < 112, and solving For r

(noting r,) to order O(n-') , we then obtain

From (6.24),

and the a-quantile of the conditional distribution of p to order ~ ( n - ~ / ~ ) can be

obtained by equation (6.27)

Pa =

ow we plug in th e expression of r, and expand functions G1(8), G2(4 in 0

around B . Rearranging the terms according to the asyrnptotic order and dropping

the terms of order O ( T ~ - ~ / ~ ) , we get

Replacing 2, with zl-, in the above expression we have pi-, . The condit ional

confidence intervals of 0 are then determined by p, and p;-,. The following theorem

gives details of this result.


Theorem 6.2

PROOF: It is equivalent to showing

and

From relation (6.26) we have r, which corresponds to p:,

where (ê), B~ (ê), cl (6) are functions of 4 and a. The explicit forms of (è) , B~ ( 4 ) and c ~ ( Ê ) can be calculated out, but here we do not have to know them. Now we

expand functions of 8 in 6 around 0 to get

CHAPTER 6. THE p' FORMULA 101

where A2(0), B2(B), C2(0) are functions of 0 and a. Again, we do not have to know

the explicit forms of A2(8), B2(0) and C2(6). From (6.24)and (6.29), we have

Repeating the process as we did for p,, we have

and so

Thus we establish the result of two-sided conditional confidence intervals having

accuracy of order 0(nd2). 0

Example 6.2 (Continued from Example 5.1, Example 5.2 and Example 5.3)

We consider the one parameter location models and the one parameter exponential

CHAPTER 6. THE v* FORMULA 102

models with O being the canonical parameter. The two-sided confidence intervals for

0 c m be constructed to have accuracy of order 0 ( n - 2 ) using the current method.

The exact forms are the same as in Theorem 6.2. We just give pz here.

In the one parameter location models:

Please note that j1@), e3;, and @4;0 are functions of the ancillary a only.

In the one parameter exponent ial models:

and

Chapter 7

Conditional Distribution of the

SRCLR Statistic by Using the

S hrinkage Argument

In this chapter we use the shrinkage argument to derive the frequentist condi-

tional distribution to order O ( T Z - ~ / * ) of the signed root of the condi tional likelihood

ratio (SRCLR) statistic (Cox & Reid, 1987) given an ancillary statistic in Iocation-

scde rnodels. We also discuss this approach for other models. By the way we obtain

approximations of the marginal posterior densi ty and distribution function.

7.1 Introduction

Suppose that we have i.i.d. observations X = (XI, - - - , x , )~ from the density

f (x; 4, where 6 = ($, A) and has prior a($). The log-likelihood function based on

2 is e(9; X) and the maximum likelihood estimate of 0, 6 = (4, i). We assume

that 11 is the scalâr parameter of interest and X is the nuisance parameter. For

CRAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 104

notational convenience, we assume X is also one dimensional. As we did in Chapter

5, we rewrite the log-likelihood function as ! ( O ) = [ ( O ; d, a ) where a is an ancillary

statistic for O , but in the Bayesian setup, we still use e(0; x).

Our investigation focuses on the frequentist conditional distribution giveo an

ancillary statistic, of the signed root of the conditional likelihood ratio (SRCLR)

statistic defined as

gested by Cox k Reid (1987)

likelihood estimate based on

the parameter of interest as

on the ground of conditioning, and 4 is the maximum

( ) The SRCLR statistic r c m be used for testing

well as ot her inferentiai purposes. In transformation

models with transformation parameters, DiCiccio Field & Fraser ( 1990) obtained

the approximation of the distribution of r by eliminating the nuisance parameter

by marginalization. In this chapter we begin from the Bayesian setup, obtain the

posterior distribution of the parmeter of interest: then use the shrinkage argument

to derive the frequentist conditional distribution to order 0 ( 7 2 - ~ / ~ ) of the SRCLR

statistic r given an ancillary statistic in location-scale models. We shall dso discuss

the possibilities in other models.

We assume the regularity conditions similar to those in 55.1 but here m = 2

and the parorneter is two dimensional. Let jl(B̂ ) = ( j i j ) 2 x 2 be the per observation

information rnatx-ix evaluated at 8, i.e. n jl (6) = j(ê) = 4'' (fi), and &O

CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STATISTIC 105

7.2 Marginal Posterior Density

The marginal posterior density of + given x under the prior T ( $ , A ) is

In the following we &al1 use Laplace's method to get an expansion of the marginal

posterior density ?T(@ lx). We begin the expansion by expanding e($, A) as function of II> and X about ti>

and A,

Now we standardize and X by defming

We can see that

The determinant of the Jacobian of the transformation from (Sr, A) to (pl, p z ) is

CHAPTER 7. CONDITyL DISTyN OF THE SRCLR STATISTIC 106

Based on the new variables we have the following form of the expansion

Now expanding ?r(tl>, A ) as function of $ and X about 6 and a and sirnplifying

the expansion, we get

Therefore we can have

In order to simplify the expression, we here introduce more notation:

CHAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 107

Please note that V', A:j are functions of the data only, while TG is the function

of the data x and the prior n.

Using the above notation, after a considerable manipulation of algebra, we

obtain the two dimensional integration

J

w here

c;(x) =

c;(x) =

The marginal posterior density of i1> given X then has the following expansion

where c;(x) is defined in (7.3), and


7.3 Approximation to the Marginal Posterior Den-

sity

Cornparhg to the expansion of the marginal posterior density of S> in the previ-

ous section, there is another way to do the expansion. It differs from an application

of Laplace's method to the numerator in which we expand e(+, A ) as a function of X

around &, where A+ is the maximum likelihood estirnate of X holding $J constant.

-4s a result of this approximation, we have

where A) = a2e($, X)/dA2. Ignoring terms which do not depend on 11> and

defining 1

[ c ( + ) = e(+, %) - - log 1 - ~ . \ ( f b , i d 1, (7.7) 2

being called the conditional log-likelihood suggested by Cox Sr Reid (1987) on the

ground of conditioning, we then have

This result was derived by Leonard (1982), Phillips (1983) and Tierney and Kadane

(1986). With a renormalizing constant, the above approximation has a relative error

of order (Tierney and Kadane, 1986). In the following we shall verify the

relative error of this approximation by making expansion of (7.8) and cornparing to

equation (7.4).

Frmn &($,&) = O, expanding [A($, &,) as a function of + and hJ, oround 4 and i, and solving for X+ - we theo obtain

CHA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 109

Now expanding t ($ , &) as a function of 11 and ii around 4 and A, and using

equation (7.9), we get

S imilarly we have

and

Hence we finally have

where Qf (pl), &;(pi) are defined in (7.5), (7.6). Comparing (7.13) with equation

(7.4), we see that approximation (7.8) with a renormalizing constant to the marginal

posterior density of qb indeed has relative error of order 0 ( n - j i 2 ) .

7.4 Marginal Posterior Distribution

In this section we shdl deive the marginal posterior distribution of + based

on the approximation (7.8) to the marginal posterior density of 11. First we make

a transformation from q!~ to r , the signed root of the conditional likelihood ratio

(SRCLR) statistic defined as

CHA PTER 7. CONDlT'L DIST'N OF THE SRCLR STATlSTIC 110

where 4 is the maximum likelihood estimate based on the condit ional log-likelihood

( ) in ( 7 ) Next we need to get the expansion of r with respect to 1C>, actually

pl, the standardized version of q5.

To get the expansion of &($) - tc(G), it is convenient to write e,(S>) - e,($)

as the difference between e,($) - e,($) and ec(4) - te($) and expand both terrns.

It follows €rom (7.10) and (7.11) that

where pi(q) = pi l$+. - Henceforth, we need to evaluate (6) further.

From P.!(+, A+) = O, taking derivative with respect to Ji, we hâve A

Now taking derivative of te($) with respect to q5 and using the above equation, then

noting e:(&) = O, we get

Making expansion of functions in the above equation and observing the 0 ( n 2 ) term,

we find 4 L.

2[êm(& - 4) + îll(;\4 - i)]& f ê03ê11 - !12&)2 = 0 (7.17)

requiring that 4 - 4 and - X are O(n-' ). Solving equation (7.17) for & - 4 by

using equation (7.9), we finally obtain

CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STA TISTIC 111

Equation (7.16) then has expansion

Combining (7.19) and (7.15) gives

Making another simple expansion we then obtain the expansion of r with respect to

p l , and the inversion of it, the expansion of pl with respect to r:

Taking derivative on both sides of equation (7.21), we have

1 -(A;, + AkA;, + A;,A&Vm + A;;V*)Vn] + - - .} dr. 4

Now inserting (7.21) in equation (7.12), we get

Using (7.18) on expansion (7.12), we have

CEAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 112

Thus we obtain, by taking ratio on both sides of the above two equations

A direct calcdation by the application of (7.18) gives

(7.24)

Hence we c m have the following expansion from (5.22), (7.23) and (7.24)

where

In order to get a Lugannani & Rice type formula, now we define statistic u as

From (7.22), (7.23) and (7.24) we have the following expansion of u-' with respect

As results of the above expansions, we summarize them into a theorem regard-

ing the approximation of the marginal posterior density with an explicit form of the

CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STATISTIC 11.3

renormalizing constant, and the approximation of the marginal posterior dist ribu-

tion both in the Lugannôni Sr Rice type and in the form of direct expansion.

Theorem 7.1 Suppose that the log-likelihood function is [(O), where 0 =

($> A) has prior T(+, A). The conditional log-likelihood &($) is dehed as in ( 7 3 ,

and iI> is the maximum likelihood estimate of $ based on e,(t,b). Under the regulanty

conditions as stated in the introduction,

the marginal posterior density of 1C, has approximation

wit h a relative error of order O ( ~ Z - ~ / ~ ) , where H ~ ( x ) is defined in (7.26);

0 the marginal posterior distribution of $ has the following expansion

where

and HF(*), H; (X) are defined in (7.25), (7.26). O

7.5 Frequentist Conditional Distribution of the

SRCLR Statistic

Unlike in the scalar parameter case, in the case with a nuisance parameter the

CHA PTER 7- CONDIT'L DIST'N OF THE SRCLR STATISTIC 114

frequentist conditional distribution of the SRCLR st atistic depends on the nuisance

parameter in general. Also, the observed balance relations are too complicated to

express. Here we just consider the location-scde models.

Suppose t hat X I , . a-- , X, are i.i.d. observations from the location-scale models

with density

where h(-) is a known function. We assume that C< is the parameter of interest and

a is the nuisance parameter, Le. 9 = ($,A) = ( p , a). It can be shown that the

configuration statistic a = (al, - - - , a,)*, where a; d e h e d as

is an ancillary statistic for 6 (Fraser, 1979). Now we make a transformation from x to (j7 O ) . Let t~(-) = log h ( . ) , t hen the log-likelihood function can be expressed as

In the following w e shall use the shrinkage argument to derive the frequentist con-

ditional distribution of the SRCLR statistic r defined in (7.14) given the ancillary

statistic a.

Now we define more notation (see dso 87.1 and 57.2):

Please note that p, p,, V, Aij are functions of the ancillary a only in the location-scale

models.

CEA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC

In the previous section we have obtained the posterior distribution of the

SRCLR statistic r. From Theorem 7.1,

where ~ ; ( ê , a), ~ ; ( ê , a ) are defined in (7.25), (7.26). Here we change the depen-

dence on the data frorn x to (8, a). This change doesn7t affect the values of the

functions Kt (-), H z ( - ) .

Now taking expectation of the posterior distribution of r with respect to the

conditional distribution of 4 given the ancillary statistic a, we get

1 E ~ o . . ~ {E" [ I ( . < , ~ I 8, a]} = E ~ o . . ~ {@(a) - J n ; ~ ( r o ) ~ i ( & o a)

where functioos H;'(B, a), HT(e7 a) are obtoined by substituting ë with 8 in hnctions

The final step of the shrinkage argument is taking expectation with respect

to the prior n(-). Now we assume that the prior R and its derivatives vanish on

the boundaries of a rectangle containhg the true parameter 0 as a.n interior point.

Using integration by parts, we have

1 E, (E(o,~) {E" [T( ,<, .~ 1 8, a]}) = / (@(rd - -$ ) (~o)H:(ka)


w here

Finally we let the prior T converge weakly to the degenerate measure at the tme

parameter O. This leads to the following result.

Theorern 7.2 If the parameter of interest is the location parameter and the

Log-likelihood function is defined as in (7.33), t hen under the regularïty conditions

as stated in 57.1, the frequentist conditional distribution of the SRCLR statistic r

given the ancillary statistic a has the following asymptotic expansion

where Gl(a), G2(a) are defined in (7.34), (7.35). 0

The functions Gl(a) and G2(a) in the expansion of the frequentist conditional

distribution of r can be expressed in the form of the scalar puameter case of Chapter

5, but the involvement of the sample space differentiations and the observed balance

relations are such that the resulting expressions are too complicated to be useful.

The quantity D is introduced to simplify the various expressions of the sample space

derivat ives.

Example 7.1 Suppose that we have i.i.d. observations Xl, - - - , X, from the

normal distribution N ( p , a2). The parameter of interest is the location parameter

CHAPTE!? 7. COXDIT'L DIST'N OF THE SRCLR STATISTIC 117

p. The function g(x) = -x2/2. The log-likelihood function is

The maximum likelihood estimate (j, 5 ) = (Ar7 d m ) , where X =

a = (ai, - - ,a,)*, where

Holding p constant, we obtain 2, = &;(Xi - ~ ) ~ / n . Defining the conditional

log-likelihood as

the maximum likelihood estimate of p based

the SRCLR statistic

- on t,(p), fi = P = X . Now defining

and

Please note that Gl(a) and G2(a) do not depend on the ancillary a, so what we

get from Theorem 7.2 is the frequentist unconditional distribution of the SRCLR

statistic r ,

CHA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 118

then T follows a t(,-l) distribution. So the above expansion of the distribution of

the SRCLR statistic r is equivalent to the following approximation to the t ( n - l )

distribut ion function,

1 0 = sgn(T0)\/(n - 1) log[l + -1. n - l

The density of the t ( , - l ) distribution is

Making expansion of the density in terms of ro and cornparing to the approximation

obtained by defferentiating the approximate distribution function, we obtain that the

above approximation to the distribution function is equivalent to the approximation

n - 1 1 / 2 1 1 o f r ( : ) / r ( Y ) b ~ ( ~ ) ( 4n ) wi t h a relative error of 0(n-3/2 ) , act ually O (n-* ) ,

because there is no O ( T ~ - ~ / ~ ) term in the expansion of r(:)/r(?). 0

If the parameter of interest is the scale parameter o, we k t 0 = ($J, A ) = (0: p ) ,

and the Log-likelihood function

In the following we summarize the result in this case into a theorem.

Theorem 7.3 If the parameter of interest is the scale parameter asd the

log-likelihood function is defined as in (7.37), then the frequent ist conditional dis-

tribution of the SRCLR statistic r given the ancillary statistic a has the following

CB4PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 119

expansion

where

Remark 1. It is possible to derive the frequentist conditional distribution of

the SRCLR statistic r given an ancillary statistic for general transformation models.

Here we do not attempt this further, but indicate the possibility of this approach

comparing to other approaches, e-g. Fraser (1979), DiCiccio Field & Fraser (1990).

In aplications, the formula obtained by DiCiccio Field & Fraser (1990) is more

convenient to use than the one we derived here.

Remark 2. For general exponential models, the expansion of the SRCLR

statistic r depends on the nuisance parameter. Actually in exponential models,

after sufficiency reduction, there is no ancillasy statist ic already, and the statist ic

for inference is conditioning on another statistic (see Skovgaard, 1987; Fraser &

Reid, 1993).

Remark 3. If we want to use the shrinkage argument in generd models, the

expansion of the frequentist conditional distribution of r depends on the nuisance

parameter, but it might serve as a tool for verîfying the results obtained by other

approaches.

Bibliography

[1] Barndofi-Nielsen, O.E. (1980). Conditionality resolutions. Biometrika 67, 293-

310.

[2] Barndorff-Nielsen, O.E. (1983). On a formula for the distribution of the maxi-

mum likelihood estimator. B i o m e t d a 70, 343-363.

[3] Barndofi-Nielsen, O.E. (1986). Inference on fui1 or partial parameters based

on the standardized, signed log likelihood ratio. Biometrika 73, 30'7-322.

[4] BamdorfFNielsen, O.E. (199 1). Modified signed log likelihood ratio. Biometrika

78, 55'7-563.

[Z] Barndofi-Nielsen, O.E. and Blaesild, P. (1986). A note on the calculation of

Bartlett adjustrnents. J. R. Statist. Soc. B 48, 353-338.

[6] Barndofi-Nielsen, O.E. and Cox, D.R. (1984). Baxtlett adjustments to the like-

lihood ratio statistic and the distribution of the maximum likelihood estimator.

J. R. Statist. Soc. B 46, 483-495.

[7] Barndofi-Nielsen, O.E. and Cox, D.R. (1989). Asymptotic Techniques for Use

in Statistics. Chapmaa and Hall, London.

BIBLIOGRAPHY 121

[SI BarndorfF-Nielsen, O.E. and Cox, D.R. (1994). Inference and Asymptotics.

Chapman and Hall, London.

[9] Barndofi-Nielsen, O.E. and Hall, P. (1988). On the level-error after Bartlett

adjustment of the likelihood ratio statistic. Biometrika 75, 374-375.

[IO] Berger, J.O. and Bernardo, J.M. (1989). Estimating a product of means:

Bayesian analysis with reference priors. J. Amer. Statist. Assoc. 84, 200-207.

[I l ] Berger, J.O. and Bernardo, J.M. (1991). Reference pnors in a variance compo-

nents problem. Ba yesian In ference in Statistics and Econometrics ( P . Goel and

N.S. Iyengqeds). Springer-Verlag, NY.

[12] Berger, J.O. and Bernardo, J.M. (1992a). Ordered group reference priors wit h

application to the Multinornid problem. Biometrika 79, 25-37.

[13] Berger, J.O. and Bernardo, J.M. (1992b). On the development of the reference

pnor method. Bayesian Statistics 4: Proceedings of the Fourth Valencia Inter-

national Meeting (J .M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith,

eds.). Clarendon Press: Oxford, 33-60.

[14] Berger, J.O. and Yang, Ruo-yong (1992). Noninformative priors and Bayesian

testing for the AR(1) model. Technicd report 92-45C, Depart ment of S tatistics,

Purdue University.

[15] Bemardo, J. M. ( 1979). Reference posterior distributions for Bayesian inference

(with discussion). J. R. Statist. Soc. B 41, 113-147.

BIBLIOGRAPHY 122

[16] Bickel, P.J. and Ghosh, J.K. (1990). A decomposition for the Lkelihood ratio

statistic and Bartlett correction-a Bayesian argument. Ann. Statist. 18, 1070-

1090.

1171 Cox, D.R. and Reid, N. (1987). Pararneter orthogonality and approximate con-

ditional inference. J. R. Statist. Soc. B 49, 1-18.

[lS] Daniels, H.E. (1954). Saddlepoint approximations in statistics. Ann. ilfath.

Statist. 25, 632-650.

[19] Datta, G.S. and Ghosh, J.K. (1995a). On priors providing frequentist validity

for Bayesian inference. Biometrika 82, 37-46.

1201 Datta, G.S. and Ghosh, J.K. (1995b). Noninformative priors for maximal in-

variant parameter in group models. Test 4, 95-114.

1211 Datta, G.S. and Ghosh, M. (1995). Some remarks on noninformative priors. J.

Amer, Statist- Assoc. 90, 1357-1363.

[22] Datta, G.S. and Ghosh, M. (1996). On the invariance of noninformative priors.

Ann. Statist. 24, 141-159.

1231 Dawid, A.P. (1991). Fisherian inference in likelihood and prequential frames of

reference (with disussion). J. R. Statist. Soc. B 53, 79-109.

[24] DiCiccio, T.J. & Field, C.A. and Fraser, D.A.S. (1990) Approximation of

marginal tail probabilities and inference for scdar parameten. Biometrika 77,

77-95.

BIBLIOGRAPHY 123

[25] DiCiccio, T.J. and Martin, M.A. (1991). Approximations of marginal tail prob-

abilities for a class of smooth functions with applications to Bayesian and con-

ditional inference. Biometrika 78, 591-902.

[26] DiCiccio, T.J. and Martin, M.A. (1993). Simple modifications for signed roots

of Iikelihood ratio statistics- J. R. Statist, Soc. B 55, 305-316.

[27] DiCiccio, T.J. and Stern, S.E. (1993). On Bartlett adjustments for approximate

Bayesian inference. Biom etrika 80, 731-740.

[28] Efron, B. (1986). Why isn't everyone a Bayesian? Amer. Statistician 40, 1-11

[29] Efron, B. (1993). Bayes and likelihood calcuiations From confidence intervals.

Biometrika 80, 3-26.

[30] Efron, B. and Hinkley. D.V. (1978). Assessing the accuracy of the maximum

likehood estiarntor: observed versus expected information. Biometrika, 65,457-

487.

[31] Fraser, D.A.S. (1962). On the consistency of the fiducial method. J. R. Statist.

SOC. B 24, 425-434.

[32] Fraser, D.A.S. (1964). Local conditiond sufficiency. J. R. Statist. Soc. B 26,

52-62.

[33] Fraser, D.A.S. (1979). Inference and Linear Models. McGraw-Hill.

[34] Fraser, D. A.S. (1 988). Normed likelihood as saddlepoint approximation. J.

Mult. Anal. 27, 181-193.

BIBLIOGRA PHY 124

[35] Fraser, D.A.S. (1990). Tai1 probabilities fiom observed likelihoods. Biometrika

77, 65-76.

[36] Fraser, D.A.S. and Reid, N. (1988). On conditional inference for a real param-

eter: a differential approach on the sample space. Biometrika 75, 251-264.

[37] Fraser, D.A.S. and Reid, N. (1989). Adjustments to profile likelihood.

Biometrika 76, 47'7-458.

[38] Fraser, D.A.S. and Reid, N. (1993). Simple asymptotic connections between

densities and cumulant generating function leading t O accurate approximations

for distribution functions. Statist. Sinica. 3, 67-82.

[39] Fraser, D.A.S. and Reid, N. (1995). Ancillaries and third order significance.

Utilitas Mathematica 47, 33-53.

[40] Garvan, C.W. and Ghosh, M. (1996). Noninformative priors for dispersion mod-

els. Preprint.

[41] Ghosh, J.K. (1994). Higher Order Asymptotics. N S F - C B M S Regional Confer-

ence Series in Probability and Statistics, Vo1.4: IMS.

[42] Ghosh, J.K. and Mukerjee, R. (1991). Characterization of priors under which

Bayesian and frequentist Bartlet t corrections are equivalent in the mutiparam-

eter case. J. Mult. Anal. 38, 385-393.

[43] Ghosh, J.K. and Mukerjee, R. (1992a). Non-informative priors (with discus-

sion). Ba yesian Statistics 4 (J.M. Bernardo, J.O. Berger, A.P. Dawid and

A.F.M. Smith, eds.). Oxford Press, 195-210.

BIBLIOGRAPHY 125

[44] Ghosh, J.K. and Mukerjee, R. (1992b). Bayesian and frequentist Bartlett cor-

rections for likelihood ratio and conditiond likelihood ratio tests. JI R. Statist.

SOC. B 54 561-875.

[45] Ghosh, J.K. and Mukerjee, R. (l993a). Frequentist validity of highest posterior

density regions in the multiparameter case. Ann. Inst. Statist. Math. 45, 293-

[46] Ghosh, J.K. and Mukerjee, R. (1993b). O n piors that match posterior and

frequentist distribution functions. Canadian J. Statist. 21, 89-96.

[47] Ghosh, J.K. and Mukerjee, R. (lW4). Adjusted versus condit iond likelihood:

power properties and Bartlett-type adjustment. J. R. Statist. Soc. B 56, 185-

188.

[48] Ghosh, J.K. and Mukerjee, R. (1995a). On perturbed ellipsoidal and highest

posterior densi ty regions wit h approximate frequentist validity. J. R. Stat k t .

SOC. B 57, 761-769.

[49] Ghosh, J.K. and Mukerjee, R. (19% b). Frequentist validity of highest postenor

density regions in the presence of nuisance parameters. Statistics 63 Decisions.

13, 131-139.

[50] Ghosh, M. and Mukerjee, R. (1996). Recent developments on probability match-

ing priors. Preprint .

[5l] Ghosh, M. & Carlin, B.P. and Srivastava, M.S. (1995). Probability matching

pnors for linear calibration. Test 4, 333-357.

BIBLIOGRAPHY 126

[52] Ghosh, M. and Yang, M X . (1996). Noninformative pnors for the two s m p l e

normal problem. Test 5, 145-157.

[53] Hartigan, J.A. (1964). Invariant prior distributions. J. R. Statist. Soc. B 28,

536-845.

[54] Jeffreys, H. (1964). An invariant f o m for the prior probability in estimation

problem. Proc. R. Soc. London A 186, 453-461.

[55] Johnson, R.A. (1970). .4symptotic expansions associated with posterior distri-

butions. Ann. Math. Statist. 41, 851-864.

[56] Kass, R.E. and Wasserman, L. (1996). Formal rules for selecting prior distri-

butions: a review and annotated bibliography. J. Amer. Statist. Assoc. 91,

1343-1370.

[57] Laplace, P.S. (1820). Essai philosophique sur les probabilit'es. English transla-

tion: Philosophical Essa ys on Probabilities, 1951. New York: Dover.

[58] Lee, C.B. (1989). Cornparison of frequentist coverage probability and Bayesian

posterior coverage probability, and applications. Ph.D. t hesis, Purdue Univer-

sity.

[59] Lugannani, R. and Rice, S.O. (1980). Saddlepoint approximation for the dis-

tribution of the sums of independent random vxiables. Adv. A p p l . Prob. 12,

475-490.

[60] McCullagh, P. (1957). Tensor Methods in Statistics. Chapman and Hall, Lon-

don.

BIBLIOGRA PEY 127

1611 McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment

of profile Likelihoods. J. R. Statist. Soc. B 52, 325-344.

1621 Mukerjee, R. and Dey, D.K. (1993). Frequentist validity of posterior quantiles

in the presence of nuisance parameter: Higher order asymptotics. Biometrika

80, 499-505-

[63] Mukerjee, R. and Ghosh, M. (1996). Second order probability matching priors.

Preprint.

1641 Nicolau, A. (1993). Bayesian intervals with good frequentist behavior quantiles

in the presence of nuisance parameters. J. R. Statist. Soc. B 55, 377-390.

[65] Peers, H.W. (1965). On confidence points and Bayesian probability points in

the case of severai puameters. J. R. Statist. Soc. B 27, 16-27.

[66] Peers, H. W. (1968). Confidence properties of Bayesian interval est imates. J. R.

Statist. Soc. B 30, 535-544.

1671 Pierce, D.A. and Peters, D. (1992). Practical use of higher order asymptotics

for multiparameter exponential families (wit h discussion). J. R. Statist. Soc. B

54, 701-738.

[68] Rao, C.R. and Mukerjee, R. (1995). On posterior credible sets based on the

score statistic. Statist. Sinica 5, 781-791.

[69] Reid, N. (1958). Saddlepoint methods and statistical inference (with discus-

sion). Statist. Sci. 3, 213-238.

Reid, N. (1995). Likelihood and Bayesian approximation methods (with dis-

cussion). Bayesian Statistics 5 (J.M. Bernardo: J.O. Berger, A.P. Dawid and

A.F.M. Smith, eds.). Oxford University Press, 1-18.

Severini, T.A. (1991). On the relationship between Bayesian and non-Bayesian

interval estimates. J. R. Statist. Soc. B 53 , 611-615.

Severini, T. A. (1993). Bayesian interval estimates which are also confidence

intervals. J. R. Statist. Soc. B 55, 533-540.

Skovgaard, LM. (1987). Saddlepoint expansions for the conditional distribu-

tions. J. App. Prob. 24, 875-887.

Skovgaard, LM. (1990). On the density of the minimum contrast estimators.

Ann- Statkt . 18, 779-789.

Stein, C. (1985). On the coverage probability of confidence sets based on a prior

distribution. Sequential Methods in Statistics. Banach Center Publications, 16,

485-514. PWN-Polish Scientific Publishers, Warsaw.

Sun, D. and Ye, K. (1995). Reference prior Bayesian analysis for normal mean

products. J. Amer. Statist. Assoc. 90, 589-597.

Sun, D. and Ye, K. (1996). Frequentist vaiidity of posterior quantiles for a tivo

parameter exponent ial farnily. Biome trika 83, 55-65.

Sweeting, T.J. (1995a). A framework for Bayesian and likelihood approxima-

tions in statistics. Biometfika 82, 1-24.

Sweeting, T.J. (1995b). A Bayesian approach to approxirnate conditional infer-

ence. Biometrika 82, 25-36.

BIBLIOGRAPHY 129

[80] Tibshirani, R.J. (1989). Non-informative piors for one parameter of rnany.

Biometrika 76, 604-608.

[Sl] Tierney, L.J. and Kadane, J.B. (1986). Accurate approximations for posterior

moments and marginal densit ies. J. Amer. Statist. Assoc. 81, 82-86.

[S2] Wdker, A.M. (1969). On the asymptotic behaviour of the posterior distribution.

J. R. Statist. Soc. B 31, 80-88.

[83] Welch, B.L. and Peers, H.W. (1963). On formulae for confidence points based

on integrals of weighted likelihoods. J. R. Statist. Soc. B 25, 318-329.

l MAGE NALUATION TEST TARGET (QA-3)

APPUED INIAGE , lnc 1653 East Main Street -

,=A Rocheçter, NY 14609 USA -- --= Phone: 71 61482-0300 -- -- - - F a 71 6M88-5989

some aspects of bayesian frequent asympt ot

Documents