some aspects of bayesian frequent asympt ot
TRANSCRIPT
Some Aspects of Bayesian and Frequent kt Asympt ot ics
Jiahui Li
A thesis submitted in conformity with the requirements
for the degree of Doctor of Philosophy
Graduate Department of Statistics
University of Toronto
@Copyright by Jiahui Li 1998
National Library 1*1 of Canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et BibliographicServices servicesbibliographiq~e~
395 Wellington Street 395, rue Wellington ûttawaON K1AONQ OttawaON KIAON4 Canada Canada
The author has granted a non- exclusive licence allowing the National Lïbrary of Canada to reproduce, loan, distribute or seil copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliotheque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation,
Some Aspects of Bayesian and Frequentist Asymptotics
Jiahui Li
Ph.D. 1998
Department of S tatistics
University of Toronto
Abst ract
In this thesis we consider various aspects of asymptotic theory in Bayesian
inference and frequentist inference. We give a detailed review of recent develop-
ments on matching priors, investigate their relationships with each other and their
invariace properties, and discuss how to obtain appropriate matching priors. We
investigate matching priors in the product of normal means problem (Berger &
Bernardo, 1989) and suggest a class of priors. We give an introduction to the
shrinkage argument and provide examples to demonstrate how to derive matching
priors by using the shrinhge argument. In the scalar parameter case, we apply
the shrinkage argument to derive the frequentist conditional distribution to order
0(n-2) of the maximum likelihood estimate given an ancillary statistic. Using this
result, we venfy the p' formula and Lugannani & Rice type formula, obtain a version
of the renormalizing constant in the p* formula and an expression of the p' formula
to order 0(n-2) for general models, and consider constructing confidence intervals
to O(n-*). We further use the shrinkage argument in the nuisance parameter case
to obtain the expansion of the frequentist conditional distribution function to order
0(n-3/2) of the signed root of the conditional likelihood ratio statistic of Cox k Reid
(1987) in location-scale models. We also discuss this approach for other models.
Acknowledgement s
1 would like to express my sincere appreciation to my supervisor, Professor
Nancy Reid, for introducing me to this research area and her invaluable insight,
advice, and for her encouragement and assistance through every step of my Ph.D.
program.
1 would like to thank Professor David Andrews and Professor Mike Evans for
their helpful comments leading to a better version of this thesis.
1 would also Like to thank Professor Mike Evans, Professor Keith Knight, and
al1 other faculty members, staffs and graduate students, for their help while I was
studying in the Department of Statistics.
1 am indebted to the University of Toronto and my supervisor for the generous
financial support on my studies and researches in the Department of Statistics,
University of Toronto.
Finally, 1 would like to thank my family and £riends for their constant help
and encouragements on my pursuing the Ph.D. degree in the University of Toronto.
... I l l
Contents
1 Introduction 1
1.1 Asymptotics in Bayesian Inference . . . . . . . . . . . . . . . . . . . . 1
1.2 Frequentist Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Matching Priors and Their Invariance Properties 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Matching Priors via the Posterior Quantiles . . . . . . . . . . . . . . 11
2.3 MatchingPriors viatheDistribution Function . . . . . . . . . . . . . 15
2.4 Matching Priors via the Posterior Regions . . . . . . . . . . . . . . . 19
2.4.1 Equal Tai1 Areas Consideration . . . . . . . . . . . . . . . . . 19
2.4.2 Frequentist Statistics Consideration . . . . . . . . . . . . . . . 30
2.4.3 HPD Consideration . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 OtherIssues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Invariance Properties of Matching Priors . . . . . . . . . . . . . . . . 26
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Casestudies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Matching Priors in the Product of Normal Means Problem 38
. . . . . . . . . . . . . . . . . . . . . . . . 3.1 h t roduction and Notation
. . . . . . . . . . . . . 3.2 First Order and Second Order Matching P ~ o ~ s
. . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Cornparison of T, and a.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 A Class of Priors
4 The Shrinkage Argument and Matching Priors
. . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Shrinkage Argument
4.2 Frequentist Distribution of the SRLR S tatistic by Using the Shrinliage
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Argument
. . . . . . . . . . . . . . . . . . . . 4.2.1 Introduction and Notation
4.2.2 The Frequentist Distribution of the SRLR Statistic . . . . . .
. . . . . . . . . . . . . . 4.3 Matching Priors via the Posterior Quantiles
. . . . . . . . . . . . . 4.3.1 Cdculating the Posterior Quantiles of 0
4.3.2 Frequentist Coverage Probabilities of Posterior Intervals . . .
. . . . . . . . . . . . . . 4.3.3 Matching via the Posterior Quantiles
. . . . . . . . . . . . . 4.4 Matching Priors via the Distribution Function
5 Tai1 Probability of MLE to O(n-*) by Using the Shrinkage Argu-
ment
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Prelirninaries
. . . . . . . . . . . . . . . . . . . . . 5.2 Posterior Probability to 0 ( n - 2 )
. . . . . . . . . . . . . . . . . . . . . . . 5.3 Some Detailed Calculations
. . . . . . . . . . . . . . . . . . . . . . . . 5.4 Tai1 Probability to 0 ( n - 2 )
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.2
6 The p' Formula and Tai1 Probability of MLE to O(n-') in the E're-
quentist Setup 85
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Direct Integration of the p' Formula . . . . . . . . . . . . . . . . . . . 85
6.3 p* Formula to 0 ( n - 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
. . . . . . . . . . . . . . . . . . . . . 6.4 Confidence Intervds to 0 ( n - 2 ) 97
7 Conditional Distribution of the SRCLR Statistic by Using the Shrink-
age Argument 103
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 Marginal Posterior Density . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3 Approximation to the Marginal Postenor Deosity . . . . . . . . . . . 108
7.4 Marginal Posterior Distribution . . . . . . . . . . . . . . . . . . . . 109
7.5 Frequentist Conditional Distribution of the SRCLR Statistic . . . . . 113
List of Tables
3.1 Shifted Coverage Probabilities to ~ ( n - ~ ' ~ ) of 0.05 Posterior Quantiles. 44
3.2 Shified Coverage Probabilities to of the Posterior Quantiles
via Lrsing the Prior r0.5- . . . . . . - . . - . . . . - . - . . . - . . . . 48
vii
Chapter 1
Introduction
Traditionally there axe two ways of doing statistical inference, one is the
Bayesian approach, the other one is the frequentist approach. In this thesis, we
consider the contacts between these two approaches in the asymptotic aspect. In
the following, we give a b ief review on some background and the current develop-
ments in Bayesiân and frequentist asymptotic methods (see also Reid, 1995), then
we outline the work of this thesis in the last section.
1.1 Asymptotics in Bayesian Inference
One important issue in Bayesian inference is the choice of the pnor density. In
practice most of the analyses are performed with noninformative priors. The philos-
ophy related to this issue is the long time debate between the subjective inference
and the objective inference. Kass & Wasseman (1996) gave a full discussion on this
issue. The earliest studies of noninformative priors are due to Laplace (1820) who
used the unifom prior, and Jeffreys (1946) who used the prior ~ ( 0 ) a1 1@) Ili2 for
inference, where 0 is the parameter, Il($) is the per observation Fisher information
CHAPTER 1. INTRODUCTION 2
matrix. There are now a variety of methods that have been proposed for constmct-
ing noninformative priors. For a detailed review and appraisal, refer to Kass &
LVasserman (1996). We here shall concentrate on matching priors and reference
priors.
Matching priors are defined by ensuring some posterior quantities have the
frequentist coverage probabilities to some asymptotic order. This met hod of derïv-
ing noninformative pnors dates back to Welch & Peers (1963) and Peers (1965),
who derived matching priors by ensuring the postenor intervals have the frequentist
coverage probabilities to first order or second order. Tibshirani (1989) considered
matching via the posterior quantiles as in Peers (1965) and obtained an explicit form
of the first order matching priors. Mukerjee k Dey (1993) extended the matching
via the posterior quantiles (Peers, 1965; Tibshirani, 1989) to the next order. Several
rnatching criteria have been proposed, the most importmt one is matching via the
posterior quantiles (Welch & Peers, 1963; Peers, 1965; Tibshirani, 1989; Mukerjee &
Dey, 1993; etc.), followed by matching via the distribution function (Ghosh & Muk-
erjee, 1993b; Mukerjee & Ghosh, 1996; etc.), and matching via the posterior regions
(Peers, 1968; Severini, 1991; Ghosh & Mukerjee, 1991, 1992b, 1995b; etc.). We s h d
give a detailed review and investigation in Chapter 2 of the current matching priors,
their relationships and invariance properties. We also discuss the matching priors
in the product of normal means problem of Berger & Bernardo (1989) in Chapter
3, and discuss deriving matching pnors in Chapter 4.
Reference priors, initiated by Bernardo (1979) and refined Iater by Berger tk
Bernardo (1989, 1991, 1992a, 1992b) (see also Berger & Yang, 1992), are defined
by maximizing the missing information, the notion defined by Bernardo (1979).
When there are no nuisance parameters, the reference prior tums out to be Jeffreys7
CHAPTER 1. INTRODUCTION 3
prior. Ln the nuisance parameter case, one needs to use a stepwise procedure, and
the reference ptiors are usually different from Jeffreys' prior. Reference priors are
quite different from matching priors in their constructions. In geoeral there is no
direct cornparison between t hese two kinds of priors eit her in reference properties
or in matching properties. But in some specific situations, there may be some
similar properties. One interesting phenomenon is the form of reference priors and
Tibshirôni's h s t order matching priors. For details see Berger (1992) and Kass &
Wasserman (1996). We shall also discuss this phenornenon in Chapter 2 and discuss
the matching properties of two reference priors of Berger & Bernardo (1989) in
Chapter 3.
Another issue in Bayesian inference is the approximation of the posterior den-
sity and the posterior distribution. When there are nuisance parameters present
or when the dimension of the parmeter becomes large, the calculations are quite
involved, and so simpler approximations are required. In the nuisance parameter
case, Tierney & Kadane (1986) obtained the approximation of the marginal poste-
rior density to order 0(72-~ /~ ) , and the approximation of the posterior moments to
the order O(n-'). DiCiccio Field & Fraser (1990) obtained the approximation of
marginal posterior distribut ion to order 0 ( n - 3 / 2 ) . We shall discuss the approxima-
tion of posterior density and posterior distribution in Chapter 5 (scalar parameter
case) and in Chapter 7 (nuisance parameter case).
1.2 Frequentist Asymptotics
Asymptotic theory in frequentist inference is mostly related to approxima-
tions of the conditional density and the tail probability of the maximum likelihood
CHAPTER 1. INTROD UCTION 4
estimate, sometimes c d e d p' based approximations.
The p' formula of Bamdorff-Nielsen (1980, 1983) approximates the conditional
density of the maximum likelihood estimate given an ancillary statistic to order
with a renormalizing constant. In transformation rnodels the p' formula
is exactly equal to the conditional density and the renormalizing constant is free of
the parameter. In exponential models wit h the canonicd pararneter, the p- formula
equals the renormalized saddlepoint approximation (Reid, 1988). A quite general
derivation of the p* formula is given by Skovgaard (1990). We shall discuss the p-
formula in detail in Chapter 6.
To use p- to cornpute the tail probability we need to integrate either numeri-
cally or analytically over values of the maximum likelihood estimate more extreme
than the observed value. An additional integration is needed to compute the renor-
molizing constant. In the one parameter case, approximations of the tail probability
to order O(n-Y2) are available. For details please see Lugannani & Rice (1980) for
exponential models, DiCiccio Field & Fraser (1990) for location models, Barndofi-
Nielsen (1991) and Fraser & Reid (1993) for general models. We shall in Chapter
5 and Chapter 6, discuss the approximation of the tail probability of the maximum
likelihood estimate.
In the case when nuisance parameters are present, approximation of the tail
probability for a scalar pararneter of interest is available in exponential rnodels
with canonical parameters, in which the nuisace parameters are eliminated by
conditioning (Skovgaard, 1987; Fraser & Reid, 1993); and in transformation models
with transformation parameters, in which the nuisance parameters are eliminated
by rnarginalization (DiCiccio Field & Fraser, 1990). In Chapter 7, we s h d discuss
the approximation in the nuisance parameter case.
CHAPTER 1. INTROD UCTION 5
The p* formula still holds its accuracy to O(TZ-~/ ' ) when the ancillary statistic
replaced by a second order approximate ancillary statistic. In Fraser & Reid (1995),
it is shown that third order inference needs only the observed likelihood and the
tangent directions for a second order ancillary. This makes the p- formula and
other tail probability approximations more applicable to a variety of models. For
the construction of approximate ancillaries, see BanidorfF-Nielsen (1980), BamdorfF-
Nielsen & Cox (1994, Chapter 7), McCullagh (1987, Chapter 8), Skovgaard (1990)
and Fraser k Reid (1995).
Cornparing to the p* formula of Barndofi-Nielsen (1980, 1983), the tangent
exponential rnodel of Fraser (1988, 1990) and Fraser & Reid (1993, 1995) is another
approach for deriving the conditional density of the maximum likelihood estimate.
It is a generalization of the p* formula. Because in the later chapters we do not
mention this again, we, in the following, give a more detailed description of the
mechanism of this approach, and thus we can understand more about the similarities
and differences of different kinds of the approaches.
For models with parmeters and variables of the same dimension, the tangent
exponential model approximates the given model to third order in a first order
derivative neighbourhood of the data point and to the second order otherwise. To
cdculate the tangent exponential model we need only the observed likelihood and
the likelihood gradient at the data point. In the general model, the log-likelihood
function is [ ( O ; x) where û is of dimension d and x is of dimension p. Suppose that
there is a third order ancillary, then there is a conditional likelihood with the same
variable and parameter dimension having third order accuracy. Now we can use a
tangent exponential model to work on this conditional likelihood. Since the model is
conditional with respect to a third order âncillary, the conditional likelihood gradient
CHA PTER 1. INTRODUCTION 6
becornes the full Lkelihood gradient tangent to the ancillary surface. Let xo be the
observed data point, V be the directions tangent to a third order ancillary at the
data point, then t(B;xo), 9 = e,v(B;zo) fully determine the tangent exponential
model within the conditioning of the ancillary. In Fraser k Reid (1995) it is shown
that for third order inference it suffices to have V tangent only to a second order
ancillary. The problem left is to find the tangent directions for a second order
ancillary.
A procedure has been developed in Fraser & Reid (1995) for calculating the
tangent directions to a second order ancillaxy at the data point. First we can use the
approximate location model theory (Fraser, 1964; Fraser & Reid, 1995) to determine
a f i s t order ancillary a t the data point. In Fraser & Reid (1995), it is shown that
there is a second order ancillary with the same tangent directions as the first order
ancillary at the data point. So in applications we can work with this first order
ancillary suggested by Fraser & Reid (1995).
Our work focuses on matching priors in Bayesian inference, the shrinkage ar-
gument which has been used in several papers for deriving matching priors, and
derivation of the frequentist version of the densities and distribution functions of
some quantities using the shrinkage argument.
In Chapter 2 we give a detailed review of the developments of matching pri-
ors and investigate relationships among various matching priors. We sort out the
partid differentiai equations which matching priors need to satisfy. Based on match-
ing methods, we categorize matching priors into three classes: matching priors via
the posterior quantiles, matching priors via the distribution function nnd matching
CHAPTER 1 . INTRODUCTION 7
priors via the posterior regions. These three classes cover h o s t al1 current match-
ing priors. Within the matching priors via the posterior regions, we further divide
them into Equal Likelihoods Consideration, Frequentist Statistics Consideration?
and Highest Posterior Density Consideration. We first try to give a different expla-
nation of why the first order matching via the distribution function is equivalent
to the first order matching via the posterior quantiles, but second order matching
is different. We investigate the invariance properties of matching priors currently
available, and thus we extend the findings of parameterkation invariance of match-
ing priors via the posterior quantiles and via the distribution function of Mukerjee
& Ghosh (1996). Based on the relations with each other and their invariance prop-
erties, we make suggestions on how to derive reasonable matching priors. Finally
we give some exarnples to illustrate the current work.
In Chapter 3, we consider matching priors in the product of normal means
problern of Berger & Bernardo (1989). We first obtain the matching properties of
the flat prior. Based on the matching properties, we make cornparisons among the
flat prior and two reference pt-iors derived in Berger k Bernardo (19S9), and give a
noninformative prior from a class of priors we suggested. In Chapter 4, we introduce
the shrinkage orgument (Bickel & Ghosh, 1990; Dawid, 1991; Ghosh & Mukerjee,
1991; Sweeting, 1995a, 1995b). This argument has been widely used as an effective
tool to evaluate the frequentist probabilities of some posterior quantities, and thus
to obtain matching priors. As demonstrations, we use the shrinkage argument to
denve the frequentist distribution of the signed root of the likelihood ratio statistic,
and to derive matching piors via the posterior quantiles (Welch & Peers, 1963). We
also consider matching via the distribution function and obtain interesting results,
matching via the distribution function is equivalent to matching via the posterior
CHAPTER 1. INTRODUCTION 8
quantiles in the first case, but leads to matching via the posterior quantiles in the
second case-
In Chapter 5, in the scalar parameter case we use the shrinkage argument
to denve the frequentist conditional distribut ion to order 0 ( n - 2 ) of the maximum
likelihood estimate given an ancillary statistic. This result extends the current
approximations in the literature by the magnitude of one more order. Mie also obtain
the posterior distribution to order 0(n-2) , and the posterior and the frequentist
conditional Bartlett corrections of the likelihood ratio statistic. Then in Chapter 6,
we verify the pu formula being the conditionai density of the mairnum likelihood
estimate to order O(n-* ) ( Barndofi-Nielsen, 1980, 1983) by directly integrat ing the
p' f o m d a and comparing to the result in Chapter 5. We look for the renormdizing
constant in the p' formula and extend the pœ formula of Barndofi-Nielsen (1980,
1983) to order 0(n-2) . We also verify a Lugannani & Rice type formula and obtain
the third order error terms of different kinds of approximations to the tâil probability
of the maximum likelihood estimate. In the final section of Chapter 6, we consider
constructing confidence intervals to order 0(n-2) .
In Chapter 7, we extend the approach used in Chapter 5, i.e. using the shrink-
age argument, by including the nuisance parameter to derive the frequentist condi-
tional distribution to order ~ ( n - ~ ' ~ ) (DiCiccio Field & Fraser, 1990) of the signed
root of the conditional likelihood ratio statistic (Cox & Reid, 1987) in location-scale
models, and discuss the possibilities for other models. We also obtain approxima-
tions to the marginal posterior density (Tiemey & Kadane, 1986) and the marginal
posterior distribution ( DiCiccio Field & Fraser, 1990) to order O (n-3/2) .
Chapter 2
Matching Priors and Their
Invariance Properties
In this chapter we give a detailed review of the development of matching priors
and investigate relationships among various matching priors. We also explore the
invariance properties of the matching priors currently available. Based on these d a -
tionships and invaxiance properties, we make some suggestions for the development
of reasonable matching priors. Findly we give some examples of this work.
2.1 Introduction
In recent years there has been a considerable development in deriving noninfor-
mative prion through ensuring the frequentist validity of some Bayesion procedures.
other types of noninformative priors.
formative priors in Bayesian inference
These types of noninformative priors are called "matching priors", in contrast to
Aside from their role in serving as the nonin-
, they are very useful for constructing accurate
confidence regions which can be difficult sometimes in the frequentist approach, e.g.
CHAPTER 2. MATCHING PRIORS 10
in the product of normal rneans problern (Berger & Bernardo, 1989). The earliest
studies on this issue date back to Welch & Peers (1963) and Peers (1965, 1968).
Further studies are followed by Stein (1985), Tibshirani (1989) and Mukerjee k
Dey (1993), etc. There have been significant developments; more and more match-
ing methods have been suggested. Most of the current results are on second order
matching, and have multiparameter and nuisance parameter versions. For discus-
sion on matching priors, please refer to Ghosh t Mukerjee (1992a), Reid (1995),
Kass & Wasserman (1996) and Ghosh & Mukerjee (1996).
Our present work focuses on the following aspects. (1). A variety of procedures
have been developed. So there is a question as to which one is better, or whether
there is an order for the choices. For this reason, we investigate different procedures
for deriving matching priors and find out the relations among these. (3). Invaxïance
properties are important for priors in Bayesian inference. If the matching priors
do not have the matching property under another parameterkation, t hen they are
not convincing to serve as noninformative priors, and the attempt to make global
parameter orthogonality may cause problems. In this regard, we shall explore the
invariance properties of the matching priors currently available. (3). Based on the
results of (1) and (2), we shall give suggestions on how to use the current procedures
to derive more reasonable and better noninformative priors. (4). We shdl give some
examples to compare the matching pnors derived via the present procedures.
We here make some assumptions. Suppose that we have i.i.d. observations
x = ( X I , - -. , Xn)T fiorn a mode1 f(s; 6 ) with parameter O, where t9 = (O1? - - - ,O,)T
belonging to an open subset of RP has prior density r(0). Let [ ( O ; 2) be the log-
likelihood function and 4 be the maximum likelihood estirnate of 8. We assume that
O1 is the parameter of interest and all the other parameters are nuisance parameters
CEAPTER 2. IMATCHING PRTORS 1 I
if not stated otherwise. The regularity conditions needed for deriving the results we
shall introduce in the following, can be found in Bickel & Ghosh (1990).
For convenience, we let Il = I l ( @ ) = (IG),,, be the per observation Fisher
information matrix, jl = ( j i j ) be the per observation observed information rnatrix
evaluated at ê, j;' = (3") and also
The study of matching priors via the posterior quantiles dates back to Welch &
Peers (1963) who worked on the scalar parameter case, i.e. p = 1 with no nuisance
parameter. Let O:,)(X) be the a-quantile of the posterior distribution of 6 . i.e.
what Welch & Peers did is to seek priors such that
where Po refers to probability in the repeated sampling model. P ~ o ~ s satisfying
the above conditions are known as the first order matching priors. Welch & Peers
showed that the first order matching prior is Jeffreys' prior? n(0) oc 1~(6) ' /* .
It is a naturd step to move to the next asymptotic order. Welch & Peers
(1963) considered also seeking priors such that
CEAPTER 2. MATCEING PRIORS 12
and
such priors being called the second order matching priors. Welch & Peers found
that in this consideration, the prior a is a second order matching pnor if and only
if it satisfies both - $ { I ; " ~ K ) = O, the condition for h s t order matching, and in our
notation
It tums out that there is no solution in generd, but if the skewness of the score
function doesn't depend on the parameter 8, then Jeffreys' pnor ~ ( 8 ) cc 1~(8)''* is
the second order matching prior. Second order matching via the quantiles therefore
depends on the model.
Peers (1965) extended the study of Welch & Peers (1963) by including nuisance
pàrameters. Let $1 be the parameter of interest and B:( , ) (x) be the a-quantile of
the posterior marginal distribution of B I , i.e.
Peers sought priors such that
He found that in order to have the first order property, the priors should satisfy the
following partid differential equation
It turns out that this partial differential equation has infinitely many solutions.
Peers discussed possible solutions for this equation, but there is no explicit form
CHAPTER 2. MATCarNG PRIORS 13
and each mode1 must be investigated separately- Peers (1965) suggested using a
pnor which satisfies the first order matching conditions for each component of O, i.e.
the parameters of interest are each cornponent of 8, but typically there is no such
prior.
An important step was made by Tibshirani (1989). Using the result of Stein
(1985) and parameter orthogonality, he obtained an explicit f o m of the first order
matching priors. Let the parameter of interest di be orthogonal to the nuisance
parameters (O2' - -a- , O p ) , ive. Ii t = O for al1 0, i = 2, - - -, p. From equation (2.21, the
solutions are
r (@) a g(e2, - - O , ep)~:i2, (2.3)
where g is âny smooth positive arbitrary function. This kind of the priors are termed
Tibshirani's first order matching priors via the posterior quantiles. Nicolaou (1993)
gave a rigorous proof of the result of Tibshirani (1989).
It is interesting that Tibshirani's first order matching priors have the form of
Berger-Bernardo's reference p io r for a particular choice of g if roles of the parameter
of interest and nuisance parameter were switched. Berger (199%) made a comment
on this phenornenon (see also Kass & Wasserman, 1996). In regard to the choice of
the arbitrary function g, Tibshirani (1989) (see also Peers, 1965; Datta, 1996) noted
that, one c m consider each component of 0 as parameter of interest, but global
parameter orthogonalization is difficult and sometimes it is not possible, and also
this consideration can lead to no solution (see also Ghosh 9r Mukerjee, 1993).
Mukerjee & Dey (1993) considered matching to the next order in the case of
one nuisance parameter. They obtained one more partial differential equation, in
addition to (2.2) of Peers (1965), i.e. second order matching pnors should satisfy
two partial different ial equations. Their derivation is in the general non-ort hogonal
CHAPTER 2. MATCHING PRIORS 14
parameter setup. Most of the time these two partial differential equations lead to
a unique solution, but sornetimes there is no solution and sometimes al1 the first
order matching priors are second order matching priors. Mukerjee & Ghosh (1996)
generalized the results of Mukerjee 9t Dey (1993) to the case of several nuisance
parameters. They showed that a p io r ?r is a second order matching pnor via the
posterior quantiles if and only if it satisfies both equation (2.2) and the following
equat ion
Under parametric orthogonality, the second order matching priors are
where g = g(02 , - - - ,O,) sat isfies
and = Eg[{& log f (XI; If there is no nuisance pararneter present,
then the above equation becornes $ {1<1,1,1 1 ~ ~ ' ~ ) = O , showing t hat Jeffreys' prior
~ ( 0 ) oc 1 ~ ( 9 ) ' / ~ is the second order matching prior if and only if the skewness of
the score function is free of the parameter O , which is the result of Welch & Peers
(1963).
In the case of no nuisance parameter, equation (2.4) reduces to the following
This equation is different frorn ( 2 4 , the second equation of Welch k Peers (1963).
But if combined with the first equation, then the two equations are the same as the
CHAPTER 2. MATCHING PRIORS 15
two equations of Welch & Peers (1963). This is because Welch & Peers (1963) eval-
uated the moment generating function while Mukerjee k Dey (1993) and Mukerjee
k Ghosh (1996) evaluated directly the coverage probability. The discussion of this
paragraph is due to the fact that, if we consider the two-sided posterior intervals of
equal tail areas to have the frequentist coverage validity, then the only condition is
to let the priors satisfy (2.4), the second partid difFerential equation of Mukerjee &
Dey (1993) and Mukerjee & Ghosh (1996). We shall discuss this issue later.
In specific situations, probability matching pnors via the posterior quant iles
were considered by several authors. For details please refer to Lee (19S9), Datta k
Ghosh (1995a), Sun & Ye (1995, 1996), Ghosh Carlin & Srivastava (1995), Ghosh
& Yang (1996) and Garvan & Ghosh (1996).
2.3 Matching Priors via the Distribution Func-
Matching via the distribution function follows from the relationship of the
posterior quantiles to the posterior distribution. Ghosh & Mukerjee (1993b) and
-11 1/2 6 A Mukerjee& Ghosh (1996) considered thes ta t i s t icT= (nl j ) ( 1 -Oi), the pos-
terior standardized version of O1. While in Datta & Ghosh (1995a), they considered
a standardized version of a parametric function. The matching pnors based on the
distribution function of the statistic T are the priors such that
for aU real t which is free of O and independent of 2. In the error te-, i con be 2 or
3 which corresponds to first order matching or second order matching respectively.
It turns out that the first order matching priors should satisfy the same partial
CHAPTER 2- MATCHING PRiORS 16
differential equation (2.2) as via matching the posterior quantiles of O1 (Ghosh &
Mukerjee, 1993; Datta k Ghosh, 1995a). The second order matching priors axe
different from those obtained via matching the quantiles (Mukerjee & Ghosh, 1996).
In addition to equation (2.2), they should ais0 satisfy the following two partial
differential equat ions
and
& = rsi - Tsi . From the performance of some examples (see 52.7, Case
Studies), it seems that these conditions are more difficult to satisfy than those
obtained via matching the posterior quantiles. In the scalar parameter case, we
obtain that in $4.4, matching via the distribution function is equivalent to matching
via the postenor quantiles using the statistic r , and can lead to matching via the
posterior quantiles if using the statistic p.
One question might a i s e that, why is first order matching via the distribut ion
function of T the same as that via the posterior quantiles of BI, but second order
matching is different? To some extent, this phenornenon can be explained by the
use of the following lemma. We cal1 it the generalized Cornish-Fisher inversion. For
details, please refer to Barndofi-Nielsen & Cox (1989, Chapter 4) and Barndofi-
Nielsen & Cox (1994).
Lemma 2.1 Suppose the distribution function of Y has the following form
CHAPTER 2. MATCEING PRIORS 17
We assume that the inverse of F, is well defined. Let Fn(y,) = a, Le. y, is the
a-quantile of F,, then y, has the following form
where 8(za) = a, and
PROOF: According to the distribut ion function given above,
Now we expand functions @(y,), #(y,), Rl(ya) and R2(y,) iri y, about z, to get
Let the terms O(n-II2), O(nV') equal to zero and @(z,) = a, we then obtain the
result as stated, O
Suppose that the posterior distribution of T has the following expansion
CHA PTER 2. MATCHING PRIORS 18
and the fiequentist distribution of T has the following expansion
First order matching via the distribution function is to let the pnor x(B) satisfy the
following equation with an error of order 0 ( n - ' / 2 ) ,
for al1 red t.
Now we consider matching via the quantiles of O1. Firstly we note that, from
the definition of T , the a-qumtile of B1 c m be changed to the form of the a-quantile
of T. From the above lemma, the postenor a-quantile of T can be expressed as
First order matching via the posterior quantiles of O1 is to let the p ior n satisfy
On the other hand, the a-quantile of the frequentist distribution of T c m be ex-
pressed as
So first order matching via the quantiles is equivalent to letting the prior T satisfy
for all z,, with an error of order O(n-'j2). And so first order matching via the
distribution function is the same as first order matching via the quantiles. Also
from the above lemma, we c m see that, second order matching via the distribution
function is not the same again in general as that via the quantiles.
CHAPTER 2. MA TCHING PRIORS 19
2.4 Matching Priors via the Posterior Regions
Matching via the posterior regions is a generalization of matching via the
posterior quantiles. Matching via the posterior regions c m &se from the inversion
of certain statistics, e.g. the likelihood ratio (LR) statistic, the profile LR stat istic,
the score statistic and the highest posterior density (HPD) regions.
The earliest study of matching via the posterior regions is due to Peers (1968)
who discussed three kinds of the posterior intervals in the scalar parameter case. The
first is the two-sided Equnl tail areas, the second is the Equal likelihoods and the
third is the Equal posterior densities. We consider these in the subsections below,
under the names: Equal tail areas consideration, Frequentist statistics consideration
and HPD consideration.
2.4.1 Equal Tail Areas Consideration
Peers (1968) considered constmcting two-sided Equal tail areas posterior in-
tervals (OL, du), where
He obtained that, for the Equal tail areas posterior intervals to have frequentist
coverage probabilities to second order, the prior K should satisfy the iollowing dif-
ferential equation
This differentid equation is not the the same as ( 2 4 , the second one in Welch &
Peers (1963) if we just let the error term of order O(n-') be zero. But it is equivalent
to (2.6), the second equation of Mukerjee & Dey (1993) who considered second order
CITA PTER 2. MATCHING PRIORS 20
matching via the posterior quantiles. This type of the Equal tail areas intervals have
the merit of balance of posterior probability on both sides. If we want them to have
the same balance of frequentist coverage probability, then it turns out to be the same
consideration as the second order matching via the posterior quantiles of Welch Sr
Peers (1963).
In the case of one nuisance parameter, for the two-sided posterior intervals of
Equal tail areas to have correct frequentist coverage probabilit ies to second order,
the prior should satisfy the second partial differential equation of Mukerjee Sr Dey
(1993). If the parameters are orthogonal, the equation is
This result can be extended to the general case of several nuisance parameters. It
turns out that, the prior x is the matching prior under the current consideration if
and only if it satisfies the partial differential equation (2.4) of Mukerjee & Ghosh
(1996).
2.4.2 fiequentist Statistics Consideration
Peers (1968) discussed two-sided posterior intervals (OL, Ou) of Equal likeli-
hoods, where [(OL; X) = [(Ov; x). He obtained that, for the Equal likelihoods pos-
tenor intervals t o have frequentist coverage probabilities to second order, the prior
a should satisfy differential equation
Since one always assumes the maximum likelihood estimate is unique for asymptotic
expansion, this Equal likelihoods consideration is thus equivalent to the likelihood
CHAPTER 2. MATCEING PRIORS 21
ratio (LR) statistic consideration, and we s h d refer to the latter in general. Severini
(1991) considered constructing the frequentist probabilj ty intervals based on the LR
statistic. He obtained that, for the intervals to have posterior coverage probabilities,
the prior a should satisfy one differential equation which is equal to (2.11) of Equal
likelihoods of Peers ( 1968).
In the multiparameter case, Ghosh & Mukerjee (1991) considered matching
the posterior Bartlett correction and the frequentist Bartlett correction of the LR
statistic to have an error of order ~ ( n - ' 1 ' ) . Using the prior satisfying the match-
ing condition, the pos terior regions based on t 6e posterior Bart let t corrected LR
statistic have lrequentist validity to the second order. Following the lines of Ghosh
& Mukerjee (1991), it is not difficult to see that, matching the posterior Bartlett
correction aad the frequentist Bartlett correction to have an error of order O(n-I l2 )
is equivalent to matching the posterior regions based on LR statistic to have an error
of order O ( T ~ - ~ / ~ ) . The LR statistic consideration is the multiparameter version of
Equal likelihoods of Peers (1968). Ghosh & Mukerjee (1991) obtained that, the prior
R is the matching prior via the LR statistic consideration if and only if it satisfies
the following partial differential equation
In the two pârameter case, under parametric orthogonality, the above equation
becornes
When there is only one parameter, the above equation reduces to (2.1 1) of Equaf
likelihoods of Peers (1968).
Rao & Mukerjee (1995) considered the posterior regions based on the modified
score statistic to have frequentist validity to second order. The original score stat istic
is S = S ( 6 : 2) = n-11'(6)T~;1e ' (0) (Rao, 1948). What they considered is the
modified version S' = S'(a; X) = n - l l ' ( ~ ) ~ j ~ ' l ? ' ( ~ ) . Rao & Mukerjee (1995) also
discussed a rnodified Wald's statistic CV' = n(ê - ~ ) ~ j l ( ê - 8). The original Wald's
statistic is W = n(6 - B ) ~ I ~ ( B ) ( ~ - 6). They obtained that, for the postenor regions
based on the modified score statistic S* or the modified Wald's statistic W* to have
frequentist validity to the second order, the prior should satisfy the same two partial
different id equat ions
These two partial differential equations are stronger than (2.12) of Bartlet t correc-
tions consideration of Ghosh & Mukerjee ( l 9 9 1 ) , which can be seen as a special case
of the LR statistic consideration. A c t u d y the s u m of the above two equations is
exactly equal to equation (2.12) of Ghosh & Mukerjee (199 1) . In the one parameter
case, Rao & Mukerjee (1995) mentioned that consideration of the original version of
Wald's statistic leads to the sarne result as does the consideration of the rnodified
version, while consideration of the onginal version of the score statistic leads to a
different result cornparing to that of the modified one. But al1 of the considerations
in the one parameter case lead to equation (2.11) of Equal likelihoods of Peers (1968).
For more discussion on Wald's statistic, please refer to Lee (1989).
Ghosh & Mukerjee (1992b) discussed rnatching the posterior and frequentist
Bartlett corrections of the profile LR statistic in two orthogonal parameters case. As
we mentioned for the consideration of the LR statistic of Ghosh & Mukerjee (1991),
CHAPTER 2. MATCHING PRIORS 23
the current consideration is equivalent to matching the posterior regions based on
the profile LR statistic to have an error of order ~ ( n - ~ ' ~ ) . Let el be the parameter
of interest and O2 be the nuisance parameter. The profile LR statistic is
where d2(e1) is the maximum likelihood estimate of OZ holding dl constant. They
obtained that, under parametric orthogonality, the p i o r T is the matching prior
based on the profile LR statistic A, if and only if it satisfies the following partial
different i d equation
At the same tirne, Ghosh & Mukerjee considered matching the posterior and fre-
quentist Bartlett corrections of the conditional LR statistic (Cox & Reid, 1987)
where
is the conditional log-likelihood suggested by Cox & Reid (1987) on the ground of
conditioning, 81 is the maximum likelihood estirnate of el based on tc(OI). In this
consideration of A,, under parametric orthogonality, the pnor rr is matching prior if
and only if it satisfies the following partial differential equation
CHAPTER 2. MATCEING PRIORS 24
2.4.3 HPD Consideration
Highest posterior density (HPD) consideration d s o dates back to Peers ( L968),
who considered the scalar parameter case. He used the oame Eqzial posterior densi-
tics, for the condition n(& 1 X) = r(Bu 1 X) for the posterior interval (OL, Ou). He
obtained that, for the Equal postenor densities intervals to have frequentist cover-
age probabilities to second order, the priors should sat isfy the following differeotial
equat ion
Since the assumptions needed for the asymptotic expânsion ensure that the poste-
rior mode is asymptoticdy unique, this Epual posterior densities consideration is
equivalent to the highest posterior density (HP D ) consideration. Severini ( 199 1)
considered the HPD intervals in one parameter case. He obtained that, for the
intervals to have the frequentist coverage probabilities, the prior R should satisfy
one differential equation which is equivdent to ('2.18) of Equal posterior densities of
Peen (1968).
Resdts on HPD consideration in the multiparameter case are available. Ghosh
& Mukerjee (1993a) considered the following HPD region
where kl-, is defined by
Then the pnor T such that
CHAPTER 2. MATCHING PRIORS 25
if and only if it satisfies the following partial differential equation
In the two parameter case, under parametric orthogonality. the above equat ion
becomes
Comparing to equation (2.13) of the consideration of LR statistic of Ghosh & Muk-
erjee (1 991), the frequentist version of HPD consideration, the difference is just the
signs. When there is only one parameter, then the above equation reduces to (2.15)
of Equal posterior densities of Peers (1968).
In the case with nuisance pararneter, Ghosh & Mukerjee (1995b) obtained
results of the HPD regions having frequentist coverage validity to second order with
q parameters of interest and p - q nuisance parameters. In the case of one pararneter
of interest wit h one nuisance parameter, under global pararnet ric ort hogonality, the
pnor ?r is matching prior via the HPD regions if and only if it satisfies the following
partial differential equation
It is interesting to compare equation (2.21) wit h equation (2.20) of two parameters
with no nuisance parameter case. We can see that, if the pnor is the matching
prior via the HPD regions for each component pararneter of the two parameters,
then this pnor is the matching prior via the HPD regions for both parameters of
interest. But the matching pnor for two parameters of interest doesn't have to be
the matching prior via the HPD regions for any component pararneter of interest.
CHAPTER 2. MATCHING PRIORS 26
If me compare (2.21) of HPD with (2.10) of Equal tail areas, the only difference is
that, (2.10) of Equal tail areas has one more factor $ & { ~ l & n ) . If we restnct
the priors in a class suggested by Tibshirani (1989), i-e. ~ ( 0 ) = g ( ~ 2 ) I:/*, then if
Kli 1i3I2 is independent of the parameter of interest, matching via the HPD regions
is equivalent to matching via the Equal tuil areas.
2.4.4 Other Issues
Comparing to matching via the postenor regions, Severini (1993) discussed
how to choose posterior intervals for a given prior, such that the postenor intemals
have the frequentist coverage validity to the second order. In multiparameter case,
Ghosh & Mukerjee (1995a) discussed how to choose perturbed eiiipsoidal and HPD
regions for a given prior, such that the posterior regions have frequentist validity to
the second order.
The matchings of Ghos h & Mukerjee (l99l, 1993b), via the posterior regions we
cal1 here, were originally considered by using the pos terior and frequentist Bartlet t
corrections. For discussion on Bartlett corrections, please refer to Cox Sc Reid ( 1987),
Bickel & Ghosh (1990), McCullagh & Tibshirani (1990), DiCiccio k Stern (1994)
and Ghosh & hlukerjee (1992b, 1994).
2.5 Invariance Properties of Matching Priors
For convenience, we assume t hat 0 = (Bi, . - - , O J T is the current version of the
parameter with pnor K ( B ) , and that A = (Xi, - --, x,)~ is the original version of the
parameter, where
pnor with respect
the transformation from X to 0 is one-to-one. Invariance of the
to the parameterization means that
CHAPTER 2. MATCHING PRIORS 27
has the same property as does a(0) in the B parameterization. This invariance
property is desirable for priors on logical grounds. Also, if the prier is invariant, then
inference is more convenient. Therefore, we should investigate this property on priors
derived from any procedure. Matching pnors have played more and more important
role in serving as noninformative priors in Bayesian inference. Mukerjee & Ghosh
(1996) showed that matching pnors via the posterior quantiles and the distribution
function do have the parameterization invariance property (see also Datta & Ghosh,
1996). But for other matching priors, there seems no literature discussing this issue
so fax. In this section we shall investigate the invariance properties of different
matching pnors currently available.
We note that most of the matching procedures can be expressed in terms of
a scdar valued function of interest, denoted by + ( O ; x ) . The defining criterion is
to let the posterior region d e b e d by $ ( O ; X ) have some posterior probability level,
and then match it to have the same frequentist probability level to some order. For
example, rnatching via the posterior quantiles is to let +(d ; x) = B I ; matching via
the LR statistic is just to let +(O; z) = t'(O; x). In the following we develop an
argument which is a generalization of Mukerjee & Ghosh (1996).
Theorem 2.1 Suppose that under the current parameterization, the log-
Iikelihood function is t ' (O; 2) where the parameter 9 has prior ~ ( 0 ) : and $ ( O ; .%)
is a scalar vdued function of interest depending on 0 and X. Under the original
pararneterization, the log-likelihood function is C(X; X) = !(0(X); x), and $'(A; .%)
is the scalar valued function of interest. If d'(A; 2) cc $@(A); x)', i.e. the defin-
ing function of interest is parameterization invariant, then the matching prior is
parameterkat ion invariant.
'ExcePt a factor of no parameter involved.
CHAPTER 2. MATCHING PRIORS 28
PROOF: Let k ( x ) be a scalar valued function depending on x only. Simple
steps (i) and (ii), the result is followed by (iii).
(il. PT( + ( O ; X) < k ( ~ ) 1 x ) = Pr ( $(@(A); Z) < k ( X ) 1 x )
( i ) Pd( $ ( O ; X ) < k ( Z ) ) = Po( + ( B ( X ) ; X) < k ( ~ ) )
(iii). If Pr ($(& x) < k ( ~ ) 1 x ) = P e ( ~ ( 9 ; X) < k ( ~ ) ), then we have
The result of Theorem 2.1 doesn't depend on the order of the matching, but
if we consider matching with some asymptotic order, then this result is true for the
same asymptotic order. In the following various situations, we appiy Theorem 2.1
to see whether the matching priors are parameterization invariant.
(1 ) . Matching via the posterior quantiles. Mukerjee & Ghosh (1996) showed
that it is parameterkation invariant. If we use Theorem 2.1, then simply take the
function $(O; X) = dl, and $'(A; 2) = 9,(X). If we consider a parametric function
h(O), then toke $ ( O ; 2) = h(9). and $'(A; 2) = h(B(A)).
(2 ) . Matching via the distribution function. Mukerjee & Ghosh (1996) obtained
that the considerations of Ghosh & Mukerjee (1993b) and Mukerjee & Ghosh (1996)
are parameterïzation invariant. If we use Theorem 2.1, then it is similar to the case
of (1).
(3). Eqaal tail areas consideration. It is parameterization invariant. This is
because the partial differential equation from the consideration of Equal tail areas
is the second equation of second order matching via the posterior quantiles.
CHAPTER 2- MATCHING PRIORS 29
(4). LR statistic consideration. This consideration is equivalent to taking the
function $(O; X ) = ((0; x), and $'(A; X) = !(@(A); x). So the rnatching ptior via
the posterior regions through the LR statistic is parameterization invariant.
( 5 ) . Score statistic and Wald's statistic considerations. The consideration of
the modified score statistic S' = n-l t ' ( ~ ) ~ j ; l t'(t9) is not pararneterization invariant
( s e Example 2.3). Neither is the consideration of the Wald's statistic because it
leads to the same result as the consideration of the modified score statistic (Rao &
Mukerjee, 1995). But if one use the original score statistic S = n-1t'(8)T~;'t'(0),
since it is parameterization invariant, by Theorem 2.1 the matching prior obtained
by consideration of the original score statistic is parameterization invariant.
( 6 ) . Profile LR statistic consideration. First we introduce a notion, interest-
respecting transformation, originally from Barndofi-Nielsen & Cox (1994). Let be
the current parameter of interest and A l be the original parameter of interest, then
the transformation should be such that BI is a function of Al only, i.e. (A1, X2) being
transformed to (O1 (Xi), &(A1, X2) ) . Applying Theorern 2.1 to this case, +(O; X) =
e(ûl,$(81); x), and d'(A; x ) = tR(X1, i2(&); x). The establishment of
is due to the following
So the consideration of the profile LR statistic is parameterization invariant as long
as the transformation doesn't change the pârarneter of interest essentially.
(7). Conditional LR statistic consideration. There is no direct application of
Theorem 2.1 to this case. In transformation rnodels, DiCiccio Field & Fraser (1990)
CHAPTER 2. MATCEING PRIORS 30
obtained t hat the condit ional likelihood approximates the marginal li kelihood of
the parameter of interest to third order, so the use of the conditional likelihood in
transformation models is equivalent to the use of the marginal likelihood to third
order. But the margind likelihood doesn't have the nuisance pararneter aiready.
The use of the marginal likelihood leads to ô case similar to (3) of the LR statistic
consideration without nuisance parameter. In exponent i d models. the condit ional
likelihood approximates the likelihood of the parameter of interest to third order
(Skovgaard, 1987; Fraser & Reid, 1993), and the likelihood of the parameter of in-
terest doesn't have the nuisance paarneter already. So it is similar to the case of
transformation models. And finally what we have is that, to second order matching,
the conditional LR statistic consideration is parameterization invariant in transfor-
mation models and in exponential models. For general models, the answer is oot
cleax asd so further investigation is required.
(8). HPD consideration without nuisance parameter. Generally speaking, this
consideration is not parameterization invariant. In the one parameter case, this can
be verified directly from the differential equation. In Example 2.1 of the Case Studies
section, we can see how the matching priors change after making a parameter tram-
formation. But if the transformation is a linear transformation, i.e. [ $ I= constant,
then the HPD consideration is pararneterization invariant with respect to this linear
transformation. Let the likelihood function be L(0: X) in the current parmeteriza-
tion, in the original parameterization the likelihood function L'(A; x) = L(B(A); z). So $(0; 3) = L(0; x ) T ( o ) , and $'(A; X ) = L(O(X); Z ) r ( 0 ( h ) ) 1 1. If the transfor-
mation is linear, i.e. 1 I= constant, then +(#(A); X ) oc $-(A; x) (except a factor
of no pararneter involved), and so it is paameterization invariant with respect to
this linear transformation.
CHAPTER 2. MATCEING PRIORS 31
( 9 ) . HPD consideration with nuisance parameter. From case (8), we know
that it is not invariant of the transformation of the parameter of interest (except
linear). Now we consider the following transformation, dl = XI, d2 = @,(A1, A 2 ) , so
1 8 I=I 2 1. In the curent 0 setup, the defining function of interest is 1/49; X) and
Combined with case (8), we obtain that, the consideration of the HPD is param-
eterïzation invariant with respect to the linear transformation of the parameter of
interest and any transformation of the nuisance puameter.
2.6 Discussion
As we have seen that there are a variety of methods having been proposed for
deriving matching priors. Matching priors are not jus t serving as noninformat ive
prïors for Bayesian inference, but also providing a way for constructing accurate
confidence regions. Among dl the methods proposed, it appears that matching via
the posterior quantiles is the most important one. This is because: (1). matching
via the posterior quantiles has a naturd interpretation in Bayesian inference and
frequentist inference; (2). Tibshirani's first order matching priors via the posterior
quantiles, aIthough they have an arbitrary factor, provide a guide for choosing priors
from other considerations (we s h d discuss this in the following), and the second
order matching via the postenor quantiles c m lead to a unique prior in many cases;
(3). it is parametenzation invariant.
Matching via the distribution function is quite different from matching via the
quantiles on the second order although they are equivalent to the first order. It
seems that second order matching v i a the distribution function depends on how the
variable is standardized (see the results in 54.4). Also, in some examples, second
order matching via the distribution function leads to no solution but second order
matching via the posterior quantiles has a unique solution (see the Case Studies
later). So it seems that, as we mentioned in 82.3, the conditions of second order
matching via the distribution function are more difficult to satisfy than those via
the posterior quantiles (see also 54.4). On the other hand, second order matching
priors via the distribution function c m not be used in general to construct accu-
rate confidence intervals or regions comparing to matching priors via the posterior
quantiles and via the posterior regions.
In contrast to matching via the posterior quantiles and matching via the dis-
tribution function, matching via the posterior regions can have many or infinitely
many solutions. The extreme case is that, for any given prior we can construct
posterior intervals or regions by perturbing the first two order terms, such that they
have the frequentist coverage validity to the second order (Severini, 1993; Ghosh E
Mukerjee, 199Sa). So for the purpose of serving as noninformative priors, we should
give some constraint in order to narrow down the matching priors. One natural
constraint is to consider the priors within the class of Tibshirani's first order match-
ing priors. From some examples, we notice that considerations of the profile LR
statistic, the conditional LR statistic and the HPD regions, and sometimes other
considerations, can lead to a unique solution as does the second order matching via
the posterior quantiles. So it seems that this constraint on the priors to be within
the class of Tibshirani's first order matching priors is reasonable and can often lead
to a unique solution. But sometimes if there is no solution for second order matching
CHAPTER 2. MATCHING PRIORS 33
via the posterior quantiles, it seems that it is not quite possible to find a solution for
rnatching via the posterior regions within the class of Tibshirani's first order match-
ing priors. If we do oot restrict attention to this class, then the matching priors
may not seern sensible for the problem. In this case, we should take further investi-
gation, e.g. comparing to reference priors or Jeffreys' prior. But for the purpose of
constructing accurate confidence intervals or regions, t hey are still the solutions of
the matching equations, although the intervals or regions might have some kind of
perturbation. From the above argument, we can see that in any case we should seek
the matching priors via the posterior quantiles first. Another concern in choosing
among the matching priors via the posterior regions, is that we should examine the
invariance properties of the matching priors, since some of the matching priors are
not pararneterization invariant.
If al1 the parameters are of interest, we feel that the consideration of the LR
statistic is the best choice comparing to the rnodified score statistic consideration
and the HPD consideration. The reason is simply that the other considerations
are not parameterization invariant. Aside from its intrinsic property, the use of
m y parameterization invariant procedure haç an advontage, i.e. we can calculate
the necessary quantities under any parameterization, and thus we can choose the
most convenient one for calculation and finding solutions. For example; in the
independent normal means problem, the calculation in the original parameterization
is straightforward. Sometimes, the modified score statistic c m help in narrowing
down the choice arnong the pnors satisfying the consideration of the LR statistic
(Rao & Mukerjee, 1995). By the way, some authors might prefer the use of HPD
regions, since they axe the shortest intervals of given size in the scalar parameter
case, but there is a choice as how to select the parameterkation.
CHAPTER 2. MATCEING PRIORS 34
2.7 Case Studies
Exarnple 2.1 Normal Distribution Mode1
Suppose X foliows a normal distribution with mean p and variance a2. Let p be the
parameter of interest o be the nuisance parameter. Under t his parameterization,
the parameters are orthogonal. After detailed calculation we have, Ill = o - ~ , =
%c-~, = Ki,il = O, Kt12 = ~ c T - ~ , K222 = 1 0 ü 3 , iC2,22 = -6cr3 , 1i221 = 0.
Jeffreys' prior under his general rule is O-*, but he prefers the other one a-' which is
also the right Haar measure and the reference prior. Tibshirani's first order matching
priors are ~ ( o ) cc g ( o ) ü l . Within the class of Tibshirani's priors, there is only one,
~ ( a ) o: O-', i.e. taking the function g as a constant, which is also the matching prior
for the considerations of the second order matching via the quantiles, the profile LR
statistic, the conditional LR statistic and the HPD regions with nuisance parameter
case. If two parameters p and a are both of interest, then for the LR statistic
consideration, priors a(o) a O-', a-' satisfy the matching condition; while for the
HPD regions consideration, priors ~ ( a ) a a-', 0' satisfy the matching condition.
Now we consider another parameterization. Let BI = p be the parameter
of interest, e2 = a2 be the nuisance parameter. Under this parameterization, the
parameters are also orthogonal. The transformation of the parameters is just on
the nuisance parameter. In this setup, we have Ill = O;', 122 = Klll =
= O, KlI2 = KZZ2 = 20;~, = K221 = O. Tibshirani's first
order matching priors are n(&, 02) oc 9(82)~;"2. Within the class of Tibshirani's
gives ~ ( p , 4) oc O-', which is also the rnatching prior for the considerations of the
second order matching via the quantiles, the profile LR statistic, the conditional
CHAPTER 2. MATCEING PRIORS 35
LR statistic and the HPD regions with nuisance parameter. If both parameters O1
and B2 are the parameters of interest, then under the LR statistic consideratioo, the
priors ~ ( 0 ~ ~ 82) cc O;', 0q3 satisfy the matching condition. Transforming back to the
( p , a) pararneterization, the above priors become ~ ( p , O) OC O-', a-5 respectively.
This indicates that matching piors by using the LR statistic is parameterization
invariant in this model. Consideration of the HPD regions without nuisance pararn-
eter leads to a different result. It turns out that, priors a(& $2) O( 6;' $2 satisfy the
matching condition via the HPD regions. Transforming back to the onginal (pl 0)
pararneterization, give n(p, a) oc a-', a'. These are different from those under the
same consideration but in the ( p , a ) setup, which give priors O-' and as. This
indicates that HPD regions consideration is not invariant to the parameterization.
Now we consider a simple rnodel X - N ( 0 , 02) with only one parameter. If we
use a as parameter, then LR statistic consideration has solutions ü1 and ü3; HPD
consideration has solutions O-' and 03. But if we use dl = a2 as parameter, then
LR statistic has solutions d;' and trançforming back to 0 parameterization,
which give a-' and ü3 respectively; while the HPD consideration has solut ions 9;'
and O:, transforming back to the original o parameterization, which give a-' and
o5 respectively.
Example 2.2 Ratio of Independent N o n a l Means
Suppose that X and Y are independent normally distributed with means a and /?
and variances one respectively. The parameter of interest is the ratio of the means
O1 = pla. Now we make a transformation from (a, p) to (O1, 02) with 0, = ,/m. Under this parameterization, the parameters O1 and O2 âle orthogonal. After detailed
calcdations, we have Ili = 8:(1 + 12* = 1, Klll = 60&(1 + 1<1,11 =
-26&(1 + Ki12 = -&(l + K222 = K2,22 = K221 = O. Tibshirani's
CHAPTER 2. MA TCHING PRIORS 36
fint order matching priors are r(O17 OZ) OC g(02)02(1 + O:)- ' ; transforming back to
(aJ) parameterizationgive r(a ,p) cc g(Jw)). Within the class of Tibshirani's
priors, there is only one prior s(O1, Oz) a B2(1 + O:) - ' , which is also the matching
prior for the considerations of the second order matching via the quantiles, the LR
statistic, the profile LR statistic and the conditional LR statistic. Pnor x(B1, 82) a
02(1 + O:)-', transfoming back to the original parameterization, gives a(&, P ) a
cas tan t , i.e. flat prior in the original parameterization, also Jeffreys' prior. There is
no solution within the class of Tibshirani's priors for the considerations of the H P D
regions with nuisance parameter and no nuisance parameter cases? the modified
score statistic (Rao & Mukerjee, 1995) and the second order distribution function
(Mukerjee & Ghosh, 1996). But the considerations of the HPD regions with nuisance
parameter and no nuisance parameter cases do have solutions, r(el, 02) a 02(1 + 0:)
satisfies bot h considerations.
Example 2.3 Non-invariance of the consideration of Modified Score Statistic
Suppose that X - N(cr,l), Y - N(/3,1) and they are independent. Both pa-
rameters cr and /? are of interest. Under (a, B ) paxarneterization, global paramet-
ric orthogonality holds, Ill = 1, 122 = 1, and all the other quantities involved
in our calculâtions are zero. It is easy to obtain that, for the consideration of
the rnodified score statistic (Rao & Mukerjee, 1995), the general solutions are
.(a,@) a (Ala + B4(A2p + B2), where Ai, Bi are constants. So the flat prior
in this (a, ,û) setup is a solution. But if we make a transformation from (a, ,@) to
(01, 02), where O1 = pja, O2 = I/-, then it becomes the case as in Example
2.2. The flat prior in the original (a, @) setup becomes n(B1, &) oc 02(1 + O:)-' in
the current (el, 82) setup. Frorn Exarnple 2.2 we know that, this prior is not the
solution for the consideration of the modified score statistic. So the consideration
CHA PTER 2. MATCHING PRIORS 37
of the modified score statistic is not parameterization invariant.
Example 2.4 Ratio of Independent Exponential Means
Suppose that X and Y foilow exponential distributions with rneans pl and p2 re-
spectively. The parameter of interest is the ratio of the means 01 = p 2 / p l . We make
a transformation from ( p l , p 2 ) to (Bi , Oz), where B2 = plp2. Under this parame-
terization, global parametric orthogooality holds. I l l = $ 0 ~ ~ ~ 122 = f d ~ ~ ? K l l l =
i ~ ; ~ ~ K ~ , ~ ~ = -W3 , Kl12 = $B;28;1? KZî2 = W 3 , 1<2,22 = 4 2 7 1 6 ~ ~ = O.
Tibshirani's first order matching pnors are K ( & , 82) oc g ( ~ z ) B ; l ; t ransforming back
to the onginal (pi , p2) parameterization give ?r(pl , p l ) oc g ( p 1 p 2 ) . Within the class
of Tibshirani's priors, only n(Bi, 02) cc 8;'8;', or a ( p 1 7 p z ) oc ( (11p2) -1 in the origi-
nal parameterization, is dso the matching prior for the considerations of the second
order matching via the quantiles, the LR statistic, the profile LR statistic, the con-
ditional LR statistic, the modified score statistic and the HPD regions with nuisance
parameter. The pnor r ( p l , p 2 ) a ( p 1 p 2 ) - l is also Jeffreys? prior ând the reference
pnor of Bernardo (1979).
Example 2.5 Exponential Regression Mode1 of Cox & Reid (1987)
Suppose that x, - , Y, are independent and k;. follows exponential distribution
with mean O2 exp(-Olai), where ai is constant and CL, ai = O . The parameter
BI can be any r ed value, the parameter O2 must be positive. The parameter of
interest is 91. Under this parametrization, global pararnetric orthogonality holds.
Ill = a:+- .-+a$ 122 = + B ; ~ , Kl l l = -1(lvll = a:+-- -+a:, Kl12 = (a:+- -+a:)9;'.
Tibshirani's first order matching priors are r(B1, &) oc 9(B2). Within the class of
Tibshirani's priors, x(Bl7 0 2 ) OC Oc1 is the only one prior which is also the matching
prior for the considerations of the second order matching via the quantiles, the
conditional LR statistic and the HPD regions with nuisance parameter case.
Chapter 3
Matching Priors in the Product of
Normal Means Problem
We consider matching priors in the product of normal means problem (Berger
& Bernardo, 1989) and make cornparisons among the Bat prior and two reference
priors derived in Berger & Bernardo (1989). We suggest a class of priors for this
problem and obtain a noninformative prior based on matching properties.
3.1 Introduction and Notation
Suppose we have a vector of i i d . observations 2 = ( ( X I , K ) , -. , ( X n 7 K))T
from Z = (X, Y) where X, Y are independent and follow normal distributions
with mems a, B respectively and variance 1. We also assume a > O, ,O > 0. The
parameter of interest is the product of the means, di = a@. The classicd approach
encounters difficulties in this problern, as does the standard noninformative pnor
approach (Efron, 1986). Berger & Bernardo (1989) used the reference prior approach
to consider this problem and develop two reference priors. We now consider the
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 39
choice of the noninformative priors via the cornparison of the frequentist coverage
probabilities of posterior probability intervals.
The three noninformative pnors we consider a e the flat prior rU((r,P) OC 1
and the two reference priors of Berger & Bernôrdo (1989):
Now we make a transformation of the parameters from (a, p ) to ( O 1 , 0 2 ) , where
B2 = a* - 0'. It is easy to obtain that
The Jacobian of the transformation from (a,/?) to is
The per observation information matrix in the new parameterization is
so the parameter el is orthogonal to O*. The three noninformative priors become
n, (dl, 02) x (40: + o ; ) - ~ / ~ ,d
In this chapter we shall use the same notation as in Chapter 2. After detailed
calculat ion, we have
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 40
3.2 First Order and Second Order Matching Pri-
ors
First we mention that we can obtain the matching priors by using the partial
differentid equations (2.2) and (2.4), but later we have to calculate the coverage
probabilities, so we need to introduce the following result.
Suppose that we have the posterior y-quantile to order 0(ne3l2) under the
prior x , Le.
The frequentist coverage probabilities of the one-sided posterior probability intervals
then can be obtained by Theorem 1 of Mukerjee & Dey (1993) (see also Mukerjee
& Ghosh, 1996)
where @(z,) = y , O < - y < 1, and
This result is true in general as long as the parameters ( B I , 8 4 are orthogonal.
If we consider the first order matching via the posterior quantiles, then the prior
R should satisfy Tl (n; el, Oz) = O. Solving the equation Tl (a; el, 02) = O, we obtain
the first order rnatching priors, r(@&) 0: g(82)~:1/2 = g(e2)(40: + 0 2 ) - ~ / ~ , where
g(02) is any smooth positive function (see Tibshiraai, 1989). Transforrning back
to the original (a, P ) pararneterization, which gives n(a, P ) oc g(a2 - p 2 ) J m .
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 41
When g(B2) = 1: which gives a,, i.e. the reference prior n, is the f i s t order matching
pnor via the quantites. The other two priors n, and nr do not satisfy the first order
matching condition. So on this ground A, is a better choice than K. or T,.
It is worth mentioning that if the parameter of interest is Oz = rr2 - ,B2, then
the first order matching priors are ?r(B1, 02 ) a g(81)(49: + where g(Bl ) is any
smooth positive function. Tranforming back to the original (a, ,8) parameterization,
gives a(a, /3) cc g(~/3)Jm. Taking g(&) as 1 and &, t hen gives the priors K,
and T,. So in this case the piors rS and n, are the first order matching priors, but
the pnor nu is not the first order matching prior.
We now consider second order rnatching based on quantiles. The necessary and
sufficient conditions are both Tl@; BI, B2) = O and T 2 ( n ; 4 , 82) = 0- From equation
T&r;B&) = O we obtain solutions ~ ( 0 ~ ~ 0 ~ ) cc g(02)(40: + 0 2 ) - ~ ' ~ . But when we
use a(&, 82) rn g(Oz)(48: + B2)-ll4 in the second equation, they do not satisfy the
equation for any kind of the function g(02). So in this case, there is no matching
prior for second order mat ching via the posterior quant iles.
But if we consider the two-sided Equal Tai1 areas posterior probability intervals
having the same frequentist coverage probabilities to second order, i .e. to ensure
then the only condition is T2(sr; 01, B 2 ) = O. The flat prior ru satisfies t his condition
but the other two reference piors T, and a, do not. So in the sense of accuracy of
the frequentist coverage probabilities of two-sided posterior intervals, the flat prior
a. is better thân the two reference priors.
Within the class of first order matching priors via the posterior quantiles, there
is no solution for matching via the posterior intervals based on the conditiond LR
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 42
statistic (Cox & Reid, 1987; Ghosh & Mukerjee, 1992) and the highest posterior
density (HPD) (Ghosh & Mukerjee, 1995). In the above we have obtained that,
the Bat prior nu is the solution for the matching via the two-sided Equal Tai2 arens
intervals. But this prior ru is not the solution for the matchings via the postenor
intervals based on the conditional LR statistic and HPD.
3.3 Cornparison of nu and n,
From the previous discussion, n, doesn't have any good matching properties
if Oi is the parameter of interest, so we do not consider it further. The posterior
quantiles using the prier a, have frequentist coverage to order O ( n - L ) while the
posterior quantiles using K, do not. So the posterior dishibution using the prior
r, is better in describing the location of the puameter dl. But on the other side,
the two-sided posterior intervals via using the prior ir, have frequentist coverage
accurate to order O ( T ~ - ~ / ~ ) while those via using the prior T, just have frequentist
coverage accurate to order O(&). So the choice between a, and a, is still hard to
make. We s h d l go further to see their performance in different situations.
First let us look at the following coverage probabilities
and
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 43
After detailed calculation, we have
The criterion of the cornparison is that, the srnaller the absolute values of the func-
tions Tl and T2, the better the priors. When a and /3 are very close, say a = 13,
,P, If a increases, both tend to zero, then Tl(~u;Ol,&) = &-, T2(rS;&,&) = --
but T2(n,; 01, û2) tends to
slowly t han T2 (T,; 01, 02) -
p > a), then for large r l ,
zero faster. If a goes to zero, Tl ( ru ; dl, B2) increases more
When a and 9 are unequd, for example /?/a = 7 (suppose
L T1(*;W2) & = 2, Tz(GO1,02) = & = SO
actually in this case, T1(ru; 01, 62) and T2(7rs; el, Oz) tend to zero at the same rate of
r)-* as r ) increases. But there is one interesting case, if we fix P and let cr go to zero,
then T2(7rS; Bi, 02) is approximately &, but Tl(au; &, 02) still tends to zero. So we
c m Say that, for more accurate coverage of the posterior quantiles, the prior K , is
good for medium to large values of (a ,P) ; for small values of (u,/3), nu is better
than K, although both perform poorly.
In order to have a direct comprehension of the coverage probabilities in both
cases, it is bet ter to have some exact values. Now we define
1 p?(Tu) = -4(~-y)Tl(ru;81i&2), P~(Ts) = 3d(Z7)~2(xs;elre2) fi n
(3.8)
which are the error terms of the coverage probabilities to order ~ ( n - ~ ' ~ ) . We cal1
tbem the shifted coverage probabilities of the posterior quantiles arising from using
the priors ru and x, respectively. Using +y = 0.05, 20.05 = -1.645, and so P0.9S(~Y) =
Po.os (?ru), Po.ss (T.) = - Po.os (r,) In cases of sample size n = 1,5,10, we summarize
in Table 3.1 using different values for (a, B) .
To error of order 0(n-3/2), the posterior quantiles using the prior T, are always
too srnall for a.lI (a, p), being to the left of Bi more often than what we would desire.
CHAPTER 3- PRODUCT OF NORMAL MEANS PROBLEM 44
Table 3.1: Shi'ed Coverage Probabilities to O ( T ~ - ~ / * ) of 0.05 Posterior Quantiles.
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 45
Or we c m Say, the posterior dishibution arising from using rr, is shifted to the Mt.
The shifted rate has order O(n-'la), but the direction is always to the left. The
(0.05) 8(0.95) error is cancelled out for the two-sided posterior interval (O, , , ), and thus
the interval has coverage accuracy of order O ( ~ Z - ~ / * ) .
The posterior quantiles arising from using the prior R, also have a shift, but
the rate is in the order of O(n- l ) , and the directions depend on (a, P ) . If a and
p are sirnilar or the ratio 7 of the larger one to the smaller one is less than fi + 6, the quantiles less than B1 happen a bit often in the left side. But in the
right side, the quantiles less than O1 happen a bit not enough. We can Say, the
posterior distribution arising from using the prior n, is shifted to both sides in the
O(n-') t e m . This kind of the shift makes up double error rate of O(n- ') from
(0.05) 8(0.95) the calculations of the quantiles for the two-sided posterior interval (O1 , ),
whose coverage probability is larger than 90% in an error of O(n-'). In other cases of
(a, p), the posterior distribution is shifted to the center, and the result ing posterior
(0.05) 0(0.95) interval (O, , , ) has srnaller frequentist coverage than 90% in an rate of order
O(n-l ).
When O1 is close to the boundary Oi = O, both perform poorly, but K, is better
than rr,. From Table 3.1 we see that, when cr and ,6 are close to 1, coverage is very
poor. In this case, if the difference between a, f l is large, then we can still use nu,
but K. performs poorly.
F i n d y we would like to mention that, the results presented in Berger &
Bernardo (1989) are consistent with Our previous discussion and our Table 3.1.
3.4 A Class of Priors
From the cornparison of x, and rS, we know that the prior a, is a good p i o r
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 46
for this problem. but it is not good when the parameter goes t o the b o u n d ~ y and
not too good for constructing two-sided confidence intervds. We suggest a c h s of
priors which may have some of the good properties from the priors T. and r,:
where p is constant between O and 1. We shdl investigate this class of piors further.
Writing out the exact form of the prior R,, which is r,(a,B) K p + (1 -
p ) Jw-, in (81,&) parometerization,
From (3.1), the frequentist coverage probabilities a.rising from using the prior R, are
where
are the shifted coverage probabilities to order 0(nm3/*) of the posterior quantiles
using the prior K,. The shifted coverage ~robabilities of the two-sided posterior
intervals (OF', O{'-')) are two tirnes the second part of P,(lr,), which have order
O(n-'). Please note that
After detailed cdculation, we obtain that
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 47
So the shifted coverage probabilities of the posterior quantiles arising from using the
prior rP to order 0 ( n - 3 / 2 ) are
where P,(a,) has the order of ~ ( n - I I 2 ) . P7(rS) ha. the order of O(n- ') .
Now suppose we
and then no.&q 13) a:
py(r0.5)
have no preference for the prior K. or q, so we choose p = 0.3,
From the expression we see, for medium to large values of (a, B). the error terrn
of the coverage probabilities P7 ( x O - 5 ) just takes a little proportion of the first order
error P,(R,), the rest of that is from the second order error PY(rs). SO the resulting
P,(T~.~) is comparable to the second order error P,(T.) in small sample case. and
thus has a great improvement as to the first order error P,(n,). For small values
of (a, B ) , prior ;rs perforrns worse than the pnor ru. In the expression of the error
P,(T~.~), we see that most of the contribution
P,(r,). So the resulting P , ( T ~ . ~ ) is just what
posterior intervals (@), 0i i7) ) via using the
is from P,(ir,), just a small part from
we expect to be. Also, the two-sided
pnor m . 5 have coverage probabilities
1 - 27 with error just Q*+P proportion of that via using the prior n,, so it is 1+ Jcr"+PZ
always better than that via using the prior R,. Therefore we c m Say, the prior roS5
indeed has some good properties from pnors K, and R,, and has not too much bad
properties from pnors ir, and n,.
Using 7 = 0.05, z0.05 = -1.645, as an example we provide Table 3 . 2 It is
good for us to have a direct comprehension of the improvements of the coverage
~robabili t ies via using the prior ~ 0 . 5 when comparing to Table 3.1. The entries
CHAPTER 3. PRODUCT OF NORMAL M E A M PROBLEM 48
in colurnn 4 and column 7 are the shifted coverage probabilities of the posterior
quantiles via using the prior 7ros. The entries in column 3 and column 6 are the half
shifted coverage probabilities of the two-sided posterior intervals.
Table 3.2: Shifted Coverage Probabilities to ~ ( n - ~ / * ) of the Posterior Quantiles via
Using the P ~ O T ~0.5.
The pnor ~0.5 is a prior between and r,, one question might be raised that,
what about the performances of other priors? Now we consider the family of Berger
& Bernardo (1989):
We s h d try to look for other priors in this class which have good performance. In
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 49
The shifted coverage probabilities of the posterior quantiles to order 0 ( n - 3 / 2 ) arising
from using the prior rij is
It is quite easy to obtain
Our objective is to choose i and j such that, the absolute value of Tl (~i j ;Olr 02)
as small as possible. From the expression of Tl (ri'; Bi, 82) , we see that the factor
(a2 - ,B2)2(~t/3)-2 is sensitive to the values of (a, /3), and when there is a difference
between a and ,8, it becomes large. So for this reason, it is better to remove this
factor. We take j = O, then
The best choice is i = 112, which gives the prior ?r,. Now we choose i which
close to 112, then the absolute value of Tl (rio; B I , 8,) becomes smailer comparing to
Tl (ru; 01, 02). Next let us look at the second error term,
when i is close to 112, T2(riO; dl, 0,) is then close to T2(riO; el, O,). So actually it is
not better than directly choosing the prior R,.
CHAPTER 3. PRODUCT OF NORMAL MEANS PROBLEM 50
From the above discussion, we conclude that, the prior ~ 0 . 5 a 1 + Jc;TfBl
is suitable for this problem based on the matching properties. In regard to other
choice of p in the class of p io r s x,, if we have a loss function, we then can choose
an optimum one. Otherwise, we tâke p = 0.5 and we can still say that, the prior
~ 0 . 5 is an noninformative prior.
Chapter 4
The Shrinkage Argument and
Matching Priors
In t his chapter we discuss the shrinkage argument which has been used quite ex-
tensively in recent Iiterature for deriving the frequentist properties of some Bayesim
procedures. We demonstrate how we c m use the shrinkage argument to deive the
frequentist distribution of the signed root of the Likelihood ratio (SRLR) statistic,
and to derive matching priors via the posterior quantiles and via the distribution
funct ion.
4.1 The Shrinkage Argument
To evaluate the frequentist coverage probability of a posterior credible set is a
crucial step in deriving different kinds of matching priors. The shrinkage argument
used in several papers (Bickel & Ghosh, 1990; Ghosh & Mukerjee, 1991, 1993a7
1993b, 1995a7 1995b; Mukerjee & Dey, 1993; Sweeting, 1995a, 1995b; etc.) is a very
useful tool for this evaluation. It is also useful for the study of contacts between
CHAPTER 4. THE SHRINKAGE ARGUMENT 52
Bayesian inference and frequentist inference. Dawid (1991) gave as exposition of
this argument eloquently. Now we outline this argument in general based on an
unpublished notes by R. Mukerjee.
Suppose that we have observation 2 with likelihood function L(0; x). The
parameter B might be a vector and has prior density ~ ( 0 ) . Now suppose we want to
calculate the frequentist expectation of the function + ( O , x). We dernonstrate how
ive con get it by using the shrinkage argument. The regularity conditions needed
for the validity of this argument c m be found in Bickel & Ghosh (1990) (see also
Sweeting, 1995a, 1995b).
(1). Er Step. We fist take expectatioo with respect to the postenor distribu-
tion of e given -f:
(2). Es Step. We then take expectation with respect to the frequentist distri-
bution of x:
(3). E, Step. We oow take expectation with respect to the prior n:
CHAPTER 4. THE SHRINKAGE ARGUMENT 53
So overdl the shrinkage argument can be summarized as
Usually what we evaluate is the left side of the above equatioo. In order to get the
frequentist expectation of the function of interest 1C>(B, x), i.e. Ee { + ( O , ,$)}, we let
the pnor T converge weakly to the degenerate measure at the true parameter 8.
In the application of this shrinkage argument, the evaluation of the EB step
with the error of order ~ ( n - ~ / ~ ) is quite straightforward, but if we wont to cdculate
to the next order, then it becomes more cornplicated (see Chapter 5) . As to the
evaluation of the E, step, Ghosh & Mukerjee (1991) developed a simple and effective
technique. We first choose a pnor n which satisfies the regularity conditions of Bickel
& Ghosh (1990), and also that the prior R and its derivatives ( the number of order
depends on the order we want to evaluate the probability) vanish on the boundaries
of a rectangle containing the true parameter as an interior point. Thus we can
evaluate the E, step by integration by parts. Finally we let the prior n converge
weakly to the degenerate measure at the true parameter. This process t hen leads to
the desired result, the frequentist expectation of the function of interest + ( O , 2). In
54.2, 54.3 and 54.4 we give some examples to dernonstrate how to use this shrinkage
argument in some specific situations.
It is not difficult to see that the Ee step in the shrinkage argument, which
refers to taking expectation with respect to the frequentist distribution of X, can be
replaced by E(s,a), the frequentist conditiond distribution of MLE given an ancillary
stizttistic a, and then we c m have the frequentist conditional expectation given a of
the function of interest $ ( O , x). In Chapter 5, we use this shrinkage argument to
obtain the asymptotic expansion of the frequentist conditional distribution of MLE
CEA PTER 4. THE SHRINKAGE ARGUMENT 54
given an ancillary statistic a.
4.2 Frequentist Distribution of the SRLR Statis-
tic by Using the Shrinkage Argument
4.2.1 Introduction and Notation
Suppose that we have i i d . observations { X i , i 2 1). Let 2 = (XI, - - - , x,)* and the log-likelihood function be ! (O; X ) , where 6 is a scalar parameter and has
pnor ~ ( 0 ) . The maximum likelihood estimate of O based on x is ê. In this section
we shrtll use the shrinkage argument to derive the frequentist distribution to order
~ ( n - ~ / ~ ) of the signed root of the Likelihood ratio (SRLR) statistic r defined as
We assume the regularity conditions as in Bickel& Ghosh (1990) with rn = 2.
A11 formal expansions for the posterior as used here, are valid for sample points in a
set S which can be defined dong the lines of Bickelk Ghosh (1990) with Pd prob-
ability 1 - ~ ( n - ~ / ~ ) uniformly over compact sets of 6 . Let j @ ) = nj, ( 6 ) = - !"( j )
be the observed Fisher information evaluated at 8, I ( 0 ) = n l l ( 0 ) = Ee[-!"(6)] be
the expected Fisher information. Sometimes we use and Il in short for jl@) and
Il (6). We also let
and
CHAPTER 4. TEE SHRINKAGE ARGUMENT 55
1î = @), n' = ~ ' ( ê ) , 4" = ~ ' ( ê ) .
In the next two sections: 94.3 and 54.4, we shall use the same notation as defined
above.
4.2.2 The Ekequentist Distribution of the SRLR Statistic
In order to get the frequentist distribution of r, we use the shrinkage argument
as described in 54.1. First we need the posterior distribution of r.
The expansion of the posterior distribution of r in Chapter 5 is up to the order
of 0 ( n - 2 ) , so we here just use the result in Chapter 5 to the order of O ( ~ Z - ~ / ~ ) .
We shall give the complete expansion to order 0 ( n - 2 ) in Chapter 5 to maintain the
integrity of the expansion. Let I denote the indicator function, and ET, Es, E, as
in 54.1. From (5.15), the posterior distribution to order 0 ( 7 2 - ~ / ~ ) of r given x under
the prior K is
where
Please note that b3, P4 are denoted i3$, b4$ in Chapter 5, but the current notation
is more convenient for the present purposes.
Now we need to evaluate the expectation with respect to the fiequentist dis-
CHAPTER 4. THE SHRINKAGE ARGUMENT 56
tribution of X,
Following the steps of Ghosh & Mukerjee (1991, 1992b, etc.) and noting the regu-
larity conditions of Bickel k Ghosh (1990), we obtain
where
Hence we have
The last step is taking expectation with respect to the pnor a:
Now we suppose that the tme parameter is 6 . We then choose a prior T such that
r and its first derivative vanish on the boundaries of an open interval contoining 6
CHAPTER 4. THE SHRINKA GE ARGUMENT a7
as an interior point. Calcdating the E, step by integration by paxts, we have
Finally we let R converge weAy to the degenerate measure at O (see 54.1 for the
general shrinkage argument). This leads to the following result.
Theorem 4.1 The frequentist distribution of the SRLR statistic r defined in
(4.2) has the following asymptot ic expansion
where Tl ( O ) , T@) are dehed in (4.7) and (4.8). 0
Remark 1. In general the result of Theorem 4.1 is different from that of
Theorem 5.2 to order ~ ( n - ~ / ~ ) . This is because Theorem 5.2 is about the frequentist
conditional distribution. But in one parameter exponentid models, they are the
sarne to order O ( ~ Z - ~ / ' ) , and a p e with the Lugannani & Rice approximation.
4.3 Matching Priors via the Posterior Quantiles
In this section we demonstrate how to use the shrinkage argument to derive
matching priors. We consider the situation of Welch & Peers (1963), look for priors
ensuring the frequentist coverage probabilities of the one-sided posterior in te rds to
CHAPTER 4. THE SHRINKAGE ARGUMENT 58
some asymptotic order, i.e. the priors should have the following properties
Pr( 6 < W(T, X ) 1 X) = a + 0(92-~/'), Pd( 6 < P ) ( T , K ) ) = O + 0 ( d 2 ) :
where i = 2 or 3.
4.3.1 Calculating the Posterior Quantiles of 0
To cdculate the posterior quantiles of O, we use a posterior standardized ver-
sion p defined as
p = fiji(6)1/2(~ - 6). (4.10)
We shall use the posterior distribution of r in (4.4) to calculate the posterior distri-
bution of p. From (5.7): r and p have the following relation
Now we can calculate the postenor distribution of p: for given po which is free of O ,
using expasion (4.4) and equation (4.1 l),
where
Now expanding Q(ro) and #(ro) in r~ around p~ and rearranging the terms, we then
have the posterior distribution of p given 2:
CHA PTER 4. THE SHRINKAGE ARGUMENT 59
where
Esing Lemma 2.1 with the posterior distribution of p. from (4.12) ive have
so the a-quantile of the posterior
Pa = ;a +
distribution of p to order 0 ( n - 3 / 2 ) is
where
If we change to the a-quantile of the posterior distribution of 8 . we get
and
Please note that p, depends on x and x.
CHAPTER 4. THE SHRINKAGE ARGUMENT 60
4.3.2 Frequentist Coverage Probabilities of Posterior Inter-
Now we use the shrinkage argument to evaluate the frequentist probabilities
Po( p < pa ) - Under a prior RO which satisfies the regularity conditions as in 84.2,
from the asymptotic expansion (4.12) of the posterior distribution of p, we have
where
- Now we expand $ ( p , ) and Q 5 ( p a ; a; X ) in pa "round z, to get
where
CHAPTER 4. THE SHRINKAGE ARGUMENT 61
Next we need to calculate the expectation of Pm( p < % [ x ) with respect
to the distribution of X. As in 54.2 we can obtain
where
The final step is to calculate the expectation with respect to the prior ~ 0 . Here
we choose prior ro such that 7ro and its first derivative vanish on the boundaries of
an open interval containing the true 8 as an interior point, and allow the prior 1r0 to
converge weakly to the degenerate measure a t O. From this process we can obtain
the frequentist coverage probabilities of the one-sided posterior intervals:
where
CHAPTER 1- THE SHRINKAGE ARGUMENT 62
4.3.3 Mat ching via the Posterior Quantiles
To consider the first order matching via the posterior quantiles, Le. ensuring
the same frequentist coverage probabilities to an error term O(&) of the one-sided
posterior intervals, the prior n(0) should satisfy the equation Tl(n; 8 ) = O. Solving
the equation Tl (a; B ) = O for n, we get a solut ion ~ ( 9 ) 0: Il
As to the second order matching via the quantiles, the p i o r a(8) should satisfy
Tl ( r ; 8 ) = O and Tz(n;B) = O. From Tl(a;8) = 0: weget x(0) o: 1 1 ( 8 ) ~ ' ~ . Now we use
n(t9) cc I ~ ( O ) ' / ~ in the second equation T'(a; 6). We find that to ensure T2(a; O ) = 0,
the log-likelihood function should satisfy some conditions. Hence, second order
rnatching via the quantiles depends on the model, not just the prior itself.
In order to express the conditions required for the second order matching we
define a function of 8, the skewness of the score function, as
It is not difficult to see that we can have the following Bartlett (1953) relations
From the above equations solving for E ~ [ ~ ? ( O ) ] to get
we then have the relation between P3 and p3
and so
CHAPTER 4. THE SHRINKAGE ARGUMENT 63
Now if we use (4.21) in the equation T2(7r; d ) = O when using the prior ~ ( 8 ) cc
I ~ ( B ) ' / ~ obtained Erom the first order matching, we get = O, Le. the skewness of
the score function doesn't depend on the parameter B. This is the result of Welch &
Peers (1963). Mukerjee & Dey (1993) also obtained this result by using the shrinkage
argument. Our derivat ion here gives more details.
4.4 Matching Priors via the Distribution Func-
tion
In this section we consider matching via the distribution function in two cases.
First using the SRLR statistic r defined by (4.2), we obtain that second order
matching via the distribution function is equivalent to second order matching via the
posterior quantiles. We then using the statistic p defined by (4. IO), and obtain that
second order matching via the distribution function leads to second order matching
via the posterior quantiles. Al1 the notation is the sarne as in 54.2.
From (4.9), the frequentist distribution of the SRLR statistic r is
where
In order to obtain matching prion we average the posterior distribution of r given
2 over the distribution of 2. The posterior distribution of r is given in (4 .4 ) . From
CHAPTER 4. TEE SHRINKAGE ARGUMENT 64
54.2, we have
where
3p4 + 5 p i a'' 1 n' p3 G(0) = -t -- + -- 34 a 211 R 2 p 2 '
To consider the first order matching via the averaging posterior distribution
and the hequentist distribution of r , the pnor n should satisfy GT(0) = T @ ) .
Solving the equation we get x ( 6 ) cc 1 ~ ( 6 ) ' / ~ . To consider the second order matching,
the priors should satisfy G:(O) = T @ ) and Gl(B) = T2(8). From the first equation
we have got n(0) oc I ~ ( O ) ' / * . Now using n(0) cc 1 ~ ( 9 ) ' / ~ in the second equation, and
as we did in 54.3 expressing p3 in terms of 8, we find that in order to satisfy the
second equation, we again need = O. In other words, second order matching needs
that the pnor r ( 0 ) a Il(B)1/2 and the skewness of the score function is independent
of the parameter O . It turns out that second order matching via the distribution
function of r is equivalent to second order matching via the posterior quantiles of 8.
If we use p = ~ n j ~ ( 8 ) ' / ~ ( 8 - O ) , the posterior standardized version, instead
of r to consider the matching, we s h d get a different resdt. Please see Ghosh L
Mukerjee (1993b) and Mukerjee & Ghosh (1996) in this context.
From (4.12) of 54.3 we have obtained the postenor distribution of p. Following
the steps in the shnnkage argument we c m obtain
CHAPTER 4. THE SHRINKAGE ARGUMENT 65
where I ,
And also, we can have the frequentist distribution of p
where
are equivalent to some asymptotic order. To consider the first order matching,
the prior n should satisfy GJ(po; O ) = T3(pO; O). We solve the equation to have
~ ( 0 ) oc 1: '~(6) . For the second order rnatching, the prior a is a rnatching prior if
and only if GJ (po; O) = T3(p0; O ) and G;(pO; O) = T4(;)0; 0). Solving the equationç ive
get ~ ( 0 ) cc I~ (6)'12, and also the mode1 should satisfy
which irnplies 83 = O, i.e. the skewness of the score function doesn't depend on the
parameter B. So in this case second order matching via the distribution function
leads to second order matching via the posterior quantiles. This is stronger than
that using the statistic r.
Chapter 5
Tai1 Probability of MLE to 0(C2)
by Using the Shrinkage Argument
In this chapter, we use the shrïnkage argument to obtain the asymptotic expan-
sion to 0 ( n - 2 ) of the frequent ist condi tional distribution of the maximum likelihood A
estimate 0 given an ancillary statistic a in the scalar parameter case. The expansion
can be used to compare other approximations of the conditional tail probability of
8. We çhall in the next chapter discuss this topic. We also obtain the posterior
probability of O to 0 ( n - 2 ) and the Bartlett corrections for which the posterior and
the frequentist conditional distributions of the likelihood ratio (LR) statistic are x:
to O(n-*).
Suppose that we have i.i.d. observations 2 = (Xi, . . . , X,.JT from a continuous -
mode1 f (x; 6 ) with scalar parameter B. The log-likelihood function Q B ) = [ ( O ; X) =
log nL, f (X i ; O) and the maximum likelihood estirnate of 0 is 8. Now we assume
CHAPTER 5. TAIL PROBABILlTY OF MLE TO ~ ( n - ~ ) 67
there exists a sufficient statistic, otherwise x itself, then we make a dimension reduc-
tion t hrough t his sufficient statistic to obtain a li kelihood funct ion which depends on
parameter O and the sufficient statistic. After the sufficiency reduction, we assume
that there is an one-to-one transformation to (8, a), where a is an ancillary statistic
for 8. For conditional inference it is more convenient to rewrite the log-likelihood
function as [ ( O ; 6, a), while in some cases, we shall still use [ ( O ; le). Now [ ( O ; Ô, a ) is
a function of both O and ( 8 , a ) depending on the data via (6, a).
We let the parameter i? have a prior density ~ ( 0 ) which is four times con-
tinuously differentiable. Now we assume the regularity conditions similar to those
of Bickel & Ghosh (1990) with m = 3, but the probability Po is replaced by the
conditional probability Pp,) given the ancillary statistic a. Al1 forma1 expansions
for the posterior as used here, are valid for sample points in a set S which can be
defined dong the lines of Bickel L Ghosh (1990) with P(e,=) probability 1 - 0 ( n - 2 )
uniformly over compact sets of 6.
For convenience, we use the following notation (also in Chapter 6):
pi;j = pic ( O ; a ) = Liu l n di(i+i)'2
[,,ln -- Firi = pid ( O ; a ) l k 8 - . ( i+j)/2 ' 31
functions of 8 and a while Fi;, , b, dl, pi;, are functions of only O and a. At the same
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0(n-2) 68
f . I I t I * A III Â = X ( ê ) , = = ($1, = K'l'(ê).
In the following we introduce some basic results which we shall use in the
future calculations.
Writing out the likelihood equation clearly, t hat is &,(8; 8, a) = O identically
in 6. Replacing 8 by O, we then have t?i:o(B; 8, a) = O or = O. Differentiating
this repeatedly we have the following observed balance relations (BarndorfF-Nielsen
& Cox, 1994, Chapter 5):
P2;0+ FI;L = 0,
On the other hand, ] = - differentiating 3 = O repeatedly, ive get
After some manipulations we have the following version of the above equations
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0in-*) 69
5.2 Posterior Probability to 0(n-2)
It is well-known that the postenor density of 0 given (9, a) under the prior
Now we present a result of the approximation of $ e ~ ~ { e ( 0 ) } ~ ( 8 ) d O ,
where
For details please d e r to Laplace's method in BarndorfF-Nielsen & Cox (1989).
Actually this result can be obtained directly from the following expansions, we shaii
mention this later in Appendix 5.1 at the end of this chapter. The p ~ s t e n o r density
then has expansion:
Now we make a transformation
ratio (SRLR) statistic defined as
from 0 to r , the signed root of the likelihood
The Jacobian from 0 to r can be calculated out easily
For convenience we let
CEAPTER 5. TAIL PROBABILITY OF hfLE TO 0W2) 70
then the posterior density can be rewritten as
where d(-) is the standard normal p.d.f..
In order to calculate the posterior probabilitj- of 9 given @:a), we need to
obtain the expansion of u with respect to r . But
expansion of u-' - r-' with respect to r. We begin
actually what we need is the
by expanding t ' ( O ) in O around
Now we introduce another variable p which is asymptotically equivalent to r:
Now we have expansion of r2 wrt p,
Making another expansion, we then have
There are two other formulas that will be helpful later: the expansion of r wrt p
and its inversion, the expansion of p wrt r.
CEAPTER 5. TAIL PROBABILITY OF MLE TO O(n-*) 71
Similarly we expand u-' in 8 around 8 and transform to an expression of p:
Now we can easily obtain
Using equation (5.8), we then have
where
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 72
When we have the above expansion, the integration calculation is straightfor-
ward
By equations (5.5), (5.11) and (5.13), we then obtain the postenor distribution of
8 (also the posterior distribution of r ) . Here we state the result in the following
t heorem.
Theorem 5.1 The posterior distribution of O given (8, a) under the prior a(0)
has the following asyrnp tot ic expansion
where
CHAPTER 5. TAIL PROBABILITY OF MLE TO ~ ( n - ~ ) 73
From the asymptotic expansion of the posterior distribution of r , we can obtain
the following result regarding the Bayesian Bartlett correct ion of the likelihood ratio
(LR) statistic r2 . For other derivations of Bayesian Bartlett correction, please see
Bickel & Ghosh (1990), Ghosh & Mukerjee (1991) and DiCiccio & Stem (1993).
Corollary 5.1 If we define the SRLR statistic r as in (5.3), then the posterior
distribution of r2(1 - 2H'"(ê)) n is X : to 0(n-2) .
PROOF: First we see
where
Now we use expansion (5.15) to get the posterior probability expansion with
respect to rl and then expand @(ri ) and #(rl) in rl about ro to get expansion with
respect to ro.
CEAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 74
and so
i.e. r2(1 - 2 *:(')) follows X: distribution to order 0(nd2). 0
For illustration we give results for the simple example of the location models.
Exarnple 5.1 Suppose that X I , - - -, Xn are i.i.d. observations from the Ioca-
tion models with density f(x - O), where !(-) is a known function and 8 is a scalar
parameter. We use flat prior n.(O) cc 1 for B . Let g ( - ) = log f (-), t heu the maximum
likelihood estirnate is defined by the following equation
A
It is not difficult to verify that a = (ai, --,a,), where ai = ,Yi - O , is an ancillary
statistic for O and has dimension n - 1. NOVJ the log-likelihood function can be
expressed as
and t hen
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 75
5.3 Some Detailed Calculations
In this section we introduce some notation first, then present most of the
detailed calculations involved by using the shrïnkage argument. The final result will
be given in the next section.
we let I(,<,) denote the indicator funct ion, E(e,a) denote taking expectat ion
with respect to the conditional distribution of given an ancillary statistic a . Er
and E, are the same as defined in $4.1. Using the generd shriokâge argument in
54.1, for given ro which is free of 0 and independent of 8:
ET ( ~ ( 6 . d ) {En [ ~ ( r < r o ) 1 8 1 a ] ) ) = Er (EW {rcrcro,)) --
From (5.15) of Section 5.2, we have
Now we are going to calculate the E(e,a) step, Le. taking expectation with
respect to the conditional distribution of 8 given the ancillary statistic a assuming
CHAPTER 5. TAIL PROBABILITY OF MLE TO Oh-*) 76
the parameter is O:
1 r 0 E(fl,a) { E ~ [ ~ ( r < n ) 1 8, a ] } = E(flla) {@(ro) - -4(ro)~:(8) - -~ (To)H; (~) fi n
Note that we have the following results. There are many ways to derive them,
for details please refer to Barndorff-Nielsen & Cox (1994, Chapter 5). Ive just
present them here rvithout proof.
Expnding functions ~ r ( 8 ) ' s in 4 about 6 to the second order in 8 - 8: aod then
using the above results when taking expectations, we have
The functionç Hr(B)'s can be obtained by replacing 8 with O in functions ~ ~ ( ê ) ' s .
As to the O(n-') terms we only need to get the one in E(~,=)H;(~) . We give separate
result s here:
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n d 2 ) 77
Now we can easily obtain the result : given ro which is free of 9 and independent
of 4,
where
The functions Hr(O) , H;(O) and H,"(O), as we stated above, can be obtained by
replacing with O in functions H;(@, and HZ(@, but & ( O ) is not obtained
by replacing 8 with 9 in function ~ t ( 8 ) .
The final step is taking expectation with respect to the prior K(-). Here we
make more assurnptions on the pnor n. We assume that K, its first denvative and
second derivative vanish on the boundaries of an open interval. Now we can calculate
E, by using integration by parts. The cdculations are quite straightforward but easy
to make some mistakes, especially in reorganizing the results when the derivatives
CHA PTER 5. TA IL PROBABILITY OF MLE TO 0 in -2 ) 78
of pi,'s corne in. We shall give the full results in the next section, but present the
details of the cdcdat ions in Appendix 5.2 a t the end of this chapter.
5.4 Tai1 Probability to 0(n-2)
In this section we shall demonstrate how we can get the frequentist conditional
distribution of 8 given the ancillary statistic a . After the ET step using integration
by parts, we have the following results: given ra which is free of 0 and independent
where
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0fn'2) 79
The function G4(8) can not be obtained simply from functions GI(6), Gz(B) and
G3 (8) comparing to H:(@ of (5.16) in the posterior expansion.
In order to state the argument cleady we now change the variable of integratioo
in the above equation from 0 to p. This change of variable doesn't affect the equation
in general. But at the same time the conditions we assurned for 7-0, that it is free of
û and independent of 8: now becornes that it iç free of ,û and independent of 8. Also,
al1 functions of 8, narnely Gi(B)'s, become functions of ,B7 Le. G(?)'s. Now suppose
that the true parameter is 6. We then choose prior T such that K, its first derivat ive
and second derivative vanish on the boundaxies of an open interval containing 8 as
an interior point. After the change of notation we have
Finally we let ir converge weakly to the degenerate measure at O. This leads to the
following result, the frequentist conditional distribution of r.
Theorem 5.2 The frequentist conditional distri but ion of the SRLR statist ic
r defined in (5.3) has the following asymptotic expansion
where ro i s independent of 4, and G@), GZ(0), G3(0), G4(4 are defined in (5.17)-
(5.20) respectively. 0
CHA PTER 5. TAIL PROBABILITY OF MLE Ti3 ~ ( n - ~ ) 80
If our interest is in the maximum likelihood estimate 4, then we have the
following coroilary.
Corollary 5.2 If ive let 40 be the observed value of 8, then the tail probability
has the foilowing asymp totic expansion
where
= s p ( 8 - 8^0)&[t(ê0; 80, a) - !!(O; 60, a)]
and G1 (O), G2 (O), G3 ( O ) , G4 (8) are defined in (5.17)-(5 -20) respect ively. 0
Remark 1. The result of Corollary 5.2 extends the approximations of the tail
probability of MLE in the curent Literature by the magnitude of one more order. In
order to have a rigorous proof of Corollory 5.2: we can follow the steps in Bickel k
Ghosh (1990) by examining the regularity conditions, but here we do not attempt
this.
Remark 2. We have found that the functions Gi ( O ) and G2(B) are invariant
with respect to one-to-one parameterizations. As to G3(8) and G4(9), from the
invariance of the probability and the invariance of ro, it seerns that they might
have the invariance property, but further investigation is required to clarify (see
McCullagh, 1987, Chapter 7, for invariant expansion).
Remark 3. The function Gi(B) is related to the probability at the true
parameter point, & ( 8 ^ > 0 1 a ) = $ + y 27rn + 0(72-~/~) . Lugannani & Rice type
formula (see Chapter 6 ) fails to capture the probability at this point (see also Reid,
1988).
CHAPTER 5. TAIL, PROBABILITY OF MLE TO O(n-*) 81
Remark 4. The function G2(0) is related to the renormalizing constant in
the p* formula. For details, please refer to Chapter 6.
As to the LR statistic r2, we have the Bartlett correction of its frequentist
conditional distribution. We state it in the following corollary. For derivations of
Bartlett correction of its frequentist distribution, rat her t han the frequentist condi-
tional distribution here, please refer to Barndofi-Nielsen & Cox (19S4), Barndorff-
Nielsen St Blaesild (1986), Barndofi-Nielsen k Ha11 (1988) and Ghosh & Mukerjee
Corollary 5.3 If we defme the SRLR statistic r as in (5.3), then the frequentist
conditional distribution of r2(l - 2 y ) given a is X: to 0(n-2).
PROOF: Using Theorem 5.2 and proceeding exactly dong the lines of those
in Corollary 5.1. 0
Example 5.2 (Continued from Example 5.1)
In the folIowing, we give the quantities needed for the expansion of the frequentist
conditional distribution of 4 given the ancillary statistic a.
Comparing to the results of Example 5.1, by expansions (5.15) and (5.21), we obtain
that, in one parameter location models the frequentist conditional distribution of r
CHAPTER 5. TAIL PROBABILITY OF MLE TO ~ ( n - ~ ) 82
agrees with the posterior distribution of r to order 0 ( n - 2 ) when the prior is taken
to be the 0at prior. O
Example 5.3 Suppose that Xi, ---, X, are i . i .d. from the exponential farnily
where B is scalar canonical parameter. The likelihood function
We can see C; t ( X i ) is a sufficient statistic for O. Since there is no ancillary statistic
in this case we rnake a transformation from C; t ( X i ) to 8 , where 8 is the maximum
likelihood estimate of 8 defined by
n
C t (Xi) = n$'(ê). 1
The log-likelihood function then can be expressed as
Now it is easy to obtain that
and al1 other piZj1s are equal to zero. The functions Gi(B)'s now have quite simple
form
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n - 2 ) 83
Appendix 5.1 Derivation of Equation (5.1).
In $5.2 we have mentioned that we can get the result (5.1) of the approximation
of lexp{l(B)}n(B)dB €rom expansions in 55.2. Simply using (5.13) and (5.1 1). we
then have
Appendix 5.2 Some Calcuiations in 55.3.
CHAPTER 5. TAIL PROBABILITY OF MLE TO 0 ( n A 2 ) 84
Chapter 6
The p* Formula and Tai1
Probability of MLE to 0(C2) in
the Frequentist Setup
In this chapter we verify the p' formula and Lugannani & Rice type formula
in the one parameter case by using the conditional distribution of MLE obtâined
in Chapter 5. We obtain a version of the renormalizing constant in the pœ formula
and an expression of the p- formula to order 0 ( n - 2 ) for general models. We also
consider constructing confidence intervals to order O(n-*) and obtain an explicit
form for the endpoints of two-sided confidence intervals.
6.1 Introduction
Suppose that we have i.i.d. observations x = (Xi,. . . , x,.JT from a continuous
mode1 f ( x ; O ) with scalaz parameter O. As in Chapter 5, we assume that the log-
likelihood function can be written as [ ( O ; 4, a), where 8 is the maximum likelihood
CHAPTER 6. THE f FORMULA 86
estimate of 8 and a is an ancillary statistic for B. Barndofi-Nielsen (1980, 1983)
obtained that the p* formula
C p.(ê I '7 a ) = (27)d/2 exp(t(8; ê, a ) - e(ê; Il a ) } 1 j ( ê ) 1
approximates the conditional dençity of 8 given a to order ~ ( n - ~ l ~ ) , where C is a
constant in 8 to 0 ( n - 3 / 2 ) and is equai to 1 to O(n-').
Approximations to the tail probobility of 8 to order O ( T Z - ~ / ~ ) are also availabie.
For details please see Lugannani & Rice (1980) for exponential models, DiCiccio
Field & Fraser (1990) for location models, Barndofi-Nielsen (1991) and Fraser &
Reid (1993) for general models. In general they have the form of Lugannani & Rice
where @ is the standard normal c.d.f., 4 is the standard normal p.d.f., and
If one wants to construct confidence intervals for 6 , then the rœ approximation of
Barndofi-Nielsen (1986) is more convenient ,
where
1 uo ro = ro + - log - .
ro To
In this chapter we shall use a different approach to investigate the p' formula
and Lugannani & K c e type formula in the scalar parameter general models. We
first integrate directly the p' formula and obtain a version of the renormalizing
CHAPTER 6. THE v* FORMULA 81
constant, then verify it being the conditional density to order O(7C3I2) of 4 given an
ancillary statistic by comparing the integrat ion wit h the condit iond distribution of
8 given an ancillary statistic obtained by using the shrinkage argument in Chapter
5. We also verify Lugannani & Rice type formula, obtain the third order error
terms of different kinds of approximations to the conditional tail probability of 6,
and consider extending the p' formula to order O(n-*) in general models. In the
1 s t section, we consider constructing confidence intervals to order 0 ( n m 2 ) .
We assume the regularity conditions as in 55.1 except the Bayesian part. The
notation used here and some basic results are mostly the sarne as in 55.1, but in the
definitions of statistics r and p, we change the signs here in this chapter comparing
to those in Chapter 5 for the convenience of the cdculations in the frequentist setup.
We shall mention t his difference when giving t heir dehitions later.
6.2 Direct Integration of the p* Formula
Now we consider integrating directly the main part of the p' formula, i.e. for
given 40, we are going to calculate
If the p' formula is a density of some variable to order O ( T ~ - ~ / ~ ) , then we can obtain
the constant C by renormalization. Now we make a transformation frorn 8 to r , the
signed root of the likelihood ratio (SRLR) statistic defined as
Please note that we change the sign of r here comparing to that in Chapter 5. Under
regdarity conditions, the transformation from 8 to r given a is asymptotically one-
CHAPTER 6. TEE p* FORMULA 88
to-one. The Jacobian of the transformation is
If we let
then the integral (6.4) can be expressed in t e m s of the new variable,
where d ( - ) is the standard normal p.d.f., and
From the above expression we can see that we have to get the expansion of
u-' with respect to r , or the expansion of u-' - r-l with respect to r. First we
expand ~ ( 8 ; 6, a) and [(O; 8, a) in B̂ about O ,
Now we introduce an intermediate variable q which is asyrnptotically equivalent to
r. We shall obtain the expansion of ü1 - r-' in terms of q and then invert it back
to get the expansion in terms of r. Here q is defmed as
CHAPTER 6. THE p* FORMULA 89
After simple calculation we have the expansion of r2 with respect to q,
Making some transformations we have the expansion of r-' wrt to q, the expansion
of r wrt to q and the expansion of q wrt r:
Sirnilady we expand u-' in 8 around 0 and then transform it to an expression
CHAPTER 6. TEE p* FORMULA 90
Now we c m directly obtain
Using equation (6.10), the expansion of q with respect to r , then w e finally get the
expansion of u-l - r-' with respect to r:
where Gi (O), G2 ( O ) , G@) are defined in (5.17), (5.18), (5.19) of Chapter 5 .
The integral (6.7) then can be obtained using the above expansion
CHAPTER 6. THE p* FORMULA 91
w here
Now we can obtain a version of the constant in the p' formula of (6.1) by
renormalizing (6.1 ) and using the expansion (6.12),
This constant ensures the p- formula of (6.1) being a density to order 0(rt-3/2). In
the next section, we shdl further v e d y the p- formula of (6.1) is the conditional
density of 8 to order 0(n-3/2) given the ancillary statistic a.
6.3 p* Formula to O(n-')
Ln the last chapter we have obtained the asymptotic expansion of the frequen-
tist conditional distribution of 6 Qven a by using the shrinkage argument. From
(5.22), under the curent ro
where
and Gi ( O ) , G@), G3(8), G4 (8) are defined in (5.17)-(5.20) of Chapter 5 respectively.
CEAPTER 6. THE v* FORMULA 92
Now we look at how rnuch conditional probability of the integation of the
p' formula can capture. From (6.1 1) and (6.12), the integration of the pR formula
G2(4 with the constant C = C(0) = 1 - ,
Note that, equations (6.14) and (6.15) agree with (6.13) to the O(&) term. Hence-
forth, we have verified Lugannani & Rice type formula of (6.2) and the p' formula
of (6.1) with a renormalizing constant C(B) in general models.
Here we consider the following corrected p' formula
where ~ ( ê ) = 1 - -n/, and ~ ~ ( 9 ) is obtained by replacing 0 with 9 in G2(B).
Expanding ~ ~ ( ê ) in 8 about 8, we get
From here we can directly cdculate the integration of the corrected p' formula
CHAPTER 6. THE p* FORMULA 93
W e summarize the above results into a theorem regarding the accuracy of the
pK formula and the renormalizing constant in the p* formula.
Theorem 6.1 In the p' formula
m, the p* formula approximates the conditional density of 8 when C(0) = 1 -
given a to order O ( ~ Z - ~ / ~ ) . But if we change 0 to 4 in the renormalizing constant
C(8) , i.e. 1
p;(ê 1 0, a ) = ~ ( 8 ) -e'(0)-'(ê)j(ê)'/2, (6.18) 6 m, then p; approxirnates the conditional density of 6 given a where ~ ( 8 ) = 1 -
to order 0 ( n - 2 ) . 0
Remark 1.The renormalizing constant C ( 0 ) is invariant with respect to one-
to-one parameterizations (see Remark 2 of Corollary 5.2).
There are diiferent kinds of approximations to the tail probability. They al1
have the error terms of O ( ~ Z - ~ / ' ) . Actually we c m obtain al1 these third order error
tenns.
If we use the p* formula with renormalizing constant C(9) given above md
integrate it to approximate the tail probability, from (6.13) and (6.15), the third
order error term is
CHAPTER 6. THE D* FORMULA 94
If we use Lugannani & Rice type formula to approximate the tail probability, from
(6.11), (6.13) and (6.14), the third order error term is
Following Barndofi-Nielsen (1986), we d e h e statistic r' as
we can use (6.11), the expansion of u-' -r-', to get the expansion of r' with respect
to r
From the above expansion we can easily obtain the third order error term of the r*
approximation,
In the following we give a simple example from which we c m have a clear view
on what the results we have developed above (also in Chapter 6) look Like in this
specific situation.
Example 6.1 Ezponential Distribution
Suppose that we have i.i.d. observations x = (XI, , X,,)T from the exponentiai
CHAPTER 6. TEEp' FORMULA 95
distribution of mean 6 , i.e. fiom the density ~ - ~ e - ~ / ~ , 0 > O. The log-likelihood
function is
The maximum likelihood estirnate of O , 8 = ! C: Xi . The log-likelihood function
then c m be written as 8 e(e; ê) = -n- - 8
n loge,
and so ( (6; ê) - ((8; 8) = n log(8/6) - rd/$ + n, j ( ê ) = n / i 2 . The p- formula has
this form
The exact density of the distribution of 8
(a). When C = 1, f ( Ê ; 6) approxirnates p(ê; 0) with a relative error of order
O(n-') . This is equivdent to Stirling's approximation of r(n) by \/2nrnn-L'2e-n
with a relative error of order O(n-').
(b). Renormalizing the pn formula can get the exact density of ê, p(8^; 8).
( c ) . Asymptotic ezpansion of tail probability to
It is easy to obtain that
and
CHAPTER 6. TEE v' FORMULA 96
By Corollary 5.2, the distribution of 8 has the foliowing asymptotic expansion
w here
(d). p' formula to 0(n-2):
By Sheorem 6.1, when C(0) = 1 - = 1 - - n 12n l 7 the p' formulaapproximates
L p(8;8), the exact density of 4. to order ~ ( n - ~ l * ) . But ~ ( 9 ) = C(0) = 1 - - 1 2 n ~
£rom Theorem 6.1 y the p; fornula
approximates the exact density 0) to order 0 ( n - 2 ) . This is equivalent to the
approximation of T(n) by finn-lj2e-" ( 1 + &-) "th a relative error of order
0 ( n - 2 ) .
(e). Third order emor terms of approximations to the tail probability:
For the p; approximation, frorn case (d) or (6.19):
For Lugannani & Rice type formula, frorn (6.20),
CHAPTER 6. THE P* FORMULA 97
where
For the r' approximation, from (6.23),
where
1 uo r; = +o + - log - ,
ro ro
and 7-0, uo as defined above.
6.4 Confidence Intervals to 0(n-2)
To contruct confidence intervals, it is convenient to use the r- statistic, as
defined in (6.31),
Our objective is to construct two-sided confidence intervals for O. If we let r' =
where O(z,) = a, we can solve the r* expression for 8 to obtain one-sided confidence
intervals having accuracy of order O ( T ~ - ~ / ~ ) . Repeating the process for r- = rl-,,
then obtain other one-sided confidence intervals having accuracy of order O(TZ-~ / * ) .
It is easy to see that from equation (6.23), the two-sided confidence intervals obtained
from the above two steps then have accuracy of order 0(n-2).
The problem arising from using the above procedure is that, it is not always
easy to solve 0 out from the expression of r', one might still have to make some
expansions, but this one more step might affect the accuracy of the intervals. In
the following construction we shdl make more expansions from the r' expression to
give an explicit form of the two-sided confidence intervals for 0 , then we prove that
CHAPTER 6- THE p" FORMULA 98
they have accuracy of order 0 ( n - 2 ) and the one-sided confidence intervals still have
accuracy of order 0 ( n - 3 / 2 ) .
Our construction is based on the asyrnptotic expansion of the conditional dis-
tribution of r given an ancillary statistic a obtained in Chapter 5. From (5.21),
under the current r defined in (6.5)
where G1 ( O ) , G2 ( O ) , G@), G4 (8) are defined in (5.17)-(5.20) respect ively.
If we directly calculate the cequantile of r to order 0 ( n - 2 ) , we c m obtain the
conditional confidence intervals, but we need to compute the first five derivatives of
the log-likelihood funct ion. Our cont ruct ion here can reduce the calculations to the
Ç s t four derivatives, but the two-sided confidence intervals st il1 have the accuracy
of order O ( C ~ ) , and also, the left-sided and right-sided intervals have accuracy of
order 0(n-3 /2 ) .
In order to get the conditional confidence intervals via statistic r , we use the
intermediate variable p which is here defined as
Please note that we change the sign of p comparïng to that in Chapter 5. The
relations between r and p cas be seen from the following two formulas.
CHAPTER 6. THE p* FORMULA 99
From (6.22), letting r' = z,, where a(&) = a, O < a < 112, and solving For r
(noting r,) to order O(n-') , we then obtain
From (6.24),
and the a-quantile of the conditional distribution of p to order ~ ( n - ~ / ~ ) can be
obtained by equation (6.27)
Pa =
ow we plug in th e expression of r, and expand functions G1(8), G2(4 in 0
around B . Rearranging the terms according to the asyrnptotic order and dropping
the terms of order O ( T ~ - ~ / ~ ) , we get
Replacing 2, with zl-, in the above expression we have pi-, . The condit ional
confidence intervals of 0 are then determined by p, and p;-,. The following theorem
gives details of this result.
CHAPTER 6. THE p* FORMULA 100
Theorem 6.2
PROOF: It is equivalent to showing
and
From relation (6.26) we have r, which corresponds to p:,
where (ê), B~ (ê), cl (6) are functions of 4 and a. The explicit forms of (è) , B~ ( 4 ) and c ~ ( Ê ) can be calculated out, but here we do not have to know them. Now we
expand functions of 8 in 6 around 0 to get
CHAPTER 6. THE p' FORMULA 101
where A2(0), B2(B), C2(0) are functions of 0 and a. Again, we do not have to know
the explicit forms of A2(8), B2(0) and C2(6). From (6.24)and (6.29), we have
Repeating the process as we did for p,, we have
and so
Thus we establish the result of two-sided conditional confidence intervals having
accuracy of order 0(nd2). 0
Example 6.2 (Continued from Example 5.1, Example 5.2 and Example 5.3)
We consider the one parameter location models and the one parameter exponential
CHAPTER 6. THE v* FORMULA 102
models with O being the canonical parameter. The two-sided confidence intervals for
0 c m be constructed to have accuracy of order 0 ( n - 2 ) using the current method.
The exact forms are the same as in Theorem 6.2. We just give pz here.
In the one parameter location models:
Please note that j1@), e3;, and @4;0 are functions of the ancillary a only.
In the one parameter exponent ial models:
and
Chapter 7
Conditional Distribution of the
SRCLR Statistic by Using the
S hrinkage Argument
In this chapter we use the shrinkage argument to derive the frequentist condi-
tional distribution to order O ( T Z - ~ / * ) of the signed root of the condi tional likelihood
ratio (SRCLR) statistic (Cox & Reid, 1987) given an ancillary statistic in Iocation-
scde rnodels. We also discuss this approach for other models. By the way we obtain
approximations of the marginal posterior densi ty and distribution function.
7.1 Introduction
Suppose that we have i.i.d. observations X = (XI, - - - , x , )~ from the density
f (x; 4, where 6 = ($, A) and has prior a($). The log-likelihood function based on
2 is e(9; X) and the maximum likelihood estimate of 0, 6 = (4, i). We assume
that 11 is the scalâr parameter of interest and X is the nuisance parameter. For
CRAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 104
notational convenience, we assume X is also one dimensional. As we did in Chapter
5, we rewrite the log-likelihood function as ! ( O ) = [ ( O ; d, a ) where a is an ancillary
statistic for O , but in the Bayesian setup, we still use e(0; x).
Our investigation focuses on the frequentist conditional distribution giveo an
ancillary statistic, of the signed root of the conditional likelihood ratio (SRCLR)
statistic defined as
gested by Cox k Reid (1987)
likelihood estimate based on
the parameter of interest as
on the ground of conditioning, and 4 is the maximum
( ) The SRCLR statistic r c m be used for testing
well as ot her inferentiai purposes. In transformation
models with transformation parameters, DiCiccio Field & Fraser ( 1990) obtained
the approximation of the distribution of r by eliminating the nuisance parameter
by marginalization. In this chapter we begin from the Bayesian setup, obtain the
posterior distribution of the parmeter of interest: then use the shrinkage argument
to derive the frequentist conditional distribution to order 0 ( 7 2 - ~ / ~ ) of the SRCLR
statistic r given an ancillary statistic in location-scale models. We shall dso discuss
the possibilities in other models.
We assume the regularity conditions similar to those in 55.1 but here m = 2
and the parorneter is two dimensional. Let jl(B̂ ) = ( j i j ) 2 x 2 be the per observation
information rnatx-ix evaluated at 8, i.e. n jl (6) = j(ê) = 4'' (fi), and &O
CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STATISTIC 105
7.2 Marginal Posterior Density
The marginal posterior density of + given x under the prior T ( $ , A ) is
In the following we &al1 use Laplace's method to get an expansion of the marginal
posterior density ?T(@ lx). We begin the expansion by expanding e($, A) as function of II> and X about ti>
and A,
Now we standardize and X by defming
We can see that
The determinant of the Jacobian of the transformation from (Sr, A) to (pl, p z ) is
CHAPTER 7. CONDITyL DISTyN OF THE SRCLR STATISTIC 106
Based on the new variables we have the following form of the expansion
Now expanding ?r(tl>, A ) as function of $ and X about 6 and a and sirnplifying
the expansion, we get
Therefore we can have
In order to simplify the expression, we here introduce more notation:
CHAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 107
Please note that V', A:j are functions of the data only, while TG is the function
of the data x and the prior n.
Using the above notation, after a considerable manipulation of algebra, we
obtain the two dimensional integration
J
w here
c;(x) =
c;(x) =
The marginal posterior density of i1> given X then has the following expansion
where c;(x) is defined in (7.3), and
CHAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 108
7.3 Approximation to the Marginal Posterior Den-
sity
Cornparhg to the expansion of the marginal posterior density of S> in the previ-
ous section, there is another way to do the expansion. It differs from an application
of Laplace's method to the numerator in which we expand e(+, A ) as a function of X
around &, where A+ is the maximum likelihood estirnate of X holding $J constant.
-4s a result of this approximation, we have
where A) = a2e($, X)/dA2. Ignoring terms which do not depend on 11> and
defining 1
[ c ( + ) = e(+, %) - - log 1 - ~ . \ ( f b , i d 1, (7.7) 2
being called the conditional log-likelihood suggested by Cox Sr Reid (1987) on the
ground of conditioning, we then have
This result was derived by Leonard (1982), Phillips (1983) and Tierney and Kadane
(1986). With a renormalizing constant, the above approximation has a relative error
of order (Tierney and Kadane, 1986). In the following we shall verify the
relative error of this approximation by making expansion of (7.8) and cornparing to
equation (7.4).
Frmn &($,&) = O, expanding [A($, &,) as a function of + and hJ, oround 4 and i, and solving for X+ - we theo obtain
CHA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 109
Now expanding t ($ , &) as a function of 11 and ii around 4 and A, and using
equation (7.9), we get
S imilarly we have
and
Hence we finally have
where Qf (pl), &;(pi) are defined in (7.5), (7.6). Comparing (7.13) with equation
(7.4), we see that approximation (7.8) with a renormalizing constant to the marginal
posterior density of qb indeed has relative error of order 0 ( n - j i 2 ) .
7.4 Marginal Posterior Distribution
In this section we shdl deive the marginal posterior distribution of + based
on the approximation (7.8) to the marginal posterior density of 11. First we make
a transformation from q!~ to r , the signed root of the conditional likelihood ratio
(SRCLR) statistic defined as
CHA PTER 7. CONDlT'L DIST'N OF THE SRCLR STATlSTIC 110
where 4 is the maximum likelihood estimate based on the condit ional log-likelihood
( ) in ( 7 ) Next we need to get the expansion of r with respect to 1C>, actually
pl, the standardized version of q5.
To get the expansion of &($) - tc(G), it is convenient to write e,(S>) - e,($)
as the difference between e,($) - e,($) and ec(4) - te($) and expand both terrns.
It follows €rom (7.10) and (7.11) that
where pi(q) = pi l$+. - Henceforth, we need to evaluate (6) further.
From P.!(+, A+) = O, taking derivative with respect to Ji, we hâve A
Now taking derivative of te($) with respect to q5 and using the above equation, then
noting e:(&) = O, we get
Making expansion of functions in the above equation and observing the 0 ( n 2 ) term,
we find 4 L.
2[êm(& - 4) + îll(;\4 - i)]& f ê03ê11 - !12&)2 = 0 (7.17)
requiring that 4 - 4 and - X are O(n-' ). Solving equation (7.17) for & - 4 by
using equation (7.9), we finally obtain
CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STA TISTIC 111
Equation (7.16) then has expansion
Combining (7.19) and (7.15) gives
Making another simple expansion we then obtain the expansion of r with respect to
p l , and the inversion of it, the expansion of pl with respect to r:
Taking derivative on both sides of equation (7.21), we have
1 -(A;, + AkA;, + A;,A&Vm + A;;V*)Vn] + - - .} dr. 4
Now inserting (7.21) in equation (7.12), we get
Using (7.18) on expansion (7.12), we have
CEAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 112
Thus we obtain, by taking ratio on both sides of the above two equations
A direct calcdation by the application of (7.18) gives
(7.24)
Hence we c m have the following expansion from (5.22), (7.23) and (7.24)
where
In order to get a Lugannani & Rice type formula, now we define statistic u as
From (7.22), (7.23) and (7.24) we have the following expansion of u-' with respect
As results of the above expansions, we summarize them into a theorem regard-
ing the approximation of the marginal posterior density with an explicit form of the
CHAPTER 7. CONDIT'L DIST'N OF TEE SRCLR STATISTIC 11.3
renormalizing constant, and the approximation of the marginal posterior dist ribu-
tion both in the Lugannôni Sr Rice type and in the form of direct expansion.
Theorem 7.1 Suppose that the log-likelihood function is [(O), where 0 =
($> A) has prior T(+, A). The conditional log-likelihood &($) is dehed as in ( 7 3 ,
and iI> is the maximum likelihood estimate of $ based on e,(t,b). Under the regulanty
conditions as stated in the introduction,
the marginal posterior density of 1C, has approximation
wit h a relative error of order O ( ~ Z - ~ / ~ ) , where H ~ ( x ) is defined in (7.26);
0 the marginal posterior distribution of $ has the following expansion
where
and HF(*), H; (X) are defined in (7.25), (7.26). O
7.5 Frequentist Conditional Distribution of the
SRCLR Statistic
Unlike in the scalar parameter case, in the case with a nuisance parameter the
CHA PTER 7- CONDIT'L DIST'N OF THE SRCLR STATISTIC 114
frequentist conditional distribution of the SRCLR st atistic depends on the nuisance
parameter in general. Also, the observed balance relations are too complicated to
express. Here we just consider the location-scde models.
Suppose t hat X I , . a-- , X, are i.i.d. observations from the location-scale models
with density
where h(-) is a known function. We assume that C< is the parameter of interest and
a is the nuisance parameter, Le. 9 = ($,A) = ( p , a). It can be shown that the
configuration statistic a = (al, - - - , a,)*, where a; d e h e d as
is an ancillary statistic for 6 (Fraser, 1979). Now we make a transformation from x to (j7 O ) . Let t~(-) = log h ( . ) , t hen the log-likelihood function can be expressed as
In the following w e shall use the shrinkage argument to derive the frequentist con-
ditional distribution of the SRCLR statistic r defined in (7.14) given the ancillary
statistic a.
Now we define more notation (see dso 87.1 and 57.2):
Please note that p, p,, V, Aij are functions of the ancillary a only in the location-scale
models.
CEA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC
In the previous section we have obtained the posterior distribution of the
SRCLR statistic r. From Theorem 7.1,
where ~ ; ( ê , a), ~ ; ( ê , a ) are defined in (7.25), (7.26). Here we change the depen-
dence on the data frorn x to (8, a). This change doesn7t affect the values of the
functions Kt (-), H z ( - ) .
Now taking expectation of the posterior distribution of r with respect to the
conditional distribution of 4 given the ancillary statistic a, we get
1 E ~ o . . ~ {E" [ I ( . < , ~ I 8, a]} = E ~ o . . ~ {@(a) - J n ; ~ ( r o ) ~ i ( & o a)
where functioos H;'(B, a), HT(e7 a) are obtoined by substituting ë with 8 in hnctions
The final step of the shrinkage argument is taking expectation with respect
to the prior n(-). Now we assume that the prior R and its derivatives vanish on
the boundaries of a rectangle containhg the true parameter 0 as a.n interior point.
Using integration by parts, we have
1 E, (E(o,~) {E" [T( ,<, .~ 1 8, a]}) = / (@(rd - -$ ) (~o)H:(ka)
CHAPTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 116
w here
Finally we let the prior T converge weakly to the degenerate measure at the tme
parameter O. This leads to the following result.
Theorern 7.2 If the parameter of interest is the location parameter and the
Log-likelihood function is defined as in (7.33), t hen under the regularïty conditions
as stated in 57.1, the frequentist conditional distribution of the SRCLR statistic r
given the ancillary statistic a has the following asymptotic expansion
where Gl(a), G2(a) are defined in (7.34), (7.35). 0
The functions Gl(a) and G2(a) in the expansion of the frequentist conditional
distribution of r can be expressed in the form of the scalar puameter case of Chapter
5, but the involvement of the sample space differentiations and the observed balance
relations are such that the resulting expressions are too complicated to be useful.
The quantity D is introduced to simplify the various expressions of the sample space
derivat ives.
Example 7.1 Suppose that we have i.i.d. observations Xl, - - - , X, from the
normal distribution N ( p , a2). The parameter of interest is the location parameter
CHAPTE!? 7. COXDIT'L DIST'N OF THE SRCLR STATISTIC 117
p. The function g(x) = -x2/2. The log-likelihood function is
The maximum likelihood estimate (j, 5 ) = (Ar7 d m ) , where X =
a = (ai, - - ,a,)*, where
Holding p constant, we obtain 2, = &;(Xi - ~ ) ~ / n . Defining the conditional
log-likelihood as
the maximum likelihood estimate of p based
the SRCLR statistic
- on t,(p), fi = P = X . Now defining
and
Please note that Gl(a) and G2(a) do not depend on the ancillary a, so what we
get from Theorem 7.2 is the frequentist unconditional distribution of the SRCLR
statistic r ,
CHA PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 118
then T follows a t(,-l) distribution. So the above expansion of the distribution of
the SRCLR statistic r is equivalent to the following approximation to the t ( n - l )
distribut ion function,
1 0 = sgn(T0)\/(n - 1) log[l + -1. n - l
The density of the t ( , - l ) distribution is
Making expansion of the density in terms of ro and cornparing to the approximation
obtained by defferentiating the approximate distribution function, we obtain that the
above approximation to the distribution function is equivalent to the approximation
n - 1 1 / 2 1 1 o f r ( : ) / r ( Y ) b ~ ( ~ ) ( 4n ) wi t h a relative error of 0(n-3/2 ) , act ually O (n-* ) ,
because there is no O ( T ~ - ~ / ~ ) term in the expansion of r(:)/r(?). 0
If the parameter of interest is the scale parameter o, we k t 0 = ($J, A ) = (0: p ) ,
and the Log-likelihood function
In the following we summarize the result in this case into a theorem.
Theorem 7.3 If the parameter of interest is the scale parameter asd the
log-likelihood function is defined as in (7.37), then the frequent ist conditional dis-
tribution of the SRCLR statistic r given the ancillary statistic a has the following
CB4PTER 7. CONDIT'L DIST'N OF THE SRCLR STATISTIC 119
expansion
where
Remark 1. It is possible to derive the frequentist conditional distribution of
the SRCLR statistic r given an ancillary statistic for general transformation models.
Here we do not attempt this further, but indicate the possibility of this approach
comparing to other approaches, e-g. Fraser (1979), DiCiccio Field & Fraser (1990).
In aplications, the formula obtained by DiCiccio Field & Fraser (1990) is more
convenient to use than the one we derived here.
Remark 2. For general exponential models, the expansion of the SRCLR
statistic r depends on the nuisance parameter. Actually in exponential models,
after sufficiency reduction, there is no ancillasy statist ic already, and the statist ic
for inference is conditioning on another statistic (see Skovgaard, 1987; Fraser &
Reid, 1993).
Remark 3. If we want to use the shrinkage argument in generd models, the
expansion of the frequentist conditional distribution of r depends on the nuisance
parameter, but it might serve as a tool for verîfying the results obtained by other
approaches.
Bibliography
[1] Barndofi-Nielsen, O.E. (1980). Conditionality resolutions. Biometrika 67, 293-
310.
[2] Barndorff-Nielsen, O.E. (1983). On a formula for the distribution of the maxi-
mum likelihood estimator. B i o m e t d a 70, 343-363.
[3] Barndofi-Nielsen, O.E. (1986). Inference on fui1 or partial parameters based
on the standardized, signed log likelihood ratio. Biometrika 73, 30'7-322.
[4] BamdorfFNielsen, O.E. (199 1). Modified signed log likelihood ratio. Biometrika
78, 55'7-563.
[Z] Barndofi-Nielsen, O.E. and Blaesild, P. (1986). A note on the calculation of
Bartlett adjustrnents. J. R. Statist. Soc. B 48, 353-338.
[6] Barndofi-Nielsen, O.E. and Cox, D.R. (1984). Baxtlett adjustments to the like-
lihood ratio statistic and the distribution of the maximum likelihood estimator.
J. R. Statist. Soc. B 46, 483-495.
[7] Barndofi-Nielsen, O.E. and Cox, D.R. (1989). Asymptotic Techniques for Use
in Statistics. Chapmaa and Hall, London.
BIBLIOGRAPHY 121
[SI BarndorfF-Nielsen, O.E. and Cox, D.R. (1994). Inference and Asymptotics.
Chapman and Hall, London.
[9] Barndofi-Nielsen, O.E. and Hall, P. (1988). On the level-error after Bartlett
adjustment of the likelihood ratio statistic. Biometrika 75, 374-375.
[IO] Berger, J.O. and Bernardo, J.M. (1989). Estimating a product of means:
Bayesian analysis with reference priors. J. Amer. Statist. Assoc. 84, 200-207.
[I l ] Berger, J.O. and Bernardo, J.M. (1991). Reference pnors in a variance compo-
nents problem. Ba yesian In ference in Statistics and Econometrics ( P . Goel and
N.S. Iyengqeds). Springer-Verlag, NY.
[12] Berger, J.O. and Bernardo, J.M. (1992a). Ordered group reference priors wit h
application to the Multinornid problem. Biometrika 79, 25-37.
[13] Berger, J.O. and Bernardo, J.M. (1992b). On the development of the reference
pnor method. Bayesian Statistics 4: Proceedings of the Fourth Valencia Inter-
national Meeting (J .M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith,
eds.). Clarendon Press: Oxford, 33-60.
[14] Berger, J.O. and Yang, Ruo-yong (1992). Noninformative priors and Bayesian
testing for the AR(1) model. Technicd report 92-45C, Depart ment of S tatistics,
Purdue University.
[15] Bemardo, J. M. ( 1979). Reference posterior distributions for Bayesian inference
(with discussion). J. R. Statist. Soc. B 41, 113-147.
BIBLIOGRAPHY 122
[16] Bickel, P.J. and Ghosh, J.K. (1990). A decomposition for the Lkelihood ratio
statistic and Bartlett correction-a Bayesian argument. Ann. Statist. 18, 1070-
1090.
1171 Cox, D.R. and Reid, N. (1987). Pararneter orthogonality and approximate con-
ditional inference. J. R. Statist. Soc. B 49, 1-18.
[lS] Daniels, H.E. (1954). Saddlepoint approximations in statistics. Ann. ilfath.
Statist. 25, 632-650.
[19] Datta, G.S. and Ghosh, J.K. (1995a). On priors providing frequentist validity
for Bayesian inference. Biometrika 82, 37-46.
1201 Datta, G.S. and Ghosh, J.K. (1995b). Noninformative priors for maximal in-
variant parameter in group models. Test 4, 95-114.
1211 Datta, G.S. and Ghosh, M. (1995). Some remarks on noninformative priors. J.
Amer, Statist- Assoc. 90, 1357-1363.
[22] Datta, G.S. and Ghosh, M. (1996). On the invariance of noninformative priors.
Ann. Statist. 24, 141-159.
1231 Dawid, A.P. (1991). Fisherian inference in likelihood and prequential frames of
reference (with disussion). J. R. Statist. Soc. B 53, 79-109.
[24] DiCiccio, T.J. & Field, C.A. and Fraser, D.A.S. (1990) Approximation of
marginal tail probabilities and inference for scdar parameten. Biometrika 77,
77-95.
BIBLIOGRAPHY 123
[25] DiCiccio, T.J. and Martin, M.A. (1991). Approximations of marginal tail prob-
abilities for a class of smooth functions with applications to Bayesian and con-
ditional inference. Biometrika 78, 591-902.
[26] DiCiccio, T.J. and Martin, M.A. (1993). Simple modifications for signed roots
of Iikelihood ratio statistics- J. R. Statist, Soc. B 55, 305-316.
[27] DiCiccio, T.J. and Stern, S.E. (1993). On Bartlett adjustments for approximate
Bayesian inference. Biom etrika 80, 731-740.
[28] Efron, B. (1986). Why isn't everyone a Bayesian? Amer. Statistician 40, 1-11
[29] Efron, B. (1993). Bayes and likelihood calcuiations From confidence intervals.
Biometrika 80, 3-26.
[30] Efron, B. and Hinkley. D.V. (1978). Assessing the accuracy of the maximum
likehood estiarntor: observed versus expected information. Biometrika, 65,457-
487.
[31] Fraser, D.A.S. (1962). On the consistency of the fiducial method. J. R. Statist.
SOC. B 24, 425-434.
[32] Fraser, D.A.S. (1964). Local conditiond sufficiency. J. R. Statist. Soc. B 26,
52-62.
[33] Fraser, D.A.S. (1979). Inference and Linear Models. McGraw-Hill.
[34] Fraser, D. A.S. (1 988). Normed likelihood as saddlepoint approximation. J.
Mult. Anal. 27, 181-193.
BIBLIOGRA PHY 124
[35] Fraser, D.A.S. (1990). Tai1 probabilities fiom observed likelihoods. Biometrika
77, 65-76.
[36] Fraser, D.A.S. and Reid, N. (1988). On conditional inference for a real param-
eter: a differential approach on the sample space. Biometrika 75, 251-264.
[37] Fraser, D.A.S. and Reid, N. (1989). Adjustments to profile likelihood.
Biometrika 76, 47'7-458.
[38] Fraser, D.A.S. and Reid, N. (1993). Simple asymptotic connections between
densities and cumulant generating function leading t O accurate approximations
for distribution functions. Statist. Sinica. 3, 67-82.
[39] Fraser, D.A.S. and Reid, N. (1995). Ancillaries and third order significance.
Utilitas Mathematica 47, 33-53.
[40] Garvan, C.W. and Ghosh, M. (1996). Noninformative priors for dispersion mod-
els. Preprint.
[41] Ghosh, J.K. (1994). Higher Order Asymptotics. N S F - C B M S Regional Confer-
ence Series in Probability and Statistics, Vo1.4: IMS.
[42] Ghosh, J.K. and Mukerjee, R. (1991). Characterization of priors under which
Bayesian and frequentist Bartlet t corrections are equivalent in the mutiparam-
eter case. J. Mult. Anal. 38, 385-393.
[43] Ghosh, J.K. and Mukerjee, R. (1992a). Non-informative priors (with discus-
sion). Ba yesian Statistics 4 (J.M. Bernardo, J.O. Berger, A.P. Dawid and
A.F.M. Smith, eds.). Oxford Press, 195-210.
BIBLIOGRAPHY 125
[44] Ghosh, J.K. and Mukerjee, R. (1992b). Bayesian and frequentist Bartlett cor-
rections for likelihood ratio and conditiond likelihood ratio tests. JI R. Statist.
SOC. B 54 561-875.
[45] Ghosh, J.K. and Mukerjee, R. (l993a). Frequentist validity of highest posterior
density regions in the multiparameter case. Ann. Inst. Statist. Math. 45, 293-
[46] Ghosh, J.K. and Mukerjee, R. (1993b). O n piors that match posterior and
frequentist distribution functions. Canadian J. Statist. 21, 89-96.
[47] Ghosh, J.K. and Mukerjee, R. (lW4). Adjusted versus condit iond likelihood:
power properties and Bartlett-type adjustment. J. R. Statist. Soc. B 56, 185-
188.
[48] Ghosh, J.K. and Mukerjee, R. (1995a). On perturbed ellipsoidal and highest
posterior densi ty regions wit h approximate frequentist validity. J. R. Stat k t .
SOC. B 57, 761-769.
[49] Ghosh, J.K. and Mukerjee, R. (19% b). Frequentist validity of highest postenor
density regions in the presence of nuisance parameters. Statistics 63 Decisions.
13, 131-139.
[50] Ghosh, M. and Mukerjee, R. (1996). Recent developments on probability match-
ing priors. Preprint .
[5l] Ghosh, M. & Carlin, B.P. and Srivastava, M.S. (1995). Probability matching
pnors for linear calibration. Test 4, 333-357.
BIBLIOGRAPHY 126
[52] Ghosh, M. and Yang, M X . (1996). Noninformative pnors for the two s m p l e
normal problem. Test 5, 145-157.
[53] Hartigan, J.A. (1964). Invariant prior distributions. J. R. Statist. Soc. B 28,
536-845.
[54] Jeffreys, H. (1964). An invariant f o m for the prior probability in estimation
problem. Proc. R. Soc. London A 186, 453-461.
[55] Johnson, R.A. (1970). .4symptotic expansions associated with posterior distri-
butions. Ann. Math. Statist. 41, 851-864.
[56] Kass, R.E. and Wasserman, L. (1996). Formal rules for selecting prior distri-
butions: a review and annotated bibliography. J. Amer. Statist. Assoc. 91,
1343-1370.
[57] Laplace, P.S. (1820). Essai philosophique sur les probabilit'es. English transla-
tion: Philosophical Essa ys on Probabilities, 1951. New York: Dover.
[58] Lee, C.B. (1989). Cornparison of frequentist coverage probability and Bayesian
posterior coverage probability, and applications. Ph.D. t hesis, Purdue Univer-
sity.
[59] Lugannani, R. and Rice, S.O. (1980). Saddlepoint approximation for the dis-
tribution of the sums of independent random vxiables. Adv. A p p l . Prob. 12,
475-490.
[60] McCullagh, P. (1957). Tensor Methods in Statistics. Chapman and Hall, Lon-
don.
BIBLIOGRA PEY 127
1611 McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment
of profile Likelihoods. J. R. Statist. Soc. B 52, 325-344.
1621 Mukerjee, R. and Dey, D.K. (1993). Frequentist validity of posterior quantiles
in the presence of nuisance parameter: Higher order asymptotics. Biometrika
80, 499-505-
[63] Mukerjee, R. and Ghosh, M. (1996). Second order probability matching priors.
Preprint.
1641 Nicolau, A. (1993). Bayesian intervals with good frequentist behavior quantiles
in the presence of nuisance parameters. J. R. Statist. Soc. B 55, 377-390.
[65] Peers, H.W. (1965). On confidence points and Bayesian probability points in
the case of severai puameters. J. R. Statist. Soc. B 27, 16-27.
[66] Peers, H. W. (1968). Confidence properties of Bayesian interval est imates. J. R.
Statist. Soc. B 30, 535-544.
1671 Pierce, D.A. and Peters, D. (1992). Practical use of higher order asymptotics
for multiparameter exponential families (wit h discussion). J. R. Statist. Soc. B
54, 701-738.
[68] Rao, C.R. and Mukerjee, R. (1995). On posterior credible sets based on the
score statistic. Statist. Sinica 5, 781-791.
[69] Reid, N. (1958). Saddlepoint methods and statistical inference (with discus-
sion). Statist. Sci. 3, 213-238.
Reid, N. (1995). Likelihood and Bayesian approximation methods (with dis-
cussion). Bayesian Statistics 5 (J.M. Bernardo: J.O. Berger, A.P. Dawid and
A.F.M. Smith, eds.). Oxford University Press, 1-18.
Severini, T.A. (1991). On the relationship between Bayesian and non-Bayesian
interval estimates. J. R. Statist. Soc. B 53 , 611-615.
Severini, T. A. (1993). Bayesian interval estimates which are also confidence
intervals. J. R. Statist. Soc. B 55, 533-540.
Skovgaard, LM. (1987). Saddlepoint expansions for the conditional distribu-
tions. J. App. Prob. 24, 875-887.
Skovgaard, LM. (1990). On the density of the minimum contrast estimators.
Ann- Statkt . 18, 779-789.
Stein, C. (1985). On the coverage probability of confidence sets based on a prior
distribution. Sequential Methods in Statistics. Banach Center Publications, 16,
485-514. PWN-Polish Scientific Publishers, Warsaw.
Sun, D. and Ye, K. (1995). Reference prior Bayesian analysis for normal mean
products. J. Amer. Statist. Assoc. 90, 589-597.
Sun, D. and Ye, K. (1996). Frequentist vaiidity of posterior quantiles for a tivo
parameter exponent ial farnily. Biome trika 83, 55-65.
Sweeting, T.J. (1995a). A framework for Bayesian and likelihood approxima-
tions in statistics. Biometfika 82, 1-24.
Sweeting, T.J. (1995b). A Bayesian approach to approxirnate conditional infer-
ence. Biometrika 82, 25-36.
BIBLIOGRAPHY 129
[80] Tibshirani, R.J. (1989). Non-informative piors for one parameter of rnany.
Biometrika 76, 604-608.
[Sl] Tierney, L.J. and Kadane, J.B. (1986). Accurate approximations for posterior
moments and marginal densit ies. J. Amer. Statist. Assoc. 81, 82-86.
[S2] Wdker, A.M. (1969). On the asymptotic behaviour of the posterior distribution.
J. R. Statist. Soc. B 31, 80-88.
[83] Welch, B.L. and Peers, H.W. (1963). On formulae for confidence points based
on integrals of weighted likelihoods. J. R. Statist. Soc. B 25, 318-329.
l MAGE NALUATION TEST TARGET (QA-3)
APPUED INIAGE , lnc 1653 East Main Street -
,=A Rocheçter, NY 14609 USA -- --= Phone: 71 61482-0300 -- -- - - F a 71 6M88-5989