predicting the mean using a sampling model and an ... · we focus our discussion on es...
TRANSCRIPT
C10ed09v11.doc 4/28/2011 5:18 PM 1
Predicting the Mean using a Sampling Model and an Exchangeable Bayesian Model in a Finite Population
Ed Stanek
Introduction We develop the predictor of a population mean based on an exchangeable Bayesian model similar to that presented by Ericson (1969, 1988), while simultaneously discussing estimation of the population mean in a finite population sampling model. An earlier detailed description of a related framework and an example is given in c10ed05v5.doc. We borrow from this document some notation and ideas. We include an auxiliary variable in the problem description, but do not discuss use of the auxiliary variable in the analysis. In a later document, we consider the auxiliary variable to be measurement error, and replace a subject’s latent value (i.e. response) in the data set with a response that includes measurement error. In Bayesian models, we consider the prior distribution to be an exchangeable distribution defined by a permutation of subjects in a population. For clarity, we introduce the notation and ideas in the context of a simple example. We focus our discussion on estimating/predicting the finite population mean, but note that this investigation was motivated by the desire to better understand prediction of subject’s latent values in the context of a mixed model. First, we define notation for the observed response and an auxiliary variable on each of subject in a set of n subjects, taking 2n = . This constitutes the data set. In order to represent response for the set as a response vector, we define a sequence of subjects, and represent response for the sequence. This ordering is artificial to the problem, and introduced solely so as to represent response in a vector. As a result, different orderings of subjects could be considered, with response for the data set described by the response vector for the corresponding ordering. We introduce notation using permutation matrices that allows the response vector to be specified in terms of any of these possible orderings. Next, we define the population, assuming that subjects in the data set are part of the population. We define a vector of response for subjects in the population, with the first part of the vector corresponding to response for the subjects in the data set. The definition specifies the subset of subjects in the population where response is observed, and we assume that the value of response observed is the same value as in the data set. Our example includes N subjects in the population, where 3N = . Each subject in the population has response and an auxiliary variable. We define parameters corresponding to the mean and variance of response in the population, and similar parameters for the data set. In a Bayesian analysis, there is a prior distribution associated with the population. We define a discrete prior distribution, where points in the prior correspond to the response vector for different sequences of subjects in the population. Associated with each point is a probability. When these prior probabilities are equal for all sequences, the prior distribution is an exchangeable prior distribution. We define sequences and response for each of !N points in the prior next, using a nested notation to index sequences. The notation has three levels, with the first level indexing the set of n subjects among the first 1,...,n= positions in the prior listing, the second level indexing permutations of subjects in the set, and the third level indexing permutations of subjects in the remaining 1,...,n N= + positions. Response for a point in the prior distribution represents a response vector for the entire population of N subjects for a prior listing. The data set consists of response for n subjects. For a point in the prior distribution, the data set corresponds to a subset of n subjects from the prior listing. We define all possible subsets of n subjects from each prior listing next. In order to systematically represent the subsets of the prior listings, we identify the subsets in an identical manner for each prior listing by
C10ed09v11.doc 4/28/2011 5:18 PM 2
specifying the positions in the prior listing that are included in the subset. For each prior listing, there are Nn
⎛ ⎞⎜ ⎟⎝ ⎠
distinct subsets.
The joint prior/sampling distribution will have !N
Nn
⎛ ⎞⎜ ⎟⎝ ⎠
points. We indicate these points by
representing response for each prior listing in Nn
⎛ ⎞⎜ ⎟⎝ ⎠
response vectors, where the first 1,...,n= responses
in the response vector correspond to response for a distinct subset of subjects. Recall that we have associated with each prior listing a prior probability. In defining the joint prior/sampling distribution, we need to assign probabilities to distinct subsets (samples) of a prior listing. We define such probabilities so that for each prior listing, they sum to one. As a result, we associate with each point in the joint prior/sampling distribution a probability equal to the product of the prior probability times the probability of the subset. We next discuss the posterior distribution which is equal to joint prior/sampling distribution, given the data set. The data set specifies the subset of subjects where response is observed. When conditioning on the data set, the subset of subjects in the joint prior/sampling distribution must match those in the data set. Since not all points in the joint prior/sampling distribution satisfy this criteria, such points will have zero probability in the posterior distribution. We summarize points with positive probability in the posterior distribution. This posterior distribution is easy to specify since the data set is clearly linked to the prior distribution, and consequently can be used to identify the sample sets. The posterior distribution is the joint distribution of two independent permutation distributions. One permutation vector is the vector of responses for subjects in the data set. The other permutation vector is the vector of responses for subjects not included in the data set. Each have distinct expected values and variances. We evaluate the expected value and variance of the mean of the sample portion of the posterior distribution. The mean has expected value equal to the mean in the data set (i.e. sample), with zero variance. This result matches an intuitive idea of the expected value and variance. We contrast this development with a development of the distribution of the sample mean based on usual finite population sampling. A similar evaluation is made of the expected value and variance of the sample mean. The mean has expected value equal to the population mean, with the variance equal to the usual finite population variance. The Data Set We assume that a data set consists of the label, response, and auxiliary variable values for n subjects, where for our example, 2n = . We denote the set of subjects by { }*
1 2, ,..., nh ID ID ID= , and the
response and auxiliary variable for the subjects by ( ) ( ) ( ){ }1 1 2 2, ...
n nID ID ID ID ID IDx a x a x a , where Namex
is the latent value of response for a subject, and Namea is the value of an auxiliary variable for the subject. For our example, the set of subjects is { },h Rose Lily+ = , and the set of response and auxiliary variables
for the subjects is ( ) ( ){ },Lily Lily Rose Rosex a x a . We order the subjects in a vector
( )1 2 nID ID ID ′=λ . In the example, we define the order as ( )Rose Lily ′=λ , and the
C10ed09v11.doc 4/28/2011 5:18 PM 3
corresponding response as ( )Rose Lilyx x ′=x . Let us define new labels for the subjects in order λ by
1,...,s n= where for the example, ( )1 2s s
Rose Lily
′= =⎛ ⎞= ⎜ ⎟⎝ ⎠
s λ . Using these labels, we represent response
for subject s by sx , and define for the example 1
2
R
L
x xx x
⎛ ⎞ ⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
x .
Since for set *h , we could arrange subjects in different orders, we note that it is possible to define a vector of responses with a different subject order. We define a set of these possible response vectors by
, 1,...,Rm
L
xm M
x⎧ ⎫⎛ ⎞⎪ ⎪′ =⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
v , where !M n= , mv is an n n× permutation matrix with elements equal to zero
or one, all rows and columns summing to one, and where we define 11 n′ =v I . With these definitions, response for the subjects in data set *h is given by any of the vectors in the set { }, 1,...,m m M′ =v x . The
vector of auxiliary variables corresponding to the set *h is defined in a similar manner as any of the
vectors in the set { }, 1,...,m m M′ =v a , where 11
2
R
L
a aa a⎛ ⎞ ⎛ ⎞′= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
a v .
The Population We define a population as a set f of N n≥ subjects that include the subjects in the data set. As an example, when 3N = we define the set of subjects as { }, ,f Lily Rose Daisy= . We define a population listing as a sequence of subjects, and specify the order of subjects in the sequence by the vector
( )I II′′ ′λ λ where I =λ λ . We assign a new subject label associated with this ordering as 1,...,j N= such
that 1 2 3I
II
j j jRose Lily Daisy
′= = =⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠
λj
λ. We represent response for the population listing via the vector
( )R L Dy y y ′ or equivalently as ( )1 2 3y y y ′=y , where jy is response for subject labeled j , and
partition it such that ( )I II′′ ′=y y y , noting that I =y x is an 1n× vector. We define population
parameters corresponding to the average response as, 1
1 N
jj
yN
μ=
= ∑ , and ( )22
1
11
N
jj
yN
σ μ=
= −− ∑ . We
define the average response in the data set as 1
1 n
I jj
yn
μ=
= ∑ and the average response in the remainder as
1
1 N n
II jj n
yN n
μ−
= +
=− ∑ . We also define x Iμ μ= , ( )22
1
11
n
x s xs
xn
σ μ=
= −− ∑ and
( ) ( )22
1
11
N n
II j IIj n
yN n
σ μ−
= +
= −− − ∑ .
C10ed09v11.doc 4/28/2011 5:18 PM 4
The Prior Distribution (Defined as Response over Prior Listings) Subjects in the population may be listed in a different order which corresponds to different sequence of subjects. We refer to such a sequence as a prior listing. There are 1,..., !p P N= = prior listings. We define an alternative nested notation to index the prior listings. The highest level index
represents a set of n subject labels, and is indexed by 1,...,N
h Hn
⎛ ⎞= = ⎜ ⎟
⎝ ⎠. Within each set h , we index
permutations of the labels by 1,..., !m M n= = . Finally, within each permutation, m of each set h , we index permutations of the remaining subject labels not in set h by ( )1,..., !t T N n= = − . Associated with
each prior listing is a prior probability, hmtπ , such that 1 1 1
1H M T
hmth m t
π= = =
=∑∑∑ . The prior distribution is defined
as the distribution of response over these prior listings. An Initial Attempt To Index Permutations with Nested Notation We define sequences of subjects explicitly in terms of the population listing using permutation
matrices. The prior listing hmt is defined by ( ) ( )hhmt mtc
h
′⎛ ⎞= ⎜ ⎟⎜ ⎟′⎝ ⎠
δj jδ where ( ) ( )
( )
m n N nmt
tN n n
× −
− ×
′⎛ ⎞⎜ ⎟=⎜ ⎟′⎝ ⎠
v 0j j
0 w, h′δ is an
n N× indicator matrix that identifies the subjects in set h as the subjects in the vector ( )mth′δ j , c
h′δ is the
ortho-complement of h′δ such that hc
h
′⎛ ⎞⎜ ⎟′⎝ ⎠
δδ
is a permutation matrix, where ( )mtch′δ j identifies the subjects
not in set h , mv is an n n× permutation matrix with elements equal to zero or one with all rows and columns summing to one, and tw is an ( ) ( )N n N n− × − permutation matrix with elements equal to zero or one, with all rows and columns summing to one. In order to uniquely define h′δ for set h , we require
the subjects in ( )mth′δ j to be in the same order as subjects in ( )mtj . We also uniquely define c
h′δ by
requiring the order of the subjects in ( )mtch′δ j to be the same as the order of subjects in ( )mtj .
We label subjects in a prior listing by ( ) ( ) ( ) ( )( )1 2hmt hmt hmt hmt
Nj j j ′=j with elements ( )hmtj , for
1,..., N= . Response is given by ( ) ( )hmt mthc
h
′⎛ ⎞= ⎜ ⎟′⎝ ⎠
δy y
δ where ( ) ( )
( )
m n N nmt
tN n n
× −
− ×
′⎛ ⎞⎜ ⎟=⎜ ⎟′⎝ ⎠
v 0y y
0 w and
( ) ( ) ( ) ( )( )1 2hmt hmt hmt hmt
Ny y y ′=y with elements ( )hmty , for 1,..., N= . We define ( )1 n n N n× −
⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ I 0 ,
( )1c
N nN n n −− ×
⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ 0 I , 11 n′ =v I and 1 N n−=w I so that ( )111j j= = and ( )111 =y y . The prior distribution
of response is given by ( )
1 1 1
H M Thmt
hmth m t
I= = =
= ∑∑∑Y y , where hmtI is an indicator random variable with
C10ed09v11.doc 4/28/2011 5:18 PM 5
( )hmt hmtE Iξ π= (which we represent by pπ when we use the simpler index 1,...,p P= ). An example of the prior listings and prior distribution of response for the example is given in Table 1. There are ! 3! 6N = = sequences corresponding to the prior listings in Table 1. The listings are organized by the sets of n subjects identified by h , where these subjects are included among the first
1,...,n= positions in the prior listing. . The first row includes prior listings where the set of subjects corresponding to 1h = is { },Lily Rose . The second row includes prior listings where the set of subjects
corresponding to 2h = is { },Daisy Rose ; the third row includes prior listings where the set of subjects
corresponding to 3h = is { },Daisy Lily . Each prior listing is identified by the indices hmt (or more simply by the index p ). The order of the subjects in the prior listing (relative to the population listing) is indicated by the vector labels for j .
As an example, for the prior listing 221hmt = , the subjects are in the order 2 3 1j j j
Lily Daisy Rose
′= = =⎛ ⎞⎜ ⎟⎝ ⎠
.
Subject labels specific to this prior listing are given by ( )221j , 1,..., N= , or more simply by 1,..., N= . Response for subjects in a prior listing is represented by response indexed by the subject’s first initial, or
by subject labels specific to this prior listing as ( )hmty , 1,..., N= . When 1hmt HMT
π = for all 1,...,h H= ,
1,...,m M= and 1,...,t T= , the prior distribution is exchangeable. Under the assumption that the prior distribution is exchangeable, we represent the indicator random variable for the prior listing as the product of three independent indicator random variables,
, ,hmt h I m II tI I I I= . The indicator random variable hI has a value of one when the prior listing has the first
1,...,n= subjects in set h , and zero otherwise, where ( ) 11hP IH
= = for all 1,...,h H= . The indicator
random variable ,I mI has a value of one when the population subjects 1,...,j n= defined by the vector
( )m n N n× −
⎛ ⎞′⎜ ⎟⎝ ⎠
v 0 j are in order m , and zero otherwise, where ( ),11I mP IM
= = for all 1,...,m M= . The
indicator random variable ,II tI has a value of one when the population subjects 1,...,j n N= + defined by
the vector ( ) tN n n− ×
⎛ ⎞′⎜ ⎟⎝ ⎠
0 w j are in order t , and zero otherwise, where ( ),11II tP IT
= = for all 1,...,t T= .
C10ed09v11.doc 4/28/2011 5:18 PM 6
Table 1. Prior Listing and Response for Sequences of 3N = Subjects for Population f _____________________________________________________________________________ ( )1p = 1h = , 1m = , 1t = ( )1p = 1h = , 2m = , 1t = 111π 121π
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
111 11 111
111 1111 1
111 1112 2
111 1113 3
123
R R
L L
D D
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
121 21 121
121 1211 1
121 1212 2
121 1213 3
213
L L
R R
D D
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 1
1
1 0 00 1 00 0 1
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 1
1
1 0 00 1 00 0 1
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
_____________________________________________________________________________ ( )3p = 2h = , 1m = , 1t = ( )4p = 2h = , 2m = , 1t = 211π 221π
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
211 11 211
211 2111 1
211 2112 2
211 2113 3
132
R R
L D
D L
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
221 21 221
221 2211 1
221 2212 2
221 2213 3
231
L L
R D
D R
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 2
2
1 0 00 0 10 1 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 2
2
1 0 00 0 10 1 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
_____________________________________________________________________________ ( )5p = 3h = , 1m = , 1t = ( )6p = 3h = , 2m = , 1t = 311π 321π
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
311 11 311
311 3111 1
311 3112 2
311 3113 3
312
R L
L D
D R
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
321 21 321
321 3211 1
321 3212 2
321 3213 3
321
L R
R D
D L
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 3
3
0 1 00 0 11 0 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 3
3
0 1 00 0 11 0 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
C10ed09v11.doc 4/28/2011 5:18 PM 7
Problems with The Initial Nested Notation Idea Inspection of Table 1 illustrates that the nested notation idea does not reproduce the !N permutations. The same permutation will result when 4p = and 5p = , and when 3p = and 6p = . Some other terms need to be included to create a nested notation that will re-produce the permutations. One possibility is to include another term that will pre-multiply the vectors for ( )311y and ( )321y . For example, pre-multiplying ( )311y by
0 1 00 0 11 0 0
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
will result in the vector ( )D R Ly y y ′ , while pre-multiplying ( )321y by
0 1 00 0 11 0 0
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
will result in the vector ( )D L Ry y y ′ . For this reason, it seems that we could represent the permutations as in Table 2.
C10ed09v11.doc 4/28/2011 5:18 PM 8
Table 2. Prior Listing and Response for Sequences of 3N = Subjects for Population f _____________________________________________________________________________ ( )1p = 1h = , 1m = , 1t = ( )1p = 1h = , 2m = , 1t = 111π 121π
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
111 11 111
111 1111 1
111 1112 2
111 1113 3
123
R R
L L
D D
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
121 21 121
121 1211 1
121 1212 2
121 1213 3
213
L L
R R
D D
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 1
1
1 0 00 1 00 0 1
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 1
1
1 0 00 1 00 0 1
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
_____________________________________________________________________________ ( )3p = 2h = , 1m = , 1t = ( )4p = 2h = , 2m = , 1t = 211π 221π
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
211 11 211
211 2111 1
211 2112 2
211 2113 3
132
R R
L D
D L
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
221 21 221
221 2211 1
221 2212 2
221 2213 3
231
L L
R D
D R
j yj y yj j y y yj y yj y
⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
j y y
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 2
2
1 0 00 0 10 1 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 2
2
1 0 00 0 10 1 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
_____________________________________________________________________________ ( )5p = 3h = , 1m = , 1t = ( )6p = 3h = , 2m = , 1t = 311π 321π
( ) ( ) ( ) ( )
( )
( )
( )
3311 11 311 311
3
3111
3112
3113
312
c
R L D
L D R
D R L
jj y y yj j y y yj y y yj
′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠
⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠
δj y y yδ
( ) ( ) ( ) ( )
( )
( )
( )
3321 21 321 321
3
3211
3212
3213
321
c
L R D
R D L
D L R
jj y y yj j y y yj y y yj
′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠
⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠
δj y y yδ
1
1
1 00 1
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 3
3
0 1 00 0 11 0 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ 2
1
0 11 0
1
⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =
v
w 3
3
0 1 00 0 11 0 0
c
⎛ ⎞′⎛ ⎞ ⎜ ⎟
=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠
δδ
C10ed09v11.doc 4/28/2011 5:18 PM 9
At this point, it seems like we need to see if there is a way to generalize this nested representation to settings where 2n > and 3N > . As an example, we consider 2n = and 4N = next. First, we illustrate the permutations that would result if we simply expressed the vectors as
C10ed09v11.doc 4/28/2011 5:18 PM 10
Table 3. Examples of when 2n = and 4N = where 4m
t
′⎛ ⎞=⎜ ⎟′⎝ ⎠
v 0I
0 w
h mt 1 2 3 4 5 6
11 m
t
′⎛ ⎞⎜ ⎟′⎝ ⎠
v 00 w
R
L
D
V
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
R
L
D
V
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
R
D
L
V
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
R
V
L
D
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
L
D
R
V
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
L
V
R
D
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
D
V
R
L
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
12 m
t
′⎛ ⎞⎜ ⎟′⎝ ⎠
v 00 w
R
L
V
D
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
R LR R L V
D DL V V D
L RV L R R
V VD D D L
y yy y y yy yy y y yy yy y y yy yy y y y
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
21 m
t
′⎛ ⎞⎜ ⎟′⎝ ⎠
v 00 w
L
R
D
V
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
L L RL R D
R D DV V V
D R LD L L
V V VR D R
y y yy y yy y yy y yy y yy y yy y yy y y
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠
22 m
t
′⎛ ⎞⎜ ⎟′⎝ ⎠
v 00 w
L
R
V
D
yyyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
L RL L R V
D DR V V D
R LV R L L
V VD D D R
y yy y y yy yy y y yy yy y y yy yy y y y
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
TO HERE 7/20/2010
C10ed09v11.doc 4/28/2011 5:18 PM 11
The Joint Distribution of the Prior and a Subset (i.e. Sample) of the Prior We describe next the joint distribution of the prior and the sample. When describing this joint distribution, we use the single index, 1,..., !p P N= = , to refer to the prior listings. From each prior listing, we define a sample as a set of n subjects, and index the sample by { }* *
1 2, ,..., nh η= ⊆ , where
*η represents the set of all possible subsets of size n and * *1,...,N
h Hn
⎛ ⎞= = ⎜ ⎟
⎝ ⎠. Let us define the labels
for prior listing p by ( ) ( ) ( ) ( )( )1 2p p p p
Nj j j ′=j and the corresponding vector of response by
( ) ( ) ( ) ( )( )1 2p p p p
Ny y y ′=y . We define the samples via n N× indicator matrices, **
h′δ , where the
subjects in the sample set correspond to the subjects in the vector ( )** p
h′δ j with response ( ) ( )
* **
,p p
I h h′=y δ y . In
order to uniquely define **
h′δ for a sample set, we require the sample subjects in ( )
** p
h′δ j to be in the same
order as the subjects in prior listing p . Also, we define ( )
*1 n n N n× −
⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ I 0 . Also, define
**c
h′δ as the ortho-complement of **
h′δ such that *
*
**
hc
h
′⎛ ⎞⎜ ⎟′⎝ ⎠
δδ
is a permutation matrix, where ( )**
pch′δ j identifies
the subjects not in set *h , and where response is given by ( ) ( )*
**,
p pchII h′=y δ y . We uniquely define *
*c
h′δ by
requiring the order of the subjects in ( )**
pch′δ j to be the same as the order of subjects in prior listing p .
Let ( )*p
hS represent an indicator random variable associated with sample *h from prior listing p ,
such that ( )( ) ( )* *1p p
h hP S p= = where ( )
* 0ph
p ≥ for all * *1,...,h H= and ( )*
** 1
1H
ph
h
p=
=∑ . The probability ( )*p
hp
represents the probability of selecting sample set *h from the prior listing p . Using the probability of the prior listing and the sample selection probabilities, we represent the sample response vector over the joint
prior/sample distribution as ( ) ( )*
* **
,1 1
P Hp p
I p h I hp h
I S= =
= ∑∑Y y . Defining ( ) ( )*
* **
,1 1
P Hp p
II p h II hp h
I S= =
= ∑∑Y y , we represent
( ) ( )*
**
*
*
*1 1 *
P Hp pI h
p chp hII h
I S= =
′⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟ ′⎝ ⎠ ⎝ ⎠∑∑
Y δy
Y δ.
There are a total of !N
PH Nn
⎛ ⎞= ⎜ ⎟
⎝ ⎠ points in the joint prior/sampling distribution. In the example
when 3N = and 2n = , this corresponds to 18 possible sample points illustrated in Table 2. We represent these 18 points in the joint prior/sampling distribution by a partitioned prior response vector for each point, where the first part of the partitioned vector is response for the sample, and the second part is response for the remainder in the prior listing. The ! 6N = rows in Table 2 correspond to prior listings, either indexed by 1,...,p P= , or by the nested subscripts, 1,..., ; 1,..., ; 1,...,h H m M t T= = = . We group the prior listings using the level of h in the nested subscripts, separating the groups by a horizontal line.
C10ed09v11.doc 4/28/2011 5:18 PM 12
Columns in Table 2 correspond to different sample sets, *h , from a prior listing, where the sample elements are contained in the first 1,..., 2n= = rows of the response vector. The remaining rows of the response vector (which in the example corresponds to a single row since 1N n− = ) correspond to response for the elements not in the sample set.
C10ed09v11.doc 4/28/2011 5:18 PM 13
Table 2. Joint Prior and Sample Response for Population f with 3N = and 2n = . _____________________________________________________________________________ p h ( )mty ( )* 1ph = ( )* 2ph = ( )* 3ph =
_____________________________________________________________________________ ( )11y ( )1
1 1pπ ( )11 2pπ ( )1
1 3pπ
11 0 0
1 10 1 0
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
R
L
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
1111
1112
1113
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
1111
1113
1112
R
D
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
1112
1113
1111
L
D
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )21y ( )22 1pπ ( )2
2 2pπ ( )22 3pπ
11 0 0
2 20 1 0
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
L
R
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
1211
1212
1213
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
1211
1213
1212
L
D
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
1212
1213
1211
R
D
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________ ( )11y ( )3
3 1pπ ( )33 2pπ ( )3
3 3pπ
21 0 0
3 10 0 1
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
R
L
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
2111
2112
2113
R
D
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2111
2113
2112
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2112
2113
2111
D
L
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )21y ( )44 1pπ ( )4
4 2pπ ( )44 3pπ
21 0 0
4 20 0 1
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
L
R
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
2211
2212
2213
L
D
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2211
2213
2212
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2212
2213
2211
D
R
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________ ( )11y ( )5
5 1pπ ( )55 2pπ ( )5
5 3pπ
30 1 0
5 10 0 1
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
R
L
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
2211
2212
2213
D
R
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2211
2213
2212
D
L
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
2212
2213
2211
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )21y ( )66 1pπ ( )6
6 2pπ ( )66 3pπ
30 1 0
6 20 0 1
1
hmt
=⎛ ⎞
=⎜ ⎟⎝ ⎠ =
L
R
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
( )
( )
( )
3211
3212
3213
D
L
R
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
3211
3213
3212
D
R
L
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )
( )
( )
3212
3213
3211
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________
C10ed09v11.doc 4/28/2011 5:18 PM 14
The Posterior Distribution of the Joint Prior/Sample Distribution, Given the Data Set The posterior distribution is the joint prior/sampling distribution, given the data set. The joint prior/sampling distribution for the example is illustrated in Table 2. In this distribution, for each point, the first part of the partitioned response vector is the sample, while the remainder is response for subjects not in the sample. When we condition on the data set consisting of { },h Rose Lily+ = , any point in the
joint prior/sampling distribution where the sample response represents response for the subjects in h+ is
included. Such responses are members of the set of responses , 1,...,Rm
L
xm M
x⎧ ⎫⎛ ⎞⎪ ⎪′ =⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
v . The points in the
joint prior/sample distribution that have positive probability in the posterior distribution, given the data set, are given in Table 3.
C10ed09v11.doc 4/28/2011 5:18 PM 15
Table 3. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = _____________________________________________________________________________ * 1h = * 2h = * 3h = _____________________________________________________________________________ ( )1
1 1pπ
11 1
1
hp m
t
== =
=
( )
( )
( )
1111
1112
1113
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )22 1pπ
12 2
1
hp m
t
== =
=
( )
( )
( )
1211
1212
1213
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________ ( )3
3 2pπ
23 1
1
hp m
t
== =
=
( )
( )
( )
2111
2113
2112
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )44 3pπ
24 2
1
hp m
t
== =
=
( )
( )
( )
2212
2213
2211
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________ ( )5
5 2pπ
35 1
1
hp m
t
== =
=
( )
( )
( )
3111
3113
3112
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )66 3pπ
36 2
1
hp m
t
== =
=
( )
( )
( )
3212
3213
3211
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________
C10ed09v11.doc 4/28/2011 5:18 PM 16
We summarize the resulting points in the posterior distribution in Table 4 using the nested notation for the probability of prior sample points. Table 4. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = . _____________________________________________________________________________ * 1h = * 2h = * 3h = _____________________________________________________________________________ ( )111
111 1pπ ( )211211 2pπ ( )221
221 3pπ
11 1
1
hp m
t
== =
=
( )
( )
( )
1111
1112
1113
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
2
3 11
hp m
t
== =
=
( )
( )
( )
2111
2113
2112
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
2
4 21
hp m
t
== =
=
( )
( )
( )
2212
2213
2211
R
L
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
( )121121 1pπ ( )311
311 2pπ ( )321321 3pπ
12 2
1
hp m
t
== =
=
( )
( )
( )
1211
1212
1213
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
3
5 11
hp m
t
== =
=
( )
( )
( )
3111
3113
3112
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
3
6 21
hp m
t
== =
=
( )
( )
( )
3212
3213
3211
L
R
D
yyy yy y
⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
_____________________________________________________________________________ The summary illustrates that there are two distinct points in the posterior distribution
corresponding to the response vector R
L
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
and L
R
D
yyy
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
. These response vectors differ by the order of the
subjects in the sample subset, indexed by 1,..., 2m M= = . Notice that for each cell in Table 4, the value of h and *h are identical. This is a consequence of introducing the nested notation for prior listings, and simplifies summing posterior probabilities in rows since we can replace ( )
*hmt
hp by ( )hmt
hp in the posterior distribution. Summing the posterior probabilities in each of the rows in Table 4, we define
( ) ( ) ( ) ( )1 2 31 1 2 2 3 3
1
Hhmt mt mt mt
mt hmt h mt mt mth
p p p p pπ π π π=
= = + +∑
where 1,..., 2m M= = and 1t = (since 1,..., 1t T= = ) and define *
1 1
mtmt M T
mtm t
ppp
= =
=
∑∑. Using this definition,
the posterior distribution is given in Table 5.
C10ed09v11.doc 4/28/2011 5:18 PM 17
Table 5. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = .
m t *mtp Response
1 1 *11p ( )R L Dy y y ′
2 1 *21p ( )L R Dy y y ′
The Posterior Distribution Assuming Equal Probability of Subsets of Prior Listings In a Bayesian framework, we assign probabilities to different subsets, *h , of subjects associated with a prior listing to specify the joint prior/sampling distribution. These probabilities are not traditional ‘sampling’ probabilities, and have no relationship with the process that may have resulted in obtaining the observations for the data set. The probabilities are part of the Bayesian model; assignment of different probabilities corresponds to a different Bayesian model. We make the assumption that for any given prior
listing, the probability is equal for all subsets, such that ( )*
1ph
pH
= for all * *1,...,h H= and 1,...,p P= .
Under this assumption, we develop the expected value and variance of the posterior distribution assuming an exchangeable prior. In so doing, we generalize the representation of the posterior distribution to an example where the data set includes n subjects, and the population consists of N
subjects. In this context, the prior distribution is given by ( )
( ), ,
1 1 1
mH M T n N nhh I m II t c
h m t h tN n n
I I I× −
= = =− ×
′⎛ ⎞′⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠∑∑∑
v 0δY y
δ 0 w
where for a prior listing, ( ) ( )
( )
m n N np hc
h tN n n
× −
− ×
′⎛ ⎞′⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠
v 0δy y
δ 0 w.
We express the joint prior/sampling distribution by including the additional indicator random variable ( )
*p
hS for subset *h in prior listing p and re-arranging the prior listing so that the sample is
contained in the first 1,...,n= positions given by ( )*
*
*
*ph
ch
′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠
δy
δ. Including the sampling, we represent
random variables for the joint prior/sampling distribution as
( ) ( )
( )
**
** *
*
, , *1 1 1 1
mH M T H n N nhmt hhh I m II t c ch
h m t hh th N n n
I I I S× −
= = = = − ×
′⎛ ⎞′ ′⎛ ⎞⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠∑∑∑ ∑
v 0δ δY y
δ δ 0 w.
The possible realizations of Y in the joint prior/sampling distribution are given by the vectors
( )
( )
* *
* *
* *
* *
m n N nh h m Ih hc cc c
h h t IIth hN n n
× −
− ×
′⎛ ⎞′ ′′ ′ ′⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′′ ′ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
v 0δ δδ δ v yy
δ δδ δ w y0 w
for 1,...,h H= 1,...,m M= , 1,...,t T= and * *1,...,h H= . The posterior distribution contains the points in the joint distribution where the sample is given by
C10ed09v11.doc 4/28/2011 5:18 PM 18
the subjects in h+ . Recall that we have defined response for subjects in the population such the subjects whose response is in Iy are in the set h+ . The vectors m I′v y , 1,...,m M= , represent permutations of these subject’s response. Since response corresponding to the sample in the joint prior/sampling distribution is contained in the first 1,...,n= elements of the realization of Y , only realizations of Y where the sample is equal to m I′v y for 1,...,m M= are in the posterior distribution. This implies that
( )
( )
*
* *
*
* *
n n N nhhc c c c
h hh hN n n
× −
− ×
⎛ ⎞′ ′⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠
I 0δ δδ δ 0 δ δ
.
Conditioning on the data set implies that h and *h satisfy this property, and defines points in the posterior distribution.
It is easy to define *
*
*
*hc
h
′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠
δδ
and hc
h
′⎛ ⎞⎜ ⎟′⎝ ⎠
δδ
that will satisfy this condition. One definition is that
*
*
*
*hh
c chh
′ ′⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠
δ δδ δ
, which results in *
*
*
*hh
Nc chh
′ ′⎛ ⎞⎛ ⎞=⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠
δ δI
δ δ. Under this assumption, the posterior distribution is
given by
( ) ( )
( )
( )
**
** *
*
, , *|1 1 1 1
, ,1 1 1
mH M T H n N nhmt hhh I m II t c ch h h
h m t hh th N n n
H M Thmt m I
h I m II t hh m t t II
I I I S
I I I S
+
× −
== = = = − ×
= = =
′⎛ ⎞′ ′⎛ ⎞⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠′⎛ ⎞
= ⎜ ⎟′⎝ ⎠
∑∑∑ ∑
∑∑∑
v 0δ δY y
δ δ 0 w
v yw y
,
since ( )* 1hmt
hS = when *h h= and zero otherwise. Let us express this as
( )
( )
( )
( )
, ,1 1 1
|
, ,1 1 1
, ,1 1 1
, ,1 1 1
H M Thmt
h I m II t h m Ih m t
h h H M Thmt
h I m II t h t IIh m t
M T Hhmt
I m m I II t h hm t h
T M Hhmt
II t t II I m h ht m h
I I I S
I I I S
I I I S
I I I S
+
= = =
=
= = =
= = =
= = =
⎛ ⎞′⎜ ⎟⎜ ⎟=⎜ ⎟
′⎜ ⎟⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞′⎜ ⎟⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠⎜= ⎜ ⎛ ⎞⎛ ⎞⎜ ′ ⎜ ⎟⎜ ⎟⎜ ⎝ ⎠⎝ ⎠⎝ ⎠
∑∑∑
∑∑∑
∑ ∑ ∑
∑ ∑ ∑
v yY
w y
v y
w y
⎟⎟⎟⎟
.
Now ( )
1
1H
hmth h
h
I S=
=∑ , ,1
1T
II tt
I=
=∑ , and ,1
1M
I mm
I=
=∑ . As a result,
,1
|
,1
M
I m m Im
h h T
II t t IIt
I
I+
=
=
=
⎛ ⎞′⎜ ⎟⎜ ⎟=⎜ ⎟
′⎜ ⎟⎝ ⎠
∑
∑
v yY
w y.
The implication of this is that the posterior distribution is ( )| | |h h I h h II h h+ + += = =′ ′=Y Y Y , where
|I h h+ =Y is
C10ed09v11.doc 4/28/2011 5:18 PM 19
1n× vector representing a permutation distribution of response for subjects the data set, and |II h h+ =
Y is an
independent ( ) 1N n− × vector representing a permutation distribution of response for remaining subjects in the population. Taking the expected value over these permutation distributions,
( )|x n
h hII N n
Eξ
μμ+ =
−
⎛ ⎞= ⎜ ⎟⎝ ⎠
1Y
1, while ( ) ( )
( )
2
2|var
x n n N n
h hII N nN n n
ξ
σ
σ+
× −
=−− ×
⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠
P 0Y
0 P.
We can use these results to develop the expected value and variance of the mean response for the
data set, |
1n I h h
Yn + =
′= 1 Y , where ( ) xE Yξ μ= , and ( )var 0Yξ = .
A Simple Application. We consider a simple example and explicitly evaluate the posterior probability. Suppose that
prior listings are assigned equal probabilities, such that 1p P
π = for all 1,...,p P= . Also, assume that for
any prior listing p , each sample set is equally likely such that ( ) 1php
H= for all 1,...,h H= and
1,...,p P= . In this setting, the probability of each sample set that equals the data set is given by ( ) 1p
p hpPH
π = . Since there are P prior listings, 1
1 1P
pT
PH H=
= =∑ , so that * 1hmp
P= for all 1,...,h H= ,
1,...,m M= . Thus, with these assumptions, the posterior distribution is a uniform distribution. Sampling from the Population Finite population sampling was defined in a general manner by Godambe (1955) who associated a probability with each possible sequence of n subjects from a finite population. Subsequently, Godambe and Joshi (1965) showed that it was sufficient to define samples as distinct sets of subjects. We use sample sets to define finite population sampling based on a population listing of subjects with response
jy , 1,...,j N= .
Let { }* *1 2, ,..., nh j j j η= ⊆ index distinct subsets of subjects (i.e. samples) from the population,
where *η represents the set of all possible subsets of size n and * *1,...,h H= . We define these samples via the n N× indicator matrices, *
*h′δ , where the subjects in the sample set correspond to the subjects in
the vector **
h′δ j with response * *
* *h h
′=y δ y . In order to uniquely define **
h′δ for a sample set, we require the
sample subjects in **
h′δ j to be in the same order as the subjects in the population. Also, we define
( )*
1 n n N n× −
⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ I 0 .
C10ed09v11.doc 4/28/2011 5:18 PM 20
As an example, when 3N = and 2n = , there are * 33
2H ⎛ ⎞
= =⎜ ⎟⎝ ⎠
distinct sample sets which we
define by *1
1 0 00 1 0⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ , *2
1 0 00 0 1⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ and *3
0 1 00 0 1⎛ ⎞′ = ⎜ ⎟⎝ ⎠
δ . When * 1h = , this set of subjects is
{ }1, 2 ; the set of subjects corresponding to * 2h = is { }1,3 ; and the set of subjects corresponding to * 3h = is { }2,3 .
Let *hS represent an indicator random variable associated with sample *h from the population
such that ( )* *1h h
P S p= = where * 0h
p ≥ for all * *1,...,h H= and *
** 1
1H
hh
p=
=∑ . The probability *hp
represents the probability of selecting sample *h from the population. Using this probability, we
represent a random vector corresponding to the sample response as *
* **
*
1
H
h hh
S=∑ y .
The previous definitions uniquely define a response vector for each distinct sample set. The order of the subjects in the response vector was arbitrarily set to match the subject order in the population. However, any order is possible for the subjects in the set. This means that we could represent response for sample *h by any of the vectors in the set { }*
* , 1,...,m hm M′ =v y where mv is an n n× permutation matrix
with elements equal to zero or one, all rows and columns summing to one, and where we define 11 n′ =v I . Which ordering of response (identified by m ) is used in the response vector * *
* *mh m h′=y v y for sample *h
determines interpretation of individual responses, since it identifies which subject is associated with a response. The ordering does not impact interpretation of summary measures of response for the sample, such as the sample mean, total, or maximum response. Let us assume that it is not necessary to retain identifiably of individual subjects in set *h in the response vector for set *h . This implies that it is not necessary to know in which order (identified by m ) responses are listed in the response vector. We implement the assumption that we do not need to identify
subjects in the response vector for set *h by defining ( )*
* ** *
1
M hmh h m
m
S=
= ∑Y y , where the indicator random
variable ( )*hmS that has a value of one when permutation m represents the ordering of response, and zero
otherwise. We represent ( ) ( )* * *1h hm mP S p⎛ ⎞= =⎜ ⎟
⎝ ⎠, and the sample response vector as
*
* **
*
1
H
I h hh
S=
= ∑Y Y .
Subsequently, we assign equal probability to each order, assuming that ( )* * 1hmp
M= for all 1,...,m M=
and * *1,...,h H= . We develop an expression for the expected value and variance of IY next. In so doing, we introduce notation to represent random variables corresponding to subjects in the remainder of the population. This simplifies the calculations, and facilitates relationships with other sampling models.
C10ed09v11.doc 4/28/2011 5:18 PM 21
Using earlier definitions, ( )*
*
* **
*
11
H M hI m mh h
mh
S S==
′ ′= ∑∑Y v δ y . We introduce a similar vector of remaining
random variables as ( )*
*
* **
*
11
H T h cII t th h
th
S R==
′ ′= ∑∑Y w δ y where *
*hc
h
′⎛ ⎞⎜ ⎟′⎝ ⎠
δδ
is a permutation matrix, *ch′δ j identifies
the subjects not in set *h , tw , ( )1,..., !t T N n= = − is an ( ) ( )N n N n− × − permutation matrix with
elements equal to zero or one, all rows and columns summing to one, and elements of *ch′δ are defined so
that the order of the subjects in *ch′δ j is the same as the order of subjects in the population. The indicator
random variable ( )*htR has a value of one when permutation t represents the ordering of subjects in
**c
t h′ ′w δ y , and zero otherwise such that ( ) ( )* * *1h h
t tP R p⎛ ⎞= =⎜ ⎟⎝ ⎠
. We subsequently assume that ( )* * 1htp
T= for
all 1,...,t T= and * *1,...,h H= . With these assumptions, ( )
( )
( )
( )
*
**
*** *
*1
*1
1
M hm mH n N nmI h
ch T hII h ht tN n n t
SS
R
× −=
=
− × =
⎛ ⎞′⎜ ⎟ ′⎛ ⎞⎛ ⎞ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠′⎜ ⎟⎝ ⎠
∑∑
∑
v 0δY
yδY
0 w.
We make the additional assumption that all sample sets are equally likely, such that *
1h
pH
= for
all * *1,...,h H= . With this assumption, plus the earlier assumptions that ( )* * 1hmp
M= for all 1,...,m M=
and * *1,...,h H= and ( )* * 1htp
T= for all 1,...,t T= and * *1,...,h H= , I
II
⎛ ⎞⎜ ⎟⎝ ⎠
YY
represents response for a
random permutation of subjects in the population. Let the subscript p in pE denote expectation with
respect to such random permutations. Standard calculations result in ( )p I nE μ=Y 1 , and
( ) 2 1varp I n nNσ ⎛ ⎞= −⎜ ⎟
⎝ ⎠Y I J . This is the finite population sampling model discussed by Stanek and Singer
(2004) that was extended to two-stage cluster sampling and the finite population mixed model by Stanek and Singer (2004). Estimating the Population Mean based on the Sample Mean from a Finite Population Sampling Model We use the finite population sampling model to evaluate the expected value and variance of an
estimator of the population mean, μ , given by the sample mean, 1n IY
n′= 1 Y . The expected value is
given by ( )pE Y μ= , indicating that the sample mean is an unbiased estimator of the population mean. The variance of the sample mean is given by
C10ed09v11.doc 4/28/2011 5:18 PM 22
( )2
var 1pnY
n Nσ ⎛ ⎞= −⎜ ⎟
⎝ ⎠.
Given the data set *h , we substitute the observed response, x , for IY . Under the assumptions of the
finite population sampling model, the estimate of the population mean is given by 1nx
n′= 1 x . Under the
model assumptions, since ( )pE Y μ= , we say that the estimator is an unbiased estimator of μ , where the estimate x is a realized value of the estimator, given the data set.
C10ed09v11.doc 4/28/2011 5:18 PM 23
References Bickel, P.J. and Doksum, K.A. (1977). Mathematical statistics: Basic Ideas and Selected Topics. Holden-Day, Inc. San Francisco, Ca. Ericson, W.A. (1969). Subjective Bayesian Models in Sampling Finite Populations, JRSS B 131:195-233. Ericson, W.A. (1988). “Bayesian inference in finite populations.” P.R. Krishnaiah and C.R. Rao, eds, Handbook of Statistics, Vol 6. Elsevier Sciences Publishers, B.V. (1988) 213-246.