predicting the mean using a sampling model and an ... · we focus our discussion on es...

C10ed09v11.doc 4/28/2011 5:18 PM 1

Predicting the Mean using a Sampling Model and an Exchangeable Bayesian Model in a Finite Population

Ed Stanek

Introduction We develop the predictor of a population mean based on an exchangeable Bayesian model similar to that presented by Ericson (1969, 1988), while simultaneously discussing estimation of the population mean in a finite population sampling model. An earlier detailed description of a related framework and an example is given in c10ed05v5.doc. We borrow from this document some notation and ideas. We include an auxiliary variable in the problem description, but do not discuss use of the auxiliary variable in the analysis. In a later document, we consider the auxiliary variable to be measurement error, and replace a subject’s latent value (i.e. response) in the data set with a response that includes measurement error. In Bayesian models, we consider the prior distribution to be an exchangeable distribution defined by a permutation of subjects in a population. For clarity, we introduce the notation and ideas in the context of a simple example. We focus our discussion on estimating/predicting the finite population mean, but note that this investigation was motivated by the desire to better understand prediction of subject’s latent values in the context of a mixed model. First, we define notation for the observed response and an auxiliary variable on each of subject in a set of n subjects, taking 2n = . This constitutes the data set. In order to represent response for the set as a response vector, we define a sequence of subjects, and represent response for the sequence. This ordering is artificial to the problem, and introduced solely so as to represent response in a vector. As a result, different orderings of subjects could be considered, with response for the data set described by the response vector for the corresponding ordering. We introduce notation using permutation matrices that allows the response vector to be specified in terms of any of these possible orderings. Next, we define the population, assuming that subjects in the data set are part of the population. We define a vector of response for subjects in the population, with the first part of the vector corresponding to response for the subjects in the data set. The definition specifies the subset of subjects in the population where response is observed, and we assume that the value of response observed is the same value as in the data set. Our example includes N subjects in the population, where 3N = . Each subject in the population has response and an auxiliary variable. We define parameters corresponding to the mean and variance of response in the population, and similar parameters for the data set. In a Bayesian analysis, there is a prior distribution associated with the population. We define a discrete prior distribution, where points in the prior correspond to the response vector for different sequences of subjects in the population. Associated with each point is a probability. When these prior probabilities are equal for all sequences, the prior distribution is an exchangeable prior distribution. We define sequences and response for each of !N points in the prior next, using a nested notation to index sequences. The notation has three levels, with the first level indexing the set of n subjects among the first 1,...,n= positions in the prior listing, the second level indexing permutations of subjects in the set, and the third level indexing permutations of subjects in the remaining 1,...,n N= + positions. Response for a point in the prior distribution represents a response vector for the entire population of N subjects for a prior listing. The data set consists of response for n subjects. For a point in the prior distribution, the data set corresponds to a subset of n subjects from the prior listing. We define all possible subsets of n subjects from each prior listing next. In order to systematically represent the subsets of the prior listings, we identify the subsets in an identical manner for each prior listing by

C10ed09v11.doc 4/28/2011 5:18 PM 2

specifying the positions in the prior listing that are included in the subset. For each prior listing, there are Nn

⎛ ⎞⎜ ⎟⎝ ⎠

distinct subsets.

The joint prior/sampling distribution will have !N

Nn

⎛ ⎞⎜ ⎟⎝ ⎠

points. We indicate these points by

representing response for each prior listing in Nn

⎛ ⎞⎜ ⎟⎝ ⎠

response vectors, where the first 1,...,n= responses

in the response vector correspond to response for a distinct subset of subjects. Recall that we have associated with each prior listing a prior probability. In defining the joint prior/sampling distribution, we need to assign probabilities to distinct subsets (samples) of a prior listing. We define such probabilities so that for each prior listing, they sum to one. As a result, we associate with each point in the joint prior/sampling distribution a probability equal to the product of the prior probability times the probability of the subset. We next discuss the posterior distribution which is equal to joint prior/sampling distribution, given the data set. The data set specifies the subset of subjects where response is observed. When conditioning on the data set, the subset of subjects in the joint prior/sampling distribution must match those in the data set. Since not all points in the joint prior/sampling distribution satisfy this criteria, such points will have zero probability in the posterior distribution. We summarize points with positive probability in the posterior distribution. This posterior distribution is easy to specify since the data set is clearly linked to the prior distribution, and consequently can be used to identify the sample sets. The posterior distribution is the joint distribution of two independent permutation distributions. One permutation vector is the vector of responses for subjects in the data set. The other permutation vector is the vector of responses for subjects not included in the data set. Each have distinct expected values and variances. We evaluate the expected value and variance of the mean of the sample portion of the posterior distribution. The mean has expected value equal to the mean in the data set (i.e. sample), with zero variance. This result matches an intuitive idea of the expected value and variance. We contrast this development with a development of the distribution of the sample mean based on usual finite population sampling. A similar evaluation is made of the expected value and variance of the sample mean. The mean has expected value equal to the population mean, with the variance equal to the usual finite population variance. The Data Set We assume that a data set consists of the label, response, and auxiliary variable values for n subjects, where for our example, 2n = . We denote the set of subjects by { }*

1 2, ,..., nh ID ID ID= , and the

response and auxiliary variable for the subjects by ( ) ( ) ( ){ }1 1 2 2, ...

n nID ID ID ID ID IDx a x a x a , where Namex

is the latent value of response for a subject, and Namea is the value of an auxiliary variable for the subject. For our example, the set of subjects is { },h Rose Lily+ = , and the set of response and auxiliary variables

for the subjects is ( ) ( ){ },Lily Lily Rose Rosex a x a . We order the subjects in a vector

( )1 2 nID ID ID ′=λ . In the example, we define the order as ( )Rose Lily ′=λ , and the

C10ed09v11.doc 4/28/2011 5:18 PM 3

corresponding response as ( )Rose Lilyx x ′=x . Let us define new labels for the subjects in order λ by

1,...,s n= where for the example, ( )1 2s s

Rose Lily

′= =⎛ ⎞= ⎜ ⎟⎝ ⎠

s λ . Using these labels, we represent response

for subject s by sx , and define for the example 1

2

R

L

x xx x

⎛ ⎞ ⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

x .

Since for set *h , we could arrange subjects in different orders, we note that it is possible to define a vector of responses with a different subject order. We define a set of these possible response vectors by

, 1,...,Rm

L

xm M

x⎧ ⎫⎛ ⎞⎪ ⎪′ =⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

v , where !M n= , mv is an n n× permutation matrix with elements equal to zero

or one, all rows and columns summing to one, and where we define 11 n′ =v I . With these definitions, response for the subjects in data set *h is given by any of the vectors in the set { }, 1,...,m m M′ =v x . The

vector of auxiliary variables corresponding to the set *h is defined in a similar manner as any of the

vectors in the set { }, 1,...,m m M′ =v a , where 11

2

R

L

a aa a⎛ ⎞ ⎛ ⎞′= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

a v .

The Population We define a population as a set f of N n≥ subjects that include the subjects in the data set. As an example, when 3N = we define the set of subjects as { }, ,f Lily Rose Daisy= . We define a population listing as a sequence of subjects, and specify the order of subjects in the sequence by the vector

( )I II′′ ′λ λ where I =λ λ . We assign a new subject label associated with this ordering as 1,...,j N= such

that 1 2 3I

II

j j jRose Lily Daisy

′= = =⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

λj

λ. We represent response for the population listing via the vector

( )R L Dy y y ′ or equivalently as ( )1 2 3y y y ′=y , where jy is response for subject labeled j , and

partition it such that ( )I II′′ ′=y y y , noting that I =y x is an 1n× vector. We define population

parameters corresponding to the average response as, 1

1 N

jj

yN

μ=

= ∑ , and ( )22

1

11

N

jj

yN

σ μ=

= −− ∑ . We

define the average response in the data set as 1

1 n

I jj

yn

μ=

= ∑ and the average response in the remainder as

1

1 N n

II jj n

yN n

μ−

= +

=− ∑ . We also define x Iμ μ= , ( )22

1

11

n

x s xs

xn

σ μ=

= −− ∑ and

( ) ( )22

1

11

N n

II j IIj n

yN n

σ μ−

= +

= −− − ∑ .

C10ed09v11.doc 4/28/2011 5:18 PM 4

The Prior Distribution (Defined as Response over Prior Listings) Subjects in the population may be listed in a different order which corresponds to different sequence of subjects. We refer to such a sequence as a prior listing. There are 1,..., !p P N= = prior listings. We define an alternative nested notation to index the prior listings. The highest level index

represents a set of n subject labels, and is indexed by 1,...,N

h Hn

⎛ ⎞= = ⎜ ⎟

⎝ ⎠. Within each set h , we index

permutations of the labels by 1,..., !m M n= = . Finally, within each permutation, m of each set h , we index permutations of the remaining subject labels not in set h by ( )1,..., !t T N n= = − . Associated with

each prior listing is a prior probability, hmtπ , such that 1 1 1

1H M T

hmth m t

π= = =

=∑∑∑ . The prior distribution is defined

as the distribution of response over these prior listings. An Initial Attempt To Index Permutations with Nested Notation We define sequences of subjects explicitly in terms of the population listing using permutation

matrices. The prior listing hmt is defined by ( ) ( )hhmt mtc

h

′⎛ ⎞= ⎜ ⎟⎜ ⎟′⎝ ⎠

δj jδ where ( ) ( )

( )

m n N nmt

tN n n

× −

− ×

′⎛ ⎞⎜ ⎟=⎜ ⎟′⎝ ⎠

v 0j j

0 w, h′δ is an

n N× indicator matrix that identifies the subjects in set h as the subjects in the vector ( )mth′δ j , c

h′δ is the

ortho-complement of h′δ such that hc

h

′⎛ ⎞⎜ ⎟′⎝ ⎠

δδ

is a permutation matrix, where ( )mtch′δ j identifies the subjects

not in set h , mv is an n n× permutation matrix with elements equal to zero or one with all rows and columns summing to one, and tw is an ( ) ( )N n N n− × − permutation matrix with elements equal to zero or one, with all rows and columns summing to one. In order to uniquely define h′δ for set h , we require

the subjects in ( )mth′δ j to be in the same order as subjects in ( )mtj . We also uniquely define c

h′δ by

requiring the order of the subjects in ( )mtch′δ j to be the same as the order of subjects in ( )mtj .

We label subjects in a prior listing by ( ) ( ) ( ) ( )( )1 2hmt hmt hmt hmt

Nj j j ′=j with elements ( )hmtj , for

1,..., N= . Response is given by ( ) ( )hmt mthc

h

′⎛ ⎞= ⎜ ⎟′⎝ ⎠

δy y

δ where ( ) ( )

( )

m n N nmt

tN n n

× −

− ×

′⎛ ⎞⎜ ⎟=⎜ ⎟′⎝ ⎠

v 0y y

0 w and

( ) ( ) ( ) ( )( )1 2hmt hmt hmt hmt

Ny y y ′=y with elements ( )hmty , for 1,..., N= . We define ( )1 n n N n× −

⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ I 0 ,

( )1c

N nN n n −− ×

⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ 0 I , 11 n′ =v I and 1 N n−=w I so that ( )111j j= = and ( )111 =y y . The prior distribution

of response is given by ( )

1 1 1

H M Thmt

hmth m t

I= = =

= ∑∑∑Y y , where hmtI is an indicator random variable with

C10ed09v11.doc 4/28/2011 5:18 PM 5

( )hmt hmtE Iξ π= (which we represent by pπ when we use the simpler index 1,...,p P= ). An example of the prior listings and prior distribution of response for the example is given in Table 1. There are ! 3! 6N = = sequences corresponding to the prior listings in Table 1. The listings are organized by the sets of n subjects identified by h , where these subjects are included among the first

1,...,n= positions in the prior listing. . The first row includes prior listings where the set of subjects corresponding to 1h = is { },Lily Rose . The second row includes prior listings where the set of subjects

corresponding to 2h = is { },Daisy Rose ; the third row includes prior listings where the set of subjects

corresponding to 3h = is { },Daisy Lily . Each prior listing is identified by the indices hmt (or more simply by the index p ). The order of the subjects in the prior listing (relative to the population listing) is indicated by the vector labels for j .

As an example, for the prior listing 221hmt = , the subjects are in the order 2 3 1j j j

Lily Daisy Rose

′= = =⎛ ⎞⎜ ⎟⎝ ⎠

.

Subject labels specific to this prior listing are given by ( )221j , 1,..., N= , or more simply by 1,..., N= . Response for subjects in a prior listing is represented by response indexed by the subject’s first initial, or

by subject labels specific to this prior listing as ( )hmty , 1,..., N= . When 1hmt HMT

π = for all 1,...,h H= ,

1,...,m M= and 1,...,t T= , the prior distribution is exchangeable. Under the assumption that the prior distribution is exchangeable, we represent the indicator random variable for the prior listing as the product of three independent indicator random variables,

, ,hmt h I m II tI I I I= . The indicator random variable hI has a value of one when the prior listing has the first

1,...,n= subjects in set h , and zero otherwise, where ( ) 11hP IH

= = for all 1,...,h H= . The indicator

random variable ,I mI has a value of one when the population subjects 1,...,j n= defined by the vector

( )m n N n× −

⎛ ⎞′⎜ ⎟⎝ ⎠

v 0 j are in order m , and zero otherwise, where ( ),11I mP IM

= = for all 1,...,m M= . The

indicator random variable ,II tI has a value of one when the population subjects 1,...,j n N= + defined by

the vector ( ) tN n n− ×

⎛ ⎞′⎜ ⎟⎝ ⎠

0 w j are in order t , and zero otherwise, where ( ),11II tP IT

= = for all 1,...,t T= .

C10ed09v11.doc 4/28/2011 5:18 PM 6

Table 1. Prior Listing and Response for Sequences of 3N = Subjects for Population f _____________________________________________________________________________ ( )1p = 1h = , 1m = , 1t = ( )1p = 1h = , 2m = , 1t = 111π 121π

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

111 11 111

111 1111 1

111 1112 2

111 1113 3

123

R R

L L

D D

j yj y yj j y y yj y yj y

⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

121 21 121

121 1211 1

121 1212 2

121 1213 3

213

L L

R R

D D


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 1

1

1 0 00 1 00 0 1

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 1

1

1 0 00 1 00 0 1

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

_____________________________________________________________________________ ( )3p = 2h = , 1m = , 1t = ( )4p = 2h = , 2m = , 1t = 211π 221π

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

211 11 211

211 2111 1

211 2112 2

211 2113 3

132

R R

L D

D L


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

221 21 221

221 2211 1

221 2212 2

221 2213 3

231

L L

R D

D R


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 2

2

1 0 00 0 10 1 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 2

2

1 0 00 0 10 1 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

_____________________________________________________________________________ ( )5p = 3h = , 1m = , 1t = ( )6p = 3h = , 2m = , 1t = 311π 321π

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

311 11 311

311 3111 1

311 3112 2

311 3113 3

312

R L

L D

D R


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

321 21 321

321 3211 1

321 3212 2

321 3213 3

321

L R

R D

D L


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 3

3

0 1 00 0 11 0 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 3

3

0 1 00 0 11 0 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

C10ed09v11.doc 4/28/2011 5:18 PM 7

Problems with The Initial Nested Notation Idea Inspection of Table 1 illustrates that the nested notation idea does not reproduce the !N permutations. The same permutation will result when 4p = and 5p = , and when 3p = and 6p = . Some other terms need to be included to create a nested notation that will re-produce the permutations. One possibility is to include another term that will pre-multiply the vectors for ( )311y and ( )321y . For example, pre-multiplying ( )311y by

0 1 00 0 11 0 0

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

will result in the vector ( )D R Ly y y ′ , while pre-multiplying ( )321y by

0 1 00 0 11 0 0

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

will result in the vector ( )D L Ry y y ′ . For this reason, it seems that we could represent the permutations as in Table 2.

C10ed09v11.doc 4/28/2011 5:18 PM 8

Table 2. Prior Listing and Response for Sequences of 3N = Subjects for Population f _____________________________________________________________________________ ( )1p = 1h = , 1m = , 1t = ( )1p = 1h = , 2m = , 1t = 111π 121π

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

111 11 111

111 1111 1

111 1112 2

111 1113 3

123

R R

L L

D D


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

121 21 121

121 1211 1

121 1212 2

121 1213 3

213

L L

R R

D D


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 1

1

1 0 00 1 00 0 1

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 1

1

1 0 00 1 00 0 1

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

_____________________________________________________________________________ ( )3p = 2h = , 1m = , 1t = ( )4p = 2h = , 2m = , 1t = 211π 221π

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

211 11 211

211 2111 1

211 2112 2

211 2113 3

132

R R

L D

D L


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

( ) ( ) ( )

( )

( )

( )

( )

( )

( )

221 21 221

221 2211 1

221 2212 2

221 2213 3

231

L L

R D

D R


⎛ ⎞ ⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

j y y

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 2

2

1 0 00 0 10 1 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 2

2

1 0 00 0 10 1 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

_____________________________________________________________________________ ( )5p = 3h = , 1m = , 1t = ( )6p = 3h = , 2m = , 1t = 311π 321π

( ) ( ) ( ) ( )

( )

( )

( )

3311 11 311 311

3

3111

3112

3113

312

c

R L D

L D R

D R L

jj y y yj j y y yj y y yj

′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠

⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠

δj y y yδ

( ) ( ) ( ) ( )

( )

( )

( )

3321 21 321 321

3

3211

3212

3213

321

c

L R D

R D L

D L R

jj y y yj j y y yj y y yj

′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠

⎛ ⎞=⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠

δj y y yδ

1

1

1 00 1

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 3

3

0 1 00 0 11 0 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ 2

1

0 11 0

1

⎛ ⎞′ = ⎜ ⎟⎝ ⎠′ =

v

w 3

3

0 1 00 0 11 0 0

c

⎛ ⎞′⎛ ⎞ ⎜ ⎟

=⎜ ⎟ ⎜ ⎟⎜ ⎟′⎝ ⎠ ⎜ ⎟⎝ ⎠

δδ

C10ed09v11.doc 4/28/2011 5:18 PM 9

At this point, it seems like we need to see if there is a way to generalize this nested representation to settings where 2n > and 3N > . As an example, we consider 2n = and 4N = next. First, we illustrate the permutations that would result if we simply expressed the vectors as

C10ed09v11.doc 4/28/2011 5:18 PM 10

Table 3. Examples of when 2n = and 4N = where 4m

t

′⎛ ⎞=⎜ ⎟′⎝ ⎠

v 0I

0 w

h mt 1 2 3 4 5 6

11 m

t

′⎛ ⎞⎜ ⎟′⎝ ⎠

v 00 w

R

L

D

V

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

R

L

D

V

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

R

D

L

V

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

R

V

L

D

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

L

D

R

V

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

L

V

R

D

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

D

V

R

L

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

12 m

t

′⎛ ⎞⎜ ⎟′⎝ ⎠

v 00 w

R

L

V

D

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

R LR R L V

D DL V V D

L RV L R R

V VD D D L

y yy y y yy yy y y yy yy y y yy yy y y y

⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

21 m

t

′⎛ ⎞⎜ ⎟′⎝ ⎠

v 00 w

L

R

D

V

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

L L RL R D

R D DV V V

D R LD L L

V V VR D R

y y yy y yy y yy y yy y yy y yy y yy y y

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠

22 m

t

′⎛ ⎞⎜ ⎟′⎝ ⎠

v 00 w

L

R

V

D

yyyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

L RL L R V

D DR V V D

R LV R L L

V VD D D R

y yy y y yy yy y y yy yy y y yy yy y y y

⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

TO HERE 7/20/2010

C10ed09v11.doc 4/28/2011 5:18 PM 11

The Joint Distribution of the Prior and a Subset (i.e. Sample) of the Prior We describe next the joint distribution of the prior and the sample. When describing this joint distribution, we use the single index, 1,..., !p P N= = , to refer to the prior listings. From each prior listing, we define a sample as a set of n subjects, and index the sample by { }* *

1 2, ,..., nh η= ⊆ , where

*η represents the set of all possible subsets of size n and * *1,...,N

h Hn

⎛ ⎞= = ⎜ ⎟

⎝ ⎠. Let us define the labels

for prior listing p by ( ) ( ) ( ) ( )( )1 2p p p p

Nj j j ′=j and the corresponding vector of response by

( ) ( ) ( ) ( )( )1 2p p p p

Ny y y ′=y . We define the samples via n N× indicator matrices, **

h′δ , where the

subjects in the sample set correspond to the subjects in the vector ( )** p

h′δ j with response ( ) ( )

* **

,p p

I h h′=y δ y . In

order to uniquely define **

h′δ for a sample set, we require the sample subjects in ( )

** p

h′δ j to be in the same

order as the subjects in prior listing p . Also, we define ( )

*1 n n N n× −

⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ I 0 . Also, define

**c

h′δ as the ortho-complement of **

h′δ such that *

*

**

hc

h

′⎛ ⎞⎜ ⎟′⎝ ⎠

δδ

is a permutation matrix, where ( )**

pch′δ j identifies

the subjects not in set *h , and where response is given by ( ) ( )*

**,

p pchII h′=y δ y . We uniquely define *

*c

h′δ by

requiring the order of the subjects in ( )**

pch′δ j to be the same as the order of subjects in prior listing p .

Let ( )*p

hS represent an indicator random variable associated with sample *h from prior listing p ,

such that ( )( ) ( )* *1p p

h hP S p= = where ( )

* 0ph

p ≥ for all * *1,...,h H= and ( )*

** 1

1H

ph

h

p=

=∑ . The probability ( )*p

hp

represents the probability of selecting sample set *h from the prior listing p . Using the probability of the prior listing and the sample selection probabilities, we represent the sample response vector over the joint

prior/sample distribution as ( ) ( )*

* **

,1 1

P Hp p

I p h I hp h

I S= =

= ∑∑Y y . Defining ( ) ( )*

* **

,1 1

P Hp p

II p h II hp h

I S= =

= ∑∑Y y , we represent

( ) ( )*

**

*

*

*1 1 *

P Hp pI h

p chp hII h

I S= =

′⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟ ′⎝ ⎠ ⎝ ⎠∑∑

Y δy

Y δ.

There are a total of !N

PH Nn

⎛ ⎞= ⎜ ⎟

⎝ ⎠ points in the joint prior/sampling distribution. In the example

when 3N = and 2n = , this corresponds to 18 possible sample points illustrated in Table 2. We represent these 18 points in the joint prior/sampling distribution by a partitioned prior response vector for each point, where the first part of the partitioned vector is response for the sample, and the second part is response for the remainder in the prior listing. The ! 6N = rows in Table 2 correspond to prior listings, either indexed by 1,...,p P= , or by the nested subscripts, 1,..., ; 1,..., ; 1,...,h H m M t T= = = . We group the prior listings using the level of h in the nested subscripts, separating the groups by a horizontal line.

C10ed09v11.doc 4/28/2011 5:18 PM 12

Columns in Table 2 correspond to different sample sets, *h , from a prior listing, where the sample elements are contained in the first 1,..., 2n= = rows of the response vector. The remaining rows of the response vector (which in the example corresponds to a single row since 1N n− = ) correspond to response for the elements not in the sample set.

C10ed09v11.doc 4/28/2011 5:18 PM 13

Table 2. Joint Prior and Sample Response for Population f with 3N = and 2n = . _____________________________________________________________________________ p h ( )mty ( )* 1ph = ( )* 2ph = ( )* 3ph =

_____________________________________________________________________________ ( )11y ( )1

1 1pπ ( )11 2pπ ( )1

1 3pπ

11 0 0

1 10 1 0

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

R

L

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

1111

1112

1113

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

1111

1113

1112

R

D

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

1112

1113

1111

L

D

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )21y ( )22 1pπ ( )2

2 2pπ ( )22 3pπ

11 0 0

2 20 1 0

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

L

R

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

1211

1212

1213

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

1211

1213

1212

L

D

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

1212

1213

1211

R

D

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________ ( )11y ( )3

3 1pπ ( )33 2pπ ( )3

3 3pπ

21 0 0

3 10 0 1

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

R

L

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

2111

2112

2113

R

D

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2111

2113

2112

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2112

2113

2111

D

L

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )21y ( )44 1pπ ( )4

4 2pπ ( )44 3pπ

21 0 0

4 20 0 1

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

L

R

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

2211

2212

2213

L

D

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2211

2213

2212

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2212

2213

2211

D

R

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________ ( )11y ( )5

5 1pπ ( )55 2pπ ( )5

5 3pπ

30 1 0

5 10 0 1

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

R

L

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

2211

2212

2213

D

R

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2211

2213

2212

D

L

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

2212

2213

2211

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )21y ( )66 1pπ ( )6

6 2pπ ( )66 3pπ

30 1 0

6 20 0 1

1

hmt

=⎛ ⎞

=⎜ ⎟⎝ ⎠ =

L

R

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( )

( )

( )

3211

3212

3213

D

L

R

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

3211

3213

3212

D

R

L

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )

( )

( )

3212

3213

3211

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________

C10ed09v11.doc 4/28/2011 5:18 PM 14

The Posterior Distribution of the Joint Prior/Sample Distribution, Given the Data Set The posterior distribution is the joint prior/sampling distribution, given the data set. The joint prior/sampling distribution for the example is illustrated in Table 2. In this distribution, for each point, the first part of the partitioned response vector is the sample, while the remainder is response for subjects not in the sample. When we condition on the data set consisting of { },h Rose Lily+ = , any point in the

joint prior/sampling distribution where the sample response represents response for the subjects in h+ is

included. Such responses are members of the set of responses , 1,...,Rm

L

xm M

x⎧ ⎫⎛ ⎞⎪ ⎪′ =⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

v . The points in the

joint prior/sample distribution that have positive probability in the posterior distribution, given the data set, are given in Table 3.

C10ed09v11.doc 4/28/2011 5:18 PM 15

Table 3. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = _____________________________________________________________________________ * 1h = * 2h = * 3h = _____________________________________________________________________________ ( )1

1 1pπ

11 1

1

hp m

t

== =

=

( )

( )

( )

1111

1112

1113

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )22 1pπ

12 2

1

hp m

t

== =

=

( )

( )

( )

1211

1212

1213

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________ ( )3

3 2pπ

23 1

1

hp m

t

== =

=

( )

( )

( )

2111

2113

2112

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )44 3pπ

24 2

1

hp m

t

== =

=

( )

( )

( )

2212

2213

2211

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________ ( )5

5 2pπ

35 1

1

hp m

t

== =

=

( )

( )

( )

3111

3113

3112

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )66 3pπ

36 2

1

hp m

t

== =

=

( )

( )

( )

3212

3213

3211

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________

C10ed09v11.doc 4/28/2011 5:18 PM 16

We summarize the resulting points in the posterior distribution in Table 4 using the nested notation for the probability of prior sample points. Table 4. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = . _____________________________________________________________________________ * 1h = * 2h = * 3h = _____________________________________________________________________________ ( )111

111 1pπ ( )211211 2pπ ( )221

221 3pπ

11 1

1

hp m

t

== =

=

( )

( )

( )

1111

1112

1113

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

2

3 11

hp m

t

== =

=

( )

( )

( )

2111

2113

2112

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

2

4 21

hp m

t

== =

=

( )

( )

( )

2212

2213

2211

R

L

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

( )121121 1pπ ( )311

311 2pπ ( )321321 3pπ

12 2

1

hp m

t

== =

=

( )

( )

( )

1211

1212

1213

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

3

5 11

hp m

t

== =

=

( )

( )

( )

3111

3113

3112

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

3

6 21

hp m

t

== =

=

( )

( )

( )

3212

3213

3211

L

R

D

yyy yy y

⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

_____________________________________________________________________________ The summary illustrates that there are two distinct points in the posterior distribution

corresponding to the response vector R

L

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

and L

R

D

yyy

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

. These response vectors differ by the order of the

subjects in the sample subset, indexed by 1,..., 2m M= = . Notice that for each cell in Table 4, the value of h and *h are identical. This is a consequence of introducing the nested notation for prior listings, and simplifies summing posterior probabilities in rows since we can replace ( )

*hmt

hp by ( )hmt

hp in the posterior distribution. Summing the posterior probabilities in each of the rows in Table 4, we define

( ) ( ) ( ) ( )1 2 31 1 2 2 3 3

1

Hhmt mt mt mt

mt hmt h mt mt mth

p p p p pπ π π π=

= = + +∑

where 1,..., 2m M= = and 1t = (since 1,..., 1t T= = ) and define *

1 1

mtmt M T

mtm t

ppp

= =

=

∑∑. Using this definition,

the posterior distribution is given in Table 5.

C10ed09v11.doc 4/28/2011 5:18 PM 17

Table 5. Posterior Distribution corresponding to the Joint Prior/Sample Response Given Response for the subjects in the Data Set { },h Rose Lily+ = .

m t *mtp Response

1 1 *11p ( )R L Dy y y ′

2 1 *21p ( )L R Dy y y ′

The Posterior Distribution Assuming Equal Probability of Subsets of Prior Listings In a Bayesian framework, we assign probabilities to different subsets, *h , of subjects associated with a prior listing to specify the joint prior/sampling distribution. These probabilities are not traditional ‘sampling’ probabilities, and have no relationship with the process that may have resulted in obtaining the observations for the data set. The probabilities are part of the Bayesian model; assignment of different probabilities corresponds to a different Bayesian model. We make the assumption that for any given prior

listing, the probability is equal for all subsets, such that ( )*

1ph

pH

= for all * *1,...,h H= and 1,...,p P= .

Under this assumption, we develop the expected value and variance of the posterior distribution assuming an exchangeable prior. In so doing, we generalize the representation of the posterior distribution to an example where the data set includes n subjects, and the population consists of N

subjects. In this context, the prior distribution is given by ( )

( ), ,

1 1 1

mH M T n N nhh I m II t c

h m t h tN n n

I I I× −

= = =− ×

′⎛ ⎞′⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠∑∑∑

v 0δY y

δ 0 w

where for a prior listing, ( ) ( )

( )

m n N np hc

h tN n n

× −

− ×

′⎛ ⎞′⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠

v 0δy y

δ 0 w.

We express the joint prior/sampling distribution by including the additional indicator random variable ( )

*p

hS for subset *h in prior listing p and re-arranging the prior listing so that the sample is

contained in the first 1,...,n= positions given by ( )*

*

*

*ph

ch

′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠

δy

δ. Including the sampling, we represent

random variables for the joint prior/sampling distribution as

( ) ( )

( )

**

** *

*

, , *1 1 1 1

mH M T H n N nhmt hhh I m II t c ch

h m t hh th N n n

I I I S× −

= = = = − ×

′⎛ ⎞′ ′⎛ ⎞⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠∑∑∑ ∑

v 0δ δY y

δ δ 0 w.

The possible realizations of Y in the joint prior/sampling distribution are given by the vectors

( )

( )

* *

* *

* *

* *

m n N nh h m Ih hc cc c

h h t IIth hN n n

× −

− ×

′⎛ ⎞′ ′′ ′ ′⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′′ ′ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

v 0δ δδ δ v yy

δ δδ δ w y0 w

for 1,...,h H= 1,...,m M= , 1,...,t T= and * *1,...,h H= . The posterior distribution contains the points in the joint distribution where the sample is given by

C10ed09v11.doc 4/28/2011 5:18 PM 18

the subjects in h+ . Recall that we have defined response for subjects in the population such the subjects whose response is in Iy are in the set h+ . The vectors m I′v y , 1,...,m M= , represent permutations of these subject’s response. Since response corresponding to the sample in the joint prior/sampling distribution is contained in the first 1,...,n= elements of the realization of Y , only realizations of Y where the sample is equal to m I′v y for 1,...,m M= are in the posterior distribution. This implies that

( )

( )

*

* *

*

* *

n n N nhhc c c c

h hh hN n n

× −

− ×

⎛ ⎞′ ′⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠

I 0δ δδ δ 0 δ δ

.

Conditioning on the data set implies that h and *h satisfy this property, and defines points in the posterior distribution.

It is easy to define *

*

*

*hc

h

′⎛ ⎞⎜ ⎟⎜ ⎟′⎝ ⎠

δδ

and hc

h

′⎛ ⎞⎜ ⎟′⎝ ⎠

δδ

that will satisfy this condition. One definition is that

*

*

*

*hh

c chh

′ ′⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠

δ δδ δ

, which results in *

*

*

*hh

Nc chh

′ ′⎛ ⎞⎛ ⎞=⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠

δ δI

δ δ. Under this assumption, the posterior distribution is

given by

( ) ( )

( )

( )

**

** *

*

, , *|1 1 1 1

, ,1 1 1

mH M T H n N nhmt hhh I m II t c ch h h

h m t hh th N n n

H M Thmt m I

h I m II t hh m t t II

I I I S

I I I S

+

× −

== = = = − ×

= = =

′⎛ ⎞′ ′⎛ ⎞⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′ ′⎝ ⎠⎝ ⎠ ⎝ ⎠′⎛ ⎞

= ⎜ ⎟′⎝ ⎠

∑∑∑ ∑

∑∑∑

v 0δ δY y

δ δ 0 w

v yw y

,

since ( )* 1hmt

hS = when *h h= and zero otherwise. Let us express this as

( )

( )

( )

( )

, ,1 1 1

|

, ,1 1 1

, ,1 1 1

, ,1 1 1

H M Thmt

h I m II t h m Ih m t

h h H M Thmt

h I m II t h t IIh m t

M T Hhmt

I m m I II t h hm t h

T M Hhmt

II t t II I m h ht m h

I I I S

I I I S

I I I S

I I I S

+

= = =

=

= = =

= = =

= = =

⎛ ⎞′⎜ ⎟⎜ ⎟=⎜ ⎟

′⎜ ⎟⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞′⎜ ⎟⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠⎜= ⎜ ⎛ ⎞⎛ ⎞⎜ ′ ⎜ ⎟⎜ ⎟⎜ ⎝ ⎠⎝ ⎠⎝ ⎠

∑∑∑

∑∑∑

∑ ∑ ∑

∑ ∑ ∑

v yY

w y

v y

w y

⎟⎟⎟⎟

.

Now ( )

1

1H

hmth h

h

I S=

=∑ , ,1

1T

II tt

I=

=∑ , and ,1

1M

I mm

I=

=∑ . As a result,

,1

|

,1

M

I m m Im

h h T

II t t IIt

I

I+

=

=

=

⎛ ⎞′⎜ ⎟⎜ ⎟=⎜ ⎟

′⎜ ⎟⎝ ⎠

∑

∑

v yY

w y.

The implication of this is that the posterior distribution is ( )| | |h h I h h II h h+ + += = =′ ′=Y Y Y , where

|I h h+ =Y is

C10ed09v11.doc 4/28/2011 5:18 PM 19

1n× vector representing a permutation distribution of response for subjects the data set, and |II h h+ =

Y is an

independent ( ) 1N n− × vector representing a permutation distribution of response for remaining subjects in the population. Taking the expected value over these permutation distributions,

( )|x n

h hII N n

Eξ

μμ+ =

−

⎛ ⎞= ⎜ ⎟⎝ ⎠

1Y

1, while ( ) ( )

( )

2

2|var

x n n N n

h hII N nN n n

ξ

σ

σ+

× −

=−− ×

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

P 0Y

0 P.

We can use these results to develop the expected value and variance of the mean response for the

data set, |

1n I h h

Yn + =

′= 1 Y , where ( ) xE Yξ μ= , and ( )var 0Yξ = .

A Simple Application. We consider a simple example and explicitly evaluate the posterior probability. Suppose that

prior listings are assigned equal probabilities, such that 1p P

π = for all 1,...,p P= . Also, assume that for

any prior listing p , each sample set is equally likely such that ( ) 1php

H= for all 1,...,h H= and

1,...,p P= . In this setting, the probability of each sample set that equals the data set is given by ( ) 1p

p hpPH

π = . Since there are P prior listings, 1

1 1P

pT

PH H=

= =∑ , so that * 1hmp

P= for all 1,...,h H= ,

1,...,m M= . Thus, with these assumptions, the posterior distribution is a uniform distribution. Sampling from the Population Finite population sampling was defined in a general manner by Godambe (1955) who associated a probability with each possible sequence of n subjects from a finite population. Subsequently, Godambe and Joshi (1965) showed that it was sufficient to define samples as distinct sets of subjects. We use sample sets to define finite population sampling based on a population listing of subjects with response

jy , 1,...,j N= .

Let { }* *1 2, ,..., nh j j j η= ⊆ index distinct subsets of subjects (i.e. samples) from the population,

where *η represents the set of all possible subsets of size n and * *1,...,h H= . We define these samples via the n N× indicator matrices, *

*h′δ , where the subjects in the sample set correspond to the subjects in

the vector **

h′δ j with response * *

* *h h

′=y δ y . In order to uniquely define **

h′δ for a sample set, we require the

sample subjects in **

h′δ j to be in the same order as the subjects in the population. Also, we define

( )*

1 n n N n× −

⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ I 0 .

C10ed09v11.doc 4/28/2011 5:18 PM 20

As an example, when 3N = and 2n = , there are * 33

2H ⎛ ⎞

= =⎜ ⎟⎝ ⎠

distinct sample sets which we

define by *1

1 0 00 1 0⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ , *2

1 0 00 0 1⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ and *3

0 1 00 0 1⎛ ⎞′ = ⎜ ⎟⎝ ⎠

δ . When * 1h = , this set of subjects is

{ }1, 2 ; the set of subjects corresponding to * 2h = is { }1,3 ; and the set of subjects corresponding to * 3h = is { }2,3 .

Let *hS represent an indicator random variable associated with sample *h from the population

such that ( )* *1h h

P S p= = where * 0h

p ≥ for all * *1,...,h H= and *

** 1

1H

hh

p=

=∑ . The probability *hp

represents the probability of selecting sample *h from the population. Using this probability, we

represent a random vector corresponding to the sample response as *

* **

*

1

H

h hh

S=∑ y .

The previous definitions uniquely define a response vector for each distinct sample set. The order of the subjects in the response vector was arbitrarily set to match the subject order in the population. However, any order is possible for the subjects in the set. This means that we could represent response for sample *h by any of the vectors in the set { }*

* , 1,...,m hm M′ =v y where mv is an n n× permutation matrix

with elements equal to zero or one, all rows and columns summing to one, and where we define 11 n′ =v I . Which ordering of response (identified by m ) is used in the response vector * *

* *mh m h′=y v y for sample *h

determines interpretation of individual responses, since it identifies which subject is associated with a response. The ordering does not impact interpretation of summary measures of response for the sample, such as the sample mean, total, or maximum response. Let us assume that it is not necessary to retain identifiably of individual subjects in set *h in the response vector for set *h . This implies that it is not necessary to know in which order (identified by m ) responses are listed in the response vector. We implement the assumption that we do not need to identify

subjects in the response vector for set *h by defining ( )*

* ** *

1

M hmh h m

m

S=

= ∑Y y , where the indicator random

variable ( )*hmS that has a value of one when permutation m represents the ordering of response, and zero

otherwise. We represent ( ) ( )* * *1h hm mP S p⎛ ⎞= =⎜ ⎟

⎝ ⎠, and the sample response vector as

*

* **

*

1

H

I h hh

S=

= ∑Y Y .

Subsequently, we assign equal probability to each order, assuming that ( )* * 1hmp

M= for all 1,...,m M=

and * *1,...,h H= . We develop an expression for the expected value and variance of IY next. In so doing, we introduce notation to represent random variables corresponding to subjects in the remainder of the population. This simplifies the calculations, and facilitates relationships with other sampling models.

C10ed09v11.doc 4/28/2011 5:18 PM 21

Using earlier definitions, ( )*

*

* **

*

11

H M hI m mh h

mh

S S==

′ ′= ∑∑Y v δ y . We introduce a similar vector of remaining

random variables as ( )*

*

* **

*

11

H T h cII t th h

th

S R==

′ ′= ∑∑Y w δ y where *

*hc

h

′⎛ ⎞⎜ ⎟′⎝ ⎠

δδ

is a permutation matrix, *ch′δ j identifies

the subjects not in set *h , tw , ( )1,..., !t T N n= = − is an ( ) ( )N n N n− × − permutation matrix with

elements equal to zero or one, all rows and columns summing to one, and elements of *ch′δ are defined so

that the order of the subjects in *ch′δ j is the same as the order of subjects in the population. The indicator

random variable ( )*htR has a value of one when permutation t represents the ordering of subjects in

**c

t h′ ′w δ y , and zero otherwise such that ( ) ( )* * *1h h

t tP R p⎛ ⎞= =⎜ ⎟⎝ ⎠

. We subsequently assume that ( )* * 1htp

T= for

all 1,...,t T= and * *1,...,h H= . With these assumptions, ( )

( )

( )

( )

*

**

*** *

*1

*1

1

M hm mH n N nmI h

ch T hII h ht tN n n t

SS

R

× −=

=

− × =

⎛ ⎞′⎜ ⎟ ′⎛ ⎞⎛ ⎞ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠′⎜ ⎟⎝ ⎠

∑∑

∑

v 0δY

yδY

0 w.

We make the additional assumption that all sample sets are equally likely, such that *

1h

pH

= for

all * *1,...,h H= . With this assumption, plus the earlier assumptions that ( )* * 1hmp

M= for all 1,...,m M=

and * *1,...,h H= and ( )* * 1htp

T= for all 1,...,t T= and * *1,...,h H= , I

II

⎛ ⎞⎜ ⎟⎝ ⎠

YY

represents response for a

random permutation of subjects in the population. Let the subscript p in pE denote expectation with

respect to such random permutations. Standard calculations result in ( )p I nE μ=Y 1 , and

( ) 2 1varp I n nNσ ⎛ ⎞= −⎜ ⎟

⎝ ⎠Y I J . This is the finite population sampling model discussed by Stanek and Singer

(2004) that was extended to two-stage cluster sampling and the finite population mixed model by Stanek and Singer (2004). Estimating the Population Mean based on the Sample Mean from a Finite Population Sampling Model We use the finite population sampling model to evaluate the expected value and variance of an

estimator of the population mean, μ , given by the sample mean, 1n IY

n′= 1 Y . The expected value is

given by ( )pE Y μ= , indicating that the sample mean is an unbiased estimator of the population mean. The variance of the sample mean is given by

C10ed09v11.doc 4/28/2011 5:18 PM 22

( )2

var 1pnY

n Nσ ⎛ ⎞= −⎜ ⎟

⎝ ⎠.

Given the data set *h , we substitute the observed response, x , for IY . Under the assumptions of the

finite population sampling model, the estimate of the population mean is given by 1nx

n′= 1 x . Under the

model assumptions, since ( )pE Y μ= , we say that the estimator is an unbiased estimator of μ , where the estimate x is a realized value of the estimator, given the data set.

C10ed09v11.doc 4/28/2011 5:18 PM 23

References Bickel, P.J. and Doksum, K.A. (1977). Mathematical statistics: Basic Ideas and Selected Topics. Holden-Day, Inc. San Francisco, Ca. Ericson, W.A. (1969). Subjective Bayesian Models in Sampling Finite Populations, JRSS B 131:195-233. Ericson, W.A. (1988). “Bayesian inference in finite populations.” P.R. Krishnaiah and C.R. Rao, eds, Handbook of Statistics, Vol 6. Elsevier Sciences Publishers, B.V. (1988) 213-246.

predicting the mean using a sampling model and an ... · we focus our discussion on es...

Documents