a characterisation of polya tree distributions

ELSEVIER Statistics & Probability Letters 31 (1997) 163-168

A characterisation of Polya tree distributions

Stephen Walker a,,, Pietro Muliere b

a Department of Mathematics, Imperial Colleoe, 180 Queen's Gate, London SW7 2BZ, UK b Dipartimento di Economia Politica e Metodi QuantitativL Universita di Pavia, S.Felice, 5, 27100 Pavia, Italy

Received May 1995; revised November 1995

Abstract

A Polya tree is characterised by a special class of predictive probabilities.

Keywords: Polya trees; Predictive distribution

AMS subject classification: Primary 60K25; secondary 68M20; 90B22

I. Introduction

Polya tree distributions for random probability measures were recently studied by Mauldin et al. (1992) and Lavine (1992,1994) though an original description was given in Ferguson (1974).

Trees of Polya urns are used to generate sequences of exchangeable random variables defined on some space 12. By a theorem of de Finetti, each such sequence is a mixture of independent and identically distributed variables and the mixing measure can be viewed as a prior on the space of probability measures on f2. This measure has been characterised by Mauldin et al. (1992) using the notion of strategy.

In this paper we give a charaeterisation of the Polya tree prior using only a special class of predictive probabilities. We also highlight some particular Polya trees of interest.

2. Preliminaries

Let f2 be a separable measurable space and/1 = (Bo,Bl,Boo, Bol . . . . ) a binary tree partition of f2.

Definition 1 (Lavine, 1992). A random probability measure F on t2 is said to have Polya tree distribution, or a Polya tree prior, with parameter (H, ~¢), written F ,,~ PT(II, ~ ) , if there exists non-negative numbers ~¢ = (s0, ~l,ct0o . . . . ) and random variables ~ = (Y0, Y0o, Ylo . . . . ) such that the following hold:

(i) all the random variables in ~¢ are independent;

* Corresponding author.

0167-7152/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved Pl l S0167-71 5 2 ( 9 6 ) 0 0 0 2 8 - 4

164 s. Walker. P. Muliere I Statistics & Probability Letters 31 (1997) 163-168

(ii) for every e, Y,o "~ beta(~o, ~,1 ); (iii) for every m = 1,2 . . . . and every e = el ...era,

F(Be,...,m) = Ye,...e,_,o 1-[ (1 - Yel...e,_10) , t=l;gt=0 t=l;~t=l

where the first terms, i.e. for t = 1, are interpreted as Y0 and 1 - Y0 and each 13t E { 0 , 1 }.

If X ,,~ betas( a, b, c ) then the density function for X, defined on (0,c), is given, up to a constant o f proportionality, by

px(xla, b,c) ocxa-l(c--x)b- l l (o ,c)(X) , a,b,O < c<~ l,

where I represents the indicator function. An alternative definition o f a Polya tree distribution is based upon such random variables.

Definition 2. A random probability measure F on 12 is said to have Polya tree distribution, or a Polya tree prior, with parameter (/7,~¢), written F ,., PT( I I , sY), if there exists non-negative numbers ~¢ = (~o,:q,~0o . . . . ) and random variables Y" = (Xo, X1,Xoo . . . . ) such that the following hold:

(i) for every e, X~olX~ ,'~ betas(~o,o~l,X~) with X0 "~ betas(~0,oq); (ii) X,1 = X~ - X~o;

(iii) for every e, F(B~) = X~.

Thus we have X0 = Y0, Xl = 1 - Yo, Xoo = YoYoo, X01 = Yo(1 - Yoo) and so on according to (iii) o f Definition 1. It is straightforward to show that these two definitions are equivalent. It is also helpful at this point to introduce Y1 = 1 - Yo, Yol = 1 - Yoo and so on.

3. Resnlt

Throughout this section let 12 be a separable measurable space and / / = (Bo, B1 . . . . ) a fixed binary tree partition o f 12. Let F be a random probability distribution and {On},>_.l an independent and identically distributed sequence o f random variables defined on 12 and with distribution F. This implies that

P(O,+~ E BIOl . . . . . O , ) = E(F(B)IO~ . . . . . 0 , ) ,

for all n and any set B. Let the distribution o f F be denoted by Q and the support o f Q be denoted by S. Let rrm denote the

partitions at the mth level and which contains r = 2 m partitions: Bo...o = B,,1 up to Bl...1 = B,nr and let mt represent the tth partition at level m. Before the main result we need the following Lemmas.

Lemma 1. For any m>~ 1 and positive integers kl . . . . . kr (on removing the subscript m f rom B,,j),

E(F(B1 )k , . . . F(Br)k~) = {Fo(B1 )FI (BI[1 E Ol ) . . . Fk,-1 ( O l ]kl - 1 E B1 )

Fk,(B21kl E B1)...Fk,+k2-1(B2]kl E Bl,k2 - 1 E B2)

Fk,+...+£_,(Br[kl E B1 . . . . . kr-1 G Br-1)...Fk,+...+kr-l(Br[kl C B1 . . . . . kr - I E Br)},

S. Walker, P. Muliere I Statistics & Probability Letters 31 (1997) 163-168 165

where

F,(Bjlnl E Bl . . . . . nj E Bj)

is the posterior mean of F(Bj) , given iid observations Ol . . . . . On from F, and ni denotes the number of observations in Bi (n = nl + . . . + nj).

Proof. Now

E(F(B1 )k, . . . F(Br)k,) = f s F(B1 )k~... F(Br)k,O(dF). (1)

Using the results in Lo (1991) then (1) can be written as

fSfB1 (j=~2g(gj)kj) e(gl)k'-le(dX1)a(dF)

=fB ' fS ( ~IF(Bj)kjlF(B1)k'-IQ(dFIOIEBI)FO(dxl)'j=2 ]

Here F0 is the prior expectation of F. Continuing in this fashion then (1) can be written as stated in the Lemma.

Lemma 2. Let F ,,~ Q and G ,,, PT(II , ~/). If, for all m >1 1 and non,negative integers kl . . . . . kr,

E(F(B1 )k, . . . F(Br )k" ) = E(G(B1 )k, . . . G(Br )k" ),

then

F ,~ PT(FI, ~¢).

Proof. From Lemma 1 of Lo (1991) it follows that, for all m~> 1,

~ ( F ( B 1 ) . . . . . F(Br)) = ~ (G(B1) . . . . . G(Br)),

which implies that, for all e,

~(F(B~o ),F(B~o ) -4- F(B~l )) = ~ ( G(B,o ), G(B~o ) + G(B~l )),

and hence

~f( F(B~o ),F(B~ ) ) = .~f( G(Bco ), G( B~ ) ).

From Definition 2 it then follows that F ~ PT(H, ~ ) , completing the proof. []

Lemma 3. F ,,~ PT(H, s¢) if, and only if, there exists non-negative numbers z~¢ = (~o, ~1 . . . . ), such that, for all n = O , 1,2 . . . . . m>ll and e = e l . . . e m ,

P(On+l E B~IO~ . . . . . 69.) = ~, +n~, ~,~: + n ~ 2 ~.l...l?.r a +n~,...~,, (2) O~ 0 + Ot 1 + n ~lO +Otel l + ne~ "'" c%...e._~ 0 + %.. .em-iI + ne~...~m_~ '

where n~ is the number o f observations from Ol . . . . . On in B~.

166 S. Walker, P. Muliere I Statistics & Probability Letters 31 (1997) 163-168

II

Proof. Firstly, if F ,,~ P T ( H , ~ ) , then F1¢91 . . . . . On "~ PT( I I ,~[¢g l . . . . . On) where 1Ol . . . . . o . is given by ~eIO1 . . . . . On = ~e + n~ and ne is the number of observations from O1 . . . . . O~ in B~. Then clearly, from Definition 1, E(F(Be)[O! . . . . . On) is given by (2).

Now consider G ,,~ P T ( H , ~ I ) and, for all e, let Xe = G(Be). For each t = 1 . . . . . m and each / = 1 . . . . . 2 t, let C~l t be all the integers contained in { ( I - 1 )2m-t+ 1 . . . . . 12m-t}. Clearly each It defines a unique e = el.../~t for which B~ = [.Jj~, Bj and denote this e by fit. Also let cglt0 define eO tt and so on. It is then straightforward to show, from Definition 1, that

E ( x k I ' . . . X ) r ) = E 1-I Yqlt~el ' , , ( 3 )

t=l 1=1 /

where qlt = ~]~jE%, kj. From the independence of the Y~0's, as given by Definition 1, then (3) can be rewritten as

f i 2t 1 -~ ql ° E{Y~,, (1 - Ye,)q"}, (4)

t=l 1=1:1 odd

where, if l is odd, then q° t = qlt and q~t = ql+l t, and Y~ol, ~ beta(~0, ~,l). The aim now is to show that E ( F ( B l ) k~.. .F(Br) k~) is given by (4) which implies, from Lemma 2, that

F ,~ PT(/7, d ) . This will be achieved by using Lemma 1 and (2). Select a particular e which defines a member o f / 7 below or on level m. This defines a particular pair

(~0,~el) and also a particular c~lt = ~g~. Then an observation in Be could also be observed in Bj for each j E c~,. From Lemma 1 the only terms in (1) to include either ~e0 or ~,1, or both, are given by

I ~ Fk,+...+ky_,+p_l(Bj[kl ~ B 1 . . . . . kj_ 1 $ Bj_~, p - 1 ~ Bj).

This can

p=l

be written, using (2), as

IIII jECg~o p=l

Likewise, the

~o + ~j>~se~¢oo k~ + p - 1 kj ~el + ~j>~se~¢,, ks + p - 1 a . o + ~ ; - + ~ k s + p - 1 x H l " Io~ ,o+ae l+Y]~j>~s~cks+p 1

jE~f~l p=l

only term in (4) containing either ~0 or ~ l , or both, is given by

(5)

F(~ro + ~1 ) F(~eo + q°to) F ( ~ I + q~to) (6) F(cteo + ctel + q°to + q~to) F ( ~ o ) r(~el ) '

where F(.) denotes the gamma function and qtto = ~-~j~l,o kj. That (5) and (6) are equal follows trivially.

This device can then be used to show that E(F(B1 )k , . . . F(Br)kr) = E(X(~ . . .Xkr ) , completing the proof. []

Eq. (2) defines the sequence of predictive distributions and therefore the joint distribution of O1, O2 . . . . is defined, is symmetric and presentable (Hewitt and Savage, 1955) and must have a unique de Finetti measure which is easily seen to be a Polya tree.

The result (Theorem 1) characterises the distribution of F using a special class of predictive probabilities. It is important to note that the class does not (directly) define the joint distribution of O1, O2 . . . . and hence the inappropriateness of the Representation theorem which could have been used to prove Lemma 3.

Let rzt, for l = 1,. . . ,m, represent the number of observations in B,,...~, (where B~,...~, = Brat ) given that there are nj observations in Bmj (for j = 1,. . . , t).

S. Walker, P. Muliere / Statistics & Probability Letters 31 (1997) 163-168 1 6 7

Theorem 1. F ~ P T ( I I , ~ ) if, and only if, there exists non-negative numbers ~¢ = (~0,al . . . . ) such that, for all m = 1,2 . . . . . t E {1 . . . . . 2 m } and non-neoative inteoers n~ . . . . . nt,

P(~gn+l E Bmtlnl ~ Bml . . . . . tlt ~ Brat) : ot~ --}- r l t o~el~ 2 --~ r2 t O~eq...e,n "4- rmt

O~ 0 -]- O~ 1 -4- n ~e~o + ~ I "~ r l t O~l...Sm_lO + ~l.,.Sm_ll + rm-- l t • (7)

Proof. The proof follows from Lemmas 1 and 3 by noting that the posterior means required for (5) are all of the type (7) which was established in Lemma 1. []

4. Special Polya trees

Let ~ = 7mo~(Be), for 7m > 0, whenever e = el .••era defines a set at level m. Then (2) becomes

7mo~(B~) + n(Bs) m-1 + n(n6|...e, ) 71~(f2) + n 11 B ' 7k+l~(B~, ~ , )+n( , , . . .~ , )

k = l "'"

where n(B,) = n,• For such a specification it is clearly seen that the prior value of P(01 E Be) is given by ~(B~)/~(f2). No generality is lost now by taking ~(.) to be a probability measure, that is, a(12) = 1. Such a specification reduces the problem of defining all the separate ~'s to a more manageable level. The interpretation of the 7's can be taken to be the relative degrees of belief in ~(B~) compared with n(Bs). Furthermore the 7's can be chosen so that the random distributions are continuous or absolutely continuous with probability 1. The Dirichlet process arises if 7m = 7 for all m.

Another interesting result is obtained if the structure, H, of the Polya tree is chosen in a particular way. Let, for some ds > 0, B0 = [0,ds), B1 = [ds, oo), Bl0 = [ds,2ds), Bll = [2ds, oo) and so on. The partitions B0, B10 . . . . are not partitioned further. Then let B~ = BL.10, with k - 1 ones, be written as Bk-10 and similarly the same representation for Bk-11. Let, for some probability measure G on f2, ~k-10 = 7k-lG(Bk-1o) and ~k-ll = 7~_lG[kds, o<~) where 7k-1 = 7 ( ( k - 1/2)ds) for some positive function 7(.). Now define the

o o k - 1 independent random variables {Yk}k=l by Yk "~ beta(~k-10,~k-ll) and let Ark = Yk I-It=l (1 - Yt). Essentially

here is the beta process of Hjort (1990), with parameters c(.) = 7(.)G[.,oo) and Ao = fCo)dG(s)/G[s, oo), studied at an incremental jump time of ds. This is easily seen by noting that, if F(t) = ~ke~<tX~, then

F(t)= 1 - H (1--Yk). kds<~t

This is the F obtained from the beta process which is defined, again at the incremental jump time of ds, by

A(O)=O and A ( t ) = Z Yk. kds<.t

In this way it can be seen that Polya trees also generalise the beta process.

Acknowledgements

This work was completed while the second author was visiting at the Department of Mathematics, Imperial College. The authors are grateful to the referee who enabled us to clarify the paper•

168 S. Walker, P. Muliere I Statistics & Probability Letters 31 (1997) 163-168

References

Ferguson, T.S. (1974), Prior distributions on spaces of probability measures, Ann. Statist. 2, 615-629. Hewitt, E. and L.J. Savage (1955), Symmetric measures on cartesian products, Trans. Amer. Math. Soc. 80, 470-501. Hjort, N.L. (1990), Nonparametric Bayes estimators based on beta processes in models for life history data, Ann. Statist. 18, 1259-1294 Lavine, M. (1992), Some aspects of Polya tree distributions for statistical modelling, Ann. Statist. 20, 1222-1235. Lavine, M. (1994), More aspects of Polya tree distributions for statistical modelling, Ann. Statist. 22, 1161-1176. Lo, A.Y. (1991), A chamcterisation of the Dirichlet process, Statist. Probab. Lett. 12, 185-187. Mauldin, R.D., W.D. Sudderth and S.C. Williams (1992), Polya trees and random distributions. Ann. Statist. 20, 1203-1221.

a characterisation of polya tree distributions

Documents