bayesian nonparametric statistics and nonparametric priors · nonparametric priors jaeyong lee...

74
Bayesian Statistics Dirichlet Process NTR process Species sampling model Bayesian Nonparametric Statistics and Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011

Upload: others

Post on 10-Mar-2021

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Bayesian Nonparametric Statistics andNonparametric Priors

Jaeyong Lee

Department of StatisticsSeoul National University

August 13, 2011

Page 2: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Outline

Bayesian Statistics

Dirichlet Process

NTR Process

Species Sampling Models

Page 3: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Thomas Bayes

Page 4: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Who is Thomas Bayes?

Thomas Bayes was an English Presbyterian minister andmathematician.

He was born in 1702 (this year is not certain) and diedApril 17, 1761.

In 1763, his paper titled An Essay Toward Solving aProblem in the Doctrine of Chances was publishedposthumously in Philosophical Transactions of the RoyalSociety. In this paper, he laid out the binomial probabilityestimation using celebrated Bayes’ theorem.

For more information, see http://www-gap.dcs.st-and.ac.uk/ history/Mathematicians/Bayes.html

Page 5: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Dirichlet processes

Page 6: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Bayesian Statistics in a NutshellBayesian Inference involves three quantities.

The Prior is the probability measure on the parameterspace Θ which reflects the statisticians degree of knowledgeabout θ before he/she sees the data

θ ∼ π(θ).

Model:x|θ ∼ f(x|θ).

The posterior is the conditional probability measure of θgiven the data x and reflects the knowledge after he/shesees the data

θ|x ∼ π(θ|x).

In most applications, the posterior is a non-standarddistributions and computational methods such as Markovchain Monte Carlo needs to be employed to extractinformation from the data.

Page 7: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

A Simple Nonparametric Problem

Suppose X1, X2, . . . , Xn|F ∼ F and

F ∈M(R) = { all probability measures on R}.

To tackle this nonparametric problem in a Bayesian way,we need a class of priors on M(R) or a class of probabilitymeasures on the space of probability measures.

The Dirichlet process, neutral to the right (NTR) process,and species sampling models are probability measures onM(R) developed for this purpose.

Page 8: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Dirichlet Process on R (Ferguson 1973)

Let α be a finite nonnull measure on (R,B), where R is thereal line and B is the class of Borel sets.

We say that the random probability measure P on Rfollows the Dirichlet process with parameter α, if for everypartition B1, . . . , Bk of R by Borel sets,

(P (B1), . . . , P (Bk)) ∼ Dirichlet(α(B1), . . . , α(Bk)).

Notation:P ∼ DP (α).

Page 9: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Properties of Dirichlet process

Page 10: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Conjugacy

Suppose

P ∼ DP (α)

X1, . . . , Xn|P ∼ P.

Then,

P |X1, . . . , Xn ∼ DP (α+

n∑i=1

δXi).

Page 11: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Marginalization Property(Blackwell and MacQueen 1973)

Suppose

P ∼ DP (α)

X1, X2, . . . |P ∼ P.

Then, marginally (X1, X2, . . .) forms a Polya urn sequence:

X1 ∼ α/α(X )

Xn+1|X1, . . . , Xn ∼α+

∑ni=1 δXi

α(X ) + n, n ≥ 1.

Page 12: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Sethuraman’s RepresentationLet α be a finite nonnull measure on X and let

θ1, θ2, . . .iid∼ Beta(1, α(X ))

Y1, Y2, . . .iid∼ α/α(X )

and they are independent of each other. Define (pi) by thestick-breaking process

p1 = θ1

p2 = θ2(1− θ1)

. . .

pn = θn

n−1∏i=1

(1− θi)

. . .

Then,

P =

∞∑i=1

piδYi ∼ DP (α).

Page 13: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Gamma process representation

Let α be a nonnull finite measure on [0,∞). Let S(t), t ≥ 0 bethe gamma process on [0,∞) with S(t) ∼ Gamma(A(t), 1),where

A(t) = α[0, t].

Then,

F (t) :=S(t)

S(∞)

is the cdf of DP (α).

Page 14: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Other Properties of DP

Let P ∼ DP (α). Then, P is discrete with probability 1.

The support of DP can cover the whole space of probabilitymeasure:

supp(DPα) = {P : supp(P ) ⊂ supp(α)}.

Page 15: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Mixtures of DP

Three equivalent statistical models.

Model 1

P ∼ DP (α)

X1, X2, . . . , Xn|Piid∼

∫h(x|θ)dP (θ),

where h(x|θ) is a probability density function withparameter θ

Model 2

P ∼ DP (α)

θ1, θ2, . . . , θn|Piid∼ P

Xi|θiind∼ h(x|θi), 1 ≤ i ≤ n.

Page 16: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Model 3.

(θ1, θ2, . . . , θn) ∼ Polya(α)

Xi|θiind∼ h(x|θi), 1 ≤ i ≤ n,

where Polya(α) is the marginal distribution of observationsequence from a random probability measure following DP.

Page 17: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Remarks on Mixtures of DP

Mixtures of DP has been the center piece of Bayesiannonparametric modeling.

Mixtures of DP was invented to fit continuous density:

P ∼ DP (α)

X1, X2, . . . , Xn|Piid∼

∫h(x|θ)dP (θ),

where h(x|θ) is a probability density function withparameter θ

Page 18: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Mixtures of DP provides a natural statistical model forclustering.

(θ1, θ2, . . . , θn) ∼ Polya(α)

Xi|θiind∼ h(x|θi), 1 ≤ i ≤ n,

Page 19: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Partition Structure

[n] = {1, 2, . . . , n}xn = (x1, x2, . . . , xn)

Define an equivalence relation ∼ on [n] by

i ∼ j ⇐⇒ xi = xj .

The partition on [n] generated by ∼ is called the partitioninduced by xn and denoted by

Π(xn) = {A1, A2, . . . , Ak}.

x∗i is the unique value of xj in class Ai.

Page 20: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

xn = (x1, x2, . . . , xn) can be represented byI Π(xn) = {A1, A2, . . . , Ak} andI {x∗1, x∗2, . . . , x∗k}

Page 21: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Decomposition of Polya(α)

θ = (θ1, θ2, . . . , θn) ∼ Polya(α = MG0)with M > 0 and G0 probability measureif and only if

I Pr(Π(θ) = {A1, A2, . . . , Ak}) =Mk

(M)n↑

k∏i=1

(ni − 1)! and

I {θ∗1 , θ∗2 , . . . , θ∗k}iid∼ G0,

where ni = card(Ai), i = 1, 2, . . . , k and(M)n↑ = M(M + 1) · · · (M + n− 1).

The partition structure of Polya(α) in mixtures of DPinduces the probability model for clustering.

Page 22: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Neutral To the Right (NTR) Processes

Page 23: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of Random Distribution

For a given partition 0 = t0 < t1 < t2 < . . . < tk <∞ of R+ ...

-

t0 t1 t2 t3 · · ·

F (t1) F (t2)−F (t1)1−F (t1)

F (t3)−F (t2)1−F (t2) · · ·

Page 24: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of Random Distribution

For a given partition 0 = t0 < t1 < t2 < . . . < tk <∞ of R+ ...

-

t0 t1 t2 t3 · · ·

F (t1)

F (t2)−F (t1)1−F (t1)

F (t3)−F (t2)1−F (t2) · · ·

Page 25: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of Random Distribution

For a given partition 0 = t0 < t1 < t2 < . . . < tk <∞ of R+ ...

-

t0 t1 t2 t3 · · ·

F (t1) F (t2)−F (t1)1−F (t1)

F (t3)−F (t2)1−F (t2) · · ·

Page 26: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of Random Distribution

For a given partition 0 = t0 < t1 < t2 < . . . < tk <∞ of R+ ...

-

t0 t1 t2 t3 · · ·

F (t1) F (t2)−F (t1)1−F (t1)

F (t3)−F (t2)1−F (t2)

· · ·

Page 27: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of Random Distribution

For a given partition 0 = t0 < t1 < t2 < . . . < tk <∞ of R+ ...

-

t0 t1 t2 t3 · · ·

F (t1) F (t2)−F (t1)1−F (t1)

F (t3)−F (t2)1−F (t2) · · ·

Page 28: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Neutral To the Right (NTR) Processes

(Doksum 1974) A random distribution F on F is called anNTR process, if

F (t1),F (t2)− F (t1)

1− F (t1), · · · , F (tk)− F (tk−1)

1− F (tk−1)

are independent, for all 0 < t1 < t2 < · · · < tk <∞.

Page 29: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Independent Increment Processes

An independent increment process, A(t) with t ∈ R+, is aright continuous stochastic process with left limits andindependent increments.

A Levy process is a stationary independent incrementprocess.

An independent increment process can be represented as asum of a drift term, a Brownian motion and a pure jumpprocess.

For a nonparametric prior, we are only concerned with apositive and nondecreasing independent increment (NII)process without a drift term (or nonstationarysubordinator), thus independent increment processes withonly pure positive jump part.

Page 30: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

First Characterization of NTR Process

F is an NTR process if and only if

F (t) = 1− exp(−Y (t)),

where Y is an NII process such that such that

Y (0) = 0 a.s.,

limt→∞ Y (t) =∞ a.s.

Page 31: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Cumulative Hazard Function

A cumulative hazard function (chf) A of a distribution F isdefined as

A(t) =

∫ t

0

dF (s)

1− F (s−).

If F is continuous,

A(t) =

∫ t

0

f(s)

1− F (s−)ds.

The chf A is roughly

A(t) =

∫ t

0P (X = s|X ≥ s)ds.

Page 32: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Second Characterization of NTR Process

F is an NTR process if and only if A is an NII process such that

A(0) = 0,

0 ≤ ∆A(t) ≤ 1 for all t ∈ R+,

either A(t) = 1 for some t orlimt→∞A(t) =∞.

Page 33: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

An NII process Y can be characterized by a Poisson process Non R+ ×R+ with its intensity (or Levy measure) ν, a σ-finitemeasure on R+ ×R+ such that∫ t

0

∫ ∞0

xν(ds, dx) <∞, for all t ∈ R+.

Page 34: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

The relationship can be represented as

Y (t) =

∫ t

0

∫ ∞0

xN(ds, dx)

N =∑

s:∆Y (s)>0

δ(s,∆Y (s)).

In this case, we will call Y the NII process with Levy measure ν.

Page 35: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 36: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 37: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 38: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 39: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 40: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Relation between N and Y

N

t

-

6

Y

t

-

6

∗ ∗∗

Page 41: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Examples.

Beta Process (Hjort 1990)BP (A0(t), c(t)) is an NII process with Levy measure

ν(dx, ds) =c(s)

x(1− x)c(s)−1dxdA0(s), 0 < x < 1, s ≥ 0.

Extended Beta Process (Kim and Lee 2001)EBP (A0(t), α(t), β(t)) is an NII process with Levy measure

ν(dx, ds) =1

xb(x : α(s), β(s))dxdA0(s),

where b(x : α, β) is the density of beta distribution withparameters (α, β).

Page 42: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Examples.

Gamma Process (Lo 1982, Doksum 1974, Ferguson andPhadia 1979, Kalbfleish 1978)GP (A0(t), c(t)) is an NII process with Levy measure

ν(dx, ds) =c(s)

xexp(−c(s)x)dxdA0(s).

Extended Gamma ProcessAn NII process with Levy measure

ν(dx, ds) =1

xg(x : α(s), β(s))dxdA0(s),

where

g(x : a, b) =ba

Γ(a)xa−1e−bx, x ≥ 0.

Page 43: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Construction of NII process.

Let f(x : θ) be a probability density on R+ with parameter θ.We can construct an NII process with Levy measure

ν(dx, ds) =1

xf(x : θ(s))dxdA0(s).

Extended beta and extended gamma processes are subclass ofthis larger class. One can use other class of densities, Weibulldistribution, positive part of tk or normal distribution.

Page 44: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Algorithms to generate sample paths of NII processes

Inverse Levy Measure (ILM) Algorithm (Wolpert andIckstadt 1998)

Algorithm of Damein, Laud and Smith 1995

ε-Approximation Algorithm (Lee and Kim 2004)

Poisson Weighting Algorithm (Lee 2009)

Page 45: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Remarks on NTR process

The NTR process is a conjugate prior for right censoreddata and is used in survival model.

Page 46: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Species Sampling Models (SSM)

Page 47: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Exchangeable Random Partition on [n]

[n] = {1, 2, . . . , n}, n ∈ N = {1, 2, . . .}(partition of n) An unordered finite sequenceπn = {n1, n2, . . . , nk} is called a partition of n, if

ni ≥ 1, 1 ≤ i ≤ k, andk∑i=1

ni = n.

(composition of n) An ordered sequencen = (n1, n2, . . . , nk) with the same properties is called acomposition of n.

Page 48: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

For any partition {A1, A2, . . . , Ak} on [n] and anypermutation σ on [n], let

σ({A1, A2, . . . , Ak}) = {σ(A1), σ(A2), . . . , σ(Ak)},

where σ(A) = {σ(a) : a ∈ A}.

Page 49: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(exchangeable random partition) A random partition Πn of[n] is called exchangeable, if for any permutation σ on [n],

Πnd= σ(Πn),

i.e., for any partition {A1, A2, . . . , Ak} of [n],

P (Πn = {A1, A2, . . . , Ak}) = P (σ(Πn) = {A1, A2, . . . , Ak}).

Page 50: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Exchangeable Partition Probability Function (EPPF)

Πn is an exchangeable random partition of [n] if and only iffor any partition {A1, A2, . . . , Ak} of [n],

P (Πn = {A1, A2, . . . , Ak}) = p(|A1|, |A2|, . . . , |Ak|),

for some function p on Cn symmetric in its arguments,where Cn is the set of all compositions of n.

(EPPF) The function p is called an EPPF of Πn.

Page 51: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Exchangeable Random Partition on N

(restriction of Πn to [n]) For 1 ≤ m ≤ n, the restriction ofΠn to [m], Πm,n, is obtained from Πn by removing{m+ 1,m+ 2, . . . , n}.If Πn is exchangeable, Πm,n is exchangeable for all1 ≤ m ≤ n.

Page 52: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(exchangeable random partition on N) A sequence ofrandom partition Π∞ = (Πn)n≥1 is called an exchangeablerandom partition on N if

I Πn is an exchangeable random partition on [n] for all n;I Πm = Πm,n a.s. for all 1 ≤ m ≤ n <∞.

Page 53: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

EPPF on N

C = ∪∞n=1Cn, where Cn is the set of all compositions of n.

For n = (n1, n2, . . . , nk),

nj+ = (n1, . . . , nj−1, nj + 1, nj+1, . . . , nk), 1 ≤ j ≤ k,n(k+1)+ = (n1, n2, . . . , nk, 1).

Page 54: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(EPPF) A function p : C → [0, 1] is called an EPPF ofΠ∞ = (Πn) if

I p(1) = 1;I for all n ∈ C,

p(n) =

k+1∑j=1

p(nj+).

I pn = p|Cn is the EPPF of Πn for all n.

Page 55: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Kingman’s Representation

Π(x1, x2, . . .) := (Π(x1, . . . , xn))∞n=1 for any sequence(x1, x2, . . .).

(decreasing arrangement of block sizes of Π∞) LetΠ∞ = (Πn) be an exchangeable random partition on N.For each n ≥ 1,

N↓n,i :=

{the ith largest block size of Πn

0, if there are fewer than i blocks.

Page 56: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(Kingman’s representation)Suppose Π∞ = (Πn) is an exchangeable random partition

of N and (N↓n,i) is the decreasing arrangement of block sizesof Πn for n ≥ 1. Then, there exists a sequence of random

variables (P ↓1 ≥ P↓2 ≥ . . .) such that

IN↓n,in−→ P ↓i a.s. for all n ≥ 1;

I Π∞|(P ↓i )d= Π(X1, X2, . . .), where X1, X2, . . .

iid∼ F and F

has ranked atoms (P ↓i ).

Page 57: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

From the proof of Kingman’s representation

(Xi) is constructed as follows :

I U1, U2, . . .iid∼ ν independent of Π∞, where ν is a diffuse

probability measure.I Xi := Uτ(i), if i belongs to τ(i)th block to appear.

The random measure has the following form:

F =

∞∑i=1

P ↓i δUj + (1−∑j

P ↓j )ν,

where U1, U2, . . .iid∼ ν independent of Π∞.

Page 58: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Π∞ is an exchangeable random partition of N if and only ifthere exists a random probability measure

F =∑j

PjδUj + (1−∑j

Pj)ν,

such that Π∞d= Π(X1, X2, . . .) where X1, X2, . . . |F

iid∼ F

and U1, U2, . . .iid∼ ν independent of (Pj).

(Xi) is called a species sampling sequence.

F is called a species sampling model.

Page 59: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Species Sampling

Imagine that we land on a planet where ”no one has gonebefore”. As we explore the planet, we encounter newspecies unknown to us.

We record the names of species we encounter. If the speciesis new, we name it by picking an element from X .

Page 60: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Suppose (X1, X2, . . .) is an infinite sequence of such records.

Xi : the species of the i th individual sampled.

Xj : the jth distinct species appeared

k = kn : the number of distinct species appeared in(X1, . . . , Xn)

nj = njn : the number of times the jth species Xj appearsin (X1, . . . , Xn)

n = (n1n, n2n, . . .) or (n1n, n2n, . . . , nkn)

Page 61: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Species Sampling Sequence

We call an exchangeable sequence (X1, X2, . . .) the speciessampling sequence if

X1 ∼ ν

Xn+1|X1, . . . , Xn ∼k∑j=1

pj(nn)δXj+ pk+1(nn)ν,

where ν is a diffuse probability measure on X , i.e.ν({x}) = 0 ∀x ∈ X .

Page 62: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Prediction Probability Function

A sequence of functions (pj , j = 1, 2, . . .) : C → R in thedefinition of species sampling sequence is called theprediction probability function (PPF).

The PPF (pj) satisfies

pj(n) ≥ 0

k(n)+1∑j=1

pj(n) = 1, for all n ∈ N∗.

Page 63: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

For a species sampling sequence (Xn), the correspondingprediction probability functions is defined as

pj(n) = P(Xn+1 = Xj |X1, . . . , Xn), j = 1, . . . , kn,

pkn+1(n) = P(Xn+1 /∈ {X1, . . . , Xn}|X1, . . . , Xn).

Page 64: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Putative PPF

A sequence of functions (pj , j = 1, 2, . . .) : C → R is calleda putative PPF, if it satisfies

pj(n) ≥ 0

k(n)+1∑j=1

pj(n) = 1, for all n ∈ N∗.

Is every putative PPF a PPF?The answer is unfortunately ”NO”. (Lee et al. 2008)

Page 65: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Characterizations of SSM

The distribution of species sampling model

F =∑j

PjδUj + (1−∑j

Pj)ν,

is characterized byI ν and the distribution of (Pj); orI ν and the distribution of Π∞; orI ν and the EPPF (p) of Π∞; orI ν and the PPF (pj) of Π∞.

The species sampling model is characterized as a speciessampling sequence.

Page 66: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Example : Dirichlet Process

Suppose P ∼ DP (θν), where θ > 0 and ν is a probabilitymeasure and X1, X2, . . . |P ∼ P . Then, marginallyX1, X2, . . . is a Polya urn sequence which satisfies

X1 ∼ ν

Xn+1|X1, . . . , Xn ∼k∑j=1

njn+ θ

δXj+

θ

n+ θν.

Thus, the Polya urn sequence is a species samplingsequence.

Page 67: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

The PPF of the Polya urn sequence is

pj(n1, . . . , nk) =nj

n+ θI(1 ≤ j ≤ k) +

θ

n+ θI(j = k + 1),

where n =

k∑i=1

ni.

Page 68: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(Ewen’s sampling formula) The EPPF of the Polya urnsequence is

p(n1, . . . , nk) =θk

(θ)n↑

k∏i=1

(ni − 1)!,

where (θ)n↑ = θ(θ + 1) . . . (θ + n− 1).

Page 69: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(Sethuram’s Representation)

U1, U2, . . . ∼ Beta(1, θ)

Y1, Y2, . . . ∼ ν

and they are independent of each other. Define

P1 = U1

P2 = U2(1− U1)

. . .

Pn = Un

n−1∏i=1

(1− Ui)

. . .

Then,

P =

∞∑i=1

PiδYi ∼ DP (θν).

Page 70: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Example: Pitman-Yor Process

For a pair of real numbers (a, b) and a diffuse probabilitymeasure with either 0 ≤ a < 1 and b > −a or a < 0 andb = −ma for some m = 1, 2, . . ., define

Ujind∼ Beta(1− a, b+ ja), j = 1, 2, . . .

X1, X2, . . .iid∼ ν

and (Uj) ⊥ (Xj).

Construct P1, P2, . . . from Uis by the stick breaking process

P1 = U1

Pj = (1− Uj) . . . (1− Uj−1) · Uj , j = 2, 3, . . . .

Page 71: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

The random probability measure

P =

∞∑j=1

PjδXj

is called a Pitman-Yor process or P ∼ PY (a, b, ν).

Note PY (0, θ, ν) = DP (θ · ν).

Page 72: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(EPPF of Pitman-Yor)

pa,b(n1, n2, . . . , nk) =(θ + a)k−1↑a

∏ki=1(1− a)ni−1↑1

(θ + 1)n−1↑1,

where (x)n↑c = x(x+ c)(x+ 2c) · · · (x+ (n− 1)c).

Page 73: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

(PPF of Pitman-Yor)

pa,bj (n1, n2, . . . , nk) =

{nj−an+b , j = 1, 2, . . . , kb+kan+b , j = k + 1.

Page 74: Bayesian Nonparametric Statistics and Nonparametric Priors · Nonparametric Priors Jaeyong Lee Department of Statistics Seoul National University August 13, 2011. Bayesian StatisticsDirichlet

Bayesian Statistics Dirichlet Process NTR process Species sampling model

Remarks on SSM

SSM is a rich class of nonparametric priors.

SSM gives alternative probability models for clustering.

Its applications and utility are remained to be seen.