12-bayesian wavelet estimators in nonparametric regression.pdf

Upload: senersin

Post on 02-Jun-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    1/48

    Bayesian wavelet estimators in nonparametricregression

    Natalia Bochkina

    University of Edinburgh

    University of Bristol - Lecture 1 -1-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    2/48

    Outline

    Lecture 1. Classical and Bayesian approaches to estimation in nonparametric

    regression

    1. Classical estimators

    Kernel estimators

    Orthogonal series estimators

    Other estimators (local polynomials, spline estimators etc)2. Bayesian approach

    Prior on coefficients in an orthogonal basis Gaussian process priors Other prior distributions

    University of Bristol - Lecture 1 -2-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    3/48

    Lecture 2.Classical minimax consistency and concentration of posterior

    measures

    1. Decision-theoretic approach to classical consistency and concentration of

    posterior measures

    2. Classical consistency

    Bayes and minimax estimators Speed of convergence Adaptivity Lower bounds

    3. Concentration of posterior measures

    University of Bristol - Lecture 1 -3-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    4/48

    Lecture 3.Wavelet estimators in nonparametric regression

    1. Thresholding estimators (universal, SURE, Block thresholding; different types of

    thresholding functions).

    2. Different choices of prior distributions

    3. Empirical Bayes estimators (posterior mean and median, Bayes factor

    estimator)

    4. Optimal non-adaptive wavelet estimators

    5. Optimal adaptive wavelet estimators

    University of Bristol - Lecture 1 -4-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    5/48

    Lecture 4.Wavelet estimators: simultaneous local and global optimality

    1. Separable and non-separable function estimators

    2. When simultaneous local and global optimality is possible

    3. Bayesian wavelet estimator that is locally and globally optimal

    4. Conclusions and open questions

    University of Bristol - Lecture 1 -5-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    6/48

    Lecture 1. Classical and Bayesian approaches to estimation in nonparametric

    regression

    1. Classical estimators

    Kernel estimators Orthogonal series estimators Other estimators

    2. Bayesian approach

    Prior on coefficients in an orthogonal basis Gaussian process priors Other prior distributions

    University of Bristol - Lecture 1 -6-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    7/48

    Lecture 1. Main references

    A. Tsybakov (2009) Introduction to nonparametric estimation. Springer.

    J. Ramsay and B. Silverman (2002) Functional data analysis. Springer

    B. Vidakovic (1999) Statiatical modeling via wavelets. Wiley. K. Rasmussen & C. Williams (2006) Gaussian processes for machine learning.

    MIT Press.

    University of Bristol - Lecture 1 -7-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    8/48

    Examples of nonparametric models and problems

    Estimation of a probability densityLet X1, . . . , X n Fiid, distribution Fis absolutely continuous with respectto the Lebesgue measure on R.Aim: estimate the unknown densityp(x) = dFd .

    Nonparametric regressionAssume pairs of random variables(X1, Y1), . . . , (Xn, Yn)are such that

    Yi =f(Xi) + i, Xi [0, 1],where E(i) = 0for alli. We can writef(x) = E(Yi

    |Xi =x).

    Unknown function f : [0, 1] R is called theregression function.The problem of nonparametric regression is to estimate unknown function f.

    We focus on the nonparametric regression problem, large sample properties.

    University of Bristol - Lecture 1 -8-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    9/48

    Examples of nonparametric models and problems (cont)

    White noise modelThis is an idealised model that provides an approximation to the nonparametric

    regression model. Consider the following stochastic differential equation:

    dY(t) =f(t)dt + 1

    ndW(t), t [0, 1],

    where Wis a standard Wiener process on[0, 1], the function fis an unknown

    function on[0, 1], and n is an integer. It is assumed that a sample path{Y(t), 0 t 1} of the process Yis observed.

    The statistical problem is to estimate the unknown function f.

    First introduced in the context of nonparametric estimation by Ibragimov andHasminskii (1977, 1981)

    Formally asymptotic equivalence was proved by Brown and Low (1996).

    An extension to the multivariate case and random design regression was obtained

    by Reiss (2008).

    University of Bristol - Lecture 1 -9-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    10/48

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    11/48

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    12/48

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    13/48

    Kernel density estimators (cont)

    One of the first intuitive solutions is based on the following argument. For sufficiently

    small h >0we can write an approximation

    p(x) =F(x) F(x + h) F(x h)2h

    .

    Replacing Fby Fn, we define

    pRn (x) =Fn(x + h) Fn(x h)

    2h

    which is calledRosenblatt estimator. It can be rewritten in the form

    pRn (x) = 12nh

    ni=1

    I(x h < Xi x +h) = 1nh

    ni=1

    K0Xi xh

    ,where K0(x) =

    12I(1< x 1).

    University of Bristol - Lecture 1 -13-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    14/48

    Kernel density estimators (cont)

    A simple generalisation of the Rosenblatt estimator is given by

    pn(x) = 1nh

    ni=1

    KXi xh

    ,where K: R R is an integrable function satisfying

    K(u)du= 1. Such a

    function Kis called akerneland the parameter his called abandwidthof theestimatorpn(x). The functionpn(x)is calledthe kernel density estimatoror the

    Parzen-Rosenblatt estimator.

    Further reading: B. Silverman (1986) Density estimation for statistics and data

    analysis. Wiley.

    Tuning parameters:bandwidth h and kernel K.

    University of Bristol - Lecture 1 -14-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    15/48

    Kernel estimators for regression function

    Nonparametric regression model:

    Yi =f(Xi) + i, i= 1, . . . , n ,

    where(Xi, Y

    i)are iid pairs, E

    |Yi|

    0. Then

    f(x) =E(Y|X=x) = yp(y| x)dyp(x) = yp(x, y)dyp(x) .If we replace herep(x, y)by its kernel estimatorpn(x, y):

    pn(x, y) = 1

    nh2

    ni=1

    KXi xh KYi yh ,and use the kernel estimator pn(x)instead ofp(x), if kernel Kis of order 1, we

    obtain Nadaraya-Watson estimator:

    fNWn (x) =n

    i=1 YiKXix

    hn

    i=1 KXix

    h

    , if ni=1

    KXi x

    h

    = 0,

    and fNWn (x) = 0otherwise.

    University of Bristol - Lecture 1 -16-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    17/48

    Nadaraya-Watson estimator is linear

    The Nadaraya-Watson estimator can be represented as a weighted sum ofYi:

    fNWn (x) =n

    i=1 YiWNWni (x),

    where the weights are given by

    WNWni (x) = K

    Xix

    h

    ni=1 KXixh I n

    i=1 KXi x

    h = 0.Definition 1. An estimatorfn(x)off(x)is called a linear nonparametric

    regression estimator if it can be written in the form

    fn(x) =n

    i=1

    YiWni(x)

    where the weightsWni(x) =Wni(x, X1,...,Xn)depend only onn,i,x and

    the valuesX1

    , . . . , X n

    .

    Typically,n

    i=1 Wni(x) = 1for allx(or for almost allx wrt Lebesgue measure).University of Bristol - Lecture 1 -17-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    18/48

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    19/48

    Other kernels

    K(u) = (1 |u|)I(|u| 1)- triangular kernel

    K(u) = 34 (1

    u2)I(

    |u

    | 1)- parabolic, or Epanechnikov kernel

    K(u) = 12

    eu2/2 - Gaussian kernel

    K(u) = 12e|u|/2 sin(|u|/2 + /4)- Silverman kernel.

    University of Bristol - Lecture 1 -19-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    20/48

    Local polynomial estimators

    If the kernel Ktakes only nonnegative values, the Nadaraya-Watson estimator

    fNWn satisfies

    fNWn (x) = arg minR

    ni=1

    (Yi )2KXi xh Thus fNWn is obtained by alocal constant least squares approximationof the

    outputs Yi.

    Local polynomial least squares approximation: replace constant by a polynomial

    of given degree k. If f(k), then forzsufficiently close toxwe may write

    f(z) f(x) + f(x)(zx) + . . . +f(k)(x)

    k! (zx)k

    =T

    (x)Uz xh ,where

    U(u) = (1, u , u2/2!, . . . , uk/k!)T, (x) = (f(x), f(x)h, f(x)h2, . . . , f (k)(x)hk)T.

    University of Bristol - Lecture 1 -20-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    21/48

    Local polynomial estimators

    Definition 2. LetK : R

    R be a kernel, h >0be a bandwidth, andk >0be

    an integer. A vector(x) Rk+1 defined by

    arg min

    Rk+1

    n

    i=1 Yi T(x)U

    z xh

    2

    K

    Xi x

    h is called a local polynomial estimator of orderk off(x). The statistic

    fn(x) =UT(0)

    n(x)

    is called a local polynomial estimator of orderk.

    University of Bristol - Lecture 1 -21-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    22/48

    2. Projection estimators (orthogonal series estimators)

    Nonparametric regression model:

    Yi=f(xi) + i, i= 1, . . . , n ,

    where Ei = 0, E2i < .Assume xi =i/n,f L2[0, 1].Take some orthonormal basis {k(x)}k=0ofL2[0, 1]. Then, for anyf L

    2

    [0, 1], {k}k=0:f(x) =

    k=0

    kk(x),

    and k = 10 f(x)k(x)dx.Projection estimation of fis based on a simple idea: approximate fby its projection

    Nk=0 kk(x)on the linear span of the firstN+ 1functions of the basis, and

    replacek

    by their estimators.

    University of Bristol - Lecture 1 -22-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    23/48

    Projection estimators

    IfXiare scattered over[0, 1]in a sufficiently uniform way, which happens, e.g., in

    the case Xi =i/n, the coefficients kare well approximated by the sums1n

    ni=1 f(Xi)k(Xi).

    Replacing in these sums the unknown quantities f(Xi)by the observations Yiwe

    obtain the following estimators ofk:

    k = 1nn

    i=1 Yik(Xi).Definition 3. LetN 1be an integer. The statistic

    f

    N

    n (x) =

    N

    k=0kk(x)is called a projection estimator (or an orthogonal series estimator) of the regression

    functionfat the pointx.

    Choice of parameter Ncorresponds to choosing smoothness off.

    University of Bristol - Lecture 1 -23-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    24/48

    Projection estimators (cont)

    Note that fNn (x)is a linear estimator, since we may write it in the form

    fNn (x) =n

    i=1 YiWni(x)with

    Wni(x) = 1

    n

    N

    k=0k(x)k(Xi)

    Examples:

    1. Fourier basis:2k(x) = 1,2k(x) =

    2 cos(2kx),

    2k+1(x) =

    2 sin(2kx),k = 1, 2, . . .,x

    [0, 1](Tsybakov, 2009).

    2. A wavelet basis (Vidakovic, 1999)

    3. An orthogonal polynomial basis:k(x) = (x a)k,k 0(more commonlyused in the context of density estimation)

    University of Bristol - Lecture 1 -24-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    25/48

    Generalisation to arbitraryXisDefine vectors = (0, . . . , N)

    T and (x) = (0(x), . . . , N(x))T.

    The least squares estimator

    LS of the vector is defined as follows:

    LS = arg minRN

    ni=1

    (Yi T(Xi))2.

    If the matrix

    B=

    n

    i=1

    (Xi)T

    (Xi)

    is invertible, we can write

    LS =B1n

    i=1 Yi(Xi).Then the nonparametric least squares estimator off(x)is given by

    fLSn,N(x) =T(x)

    LS.

    University of Bristol - Lecture 1 -25-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    26/48

    Wavelet basisWavelet basis with periodic boundary correction on[0, 1]is

    {Lk, k= 0, . . . , 2L 1; jk, j=L, L + 1, . . . , k= 0, . . . , 2j 1},

    where jk(x) = 2j/2(2jx k),jk(x) = 2j/2(2jx k),(x)is ascaling function,(x)is awavelet functionsuch that

    (x)dx= 1, (x)dx= 0.Then, any f L2[0, 1]can be decomposed in thewavelet basis:

    f(x) =2L1

    k=0 kLk(x) +

    j=L2j1

    k=0 jkjk(x),and = {k, jk} is a set of wavelet coefficients. [Meyer, 1990]Wavelets(, )are said to haveregularitysif they havesderivatives and hass

    vanishing moments (xk(x)dx= 0for integerk s).University of Bristol - Lecture 1 -26-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    27/48

    Examples of wavelet functions

    0.0 0.2 0.4 0.6 0.8 1.0

    -1.5

    -0.5

    0.5

    1.

    0

    1.

    5

    Haar mother wavelet

    (a) Haar wavelet

    1.0 0.5 0.0 0.5 1.0 1.5

    1.0

    0.

    5

    0.0

    0.

    5

    1.0

    1

    .5

    Daub cmpct on ext. phase N=2x

    (x)

    (b) Daubechies wavelet,

    s = 2

    -2 0 2 4

    -1.5

    -0.5

    0.5

    1.

    0

    1.

    5

    Daubechies mother wavelet

    (c) Daubechies wavelet, s = 4

    Localisation in time and frequency domains - sparse wavelet representation of most

    functions.

    University of Bristol - Lecture 1 -27-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    28/48

    Daubechies wavelet transform,s= 8

    3 2 1 0 1 2 3

    1.0

    0.5

    0.

    0

    0.5

    1.0

    Daub cmpct on ext. phase N=8x

    (x)

    2 0 2 4 6 8

    1.0

    0.

    5

    0.

    0

    0.5

    1.

    0

    Daub cmpct on ext. phase N=8x

    (x)

    8 6 4 2 0 2

    1.0

    0.

    5

    0.

    0

    0.5

    1.

    0

    Daub cmpct on ext. phase N=8x

    (x)

    0 5 10 15 20

    1.

    0

    0.5

    0.0

    0.5

    1.0

    Daub cmpct on ext. phase N=8x

    (x)

    5 0 5 10 15

    1.

    0

    0.5

    0.0

    0.5

    1.0

    Daub cmpct on ext. phase N=8x

    (x)

    15 10 5 0 5

    1.

    0

    0.5

    0.0

    0.5

    1.0

    Daub cmpct on ext. phase N=8x

    (x)

    20 15 10 5 0

    1.

    0

    0.5

    0.0

    0.5

    1.0

    Daub cmpct on ext. phase N=8x

    (x)

    University of Bristol - Lecture 1 -28-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    29/48

    Discrete wavelet transform (DWT)Applying discretised wavelet transform to data yields

    djk = wjk+jk, L j J 1, k= 0, . . . , 2j 1,

    cLk = uLk+ k, k= 0, . . . , 2L

    1,where djkand cLkare discrete wavelet and scaling coefficients of observations

    (yi), and jkand kare coefficients of the discrete wavelet transform of noise

    (i). Ifi

    N(0, 2)independent, then jk

    N(0, 2)independent.

    Connection tojk:

    jk =

    1

    0

    f(x)jk(x)dx 1n

    n

    i=1jk(i/n)f(i/n) =

    1n

    (W fn)(jk) = wjk

    n =:jk,

    where Wis orthonormal n nmatrix,fn = (f(1/n), . . . , f (1)).

    Also, foryjk =djk/

    n and yk =cL,k/

    n, and for Gaussian noise,

    yjk N(jk, 2/n), yk N(k, 2/n).University of Bristol - Lecture 1 -29-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    30/48

    SmoothnessFourier series - basis of Sobolev spaces Wrp L2,p [1,],r >0:

    f Wr

    p

    k=1 |arkk|p < ,where ak =kfor even kand ak =k 1for odd k.Wavelet series - basis of Besov spaces Brp,q L2),p, q [1,],r >0:

    f Brp,q 2L1

    k=0|k|p

    1/p

    +

    j=L2jq(r+1/21/p)

    2j1

    k=0|jk|p

    p/q1/q

    <

    provided regularity sof wavelet transform:s > r >0(Donoho and Johnstone,

    1998, Theorem 2).

    Embeddings:B

    r

    2,2=W

    r

    2.

    University of Bristol - Lecture 1 -30-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    31/48

    Regularisation

    Penalised least squares estimator off:

    fpenn = arg minfF

    ni=1

    (Yi f(xi))2 + pen(f)

    where pen(f)is a penalty function, >0is regularisation parameter.

    Example: pen(f) =

    [f(x)]2dx, leads to a cubic spline estimator (Silverman,1985).

    (see Green and Silverman, 1994, for more details).

    University of Bristol - Lecture 1 -31-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    32/48

    Regularisation

    Penalisation can be done on the coefficients offin an orthonormal basis:

    penn = arg minRN+1

    Nk=0

    (yk k)2 + pen()

    Examples:1. pen

    () = ||||2

    2:k = 11+yk- Tikhonov regularisation, ridge

    regression.

    2. pen() = ||||1: for large enough ,

    is sparse, lasso regression (Tibshirani,

    1996).

    Estimator fpenn (penn ) coincides with MAP (maximum a posteriori) Bayesian

    estimator.

    University of Bristol - Lecture 1 -32-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    33/48

    Bayesian estimators

    Likelihood:

    Yi =f(Xi) + i.

    Common ways of specifying aprior distributionon a set of functionsF: On coefficients in some (orthonormal) basis, e.g. wavelet basis.

    Directly on

    F, e.g. in terms of Gaussian processes

    Inference is based on theposterior distribution(f| Y):

    p(fmin Y) =p(y| f)p(f)

    p(Y) .

    A point summary of the posterior distribution gives f(e.g. posterior mean, median,

    mode); can also obtain credibility bands for f.

    University of Bristol - Lecture 1 -33-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    34/48

    Bayesian projection estimators

    Decomposition in some orthonormal basis:

    f(x) =

    k=0kk(x).

    Likelihood under the (continuous time) white noise model:

    Yk N(k, 2/n) independent

    Under the nonparametric regression model:yk N(k, 2/n), independent.Prior on coefficients :

    k pk(), k= 0, . . . , N ,and P(k = 0) = 1fork > N.

    Prior distributions kcan be determined by a priori smoothness assumption.

    Inference is based on theposterior distribution | y:kcan be posterior mean,median, mode etc; variability ofk.University of Bristol - Lecture 1 -34-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    35/48

    Example: posterior mode (MAP) estimator

    Suppose we have Gaussian likelihood:yk N(k, 2/n), and prior densitiesk pk(),k = 0, . . . , N .The corresponding posterior density ofis

    f(| y) exp N

    k=0

    [ n22

    (yk k)2 + logpk(k)]

    .

    Posterior mode (MAP) estimator:

    MAPn = arg maxRN+1

    f(| y) = arg minRN+1

    Nk=0

    (yk k)2 + npen()],

    where pen() = Nk=0logpk(k).For example, for a Gaussian prior k N(0, 2)iid, pen() = ||||22/22 -corresponds to ridge regression estimator, and for a double exponential prior

    pk(k) = 2 e

    |k|iid, pen() =||

    ||1- corresponds to lasso regression.

    University of Bristol - Lecture 1 -35-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    36/48

    Choice of prior distribution for Bayesian wavelet estimators

    Wavelet decomposition:

    f(x) =2L

    1

    k=0

    kLk(x) + j=L

    2j

    1

    k=0

    jkjk(x),

    Wavelet representation of most functions is sparse, motivating the following prior

    distribution forwavelet coefficients:

    jk (1 j)0() + jhj(),where hj(

    )is the prior density function of non-zero wavelet coefficients, and

    j = P(jk= 0).Scaling coefficients:k 1- noninformative prior.

    University of Bristol - Lecture 1 -36-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    37/48

    Prior distribution of wavelet coefficients

    h - normal:Clyde and George (1998), Abramovich, Sapatinas and Silverman

    (1998), etc.

    h - double exponential:h(x) = 12e|x|- by Vidakovic (1998), Clyde and George

    (1998), Johnstone and Silverman (2005).

    h - t distribution:Bochkina and Sapatinas (2005), Johnstone and Silverman (2005).

    What is corresponding a priori regularity off?

    University of Bristol - Lecture 1 -37-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    38/48

    A priori regularity

    Studied by Abramovich et al. (1998) for normal hand j = min(1, c2j),

    j =c2j ,, 0,c, c >0.

    Generalised to arbitrary h,jand jby Bochkina (2002)

    [PhD thesis, University of Bristol]

    Example:j =c2j ,j = min(1, c2j ).

    Expected number of non-zero wavelet coefficients is EN=j=j0 2jj .Can specify jin such a way that:

    EN=

    :j = min(1, C2

    j )with

    1;

    EN < :j = min(1, C2j )with >1.Consider case (0, 1].

    University of Bristol - Lecture 1 -38-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    39/48

    Assumptions on distributionH

    Suppose has distribution H.

    1. 0 0; if

    q < , assume that l > q;(b) 1H(x) + H(x) =cme(x)m [1 + o(1)]asx +,m >0,

    >0,cm>0.

    3. = 1,1 p ,1 q < : assume that E||q < .4. = 1,1 p ,q= : assume that >0such thatE[log(||)I(|| > )]< .

    University of Bristol - Lecture 1 -39-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    40/48

    A priori regularity

    H=

    1l , H has polynomial tail andp= ,

    0, otherwise

    .Theorem 1. Suppose that andare wavelet and scaling functions of regularity

    s, where0< r < s. Consider functionfand its wavelet transform under

    assumption H.

    Then, for any fixed value of scaling coefficientsk,f Brp,q almost surely if andonly if

    either r+

    1

    2

    2

    p + H < 0,

    or r+1

    2

    2

    p = 0 and 0

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    41/48

    Nonparametric Bayesian estimators

    Assume fixed design (i.e.Xi =xiare fixed):

    Yi =f(xi) + i, xi [0, 1],

    with E(i) = 0for alli.

    Prior distribution: f G,where

    Gis a probability measure on a set of functions f.

    University of Bristol - Lecture 1 -41-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    42/48

    Nonparametric Bayesian estimators: examples

    1.G = GP(m(x), k(x, y))-Gaussian processwith mean functionm(x) = Ef(x)and covariance function k(x, y) =Cov(f(x), f(y))-

    symmetric and positive definite.

    2. Wavelet dictionary: Abramovich, Sapatinas, Silverman (2000), Bochkina (2002):

    model fas

    f(x) =f0(x) +fw(x) =

    M

    i=1 ii(x) + (x),where (x) =a

    1/2(a(x b)),(x) =a1/2(a(x b))= (a, b) [a0,) [0, 1],M < and i< a0.Take- Poisson process on R+ [0, 1]with intensity (a, b) a, >0, and | H()iid.For Gaussian H, Abramovich et al. (2000) give necessary and sufficient

    conditions forf

    Brp,qwith probability 1, for more general H- in Bochkina

    (2002).

    University of Bristol - Lecture 1 -42-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    43/48

    3.Levy adaptive regression kernels:f(x) = g(x, )L(d),whereL()is a Levy random measure:

    L(A) =

    N

    k=0 kIA(j)where N Pois(),(j, j) (d,d)iid (C. Tu, M.Clyde, R. Wolpert,2007).

    University of Bristol - Lecture 1 -43-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    44/48

    Nonparametric Bayesian estimators with Gaussian process prior

    Definition 4. A Gaussian process is a collection of random variables, any finite

    number of which have a joint Gaussian distribution.

    Assume that the observation errors are also Gaussian:Yi N(f(xi), 2), or, inthe matrix form,

    Y Nn(f, 2In),where Y= (Y1, . . . , Y n)

    T, f= (f(x1), . . . , f (xn))T.

    Often, in regression problems a priori Ef(x) =m(x) = 0.

    Prior:f GP(0, k(x, y)).

    University of Bristol - Lecture 1 -44-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    45/48

    Posterior distribution

    Then, the posterior distribution offat an arbitrary set of points

    x= (x1, . . . , x

    m)

    T (0, 1)m, f = (f(x1), . . . , f (xm))T isf

    |Y,x,x

    Nm(, )

    where

    = k(x,x)[k(x,x) + 2In]1Y,

    = k(x,x) k(x, x)[k(x,x) +

    2

    In]1

    k(x,x

    ).If the posterior mean is used as a point estimator, we have, for any x (0, 1):

    f(x) = E(f(x)

    |Y,x) =

    n

    i=1 ik(xi, x),where = [k(x,x) + 2In]

    1Y.

    This estimator is linear, and is a particular case ofkernel estimator.

    In addition, have posterior credible bands.

    University of Bristol - Lecture 1 -45-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    46/48

    Bayesian nonparametric estimators with Gaussian process prior

    Smoothness

    If we assume f

    GP(0, k(x, y)), then f

    Hk- Reproducing Kernel Hilbert

    Space (RKHS) with kernel k(x, y).

    Hence, a priori regularity of a GPfis the regularity of the corresponding RKHSH.Orthogonal basis estimators with basis

    {i(x)

    }are also (implicitly) assumed to

    belong to a RKHS with reproducing kernel k(x, y) = i=1 i(x)i(y).Connection to splines:

    ifk(x, y):

    ||f

    ||2

    H= [f(x)]2dx, the corresponding MAP estimator is a cubicspline.

    The corresponding k(x, y) = 12 (x y)2 min(x, y) + 13 [min(x, y)]3.

    University of Bristol - Lecture 1 -46-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    47/48

    Regularity of Gaussian processes

    1.Brownian motion:k(x, y) = 12 [x + y |x y|].

    W(t) C[0, 1], ||W||2H=W(0)2 + ||W||22.2.Fractional Brownian motion:k(x, y) = 12 [x

    2 + y2 |x y|2], (0, 1).-smooth.

    References:

    Q Wu, F Liang, S Mukherjee, RL Wolpert (2007) Characterizing the functionspace for Bayesian kernel models. Journal of Machine Learning.

    A. van der Vaart and H. Zanten (2008) Rates of contraction of posteriordistributions based on Gaussian process priors. Annals of Statistics (36).

    University of Bristol - Lecture 1 -47-

  • 8/10/2019 12-Bayesian wavelet estimators in nonparametric regression.pdf

    48/48

    Next lecture

    Frequentist behaviour of nonparametric estimators:

    Consistency of (point) estimators fn. Concentration of posterior measures.

    University of Bristol - Lecture 1 -48-