asymptotic behavior of confidence regions in the change-point problem

20
ELSEVIER Journal of Statistical Planning and Inference 58 (1997) 263-282 journal of statistical planning and inference Asymptotic behavior of confidence regions in the change-point problem Michael Baron, Andrew L. Rukhin* Department (~[ Mathematics and Statistics, UMBC, Baltimore. MD 21228, USA Received 1 April 1995; revised 15 November 1995 Abstract The confidence estimation of the change-point is considered. The asymptotic behavior of the coverage probability and the expected width of a traditional confidence region is derived as the threshold constant increases. The nature of this bound is related to information num- bers and first passage probabilities for random walks. In the situation when pre- and after- change distributions belong to a one-parameter exponential family but are otherwise unknown, similar results are obtained for the confidence region based on the maximum likelihood procedure. A MS classi~cation: primary 62F12; secondary 62E20 Keywords: Change-point problem; Confidence estimation; Credible region; Exponential family; Information numbers; Random walk 1. Introduction Let F and G be two different probability distributions with densities f and .q and assume that the observed data (X1 ..... X,,, X,.÷I ..... Xn) consist of two indepen- dent parts, the first, Xj,...,X,., being a random sample from the distribution F, and the second random sample, Xv+l ..... Xn, coming from the distribution G. In other terms v is the change-point, the parameter of interest, whose estimation was initi- ated by Chernoff and Zacks (1966), Hinkley (1970) and Cobb (1978). The confi- dence estimation of the change-point was considered by Worsley (1986) and Siegmund (1988). We study the decision theoretic aspect of this problem assuming the loss function of the form w(v, c) = ; lcl + 1 - Ic(v). l) * Corresponding author. Research supported by NSF Grant #DMS-9124009 and NSA Grant MD904-93- H-3015. 0378-3758/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PII S0378-3758(96)00076-6

Upload: michael-baron

Post on 04-Jul-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

ELSEVIER Journal of Statistical Planning and

Inference 58 (1997) 263-282

journal of statistical planning and inference

Asymptotic behavior of confidence regions in the change-point problem

Michael Baron, Andrew L. Rukhin* Department (~[ Mathematics and Statistics, UMBC, Baltimore. MD 21228, USA

Received 1 April 1995; revised 15 November 1995

Abstract

The confidence estimation of the change-point is considered. The asymptotic behavior of the coverage probability and the expected width of a traditional confidence region is derived as the threshold constant increases. The nature of this bound is related to information num- bers and first passage probabilities for random walks. In the situation when pre- and after- change distributions belong to a one-parameter exponential family but are otherwise unknown, similar results are obtained for the confidence region based on the maximum likelihood procedure.

A M S classi~cation: primary 62F12; secondary 62E20

Keywords: Change-point problem; Confidence estimation; Credible region; Exponential family; Information numbers; Random walk

1. I n t r o d u c t i o n

Let F and G be two different probabili ty distributions with densities f and .q

and assume that the observed data (X1 . . . . . X,,, X,.÷I . . . . . Xn) consist of two indepen-

dent parts, the first, Xj , . . . ,X , . , being a random sample from the distribution F, and

the second random sample, Xv+l . . . . . Xn, coming from the distribution G. In other

terms v is the change-point, the parameter of interest, whose estimation was initi-

ated by Chernoff and Zacks (1966), Hinkley (1970) and Cobb (1978). The confi-

dence estimation of the change-point was considered by Worsley (1986) and Siegmund

(1988).

We study the decision theoretic aspect of this problem assuming the loss function

of the form

w ( v , c ) = ; lcl + 1 - I c ( v ) . l)

* Corresponding author. Research supported by NSF Grant #DMS-9124009 and NSA Grant MD904-93- H-3015.

0378-3758/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PII S0378-3758(96)00076-6

264 M. Baron, A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263~82

Here C is the confidence set with ]C I elements; I denotes the indicator function and

2 is a fixed positive constant. Thus the corresponding risk, 2EIC I + P(v ~ C), is a combination of the expected cardinality of the confidence region and of the probability of non-coverage. This risk appears also in the minimization problem of the expected width of a confidence interval with a fixed confidence coefficient.

If distributions F and G are known, the classical credible confidence region for v

has the form

Rc = {k: mkax [~l l°gf(XJ)+k~+ l°gg(XJ) ]

< log f(Xj.) + log g(Xj) + c k+l

for some positive c. As is shown in Section 2, the risk function corresponding to this set possesses the

following property: as 2 ~ 0, then in order to minimize this risk, the threshold c must go to infinity. For this reason we look at the asymptotic behavior when c --~ cx~ for arbitrary pre-change and after-change distributions. Thus we follow the setting of

Siegmund (1988) who derived asymptotic results for large c for normal distributions F and G. From the statistical point of view, small values of 2 correspond to the dominating role of the coverage probability.

The paper is organized as follows. Results concerning the asymptotic behavior of

the expected width, E]Rcl, and of the coverage probability, P(v E Rc), for c ~ oo are stated in Section 2. In Section 3 we study the situation when densities f and

g are not completely known, but are assumed to belong to the same one-parameter exponential family. It is shown that the confidence region based on the likelihood ratio test for any fixed (but unknown) values of the nuisance parameters exhibits the same asymptotic behavior when these parameters are known (and this behav- ior is described in Section 2). Proofs of all these results are collected in Section 4.

2. Asymptotic behavior of the credible confidence region

To study the confidence estimation problem described in Section 1 we consider the following asymptotic setting originally suggested by Cobb (1978). Let {Xj, j E Z} be a doubly infinite sequence of independent random variables. Assume that for some integer v, Xj has density f(x) if j ~<v and density g(x) otherwise. Here we assume that f and 9 are known positive densities.

To relate this problem to the situation described in Section 1 with m = [(n - 1)/2]

relabel the original observations as X-re, X-m+1 ..... Xl, Xo, X1 ..... X[n/2]. Then as n --- oo, the setting above arises. For practical applications the obtained confidence regions have to be intersected with the set - m ~< v ~< [n/2].

M. Baron, A.L. Rukhin/Journal of Statistical Planning and InJ~'rence 58 (1997) 263 282 265

0 Let zi = l og f /g (X j ) and for k>~l Sk = ~ = l z j , S~ = --~j=k+lZj if k<~ - 1

So -- 0. Then the credible confidence set introduced in Section 1 has the form

Rc = {k: Sk <maxSk--C}k (2)

for some positive constant c.

For j > v, Ezj = - P l < 0, and Ezj = P2 > 0 for j<~v, with PI,P2 denoting

the information numbers, which are assumed to be finite. According to the strong

law of large numbers maxkSk is finite with probabili ty one and R~ is well

defined.

Throughout this paper, i f $1 is an arithmetic random variable with a span d, i.e.

its support is contained in $1 E { . . . . -2d , -d ,O ,d , 2d,.. .} with d being the largest

constant with this property, c will tend to + o c only through multiples

of d.

Theorem 2.1. For c ---+

I - P,.{v ERc} ~ e-C~lzcz + , (3)

where

zrl = exp - k-lP6(Sk > 0) , 7z2 = exp k-1PF(SkT>O) , (4) k~- -oc

and Pl,P2 are information numbers.

A similar result for point estimation of the change-point parameter for the zero-one

loss function is given in Rukhin (1995).

The next result pertains to the asymptotics of the expected width.

Theorem 2.2. Let F and G satisfy one of the Jbllowing conditions:

(i) F and G are continuous distributions. (ii) F and G are mixed type distributions (i.e. they have both discrete and contin-

uous components), and

~ * eXp{S~+ ~-- x} < 1, (5)

where ~ * means the summation over only discrete values of S~+. Then the average length of Rc for c ---, ec has the form

EIRcl = c + + b + o ( 1 ) , (6)

where b is a constant term specified in (26).

266 M. Baron, A.L. Rukhin/Journal of Statistical Plannin9 and Inference 58 (1997) 263-282

According to Theorems 2.1 and 2.2, the risk o f the confidence set Rc for the loss

function (1) has the asymptotic representation as c --~ oc,

EW(v, Rc)=(2c+e_C~lrc2 ) 1 + + 2(b + o(1)).

Constant c which minimizes this risk, admits the following asymptotic representation

for 2 --~ 0,

c ~ -- log 2 + log (nlrcz).

With this choice o f c,

EW(v,R~(;O)=-21og2[ 1

as 2 ~ 0 .

1 ] f(l +log(rclzc2))(pl+P2) +b] +o()~) + + 2 PlP2

If F = N(0, 1) and G = N(A, 1), then pl = P2 = A2/2. According to Theorems 2.1

and 2.2 as c --+ oc,

4c 4 e x p { - c - to} e l R c f ~ 1 - P , , { v R c ) ~ d 2 ,

where ~c = 2 ~ k - l ~ ( - 2 A v / - k ) . This agrees with the results of Siegmund (1988).

As another example consider the situation when F and G are Bernoulli distributions.

Then, by a translation invariant property of Pv{v ERc},

Pv{vE Rc}=Po{OE Rc}=PF~maxSk<c'~Pc~maxSk<c; ~ 02(C) . Lk~<o J J Lk~>o

Clearly, Q(c), the left-continuous version of the distribution function of maxk~<0Sk,

satisfies the following integral equation,

Q(c) = fPF{zj E d u } Q ( c - u) = EFQ(C--Z~) for c~>0. (7)

Also it vanishes for non-positive c and monotonical ly increases to 1 as c increases to

infinity.

The solution to (7), when probabili t ies o f success for F and G are p, 0 < p < 1/2,

and (1 - p ) respectively, turns out not to depend on the value o f p. Indeed zl takes

values ~ and - ~ with probabili t ies p and 1 - p where ~ = log((1 - p ) / p ) and

maxk~<0 Sk can take only the values which are multiples o f ~. Hence the function Q(c)

is a constant on any interval (n~, (n + 1)~] with integer n. Put c = n~ and an = Q(c).

Then a , T 1 as n ---+ oc, a , = 0 for n ~ 0 , and (7) implies the following difference

equation for an:

an = pan-i + (1 - p)an+l.

The unique solution has the form an = 1 - exp{ -n~} , n > 0, and

Q(c) = ac/¢ = 1 - e -~. (8)

M. Baron. A.L. Rukhin/Journal of Statistical Planning and InJerence 58 (1997) 263 282 267

Therefore,

P,,{v 6 Rc} = (1 - e-C) 2, (9)

which, rather surprisingly, does not depend on p. Thus in this example the set R~, has the guaranteed confidence coefficient for all values of p.

3. Confidence estimation of the change-point when F and G are unknown

Suppose now that distributions F and G belong to an exponential family but other- wise are unknown. More precisely assume that both f and g have the form

f ( x ]0) = exp{0x - ~b(O)}f(x 10)

with some real 01 and 02, 02 ¢ 0t (although our results can be extended to the multivariate case). Function ~,(0) is known to be an analytic convex function on 6), which is a subinterval o f the real line (see for example Brown, 1986). We assume that interval [0j, 02] is contained in Int 6).

It is also assumed that for some 0 < oo ~< 1 and l > 0

min{v,n - v} >~ In ~. (10)

Thus our problem of interval estimation of v can be considered as the Bayes estimation problem with the discrete uniform prior on the set defined by (10), and the risk function

2E[R I + P{v ~ R}, when 01 and 02 are nuisance parameters. For any fixed l<~k<,n let 01 = 01(k) and ()2 = t)2(k) be the maximum likelihood

estimators for the parameters based on X1 . . . . . Xk and Xk+L . . . . . Xn respectively. Then the Bayes estimator of v with respect to the zero-one loss and the given prior is

~: = argmax{Ak, ln"~<~k~n - ln~'~}, (11)

where k

A k = l o g ] 7 ~ f ( X j [ 0 , ) f l / (Xj] ( )2) . (12) j 1 j = k + l

Define for a positive c

T~.= {k: ln°~ <~k ~ n - ln ~0, A~, - Ak < c } .

Our goal is to study the behavior of Tc in terms of the given risk function and to obtain the optimal value of c. The formulas for the asymptotic risk of T,. as n -+ ,oc follow

from the proximity of the set T~. and of the set R~, defined by (2) for known 01 and 02.

Proposition 3.1. One has P{T~ C [ v - n~,v + n~]} --~ 1 as n ~ cxD f o r any 0 < ~ < ~o

and

l _ P { T c C [ v _ n ~ , v + n ~ ] } = o ( e ~ )

as n --~ oc ,/br some posi t ive constant K.

268 M. Baron, A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263-282

Proposit ion 3.2. Le t gn ~- rn-B f o r some 0 ~ fl < co/2 - :¢ and r > O. Then

P ([[v-n,,v+n,]max J(A~- Ak)--(Sv--Sk)l~gn} =o(exp{-Cn(°9-27-2fl)/3})

as n ---+ cxD f o r some C > O.

As was observed, Rc = {& - Sk ~<c} C [v - n ~, v + n ~] with probability tending to one. Therefore Propositions 3.1 and 3.2 imply the first part o f the following result.

Theorem 3.3. Le t en = rn-B f o r some 0 <~ fl < ~ / 2 and r > O. Then

P{Rc-~, C Tc CRc+~,} ---* 1 as n --4 oo.

The rate o f convergence is o(exp{-Kn(°~-2~)/s}) f o r some K > O. Here c can be any

bounded posit ive funct ion o f n.

Theorem 3.3, whose proof along with the proofs of propositions is given in Section 4, states that the sets Tc and Rc are very close when n is large, so that

lim P{v E R~_~,} <~ lim P { v C Tc} <~ lira P { v E Rc+~} n ----4 o o n -----roo n ---+ o o

and

lim EIRc_¢°I<~ lim EITc[~< lim EIRc+~ol. n - - - ~ o o n - - - * o o n - - - , o o

Letting n ---* oc in both inequalities and using Theorems 2.1 and 3.1 one derives the following corollaries.

Corollary 1. As c --~ + ~

nl2~c~o [ 1 - - P { v E To}] ~ e-cTzl~Z2 [ 1

where 7h and re2 are given by (4).

+ ,

Corollary 2. As c ---* +oo

lira E I T c l ~ c + . n-- -+ ~

Thus the risk function of Tc behaves asymptotically like that o f Re. Therefore the best choice of c in terms of the Bayes risk when 2 --~ 0 is

c ~ - log 2 + log(~zrC2).

Corol lary 3. I f distributions F and G do not have atoms, then

lim P{v = f~} = P{r+ = oo}P{z+ = oo}, n ---4 c x )

' = inf{k > O; Sv-k ~> 0}. where z+ = inf{k > O; Sv+~ >>- O} and r+

(13)

M. Baron, A.L RukhinlJournal of Statistical Planniny and Inference 58 (1997) 263~82 2 6 9

To prove (13) note that because o f the absence of atoms v = 9 is /~-almost surely

equivalent to A,, = A~ or to the event v E 7o. Let ~ = n -1~ for some 0 ~ f i < ~o/2

so that To = ["] T~.. According to Theorem 3.3 R0 C T~,, C Rze. with probabil i ty tending

to one. Hence as n ---+ oc the limiting probabil i ty P{v = 9} is bounded from below

by P{v E R0} = P{Smax = &} = P{z+ = oc}P{r+ = ,vc} and from above by

l i m ~ P{v E R2F..} = P{v E R0}. Thus (13) holds.

Notice that under condition (10) one can also estimate 01 on the basis of the data

Xi, 1 <~i < In °~, and 02 on the basis of the data Xj, n - In °, < j ~ n. It is possible to

show that the truncated random walk with so estimated parameters converges to the

random walk with known 01 and 02. In this way another version of the asymptotically

optimal confidence region can be obtained. However, the confidence set T~. seems to

be more robust to the choice of l and (o.

4. Appendix

In our setting the quantities Pv{v E Re} and E~IRc I do not depend on v, so we

can assume that v = 0. Then {&, k E Z} can be viewed as a combination of two c,(2) S independent random walks: S(I)k = S_k and o k = k, k~>0. We attach index j to all c , ( j ) • * probabili t ies and expected values related to ~k , J = l, 2.

v(J) for j --- 1,2 and denote by Q(u), Let M = max_oo<~<o~&; Mj = max0,<k<o~ o k

Ql(u) and Q2(u) distributions of M, Mi and M2 respectively. Also for j = 1,2 we

introduce the first passage moments,

inf{k: v(J)>~x} for x > 0 ; o k

r(J)(x) = inf{k: v ( J ) ~ x } for x < 0; ~ k

r~) = inf{k: v(J) 0}; r (j) inf{k: ~(J) 0}. ~ k >

4.1. Proof o f Theorem 2.1

We can think of f and g as imbedded into a one-parameter exponential family of

the form

f ( x l O) = exp { O l o g f (x) - ~(O)} g(x).

Then exp{~p(0)} = f fOgl-O, f ( x l 1) = f ( x ) and f ( x l O ) = g(x). Moreover, ~ ' (0 ) =

- P 2 and ~ ' ( 1 ) = Pl. Since

P{v E R,,} = P { M 1 < c}P{M2 < c}

= (1 -- PI {'c(c) < oo})(1 - P2{z(c) < oo}) (14)

we can use formula (8.48) of Siegmund (1985) which shows that for c --+ cx>

P l{~(c ) < oc} ~ e-CPl{~+ = oo}P2{z_ = oc} / (p ' ( l ) = e-cnln2/pl. (15)

270 M. Baron. A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263-282

Similar asymptotics obtains for P2{ 'c(c)< oc}, so that Theorem 2.1 follows from (14).

4.2. Proof of Theorem 2.2

In our notation

f EIRc } = ] E{IRc ] lM=u}dQ(u)

du >10 = [ E{[Rcl IM~ -- u; M2<~u}Q2(u)dQl(u)

Ju ~>0

+ [ E{IRcl ]M1 < u; 342 = u}Ql(u-)dQ2(u), (16) Ju >~o

where Ql(u-) denotes the limit from the left of Ql(ul) as ul ~ u. We consider only the first integral in the right-hand side of (16), the second integral can be treated similarly.

The set R~ may be disconnected (i.e. it can have lacunas). Denote connected com- ponents of Rc to the left o f the origin by Go, G~ . . . . . starting from the origin, and the components to the right o f it by/40, Hi . . . . Here we put Go = 0 if 0 ~'Rc, and Ho = i f l CR~.

Then

?/I n2

IRc[ = ~ [Gil q- 2 ]Hi[, i=0 i=0

where nl and n2 are the (random) numbers of connected components.

One has for any positive u

E{IRc[ [M, = u; M2 ~<u}

= g IG01+ IG~I[MI=u>~M2 +g IHol+~lHiflM2<~u i = 1 i = 1

Considering separately the four terms that appear in (17), let

G~c, = {k<~v(u-c): ~k~O) > u - e), i f u<~c;= (3, otherwise.

Then, under the condition M1 = u, one has Go = G~,c, and

E{[GoI [M1 = u>~Mz} - El{[G~,c[ [Ml = u}

=El{]Gu, c][maxSk=u}P,~maxSk=Ml[M,=U}G..,. [ Go

+E,{[Gu, c,,maxSk<u)P,~maxSk<M,[Ml:U},G..,. I. Go

--= M1} •

(17)

(18)

M. Baron. A.L. RukhinlJournal of Statistical Planning and Inference 58 (1997) 263 282 271

where for u < c

E' { IG",cl l maxSk =

= E~ {~(u) I ~(u) < ~(u - c) , S~(~) = u} + E, { ~ ( - c ) I ~+ = o~} . (19)

Lemma 1. Under the hypothesis of Theorem 2.2,

F~l{Z(_c),z+ =oo} = p~-I (c + E]Z~_2p, 2 E L M , ) - E I z ( M , )

a s c - - ~ o c .

+ o(1), (20)

Proof. Consider the queueing process Wk, defined as W0 = 0; Wk+t = rain{0; Wk - Zk+l} for k~>0.

Observe that given z+ = oc, both processes Wk and S~ 11 take only non-positive values. Therefore, under this condition they coincide for all k. Hence, for r " ( - c ) = inf{k: W, ~ - c},

E ~ { z ( - c ) l z+ = oc} = E ] { r W ( - e ) lr+ = oc}. (21)

q(]) -- maxj~<,S~ 1) Then W:(,w,) = 0, and the process It is easy to see that Wk = ~k Wt* = Wk.~(st,)- W:tM,) is distributed as Wk conditioned on r+ = oc. Then, since r~t ' ( -c) = r ( M i ) + r w * ( - c ) , it follows that

El{zW(--c) l % = oc} = El zW*(-c) = E l z W ( - c ) - Elz(M1).

The following result of Lotov (1991) can be used to evaluate the asymptotics of E ] r W ( - c ) .

Theorem (Lotov, 1991). Let c.l, ~2 . . . . be independent and identically distributed ran- dom variables with E~l > 0 and f ( )O = E e - ; ~ . Let Sk = ~1 + " " + ~k and let Y~. be the corresponding queueing process, Iio =- O, Yn+ =- max(0, Yn + ~,+l ). Assume that

(a) I f ( ) L ) l < o c f o r - - y ~ < R e 2 ~ < f i (7>~0, fl>O). (b) f(/J)>~ 1; /f f ( ~ ) = 1, then also f o Y epyP{-{I E dy} < oc. (c) r(2) = 1 - f ( 2 ) Jbr -7~<Re).~</3 has only two zeros, Z = 0 and a unique

positive root Z = q. >

Let r/± = inf{n>~l: Sn <~ 0} ( inf~ = oc), Z± = S,l±, G~(y) = P{Z¢ <Y, ~7+ <vc} , G2(y) = P { - Z < y}. Also suppose that

<'X5 043 (d) Jo elLVdG°(v) < l, fo e;'YdG°(Y) < 1, where GO denotes the su)n of the

discrete and singular components of Gj ( j = 1,2). Then Jor the stopping time T = r(b) = min{n/> l • Y~ > b},

{ z/z¢lE~ 2r~r+(0)(0__~) } qlq2 _qb E T = ( E ~ I ) -1 b+~--~z-~ + + + + . q e o(e -Tb) o(e [~b),

2 7 2 M. Baron, A.L. Rukhin/Journal of Statistical Planning and InJerence 58 (1997) 263-282

as b ---+ oo, where r~:(2) = 1 - E(exp{2zj :}; qj: < oo), ql = -r+(O)/qr'+(q), q2 = r_ (q)/qr' ( 0 ).

We let ~j = z_j and b = c. Then, in our notation, We = -Yk, zW(-c) = T(b),

f ( 2 ) f f i-; . ,~ E ~ = qO) r+(0) = 1 - P { ~ + < oo} = th, and = a g , = P l , Z4- ~ q : ,

r~_(0) = - E { & + {z+ < cxD}. Conditions (a) and (c) hold for 7 = 0 and /~ = 1. Also, for fl = 1, f f f ye~YP{-{~ E d y } ~ E ( - z _ l ) e -z-I = EGlog(g/ f ) (X) = P2 < oo, so that (b) holds. Clearly, with 7 = 0, j0 ~ e yy dG°(y) < 1 for continuous and mixed type distributions. Inequality f ~ e~YdG°(y) < 1 is trivial for continu-

ous distributions, and it is guaranteed by (5) for mixed type F and G. Thus, (d)

holds.

Hence, all the conditions o f Lotov 's Theorem are met, and therefore,

E1~W(-c) = P-l(c + ill) + o(1), (22)

E l{&+ Iv+ < co} = ET1 = E~Ml/Eq = r/1E1M1,

and (20) follows from (21) and (22).

Since Pl{z(u) < z(u - c)IM1 = u} ---+ 1 as c ~ cxD, formulas (19) and (20) yield

E l { IGu'c[jmax&=u}Gu,~

= E~ {~(u) [ z(u) < co, &(u) = u} + (c +/ /a )/Pa + E1 z(M) + o(1),

Our next step is to show that only the first term in (18) matters.

Lemma 2. For any u from the support of Q1,

as C----+ 00.

e ----+ o o .

(23)

as c ---+ co, where fl, = E1Z~/2pl - 2E, {&+ I ~+ < oo}/~/1. Next, we note that Ml = ~q=x Tk, where T1,. . . ,Tq are consecutive ascending

ladder heights o f the random walk S (1). The common distribution o f Tk coincides

with the conditional distribution of &+ given r+ = oo, and q is an independent geo-

metric random variable with parameter Pl {~+ = oo} = ql. Hence,

M. Baron. A.L. Rukhin / Journal of Statistical Planning and lnli, rence 58 (1997) 263 282 273

Proof. In terms of the first passage time,

P ~ m a x S ~ ) < M ' ] M I = u } I G,,

= P l{~(u - c) < ~ ( u ) I M l = u}

= t5 {~(u - c) < z (u) l~(u) < ~} Pl {~(u - c) < ~(u) < oc}

Pl {~(u) < ~c}

(1 _ ~(l) for k > 0. For any c the moment Consider the random walk S~ = S~(,,_~.) -~¢~-c)

r(u c) is a Markov stopping time. Hence by the strong Markov property, the process

S~ is independent on S~2_c) and has the same distribution as S~ L). The first passage

time %* = inf{k >~ 1 • S k* >~c}, which is a function o f S k ,* is identically distributed with rl I ). According to (15),

P, {r(u c) < z(u) < oc} 4P1 {~* < oc} -~ " - - ~'~ e 7 ~ 1 ~ 2 / j 0 I ,

as c --+ :yo, which implies (24).

Using the notation I{A} for the indicator of the event A, notice that max,;,,. S~ j/ and

~(u c) IG,,.,I = ~7, I{S~ ' t > u - c }

k=0

are positively associated random variables (see Lehmann, 1966). Indeed, they are non- ~(1) ~,¢1), decreasing functions o f - j ,~2 . . . . and therefore are non-decreasing functions of i.i.d.

random variables zl,z2 . . . . This property implies that by (23),

E l { I G " " l l m a x S k < u } < < ' E l { I G u ' c l l m a x & = u } .... c,, ....

The second term of (18) tends to 0, because it is bounded by a product of linearly increasing and exponentially decreasing functions of c. Also, according to Lemma 2,

P ~maxS~') = M' lMl = U} = l + (;~, a s c - - + ~ c .

One obtains from (18) and (23)

E{LG011M, -- u > M ~ }

EI{Z'(u) I ' c ( u ) < o c , St0,)=U}q-(C-~-fll)/ 'rO1 + E I ~ ( M ) + o ( 1 ) , c - ~ ~ .

We show now that as c --+ oc, El ~7'-1 {IG, I IM~ = u} has a finite limit, independent of u. It tbllows from (24) that all moments k defined by S~ l) = MI do not belong to

[.-Ji>l Gi with probability tending to 1. Hence the limiting conditional distribution of

I[.]i>1 Gil given M1 = u coincides with its limiting unconditional distribution. In fact

~(~) , whose existence is discussed it is determined by the limiting distribution of c + 5~_c)

in Siegmund (1985, Ch. VIII).

274 M. Baron, A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263-282

Conditioned on M1 = u, the moment r(Ml - c) is a Markov stopping time. Hence the processes S [ and S~ 1) have the same conditional distribution. Also

E, IO~l pM~ =u =E,{#k: S2 >c+~(_c~

Here G+ = {k: S~ l) > y} and y denotes a random variable independent on Sk, whose distribution is the limiting distribution of c + S}2_)c).

Consider now the last term of (17). As above,

E2 I~l IM2~u =M~ ~ E2IH+I, k i = l

where IH.I is defined similarly to G+. Thus only the asymptotic distribution of IH0[ depends on c. Applying Lemma 1 to S~ 21 and r ( - v + u - c), one has as c --+ ec,

E2{{H0[ {M2 = v<~M~ = u}

= E2{r (v ) [ r (v ) < r ( u - c), S~(~,) = v} + E 2 { z ( - v + u - c ) [ r + = oc} - 1

= Ee{r(v) [ r(v) < 0% S~(~) = v} + (c + v - u + f12)/P2

- E z z ( M ) - 1 + o ( 1 ) , ( 2 5 )

where f12 is defined similarly to ill. Thus (17), (23) and (25) yield

E{IRc I I MI = u ~>/142 = v} = c/pl + c/p2 + Wl (u, v) + o(1 ),

where

WI(U,U) = EI{T(L/) [ ~C(U) < (X3, S v ( u ) = u} -~- Ez{v(v) ] v(v) < ~ , S~(~) = v}

+ f l l / P l - - ( u - - v - - f 1 2 ) / P 2 - - E 1 z ( M ) - E 2 z ( M ) + E, IG+I + E21H+I - 1.

A similar formula holds for E{IRc [ IM2 = u > M1 = v}, and w2(u,v) is defined similarly to wl(u, v).

Next, we note that

(i) f~>~o fv~u dQ2(v)dQl(U)+ f,~o f~<u dQ1(v)dQ2(u)= fu~>o d O ( u ) = 1, (ii) L~>o EJ{r(u) lr(u) < oc, S~(,) = u}dQj(u) = E j r ( M ) for j = 1,2.

(iii) fu>~o f~<.u( u - v)dQ2(v)dQ~(u) = E(ml - M 2 ) +, (iv) M1 + ( M 2 - M 1 ) + = M 2 +(M1 - M 2 ) + = max{M~.M2} = m ,

where x + = max{0,x}. It follows from (16) that

E[Rcl = fu fv E{IRcl]M' :u>lM2:v}dQ2(v)dQ,(u) >~0 <~u

+ ~ ~v E{,Rc,,M2 = u > M I =v}dQl(v)dQ2(u) >~0 <u

= c(1/p~ + l /p z )+b+o(1 ) ,

M. Baron, A.L. RukhinlJournal of Statistical Planning and lnlerence 58 (1997) 263 282 275

as c + ~c. Here

Lo.L E t Z~/2pl - E ( M + Mi) E 2 Z 2 / 2 p 2 - E ( M + M2) = +

P l P2

• + E][G+ I + E21H+[ - I. {26)

By using this approach one can obta in a l imi t ing express ion for the probabi l i ty o f

R,. be ing a connec ted set. Indeed,

P{Rc is connected} = P{Rc = GoUHo} = P{GI = (3, H1 = 0}.

The dis t r ibut ions o f IG~] and ]Hl l is comple te ly de te rmined by the ' ove r shoo t ' c' +

S~(n.), j = 1,2. If, as in the p r o o f o f Theorem 2.2, r andom var iables v/, j = 1,2, have

the dis t r ibut ion co inc id ing with the l imit ing dis t r ibut ion o f c + S~(nci, and F~ denotes

the dis t r ibut ion of M / = maxk S~ j), then as c -+

P{G, 0} --+ P { v ( l y l l) = oo} = P{M1 < [Y,I} = E F I ( I y , I)

and P{HI -- 0} --+ EF2( ly21) . By independence

P{&. is connected} --+ E F I ( l Y l I)EF2(lY2I ).

4.3. ProoJ" of Proposition 3.1

Let H(x) = suP0 {Ox-~(0)} be the Legendre t ransform of the function ~/~. It fo l lows

f rom (12) and the defini t ion o f 01 and ~}2 that

/lk = k((}l~'k --///(01 ))-L (n -- k)(O2~Ckn-~ll((}2))=kH(xk)4-(n - k)H(.TCk,,). (27)

Here and later we use the notat ion

Xk< + " " +Xj X~ + . . - +X~. ~kj ; ~k --

j - k k

The m a x i m u m l ike l ihood es t imators ()1 and 02, which max imize the function 0 x -

~b(0) for x = Xk and x = xk~, are the roots o f the equat ion ~ ' ( 0 ) - x for these

values o f x . Since ~ ' is an increas ing funct ion of 0, we can introduce h(x) , the inverse

function o f ~y(0). Then (~1 = h(~a), 02 = h (~ , , ) and H(x) = xh(x) - ~(h (x ) ) , so that

H ' ( x ) - h(x). Also 01 = h(#t ) and 02 = h(/~2), where ILj = E<X.

Lemma 3. Let ~1 ,~2 . . . . . ~k be i.i,d, random variables with density f(xlO) }~,r some 0 E lnt O and tt = E~I. Let A be a closed subset of 0 containing 0 in its interior and

{1 } r e = r a i n ~ , t ¢ A > 0 .

276 M. Baron, A.L. RukhinlJournal of Statistical Plannin9 and Inference 58 (1997) 263282

Then f o r any e > 0 such that h([/~ - 6, p + el) C A

P{[~ - / - t I >/6} ~<2e -m~2k/2. (28)

Proof. Since Ee )& = exp{~k(0 + 2) - ~k(0)}, the classical results from the theory of

large deviations (see e.g. Bahadur, 1971) imply that

P { l ~ -1~l >>" e } <~ 2exp { - k min G(# + t ) } (29)

where G(a) = sup,~(a2 - logEe "~-' ) = H ( a ) - aO + ~9(0). Let ~b(t) = G(/~ + t). Then

qS(0) = 0, ~b'(0) = 0 and qS"(t) = h'(/~ + t) = 1/~,"(h(/~ + t)) > 0. Hence for some

-6<--6 <~e

q~(--b6) --- g2~bH(t I )/2 ~> 62 min ~b"(t)/2 = 62m/2. Itl.<~

[]

To apply this lemma to our situation let A be any closed subinterval o f O, which

contains 01 and 02 in its interior, and let m = min{1/~k"(t), t E A} = min{h'(x),

xE~, ' (A)}. Also put M = max{a/~b"(t), t C A } = max{h'(x), xE~, ' (A)} < cx~.

We proceed with the proof o f Proposition 3.1. Consider only the case ln'°<~ k ~ v - n ~

(which is trivial if In ~ > v - n~). Let us show first that the order o f Av - A k is at least

n ~ as n ---+ o~ with the probability tending to 1. Using (27) one obtains

Av - Ak = vH(Ycv) - kH(Yck) + (n - v)H(Ycvn) - (n - k)H(Yckn)

= (v - k ) H ( # l ) + (n - v)H(kt2) - (n - k ) H ( f i ) + rk

= { tH( /q) + (1 - t ) H ( # e ) - H ( t l ~ l + ( 1 - t ) # 2 ) } ( n - k ) + r k , (30)

where t = (v - k ) / ( n - k ) and/7 = E~k, = t#l + (1 - t ) / A 2 .

Let 9(u) = uH(t~l ) + (1 - u ) H ( / / 2 ) - H(Ultl ÷ ( I - u ) # 2 ) . Since H is convex, one has 9(u)>~Kl rain{u, 1 - u } for some K1 > 0 and all 0~<u41 . Indeed, it suffices to show

that 9(u) /u and 9(u) / (1 - u ) have strictly positive limits as u + 0 and u T 1 respectively. One has

lim g(u) /u = 9'(0) = H(p~ ) - H(122) - - Ht(It2)(/*l - ~ t 2 ) u.[0

= H"( /7 ) (~ - /22)2/2 > 0,

where /7 is some number between /q and /*e. Similarly, lim, lq g(u) / (1 - u) = H " ( f i )

( / ~ 2 - - / /1 )2/2 > 0 for some /~ between /~1 and P2. Hence, for n such that n ~ < In%

Av - Ak ~>K1 min{t, 1 - t } (n - k ) + rk = K l ( v - k ) + rk. (31)

M. Baron, A.L. Rukhin/Journal o f Statistical Planning and Inference 58 (1997) 263-282 277

Consider the remainder rk. Since H ' ( x ) = h ( x ) one derives from (30)

rk = v{H(~v) - H(/~I)} - k{H(~k) - H(/~I )}

+ (n - v){H(~v,) - H(/~2)} - ( n - k ) { H ( Y c k ~ ) - H(/~2)}

= v h ( ~ t )(~,, - / ~ ) - / c h ( m ) ( ~ - m )

+ (n - v)h(Ft2)(~., - Uz) - (n - k )h([ t ) (YCk, - f i ) + r~ 2). (32)

The error term r~ 2) is estimated below.

As was observed, h(/4j ) -- Oj. Also ~,.--- [ ( v - k)xkv + kxk]/v and xk, --- [ ( v - k )-~k,, +

(n -- v)~,.,]/' (n -- k). Therefore

r~ = (v - k )Ol(Yck~ - / ~ 1 ) + ( n - v)O:(Ycv, - lZ2)

(2) - h ( / 7 ) [ ( v - k)(-~k~, - / ~ 1 ) -t- ( n - v ) ( :~ , , . - / ~ ) ] + r k

= ( v - k ) (01 - h ( ~ ) ) ( ~ k v - ~1 ) + i n - v ) (0~ - h ( ~ ) ) ( ~ v . - # 2 ) + r~ 2~. i 3 3 )

In fact the second term of (33) has also the order v - k . Indeed, the Taylor expansion of h(/]) shows that

n - v /t:[h'(~l) I(n - v ) (02 - h ( ~ ) ) l - - I(n - , , ) h ' ( ~ , ) ( m - ~)1 = ( v - k ) n - - 5 7 I ~ - .

<~ ( v - k)]/xl -/x21 max h'(~/) -- (v - k ) M I l ~ l - ~ - i ff~cq,'(A)

with .~1 between /L1 and /22. Since h ( x ) is an increasing function, h ( ~ L ) ~ A .

Similarly 101 - h(/])l ~<MI/~I -/x2[ and (33) shows that

Using Lemma 3 with e - - -K~/ (6MIpI - p:]) --K~ one obtains for sufficiently large n

( x, \

~< 4 exp{-rnK:~ n ~/2 }. (34)

The error term of (33) also can be estimated using Lemma 3:

ra(2) = i vh / (~2 ) (~v . - /Ul )2 _ kht(~3)(~Ck - ill )2

+ (n - v)h ' (<. '4)(~, , . - . 2 ) : - ( n - ~ ) ~ ' ( ~ s ) ( ~ . , - f i )21/2 .

where ~ , ~3, [4 and ~ are points in the corresponding intervals. Let us estimate the probability that all of them belong to the interval ~ ' (A) = h ~(A).

278 M. Baron, A.L. Rukhin/Journal of Statistical Plannin 9 and Inference 58 (1997) 263~82

Denote by 6j the distance from /2j to the boundary of ff '(A) for j = 1,2 and let

6 = min(61,~2). Then

P{~2 ~ I//(A)} ~<P{lff2 - I J l l / > a } < ~ P { i Y c , . - # l l / > a } < . 2 e x p { - m a Z v / 2 } . (35)

The probabilities involving {3, {4 and ~5 can be estimated similarly. So, if 7 = m i n ( v , k , n - v,n - k ) = min(k,n - v)/> ln °', then

[r(2)l M k ~ T [V('~v -- /21 )2__k(xk --/21 )2 _~ (n -- V)(~fvn -- /22) 2 --(n -- k ) (2~ - fi)2 I

(36)

with probability greater than 1 - 8 e x p { - m 6 2 7 / 2 } . Applying Lemma 3 with e =

K~(v - k ) / 6 M = K3(v - k), we get

P { v( Ycv -/21 )2/>K3(v _ k)} = P{l-vv -/211/> v / K3( v - k )/v } ~ 2e -mK3n~ /2.

The same upper bound can be obtained for the other terms of (36). Thus

P{ Ir{2)[/>Kl (v -- k)/3} ~< 8[exp{-maZ7/2} + e x p { - m K 3 n ~ / 2 } ] . k

By combining this result with (34) one obtains for n --+ oo

P { Irk[/> 2Kl(v - k)} ~< 12 exp{ -K4n ~ } + o(exp{-K4n~}) , (37)

where K4 = m min(K2 ,K3) /2 . This estimate and (31) show that with probability (37)

Av - A k / > K l ( v - k ) /3 />Kln~ /3 . (38)

By the definition of 9, the event (38) implies that A ~ - Ak ~>Kin:</3, which shows that k q{ T,, for any n > (3c/K1)1/:,. Thus as n --+ oo for K < K4

P { l l n ~', v - n ~] rq Te = 13} <~ 1 2 v e x p { - K 4 n ~ } ( 1 + o(1 )) = o(exp{-Kn~}) .

By relabelling the data from X, to )(1 one obtains a similar bound for v+n:' <~k <~Ln,

P{[v + n ~, n - In "~] 71 T~. = 13} = o ( e -K'n~ ) as n --~ oo

with some positive constant K ' . These two results prove the proposition.

4.4. P r o o f o f Proposi t ion 3.2

Assume that v - n ~ < k ~< v and consider the function q(O,x) = log f (x]O ) = O x - ~ ( 0 )

for which H ( x ) = q (h (x ) , x ) . By the definition of &,

&, - & = vq(Ol,YCv) - kq(Ol,Yck) + (n - v)q(O>Ycv~) - (n - k ) q ( O 2 , . F c k ~ ) ,

M. Baron. A.L. Rukhin I Journal of Statistical Planning and lnfi, rence 58 (1997) 263 282 279

and (30) implies that

(A,. - A k ) - (S,. - & ) = v[H(£,,) - q(01,2,.)] - k [ H ( 2 k ) - q (Ot ,2k ) ]

+ (n - v)[H(-~vn) - q(O2,Ycv,)] - (n - k)[H(2/;n )

- q(02,2.k~)] = A1 + A 2 + A3, (39)

where

A, = (v k ) { [ H ( ~ , , ) - q(01,x~,)] - [H(xvn) - q( 02, -V,,n )] },

A2 = k{H(X'v) - H ( x k ) -- [q(01,2,.) -- q(01,-~k)]},

A3 = (n - k ) {H(2vn ) - H(YClm) - [q(O2,Ycvn) - q(O2,2k,,)]}.

Consider these terms separately. The Taylor expansion of the function q(O,x ) in the

first variable gives

H(~,.) - q(0~,2,.) = q(h(~,,),~,.) - q(01, Yc,.)

= 0~) + 2,,)(h(~,.) - 0~ - ~ ~(~-~" )2

for some ~6 be tween 0j and h(2,,). Here (~q/80)(Oj , .¥ , . ) = ~,. - ~ ' ( 0 ) ) --- },. - /21 and

(~72qfi?02)(~,,2,.) = - ~ " ( { 6 ) . The Taylor formula for h ( x ) shows that

H(2, .) - q(01,2,,) = h'(~7)(-~v - /21 )2 _ ~"(~6){h'(~7)(x, . - /21 )}2/2

= h ' ( ~ 7 ) ( 1 - " ~ , ~ , - )2, (~_6)h (c,_7)/2)(x,,-/21

and a similar expression holds for H ( 2 , , ~ ) - q( Oi,2v,, ).

As in the proof o f Proposi t ion 3 . l , ~6 E A and ~7 ~ ~9'(A) with the probabil i ty

o ( e x p { - C n " ' } ) for some C > 0. Then with the probabil i ty of the same order

]All ~< (v - k )M(1 + M / 2 m ) { ( 2 , . - / 2 1 )2 + (2,.,, _/22)2} <~ 2n~M(1 + M / 2 m ) m a x [ ( Y c v - !~1 )2; (xv,, - /22)2] .

Hence by Lemma 3

P - 3 3 2n~M(1 + M/2m) J

= o ( e x p { - C n ¢ ° - ~ - f ~ } )

as n --~ vc for some C > 0.

The second term of (39) can be est imated as follows:

A2 = ~ { H ( ~ , , ) - H ( ~ k ) - 01 (~ , , - ~ k ) }

= k{h(Yck)(Ycv - x k ) + h'(¢8)(2,. - Xk)2/2 -- 01(Av -- xk)}

= k(~:v - Xk){ht(~8)(Xv -- .7Ck)/2 Jr ht((9)(2k -- /21 )}.

(40)

2 8 0 M. Baron, A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263~82

Here, similarly to (35), with probability 1 - o ( e -cn''') the intermediate values ~s and ~'9 belong to ~b'(A) and

IA2] ~< 1.5n ~ ]x~ - xk IM max(l~. - 2k 1, I~k - It1 I). (41 )

Let 0 < ~ < c o - ~ - f l . B y L e m m a 3

,e,, n

P {max(12~ - 2kl, I-~ - /~11)~ > 4.5 ~n~+~ } = o(exp{-CnO-2(~+fl+°)})

as n ~ oc for some positive C. Also

P{ 12a~, - 2k] ~>n a} <~P{12~[ >>-n'/2} + P{12k[ >~n~/2} <~2Po. {IXil>~n~/2},

which is o (exp{-Cn~}) for some C. Indeed, for sufficiently small 2

E0~e ~lx'l <~E<e ~x' + E(he -ax' = e-~(°')[e 0(°'+~) + e q'(°'-~0] < oc

and

F Po,{lXll>nU2} = I f(xlO1)dl2(X)

alx I >~na/2

~< exp{-2n~ /2} __fxP>,~/z e~lxJf(xl01 ) d/2(x)

~< E0, e )lx' I exp{-2nG/2}.

Hence, for any 0 < ~r < o2/2 - :¢ - fl,

P{ IAz 1/> e,/3 } = o ( e x p { - Cn °'-z(~+/~+~)} + e x p { - C n ~ }).

We choose a = (co - 2c~ - 2//)/3, in which case

P{ [A21 >~ an/3 } = o ( e x p { - Cn ('°-2~-2fl)/3 } ).

The same estimate holds for P{IA3I ~>en/3}. Finally, by (39) and (40)

P { I( Av - Ak ) - ( &. - &)[ ~>en} = o( exp{-Cn(~°- 2~- 2fl)/3 } ).

The case v < k < v + n ~ can be treated similarly, and one has

P ~ max [ ( & , - A , ) - ( & , - & ) l > ~ e n } [ [v-n~,v+n ~1

V+n ~ ~< ~ P{[(Av - Ak) -- (Sv - Sk)l>~en} = o(exp{-Cn(O)-2~-2fl)/3}) •

k=v-n ~

4.5. Proof o f Theorem 3.3

Let ~ be an arbitrary number in (0, c o / 2 - fl). Then for any k < ~ v - n ~,

S v - S k = ~ 1 f ( x j lO1) = ( v - k ) { p l + ( 0 1 - 02 ) (2b- /21 )} . j=k+l og f(xjlO2)

M. Baron, A.L. RukhinlJournal o f Statistical Planning and Inference 58 (1997) 263~82 281

It follows from L e m m a 3 that

{ Pl } : ° ( e - c n ~ ) 1 - P I ~ , - ~11~< 210~-_ 021

for some posit ive C. Hence, S o - Sk>>-S~- Sk>~pln~/2 > c with probabil i ty 1 -

o(e -cn~ ) for any n > (2c/pl)1/:~, which implies k ~ Rc. The case k >~ v + n ~ is similar.

Thus

P{R,,¢[v - n ~,v + n~]}

= P{min(S~ - Sk, k ~ [v -- n~,v + n~])~<c} = o(e-C"~). (42)

Here c is any bounded funct ion of n, so that it can be replaced by c - r.n. By (42)

and Proposi t ion 3.1, Rc ~. and Tc are both subsets o f [v - n ~, v + n ~] with probabil i ty

1 - o(e - c ~ ) for some C > 0. Also by Proposi t ion 3.2

max I ( A , - P : o (exp{_Cn(°~-2a-2[J t /3 } ).

L v n ~ k ~ v + n ~ J

For any k in R~,_~,,, S0 - Sk ~< c - ~. and

A~ - A , = {(A,, - A , ) - (Sv - S , ) } + {(n~, - Av) - (S~ - S , ) } + (&~ - S , )

~< a. /2 + e . /2 + (c - ~;.) = c (43)

with probabi l i ty 1 - o ( e x p { - C r t (~°-2~-2/~)/3 } %- e x p { - C n ~ }). But A~ - Ak ~< c means that

k c T~, so that R~,_~. C T~ with the probabi l i ty o f the same order for any 0 < ~ < ~n/2-/3.

The optimal choice is ~ = (co - 2/~)/5, in which case

P{R, ...... C fc} = 1 - o(exp{-Cn("~-zl3)/5}).

The other inclusion, T¢ C Rc+~,, can be proven similarly by us ing (43).

References

Bahadur, R.R. (1971). Some Limit Theorems in Statistics. Regional Conferences Series in Applied Mathematics. SIAM, Philadelphia, PA.

Brown, L.D. (1986). Fundamentals o f Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes - Monograph Series, Vol. 9. Institute of Mathematical Statistics, Hayward, CA.

Chernoff, H. and S. Zacks (1964). Estimating the current mean of a normal distribution which is subject to changes in time. Ann. Math. Statist. 35, 999 1018.

Cobb, G.W. (1978). The problem of the Nile: conditional solution to a change-point problem. Biometrika 62, 243-251.

Hinkley, D.V. (1970). Inference about the change-point in a sequence of random variables. Biometrika 57, 1 17.

Lehmann, E. (1966). Some concepts of dependence. Ann. Math. Statist. 37, 1137-1153. Lotov, V.1. (1991). On random walks in a strip. Theory Probab. Appl. 36, 165 170.

282 M. Baron, A.L. Rukhin/Journal of Statistical Planning and Inference 58 (1997) 263-282

Rukhin, A.L. (1994). Asymptotic minimaxity in the change-point problem. In: D. Siegmund, E. Carlstein and H. Mueller, Eds., Proceedings of Change-Point Problems. Institute of Mathematical Statistics, Hayward, CA.

Siegmund, D. (1985). Sequential Analysis." Tests and Confidence Intervals'. Springer, New York. Siegmund, D. (1988). Confidence sets in change-point problems. Internat. Statist. Rev. 56, 31-48. Worsley, K.J. (1986). Confidence regions and tests for a change-point in a sequence of exponential family

random variables. Biometrika 73, 91-104.