ii.'mmom · a -sr. 8 7- 9 39 the partially observed stochastic minimum principle john baras...

A-EU 787 THE PRIRTZRLLY OBSERVED STOCHASTIC "NIINUN PRINCIPt.E(U 1/1ALERTS UNKIV EDMONTON DEPT OF STATISTICS AND APPIEDPROBAILITY J1 35155 ET AL. 11 NOV 9? AFOSR-TR-17-1939

IUCL SIF IE WOSS-06-9332F/O 2/3 NL

II.""'mmom

-% J.. .

% %

.~%

1 3.15 1 2 -

A....,..,.

.. J. ...

'."

II" " 11111'0

.,-..-... ,

-----------A % %.',,% .. . ., -. V

- - ~'~,- ,.- ,.- .

U' .'.¢ ".W ' ., . ""2. :." "..':'-.-,"." ." - "---."- '(.. " , ..- "".''-..".'"'.'."".",' ," "" "'-, ' ','': ,, '"'- ","-.

"" "'" n , 1

FILE, ckp,~ %

AF~A Ion ,oREPORT DOCUMENTATION PAGEMUJ IOA 189 l8 b. RESTRICTIVE MARKINGS%

2a. SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/ AVAILABILITY OF REPORT AA

2b. DECLASSIFICATION i DOWNGRADING SCHEDULE A~o~dfr~b c?~Iaedistributi± u unlimi~±ted.$

4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5MNITORING ORGANIZATION REPORT NUMBER(S)

AfOSR.Tlt. 87 -19 39 P

6. NAME OF PERFORMING ORGANIZATION 6bOFFICE SYMBOL 7a NAME OF MONITORING ORGANIZATION

University of Alberta AFOSR/NM

6c. ADDRESS (City, State, and ZIP Code) 7b AD WW tate, and ZIP Code)

Edmonton, Alberta, Canada T6G 2G1 Bldg 410 ~'i

Ba. NAME OF FUNDING/ SPONSORING I8b OFFICE SYMBOL 9 PROCURE'MENT INSTRUMENT IDENTIFICATION NUMBER

ORGANIZATION (If~ applicable)

AFOSR jNM AFOSR-86-0332

8c. ADDRESS (City, State",lnd ZIP Code) 10 SOURCE OF FUNDING NUMBERS4 PROGRAM IPROJECT ~TASK IWORK UNIT

Boi~ling AiFB DC ; ELEMENT NO NO NO, ACESSION NO

203,4961102F 2304 Al

11 TITLE (include Security Classification) -'

The Partially Observed Stochastic Minimum Principle

12 PERSONAL AUTHOR(S).. .

John Baras, Robert J. Elliott & Michael Kohlmann

13a. TYPE OF REPORT 13,TIME COVERED 14DTE OF REPORT (Year,&Mont, ay), AE ON

Reprin 1 F~M'' TO7 Nov. 11,1987I

16 SUPPLEMENTARY NOTATION

17 COSATI CODES '8 SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 10FIELD GROUP tSUB-GROUP

19 ABSTRACT (Continue on reverse if necessary and identify by block number) '

The focus of this research is thl filtering jump processes. To investigate the

filtering of manifold-valued processes, their approximation by random walks and Markov

chains was studied. The object was to approximate a signal process by a finite-state

jump process for which a finite-.demensioflal filter is available. Four papers were

published during the past year, including "The existence of smooth densities for the

prediction, filtering and smoothing problems" and "The partially observed stochastic

minimum principle". QI

20 DISTRIBUT!ON/ AVAILABILITY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATION .i C0 UNCLASSIFIED/UNLIMITED 0 SAME AS :ZIT ~

22a NAME OF RESPONSIBLE INDIVIDUAL ')'! TEEUNS(nlERAeSoe)2cOF ESMO

Maj Jmes Gowlev(202) 767-5025 N

DO FORM 1473,.84 MAR 83 APR eat e)- -ay oc .jsed urtil exhausted SCHFk5PGAll oV'e ed1tor are obsoletet

NI N

A -SR. 8 7- 9 39

The Partially Observed StochasticMinimum Principle

JOHN BARAS ".."

UNIVERSITY OF MARYLAND, USA

ROBERT J. ELLIOTT V

UNIVERSITY OF ALBERTA, CANADA

MICHAEL KOHLMANNUNIVERSITAT KONSTANZ, F.R. GERMANY

Acleoessiaoa ForNTIS aRA&,

Unannounced '"

Justi ieti '..

Distribution/

Availability Codes p ',

Dist Special ,.'a

. / 12 May 1987

-W.a

a..1'.,

:"i "

, '2:.

j'.* ~* *a a

_ %., j%

5-' ~."-,,,:..,,a

ACKNOWLEDGEMENTS. The second and third authors wish to thank the Natural 6.

Sciences and Engineering Research Council of Canada for its support under grant A-7964. %'~

The second author was partially supported by the Air Force Office of Scientific Research,

United States Air Force, under grant AFOSR-86-0332

and

European Office of Aerospace Research and Development, London, England.

16 %

a

The Partially Observed StochasticMinimum Principle

JOHN BARAS

SYSTEMS RESEARCH CENTER, UNIVERSITY OF MARYLAND

COLLEGE PARK, MD 20742 USA

ROBERT J. ELLIOTTDEPARTMENT OF STATISTICS AND APPLIED PROBABILITY, UNIVERSITY OF ALBERTA

EDMONTON, AB T6G 2G1 CANADA

MICHAEL KOHLMANN

FAKULTAT FUR WIRTSCHAFTSWISSENSCHAFTEN UND STATISTIK

UNIVERSITAT KONSTANZ, POSTFACH 5560, D-7750 F.R. GERMANY

0.

1. INTRODUCTION.

Various proofs have been given of the minimum principle satisfied by an optimal de

control in a partially observed stochastic control problem. See, for example, the papers

by Bensoussan [1], Elliott [5], Haussmann [7], and the recent paper [91 by Haussmann in

which the adjoint process is identified. The simple case of a partially observed Markov

chain is discussed in the University of Maryland lecture notes [6] of the second author.

We show in this article how a minimum principle for a partially observed diffusion

can be obtained by differentiating the statement that a control u* is optimal. The results

of Bismut [2], [3] and Kunita [10],, on stochastic flows enable us to compute in an easy and

explicit way the change in the cost due to a 'strong variation' of an optimal control. The

only technical difficulty is the justification of the differentiation. As we wished to exhibit

the simplification obtained by using the ideas of stochastic flows the result is not proved

%J1under the weakest possible hypotheses. Finally, in Section 6, we show how Bensoussan's

minimum principle follows from our result if the drift coefficient is differentiable in the

control variable.

I~i'%,,

.5 ~ .' '5 i ~ v. 4 ~m ~ ~ . .K I' 7 .*d~ V . °J..j

2. DYNAMICS.

Suppose the state of the system is described by a stochastic differential equation

d et = f (t, et, u)dt + g (t, Ct) dwt, ...

tERd, o=xo, o<t<T. (2.1)

The control parameter u will take values in a compact subset U of some Euclidean space Rk.

We shall make the following assumptions:

A,: xo is given; if ZO is a random variable and PO its distribution the situation when

f IxI9Po(dz) < oo for some q > n + 1 can be treated, as in [91, by including an extra

integration with respect to P.

A 2 : f : [0,TJ x Rd x U --+ Rd is Borel measurable, continuous in u for each (t,z),

continuously differentiable in z and for some constant K

(I + Iz)- ' If(t,z,u)I + If,(t,x,u)1 < K.

A 3 : g: [0, T] x Rd -- Rd 9 R is a matrix valued function, Borel measurable, continuously

differentiable in z, and for some constant K 2

Ig(t,-)I + Igz (t,z)I K 2.

The observation process is given by

dyt = h(Ct)dt + dvt

Yt E R n , yo -= o, O< t < T. (2.2)".-,

In the above equations w = (w ,...,w") and v = (v,...,v d ) are independent Brownian •

motions. We also assume

A 4 : h : Rd --+ R' is Borel measurable, continuously differentiable in x, and for some

constant K 3

Ih(t, z)I + Ih,(t,x)I < K 3 .

3=2. .

REMARKS 2.1. These hypotheses can be weakened. For example, in A 4 , h can be

allowed linear growth in x. Because g is bounded a delicate argument then implies the

exponential Z of (2.3) is in some L' space, 1 < p < oo. (See, for example, Theorem 2.2 of-b

[8]). However, when h is bounded Z is in all the L' spaces, (see Lemma 2.3). Also, if we

require f to have linear growth in u then the set of control values U can be unbounded

as in [9]. Our objective, however, is not the greatest generality but to demonstrate the

simplicity of the techniques of stochastic flows. .-

Let P denote Wiener measure on the C([O,T],R ' ) and jA denote Wiener measure 0

on C(O,T],Rm ). Consider the space fl = C([0,T],R'") x C([O,T],R ' ) with coordinate

functions (xt, yit) and define Wiener measure P on 01 by

P(dx,dy) = P(dz)ji(dy).

DEFINITION 2.2. Write Y = {YJ for the right continuous complete filtration on

C(10, TI, R m ) generated by Y0 ° = {f s < t}. The set of admissible control functions U

will be the Y-predictable functions on 10, T] x C([0, T], Rm ) with values in U.

For u E U and x E Rd write e' (x) for the strong solution of (2.1) corresponding to

control u, and with C, (x) = x. Write

Z.'(x)=exp( h(C,,,(x))'dy7 - ] h(C,' (x)) 2 dt (2.3)

and define a new probability measure P' on f] by d = Z 0 .=d]5- T (xo). Then under Pu

( o,t (xo),yt) is a solution of (2.1) and (2.2), that is Qt (xo) remains a strong solution of

(2.1) and there is an independent Brownian motion v such that yt satisfies (2.2). A version

of Z defined for every trajectory y of the observation process is obtained by integrating by

parts the stochastic integral in (2.3).

LEMMA 2.3. Under hypothesis A 4 , for t < T, 0

E(Zout (xo)) P < on for all u E U and all p, 1< p < o.

4 w-0:

" .*. " - .'..'.." ," ." -,' '_-.--." , -:, '..'-.':.,':: ." -."-:.-: -....................................... "."."...."..........'"".....'....."".".".."

0

PROOF.

Zo (o)= .1 + Z,,,,, (:xoJLOr (xo))'dyr.

Therefore, for any p there is a constant CP such that

(-)] 5C[1+Ef(o,,r(x:o))h h(Co,, o)d,'r P

The result follows by Gronwall's inequality.

COST 2.4. We shall suppose the cost is purely terminal and given by some bounded,

differentiable function ,

C( 0,T (XO))

which has bounded derivatives. Then the expected cost if control u E U is used is

J (u) = Eu [C (COT (XO))] S..,-

In terms of P, under which Yt is always a Brownian motion, this is

[Zo" (2, (o)]I.4):"J (e) T , (x.) C (CU. (X:O).

.

5~*

°" %"-.

-S

5.'--.°5 ." -"

S.,.

is'.

• ,.'. ,_'_', u ,,:'. v .' : " '• -"€ ",2''': ,"..,:,-,, ' , ' ': ,.' ., % ."," ,"¢ .'."u'.t . .' "u' . "% :.2.2 " ,.."% ,V : 2

3. STOCHASTIC FLOWS.

For u E U write

',t(x) = X + f(r, (x), u,)dr + g(r, ,(x))dw, (3.1)

for the solution of (2.1) over the time interval [8,t] with initial condition e, (x) = x. In

the sequel we wish to discuss the behaviour of (3.1) for each trajectory y of the observation

process. We have already noted there is a version of Z defined for every y. The results of

Bismut [2] and Kunita [10] extend easily and show the map @

Rd -' Rd

is, almost surely, for each y E C([0, T],R m ) a diffeomorphism. Bismut [21 initially gives J% A

proofs when the coefficients f and g are bounded, but points out that a stopping time

argument extends the results to when, for example, the coefficients have linear growth.

Write IIC (zo)It = sup I g*(zo)1. Then, as in Lemma 2.1 of [8], for any p, 5 'A.o<_<t 0

1 < p < oo using Gronwall's and Jensen's inequalities/o-'II ~xoll < C1 + [X01p + g g(r, ,, (xo)) dwr p '"

almost surely, for some constant C.5 '%

Therefore, using Burkholder's inequality and hypothesis A3 , IIC(x0)lIT is in LP for

all p, 1 < p < .. '

Suppose u° E U is an optimal control so J(u*) - J(u) for any other u E U. Write

(') for , ('). The Jacobian (z) is the matrix solution C, of the equation for s < t,

n

dCt (2 (t, , (x), u*)Ct dt + ( g')(t, " (x))Ctdw (3.2)01=1 ,2.'

with C, = I.

Here I is the n x n identity matrix and g(') is the ith column of g. From hypotheses A2

and A3 , f, and g, are bounded. Writing IIClIT = sup ICI an application of Gronwall'sO<s<t

6 N

Jensen's and Burkholder's inequalities again implies IICI1T is in LP for all p, 1 < p < oo. .1.

Consider the related matrix valued stochasic differential equation

Dt= I - Drfz(r, C; (x), u*)'dr

ft+ fD,(g()(r, Ca,r (x))') 2 dr. (3.3)

Then it can be checked that DtCt = I for t > s, so that Dt is the inverse of the Jacobian,

that is Dt = Again, because f and g, are bounded we have that jjDIlt is in

every LP, 1 < p < oo.

For a d-dimensional semimartingale zt Bismut [21 shows one can consider the flow

C;,t (zt) and gives the semimartingale representation of this process. In fact if zt = z, +

At + i f Hidwr is the d-dimensional semimartingale, Bismut's formula states thatIW(

zt.) =z. + (f(r, 6;,,(zr), U,*)

2 n a2 (H 1 )dfac;, 8' r **'

+ (, (r, (z,), u;) a (z,)H + - i.a;' 7 (zH) H) dr+ agq , (r' zr) ' f; -t

=

a dA + W (r, (z)) + a (zr)Hi dw..

-j- ai_ J(3.4)

DEFINITION 3.1. We shall consider perturbations of the optimal control u* of the fol-

lowing kind: For S E [0,T), h > 0 such that 0 < s < s + h < T, for any other admissible

control u E U and A E Y, define a strong variation of u* by

= u(t,w) if(t,w) [s,s+h]xAutLW) = (t,W) if(t,w) E [s,s + hl x A.

Applying (3.4) as in Theorem 5.1 of [4] we have the following result.

THEOREM 3.2. For the perturbation u of the optimal control u" consider the process

zi = + cz (f(r, C;,, (zr), ur) - f(r, C;,r(zr), u;) dr. (3.5)

7 7 • ":7-S

* 4 4 4W * ~ . .. ~ - ;~-#* e~d ~4.. 4,-

0

Then the process (zt) is indistinguishable from C' (x). ;

PROOF. Note the equation defining zt involves only an integral in time; there is no

martingale term, so to apply (3.4) we have Hi 0 for all i. Therefore, from (3.4)

g(zt) =X+ f f(r, ,(z) u,)dr

it (a,(Z, ae;,(zr\) )- (

However, the solution of (3.2) is unique so

G*'t~.1 (z) CtW

REMARKS 3.3. Note that the perturbation u(t) equals ts8 (t) if t > s+h so Zt Z,+h

if t> s + h and

Z,,z) = ,t (Z.+h, -+h,t (Cs,.+h (W)

8..w

4. AUGMENTED FLOWS.

Consider the augmented flow which includes as an extra coordinate the stochastic

exponential Z.,t with a 'variable' initial condition z E R for Z,. (-). That is, consider the

(d + 1) dimensional system given by:

ft .S(X) =X + j f(r, C,, (xz), u;) dr + j. g (r, (x)) dw7

Z,t (2,Z) = z + Z.,,:(x,zh( ,(x))'dy,.

Therefore,

Z, (X, Z) =z,,) ", t (X))2

= exp h (x)) dy, - f h( ,(x))

and we see there is a version of the enlarged system defined for each trajectory y by inte-

grating by parts the stochastic integral. The augmented map (x, z) - (, (x), Z t (x, z))

" is then almost surely a diffeomorphism of Rd+ . Note that 0 = O, (_)_ = 0 and8z Oz

= . The Jacobian of this augmented map is, therefore, represented by the matrix" z

and for 1< i < d from equation (3.3)(_ _Z,_t ( _, () ,,( (x)) o ,, (x)

i = (z:, (z,) G -xi

dxJ

dZ (x z) (4.1)

.- (Here the double index k is summed from 1 to n).

We shall be interested in the solution of this differential system (4.1) only in the

situation when z = 1 so we shall write Z t (x) for Z, t (x, 1). The following result is

motivated by formally differentiating the exponential formula for Z,,t (x).A,..

A.9 .

A,.

'A +" . "'.' _- ' -" " , °% ''' r '* -' '' + '+ + + ' + . . ,,,€ + .

W,WV l XiWWV W- -VV ww p~ji n VV'Q -VnWWWVU'VVWXV UwwwwU

LEMMA 4.1.

wx) - () (x) ) ) ,

where v = (v ...., 0 ) is the Brownian motion in the observation process.

PROOF. From (4.1) we see t is the solution of the stochastic differential equa-

tion 9Z.;,t (x) (a (Z,,,x) a". (x)z:h' (x)) + Z*, (x)h( C (x)) ,()d. (4.2)

Write

L,,, (x) = zax (d) v,.Z)

where

dy, h(*,t (x))dt + dvi.

Becauset

z:,, (x) =1 + z,,, (x)h' (,,. (x))dy,.

the product rule gives

L , ,t (x) = ' ww dv,.

+ hx dv)Z,,. (x)h'(,, (x))dy,.axjt ac;,,. d+ Z;,, (x)h'(,,. (x)) . hx. a d,

f L,,, (x) h' (x))dy + ftZ:. (x)hz a ',,.r dYxax

Therefore, L,,t (x) is also a solution of (4.2) so by uniqueness

L,.t(X) - z;(x)axREMARKS 4.2. As noted at the beginning of this section we can consider the

augmented flow

(x,z)-- ( x;,,(x), Z ,t(x,z)) for x E R, zC R,

and we are only interested in the situation when z = 1, so we write Z t (x).

10

-- " i " I I I - i l i l i i ' i l l ~ i l i - - " - " " - -

LEMMA 4. ., z)=Z (x) where zt is the semnimartin gale defined in (3.6).

PROOF. Z~t (x) is the process uniquely defined by I.

Z"' (W = 1 + Z'xh ()d, 42

faS

Consider an augmented (d + 1) dimensional version of (3.6) defining a semnimartingale

-t= (zt, 1), so the additional component is always identically 1. Then applying (3.5) to

the new component of the augmented process we have

tS

z-,7 (Zr) = + 18 z:, (z,) h'(C,, (z,)) dy7

=1 + ftz,, (z.) h'(, (x)) dy7

by Theorem 3.2. However, (4.2) has a unique solution so Z,*, (zt) Z,'x)

REMARKS 4.4. Note that for t > s + h

Zj (zt) =Z.*, (z8+h)

w~jMv ~ ~ I'VN-WW.V'v wj Wr JVV~ WV VZ -J - - %

iS

a,.'-

5. THE MINIMUM PRINCIPLE. P% .

Control u will be the perturbation of the optimal control u' as in Definition 3.1. We

shall write x C 0,, (xo). Then the minimum cost is..

J(u*) = E[ZT (XO)C(G,T (X0))J

= E[Z,, (XO) z;,T (X) C(G;T ())

The cost corresponding to the perturbed control u is

J(u) = E[Ze,, (xo)Z,,T ()c(,T (x))]

= E[Z,. (XO)Z,T (Z.+h )C(;,T (Z.+h )) %7.

by Theorem 3.2 and Lemma 4.3. Now Z,,T(.) and C(C;,T(')) are differentiable with con-

tinuous and uniformly integrable derivatives. Therefore

J(u) - J(u*) -- E[Z;, (XO)(Z,T (Za+h )c(8;,t (z,+h)) - Z,T (X)C(C(,T (X)))]

--E[ r(sz,)(f (r, C;,, (z,), ,) - f(r, C,.,(x), u, )d r

where '

r(s,z,) = z,. (xo)z(,T(z,) {ct(CT(Z))a +

) h(C,, (z.)) , (z.)dv) }(x,(Z))

Note that this expression gives an explicit formula for the change in the cost resulting from

a variation in the optimal control. The only remaining problem is to justify differentiating S

the right hand side.

From Lemma 2.3, Z is in every LIP space, 1 < p < oo and from the remarks at the

beginning of Section 3, CT = q and DT = z r ic

8z IaT are in every/L space, I < p < oo.

Consequently, r is in every LP space, 1 < p < oo. -p.

12 ,

.:.'

Therefore

J(u) - J(u') E jF [(s Zr) -(s,zx)) (f(r, C;,,(z,), ur) f f(r, ~(4T), U;))dr

E E[ (r(s, x) -r(r,zx)) (f(r, C;, ()U') - f (r, z,Z), u8*)3 dr

+ E E[r(r,x) (f (r , (Zr), i r) f f(r, e:,. (Z'), U;)

-f(r, C8, WUr) + f (r, C;,,(x), )]dr

r8a+h

1+ E [r(r,x) (f (r, Co*, (XO), U r) - f (r, C*r.(xO), 14))]dr I

II 1(h) + 12 (h) + 13 (h) + 14 (h), say.

Now,

III (h)i I K1 E IFr(s,z.) r(S, X)P( + IIC'(XO)Ila,+h)dr

< K~h sup E[lI' (S,Zr) - r(S,zX)l(I + (ICU(XO)lla+h]a<r<s+h

8a+h -

<5K~h sup E[lr(~z) li- Z~~xl~l + ]. XO118<r<8+1

K3 ft

B'm urn (s,z7 ) - r(s,x)I1 0 a.s.

lrn IF(s,x) - rF(r,x)i 0 a.s.

Jim lix - Z.ii.+Ak = 0.

13

" R

Therefore,

lim IIr(s,z,) - r(s,X)lp = 0 ,.t"48 '

Jri Ir(s, X) - r(r,X) ip 0

and lira II(lz - Z1+h )l = o for some p.

Consequently, Jr h - 1 I&(h) = 0, for k = 1,2,3.

The only remaining problem concerns the differentiability of

,5+h

14 (h) E [r (r, x)(f (r, t0*,. (xo), u,,) - f (r, CO*r (XO), U;))]dr.

The integrand is almost surely in L1 ([0, T]) so im h-' 14 (h) exists for almost every s G

[0, T]. However, the set of times {s} where the limit may not exist might depend on the

control u. Consequently we must restrict the perturbations u of the optimal control u' to

perturbations from a countable dense set of controls. In fact:

1) Because the trajectories are, almost surely, continuous, Yp is countably generated

by sets {Aip}, i = 1,2,... for any rational number p E [0,T]. Consequently Y is

countably generated by the sets {A 1, }, r < t.

2) Let Gt denote the set of measurable functions from (fl, Y) to U C Rk. (If u E U

then u(t,w) E Gt.) Using the Ll-norm, as in [5], there is a countable dense subset *

Hp = {ujp } of Gp, for rational p E [0, T1. If Ht = U Hp then Hi is a countable

dense subset of Gt. If ujp E Hp then, as a function constant in time, ui, can beI

considered as an admissible control over the time interval it, T] for t > p.

3) The countable family of perturbations is obtained by considering sets Ai, E Yt,

functions ujp E Ht, where p < t, and defining as in 3.1

-{ u*(s, w) if (s,w) it,T] x aipU (S, w) ='- ,u (s, w) if (s, w) E [t,T x A,,.

Then for each i,j,p "

rr (51lim h - ' Er(r,)(f(r, Co*,, (o), u,) - f(r, c*(x0), ts))Jdr (5.1)h-0 ,' '

14tc~...

exists an-qul

E[r(s, X)(f(s, C ,a(X0), tLjP) - f (s, C'. (X0), u*))i.,]g.

for almost all s E 10, TJ.

Therefore, considering this perturbation we have

lim h-1 (J(u;,) - J(us)) = E[P(s,z)(f(S, C0*5 (XO), ,ijp) f(S, C;,. (XO), U*))I~~]Ja-o

> 0 for almost all s E [0, TI.

Consequently there is a set S C 10, T] of zero Lebesgue measure such that, if s S , the

limit in (5.1) exists for all i*,j,p, and gives

Using the monotone class theorem, and approximating an arbitrary admissible control

u E U we can deduce that if s S

E[Fs~)(fSC0*,(0),U)- A~S, Co*(XO), U*)I 0 for any A E Y8. (5.2)

Write

aGT(X)T a G ,(xo)) )P. (x) =E* [cc(60,T (XO)) a W X) 3XY

where, as before, z x~ (xo) and E' denotes expectation under P' = P*0. Then p. (z)

is the co-state variable and we have in (5.2) proved the following 'conditional' minimum

principle:

THEOREM 5. 1. If u' E U is an optimal control there is a set S C 10, TI of zero Lebesgue

measure such that if s V S

E* [p, (x)f (s, x, u') Y.] 5 P p. (x)f (s, x, u) I 1~ a.s.

That is, the optimal control ti' almost surely minimizes the conditional Hamiltonian and

the adjoint variable is p5 (x).

6. CONCLUSION.

Using the theory of stochastic flows the effect of a perturbation of an optimal control

is explicitly calculated. The only difficulty was to justify its differentiation. The adjoint

process is explicitly identified as p, (x). W'

THEOREM 6. 1. If f is differentiable in the control variable u, and if the random variable

z Co*,, (zo) has a conditional density q.(x) under the measure P*, then the inequality of

Theorem 5.1 implies

k?uj(8) - i (s)) fm) <o"

This is the result of Bensoussan's paper Il].

The method of this paper can be applied to completely observable systems by ini-

tially considering 'stochastic open loop' controls, systems with stochastic constraints and.4?

deterministic systems. The adjoint process can be explicitly identified. 'Almost minimum'

principles for 'almost optimal' controls can be obtained. Some of these will be discussed

in later work.

16%

.4,.

.%

.4.

161

.. ,, _ -,. ; ,i . .. , % % a % , ., o- % 4, 4w . . ..... t' '..% ' m'. ,,t o " ."., ,-.;-. .- I":..' , i -" ." '.; .,,i"

."

References[1] A. Bensoussan, Maximum principle and dynamic programming approaches of the

optimal control of partially observed diffusions. Stochastics, 9(1983), 169-222. p

[2] J.M. Bismut, A generalized formula of Ito and some other properties of stochasticflows. Zeits. fur Wahrs. 55(1981), 331-350.

[3] J.M. Bismut, Mecanique Aliatoire. Lecture Notes in Mathematics 866, Springer-Verlag, 1981.

[4] J.M. Bismut, Mecanique Aliatoire. In Ecole d'Et6 de Probabilites de Saint-Flour X.Lecture Notes in Mathematics 929, pp. 1-100. Springer-Verlag, 1982.

[5] R.J. Elliott, The optimal control of a stochastic system. SIAM Jour. Control andOpt. 15(1977), 756-778.

[6] R.J. Elliott, Filtering and control for point process observations. Systems Science

Center, University of Maryland. Springer Lecture Notes, to appear.

[7] U.G. Haussmann, General necessary conditions for optimal control of stochastic sys-tems. Math. Programming Studies 6(1976), 30-48.

[8] U.G. Haussmann, A Stochastic Minimum Principle for Optimal Control of Diffu-sions. Pitman Research Notes in Mathematics 151, Longman U.K., 1986.

[9] U.G. Haussmann, The maximum principle for optimal control of diffusions with par-tial information. SIAM J. Control and Optimization 25(1987), 341-361.

[10] H. Kunita, On the decomposition of solutions of stochastic differential equations.Lecture Notes in Mathematics 851(1980), 213-255.

17d.

-p

4'

4'

A

0*

-pA.

FrrF

FLMED 1p~jA.

-- p*dd'

I

(Lu I

S

(p.

/61 -p

A.

*

C. ** -.

ii.'mmom · a -sr. 8 7- 9 39 the partially observed stochastic minimum principle john baras...

Documents