ii.'mmom · a -sr. 8 7- 9 39 the partially observed stochastic minimum principle john baras...
TRANSCRIPT
A-EU 787 THE PRIRTZRLLY OBSERVED STOCHASTIC "NIINUN PRINCIPt.E(U 1/1ALERTS UNKIV EDMONTON DEPT OF STATISTICS AND APPIEDPROBAILITY J1 35155 ET AL. 11 NOV 9? AFOSR-TR-17-1939
IUCL SIF IE WOSS-06-9332F/O 2/3 NL
II.""'mmom
-% J.. .
% %
.~%
1 3.15 1 2 -
A....,..,.
.. J. ...
'."
II" " 11111'0
.,-..-... ,
-----------A % %.',,% .. . ., -. V
- - ~'~,- ,.- ,.- .
U' .'.¢ ".W ' ., . ""2. :." "..':'-.-,"." ." - "---."- '(.. " , ..- "".''-..".'"'.'."".",' ," "" "'-, ' ','': ,, '"'- ","-.
"" "'" n , 1
FILE, ckp,~ %
AF~A Ion ,oREPORT DOCUMENTATION PAGEMUJ IOA 189 l8 b. RESTRICTIVE MARKINGS%
2a. SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/ AVAILABILITY OF REPORT AA
2b. DECLASSIFICATION i DOWNGRADING SCHEDULE A~o~dfr~b c?~Iaedistributi± u unlimi~±ted.$
4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5MNITORING ORGANIZATION REPORT NUMBER(S)
AfOSR.Tlt. 87 -19 39 P
6. NAME OF PERFORMING ORGANIZATION 6bOFFICE SYMBOL 7a NAME OF MONITORING ORGANIZATION
University of Alberta AFOSR/NM
6c. ADDRESS (City, State, and ZIP Code) 7b AD WW tate, and ZIP Code)
Edmonton, Alberta, Canada T6G 2G1 Bldg 410 ~'i
Ba. NAME OF FUNDING/ SPONSORING I8b OFFICE SYMBOL 9 PROCURE'MENT INSTRUMENT IDENTIFICATION NUMBER
ORGANIZATION (If~ applicable)
AFOSR jNM AFOSR-86-0332
8c. ADDRESS (City, State",lnd ZIP Code) 10 SOURCE OF FUNDING NUMBERS4 PROGRAM IPROJECT ~TASK IWORK UNIT
Boi~ling AiFB DC ; ELEMENT NO NO NO, ACESSION NO
203,4961102F 2304 Al
11 TITLE (include Security Classification) -'
The Partially Observed Stochastic Minimum Principle
12 PERSONAL AUTHOR(S).. .
John Baras, Robert J. Elliott & Michael Kohlmann
13a. TYPE OF REPORT 13,TIME COVERED 14DTE OF REPORT (Year,&Mont, ay), AE ON
Reprin 1 F~M'' TO7 Nov. 11,1987I
16 SUPPLEMENTARY NOTATION
17 COSATI CODES '8 SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 10FIELD GROUP tSUB-GROUP
19 ABSTRACT (Continue on reverse if necessary and identify by block number) '
The focus of this research is thl filtering jump processes. To investigate the
filtering of manifold-valued processes, their approximation by random walks and Markov
chains was studied. The object was to approximate a signal process by a finite-state
jump process for which a finite-.demensioflal filter is available. Four papers were
published during the past year, including "The existence of smooth densities for the
prediction, filtering and smoothing problems" and "The partially observed stochastic
minimum principle". QI
20 DISTRIBUT!ON/ AVAILABILITY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATION .i C0 UNCLASSIFIED/UNLIMITED 0 SAME AS :ZIT ~
22a NAME OF RESPONSIBLE INDIVIDUAL ')'! TEEUNS(nlERAeSoe)2cOF ESMO
Maj Jmes Gowlev(202) 767-5025 N
DO FORM 1473,.84 MAR 83 APR eat e)- -ay oc .jsed urtil exhausted SCHFk5PGAll oV'e ed1tor are obsoletet
NI N
A -SR. 8 7- 9 39
The Partially Observed StochasticMinimum Principle
JOHN BARAS ".."
UNIVERSITY OF MARYLAND, USA
ROBERT J. ELLIOTT V
UNIVERSITY OF ALBERTA, CANADA
MICHAEL KOHLMANNUNIVERSITAT KONSTANZ, F.R. GERMANY
Acleoessiaoa ForNTIS aRA&,
Unannounced '"
Justi ieti '..
Distribution/
Availability Codes p ',
Dist Special ,.'a
. / 12 May 1987
-W.a
a..1'.,
:"i "
, '2:.
j'.* ~* *a a
_ %., j%
5-' ~."-,,,:..,,a
ACKNOWLEDGEMENTS. The second and third authors wish to thank the Natural 6.
Sciences and Engineering Research Council of Canada for its support under grant A-7964. %'~
The second author was partially supported by the Air Force Office of Scientific Research,
United States Air Force, under grant AFOSR-86-0332
and
European Office of Aerospace Research and Development, London, England.
16 %
a
The Partially Observed StochasticMinimum Principle
JOHN BARAS
SYSTEMS RESEARCH CENTER, UNIVERSITY OF MARYLAND
COLLEGE PARK, MD 20742 USA
ROBERT J. ELLIOTTDEPARTMENT OF STATISTICS AND APPLIED PROBABILITY, UNIVERSITY OF ALBERTA
EDMONTON, AB T6G 2G1 CANADA
MICHAEL KOHLMANN
FAKULTAT FUR WIRTSCHAFTSWISSENSCHAFTEN UND STATISTIK
UNIVERSITAT KONSTANZ, POSTFACH 5560, D-7750 F.R. GERMANY
0.
1. INTRODUCTION.
Various proofs have been given of the minimum principle satisfied by an optimal de
control in a partially observed stochastic control problem. See, for example, the papers
by Bensoussan [1], Elliott [5], Haussmann [7], and the recent paper [91 by Haussmann in
which the adjoint process is identified. The simple case of a partially observed Markov
chain is discussed in the University of Maryland lecture notes [6] of the second author.
We show in this article how a minimum principle for a partially observed diffusion
can be obtained by differentiating the statement that a control u* is optimal. The results
of Bismut [2], [3] and Kunita [10],, on stochastic flows enable us to compute in an easy and
explicit way the change in the cost due to a 'strong variation' of an optimal control. The
only technical difficulty is the justification of the differentiation. As we wished to exhibit
the simplification obtained by using the ideas of stochastic flows the result is not proved
%J1under the weakest possible hypotheses. Finally, in Section 6, we show how Bensoussan's
minimum principle follows from our result if the drift coefficient is differentiable in the
control variable.
I~i'%,,
.5 ~ .' '5 i ~ v. 4 ~m ~ ~ . .K I' 7 .*d~ V . °J..j
2. DYNAMICS.
Suppose the state of the system is described by a stochastic differential equation
d et = f (t, et, u)dt + g (t, Ct) dwt, ...
tERd, o=xo, o<t<T. (2.1)
The control parameter u will take values in a compact subset U of some Euclidean space Rk.
We shall make the following assumptions:
A,: xo is given; if ZO is a random variable and PO its distribution the situation when
f IxI9Po(dz) < oo for some q > n + 1 can be treated, as in [91, by including an extra
integration with respect to P.
A 2 : f : [0,TJ x Rd x U --+ Rd is Borel measurable, continuous in u for each (t,z),
continuously differentiable in z and for some constant K
(I + Iz)- ' If(t,z,u)I + If,(t,x,u)1 < K.
A 3 : g: [0, T] x Rd -- Rd 9 R is a matrix valued function, Borel measurable, continuously
differentiable in z, and for some constant K 2
Ig(t,-)I + Igz (t,z)I K 2.
The observation process is given by
dyt = h(Ct)dt + dvt
Yt E R n , yo -= o, O< t < T. (2.2)".-,
In the above equations w = (w ,...,w") and v = (v,...,v d ) are independent Brownian •
motions. We also assume
A 4 : h : Rd --+ R' is Borel measurable, continuously differentiable in x, and for some
constant K 3
Ih(t, z)I + Ih,(t,x)I < K 3 .
3=2. .
REMARKS 2.1. These hypotheses can be weakened. For example, in A 4 , h can be
allowed linear growth in x. Because g is bounded a delicate argument then implies the
exponential Z of (2.3) is in some L' space, 1 < p < oo. (See, for example, Theorem 2.2 of-b
[8]). However, when h is bounded Z is in all the L' spaces, (see Lemma 2.3). Also, if we
require f to have linear growth in u then the set of control values U can be unbounded
as in [9]. Our objective, however, is not the greatest generality but to demonstrate the
simplicity of the techniques of stochastic flows. .-
Let P denote Wiener measure on the C([O,T],R ' ) and jA denote Wiener measure 0
on C(O,T],Rm ). Consider the space fl = C([0,T],R'") x C([O,T],R ' ) with coordinate
functions (xt, yit) and define Wiener measure P on 01 by
P(dx,dy) = P(dz)ji(dy).
DEFINITION 2.2. Write Y = {YJ for the right continuous complete filtration on
C(10, TI, R m ) generated by Y0 ° = {f s < t}. The set of admissible control functions U
will be the Y-predictable functions on 10, T] x C([0, T], Rm ) with values in U.
For u E U and x E Rd write e' (x) for the strong solution of (2.1) corresponding to
control u, and with C, (x) = x. Write
Z.'(x)=exp( h(C,,,(x))'dy7 - ] h(C,' (x)) 2 dt (2.3)
and define a new probability measure P' on f] by d = Z 0 .=d]5- T (xo). Then under Pu
( o,t (xo),yt) is a solution of (2.1) and (2.2), that is Qt (xo) remains a strong solution of
(2.1) and there is an independent Brownian motion v such that yt satisfies (2.2). A version
of Z defined for every trajectory y of the observation process is obtained by integrating by
parts the stochastic integral in (2.3).
LEMMA 2.3. Under hypothesis A 4 , for t < T, 0
E(Zout (xo)) P < on for all u E U and all p, 1< p < o.
4 w-0:
" .*. " - .'..'.." ," ." -,' '_-.--." , -:, '..'-.':.,':: ." -."-:.-: -....................................... "."."...."..........'"".....'....."".".".."
0
PROOF.
Zo (o)= .1 + Z,,,,, (:xoJLOr (xo))'dyr.
Therefore, for any p there is a constant CP such that
(-)] 5C[1+Ef(o,,r(x:o))h h(Co,, o)d,'r P
The result follows by Gronwall's inequality.
COST 2.4. We shall suppose the cost is purely terminal and given by some bounded,
differentiable function ,
C( 0,T (XO))
which has bounded derivatives. Then the expected cost if control u E U is used is
J (u) = Eu [C (COT (XO))] S..,-
In terms of P, under which Yt is always a Brownian motion, this is
[Zo" (2, (o)]I.4):"J (e) T , (x.) C (CU. (X:O).
.
5~*
°" %"-.
-S
5.'--.°5 ." -"
S.,.
is'.
• ,.'. ,_'_', u ,,:'. v .' : " '• -"€ ",2''': ,"..,:,-,, ' , ' ': ,.' ., % ."," ,"¢ .'."u'.t . .' "u' . "% :.2.2 " ,.."% ,V : 2
3. STOCHASTIC FLOWS.
For u E U write
',t(x) = X + f(r, (x), u,)dr + g(r, ,(x))dw, (3.1)
for the solution of (2.1) over the time interval [8,t] with initial condition e, (x) = x. In
the sequel we wish to discuss the behaviour of (3.1) for each trajectory y of the observation
process. We have already noted there is a version of Z defined for every y. The results of
Bismut [2] and Kunita [10] extend easily and show the map @
Rd -' Rd
is, almost surely, for each y E C([0, T],R m ) a diffeomorphism. Bismut [21 initially gives J% A
proofs when the coefficients f and g are bounded, but points out that a stopping time
argument extends the results to when, for example, the coefficients have linear growth.
Write IIC (zo)It = sup I g*(zo)1. Then, as in Lemma 2.1 of [8], for any p, 5 'A.o<_<t 0
1 < p < oo using Gronwall's and Jensen's inequalities/o-'II ~xoll < C1 + [X01p + g g(r, ,, (xo)) dwr p '"
almost surely, for some constant C.5 '%
Therefore, using Burkholder's inequality and hypothesis A3 , IIC(x0)lIT is in LP for
all p, 1 < p < .. '
Suppose u° E U is an optimal control so J(u*) - J(u) for any other u E U. Write
(') for , ('). The Jacobian (z) is the matrix solution C, of the equation for s < t,
n
dCt (2 (t, , (x), u*)Ct dt + ( g')(t, " (x))Ctdw (3.2)01=1 ,2.'
with C, = I.
Here I is the n x n identity matrix and g(') is the ith column of g. From hypotheses A2
and A3 , f, and g, are bounded. Writing IIClIT = sup ICI an application of Gronwall'sO<s<t
6 N
Jensen's and Burkholder's inequalities again implies IICI1T is in LP for all p, 1 < p < oo. .1.
Consider the related matrix valued stochasic differential equation
Dt= I - Drfz(r, C; (x), u*)'dr
ft+ fD,(g()(r, Ca,r (x))') 2 dr. (3.3)
Then it can be checked that DtCt = I for t > s, so that Dt is the inverse of the Jacobian,
that is Dt = Again, because f and g, are bounded we have that jjDIlt is in
every LP, 1 < p < oo.
For a d-dimensional semimartingale zt Bismut [21 shows one can consider the flow
C;,t (zt) and gives the semimartingale representation of this process. In fact if zt = z, +
At + i f Hidwr is the d-dimensional semimartingale, Bismut's formula states thatIW(
zt.) =z. + (f(r, 6;,,(zr), U,*)
2 n a2 (H 1 )dfac;, 8' r **'
+ (, (r, (z,), u;) a (z,)H + - i.a;' 7 (zH) H) dr+ agq , (r' zr) ' f; -t
=
a dA + W (r, (z)) + a (zr)Hi dw..
-j- ai_ J(3.4)
DEFINITION 3.1. We shall consider perturbations of the optimal control u* of the fol-
lowing kind: For S E [0,T), h > 0 such that 0 < s < s + h < T, for any other admissible
control u E U and A E Y, define a strong variation of u* by
= u(t,w) if(t,w) [s,s+h]xAutLW) = (t,W) if(t,w) E [s,s + hl x A.
Applying (3.4) as in Theorem 5.1 of [4] we have the following result.
THEOREM 3.2. For the perturbation u of the optimal control u" consider the process
zi = + cz (f(r, C;,, (zr), ur) - f(r, C;,r(zr), u;) dr. (3.5)
7 7 • ":7-S
* 4 4 4W * ~ . .. ~ - ;~-#* e~d ~4.. 4,-
0
Then the process (zt) is indistinguishable from C' (x). ;
PROOF. Note the equation defining zt involves only an integral in time; there is no
martingale term, so to apply (3.4) we have Hi 0 for all i. Therefore, from (3.4)
g(zt) =X+ f f(r, ,(z) u,)dr
it (a,(Z, ae;,(zr\) )- (
However, the solution of (3.2) is unique so
G*'t~.1 (z) CtW
REMARKS 3.3. Note that the perturbation u(t) equals ts8 (t) if t > s+h so Zt Z,+h
if t> s + h and
Z,,z) = ,t (Z.+h, -+h,t (Cs,.+h (W)
8..w
4. AUGMENTED FLOWS.
Consider the augmented flow which includes as an extra coordinate the stochastic
exponential Z.,t with a 'variable' initial condition z E R for Z,. (-). That is, consider the
(d + 1) dimensional system given by:
ft .S(X) =X + j f(r, C,, (xz), u;) dr + j. g (r, (x)) dw7
Z,t (2,Z) = z + Z.,,:(x,zh( ,(x))'dy,.
Therefore,
Z, (X, Z) =z,,) ", t (X))2
= exp h (x)) dy, - f h( ,(x))
and we see there is a version of the enlarged system defined for each trajectory y by inte-
grating by parts the stochastic integral. The augmented map (x, z) - (, (x), Z t (x, z))
" is then almost surely a diffeomorphism of Rd+ . Note that 0 = O, (_)_ = 0 and8z Oz
= . The Jacobian of this augmented map is, therefore, represented by the matrix" z
and for 1< i < d from equation (3.3)(_ _Z,_t ( _, () ,,( (x)) o ,, (x)
i = (z:, (z,) G -xi
dxJ
dZ (x z) (4.1)
.- (Here the double index k is summed from 1 to n).
We shall be interested in the solution of this differential system (4.1) only in the
situation when z = 1 so we shall write Z t (x) for Z, t (x, 1). The following result is
motivated by formally differentiating the exponential formula for Z,,t (x).A,..
A.9 .
A,.
'A +" . "'.' _- ' -" " , °% ''' r '* -' '' + '+ + + ' + . . ,,,€ + .
W,WV l XiWWV W- -VV ww p~ji n VV'Q -VnWWWVU'VVWXV UwwwwU
LEMMA 4.1.
wx) - () (x) ) ) ,
where v = (v ...., 0 ) is the Brownian motion in the observation process.
PROOF. From (4.1) we see t is the solution of the stochastic differential equa-
tion 9Z.;,t (x) (a (Z,,,x) a". (x)z:h' (x)) + Z*, (x)h( C (x)) ,()d. (4.2)
Write
L,,, (x) = zax (d) v,.Z)
where
dy, h(*,t (x))dt + dvi.
Becauset
z:,, (x) =1 + z,,, (x)h' (,,. (x))dy,.
the product rule gives
L , ,t (x) = ' ww dv,.
+ hx dv)Z,,. (x)h'(,, (x))dy,.axjt ac;,,. d+ Z;,, (x)h'(,,. (x)) . hx. a d,
f L,,, (x) h' (x))dy + ftZ:. (x)hz a ',,.r dYxax
Therefore, L,,t (x) is also a solution of (4.2) so by uniqueness
L,.t(X) - z;(x)axREMARKS 4.2. As noted at the beginning of this section we can consider the
augmented flow
(x,z)-- ( x;,,(x), Z ,t(x,z)) for x E R, zC R,
and we are only interested in the situation when z = 1, so we write Z t (x).
10
-- " i " I I I - i l i l i i ' i l l ~ i l i - - " - " " - -
LEMMA 4. ., z)=Z (x) where zt is the semnimartin gale defined in (3.6).
PROOF. Z~t (x) is the process uniquely defined by I.
Z"' (W = 1 + Z'xh ()d, 42
faS
Consider an augmented (d + 1) dimensional version of (3.6) defining a semnimartingale
-t= (zt, 1), so the additional component is always identically 1. Then applying (3.5) to
the new component of the augmented process we have
tS
z-,7 (Zr) = + 18 z:, (z,) h'(C,, (z,)) dy7
=1 + ftz,, (z.) h'(, (x)) dy7
by Theorem 3.2. However, (4.2) has a unique solution so Z,*, (zt) Z,'x)
REMARKS 4.4. Note that for t > s + h
Zj (zt) =Z.*, (z8+h)
w~jMv ~ ~ I'VN-WW.V'v wj Wr JVV~ WV VZ -J - - %
iS
a,.'-
5. THE MINIMUM PRINCIPLE. P% .
Control u will be the perturbation of the optimal control u' as in Definition 3.1. We
shall write x C 0,, (xo). Then the minimum cost is..
J(u*) = E[ZT (XO)C(G,T (X0))J
= E[Z,, (XO) z;,T (X) C(G;T ())
The cost corresponding to the perturbed control u is
J(u) = E[Ze,, (xo)Z,,T ()c(,T (x))]
= E[Z,. (XO)Z,T (Z.+h )C(;,T (Z.+h )) %7.
by Theorem 3.2 and Lemma 4.3. Now Z,,T(.) and C(C;,T(')) are differentiable with con-
tinuous and uniformly integrable derivatives. Therefore
J(u) - J(u*) -- E[Z;, (XO)(Z,T (Za+h )c(8;,t (z,+h)) - Z,T (X)C(C(,T (X)))]
--E[ r(sz,)(f (r, C;,, (z,), ,) - f(r, C,.,(x), u, )d r
where '
r(s,z,) = z,. (xo)z(,T(z,) {ct(CT(Z))a +
) h(C,, (z.)) , (z.)dv) }(x,(Z))
Note that this expression gives an explicit formula for the change in the cost resulting from
a variation in the optimal control. The only remaining problem is to justify differentiating S
the right hand side.
From Lemma 2.3, Z is in every LIP space, 1 < p < oo and from the remarks at the
beginning of Section 3, CT = q and DT = z r ic
8z IaT are in every/L space, I < p < oo.
Consequently, r is in every LP space, 1 < p < oo. -p.
12 ,
.:.'
Therefore
J(u) - J(u') E jF [(s Zr) -(s,zx)) (f(r, C;,,(z,), ur) f f(r, ~(4T), U;))dr
E E[ (r(s, x) -r(r,zx)) (f(r, C;, ()U') - f (r, z,Z), u8*)3 dr
+ E E[r(r,x) (f (r , (Zr), i r) f f(r, e:,. (Z'), U;)
-f(r, C8, WUr) + f (r, C;,,(x), )]dr
r8a+h
1+ E [r(r,x) (f (r, Co*, (XO), U r) - f (r, C*r.(xO), 14))]dr I
II 1(h) + 12 (h) + 13 (h) + 14 (h), say.
Now,
III (h)i I K1 E IFr(s,z.) r(S, X)P( + IIC'(XO)Ila,+h)dr
< K~h sup E[lI' (S,Zr) - r(S,zX)l(I + (ICU(XO)lla+h]a<r<s+h
8a+h -
<5K~h sup E[lr(~z) li- Z~~xl~l + ]. XO118<r<8+1
K3 ft
B'm urn (s,z7 ) - r(s,x)I1 0 a.s.
lrn IF(s,x) - rF(r,x)i 0 a.s.
Jim lix - Z.ii.+Ak = 0.
13
" R
Therefore,
lim IIr(s,z,) - r(s,X)lp = 0 ,.t"48 '
Jri Ir(s, X) - r(r,X) ip 0
and lira II(lz - Z1+h )l = o for some p.
Consequently, Jr h - 1 I&(h) = 0, for k = 1,2,3.
The only remaining problem concerns the differentiability of
,5+h
14 (h) E [r (r, x)(f (r, t0*,. (xo), u,,) - f (r, CO*r (XO), U;))]dr.
The integrand is almost surely in L1 ([0, T]) so im h-' 14 (h) exists for almost every s G
[0, T]. However, the set of times {s} where the limit may not exist might depend on the
control u. Consequently we must restrict the perturbations u of the optimal control u' to
perturbations from a countable dense set of controls. In fact:
1) Because the trajectories are, almost surely, continuous, Yp is countably generated
by sets {Aip}, i = 1,2,... for any rational number p E [0,T]. Consequently Y is
countably generated by the sets {A 1, }, r < t.
2) Let Gt denote the set of measurable functions from (fl, Y) to U C Rk. (If u E U
then u(t,w) E Gt.) Using the Ll-norm, as in [5], there is a countable dense subset *
Hp = {ujp } of Gp, for rational p E [0, T1. If Ht = U Hp then Hi is a countable
dense subset of Gt. If ujp E Hp then, as a function constant in time, ui, can beI
considered as an admissible control over the time interval it, T] for t > p.
3) The countable family of perturbations is obtained by considering sets Ai, E Yt,
functions ujp E Ht, where p < t, and defining as in 3.1
-{ u*(s, w) if (s,w) it,T] x aipU (S, w) ='- ,u (s, w) if (s, w) E [t,T x A,,.
Then for each i,j,p "
rr (51lim h - ' Er(r,)(f(r, Co*,, (o), u,) - f(r, c*(x0), ts))Jdr (5.1)h-0 ,' '
14tc~...
exists an-qul
E[r(s, X)(f(s, C ,a(X0), tLjP) - f (s, C'. (X0), u*))i.,]g.
for almost all s E 10, TJ.
Therefore, considering this perturbation we have
lim h-1 (J(u;,) - J(us)) = E[P(s,z)(f(S, C0*5 (XO), ,ijp) f(S, C;,. (XO), U*))I~~]Ja-o
> 0 for almost all s E [0, TI.
Consequently there is a set S C 10, T] of zero Lebesgue measure such that, if s S , the
limit in (5.1) exists for all i*,j,p, and gives
Using the monotone class theorem, and approximating an arbitrary admissible control
u E U we can deduce that if s S
E[Fs~)(fSC0*,(0),U)- A~S, Co*(XO), U*)I 0 for any A E Y8. (5.2)
Write
aGT(X)T a G ,(xo)) )P. (x) =E* [cc(60,T (XO)) a W X) 3XY
where, as before, z x~ (xo) and E' denotes expectation under P' = P*0. Then p. (z)
is the co-state variable and we have in (5.2) proved the following 'conditional' minimum
principle:
THEOREM 5. 1. If u' E U is an optimal control there is a set S C 10, TI of zero Lebesgue
measure such that if s V S
E* [p, (x)f (s, x, u') Y.] 5 P p. (x)f (s, x, u) I 1~ a.s.
That is, the optimal control ti' almost surely minimizes the conditional Hamiltonian and
the adjoint variable is p5 (x).
6. CONCLUSION.
Using the theory of stochastic flows the effect of a perturbation of an optimal control
is explicitly calculated. The only difficulty was to justify its differentiation. The adjoint
process is explicitly identified as p, (x). W'
THEOREM 6. 1. If f is differentiable in the control variable u, and if the random variable
z Co*,, (zo) has a conditional density q.(x) under the measure P*, then the inequality of
Theorem 5.1 implies
k?uj(8) - i (s)) fm) <o"
This is the result of Bensoussan's paper Il].
The method of this paper can be applied to completely observable systems by ini-
tially considering 'stochastic open loop' controls, systems with stochastic constraints and.4?
deterministic systems. The adjoint process can be explicitly identified. 'Almost minimum'
principles for 'almost optimal' controls can be obtained. Some of these will be discussed
in later work.
16%
.4,.
.%
.4.
161
.. ,, _ -,. ; ,i . .. , % % a % , ., o- % 4, 4w . . ..... t' '..% ' m'. ,,t o " ."., ,-.;-. .- I":..' , i -" ." '.; .,,i"
."
References[1] A. Bensoussan, Maximum principle and dynamic programming approaches of the
optimal control of partially observed diffusions. Stochastics, 9(1983), 169-222. p
[2] J.M. Bismut, A generalized formula of Ito and some other properties of stochasticflows. Zeits. fur Wahrs. 55(1981), 331-350.
[3] J.M. Bismut, Mecanique Aliatoire. Lecture Notes in Mathematics 866, Springer-Verlag, 1981.
[4] J.M. Bismut, Mecanique Aliatoire. In Ecole d'Et6 de Probabilites de Saint-Flour X.Lecture Notes in Mathematics 929, pp. 1-100. Springer-Verlag, 1982.
[5] R.J. Elliott, The optimal control of a stochastic system. SIAM Jour. Control andOpt. 15(1977), 756-778.
[6] R.J. Elliott, Filtering and control for point process observations. Systems Science
Center, University of Maryland. Springer Lecture Notes, to appear.
[7] U.G. Haussmann, General necessary conditions for optimal control of stochastic sys-tems. Math. Programming Studies 6(1976), 30-48.
[8] U.G. Haussmann, A Stochastic Minimum Principle for Optimal Control of Diffu-sions. Pitman Research Notes in Mathematics 151, Longman U.K., 1986.
[9] U.G. Haussmann, The maximum principle for optimal control of diffusions with par-tial information. SIAM J. Control and Optimization 25(1987), 341-361.
[10] H. Kunita, On the decomposition of solutions of stochastic differential equations.Lecture Notes in Mathematics 851(1980), 213-255.
17d.
-p
4'
4'
A
0*
-pA.
FrrF
FLMED 1p~jA.
-- p*dd'
I
(Lu I
S
(p.
/61 -p
A.
*
C. ** -.