optimal control: applications in risk management and ...the organized mind: thinking straight in the...

Optimal Control: Applications in Risk Management andOnline Markets∗

Albert Cohen

Actuarial Sciences ProgramDepartment of Mathematics

Department of Statistics and ProbabilityMichigan State UniversityEast Lansing, MI [email protected]@stt.msu.edu

∗Joint work with Milan Bradonjic (Bell Labs) and Matt Causley (Kettering U.)

Mathematics Department Colloquium

Albert Cohen (MSU) Albion Math Department Colloquium Albion College April 2017 1 / 48

Outline

1 Stochastic Optimal Control for Online Seller under ReputationalMechanisms

Modeling Agent BehaviorStochastic Model and SolutionHamilton-Jacobi-Bellman formulationMarket share based pricing mechanismReduced modelNumerical results for the reduced modelNumerical results for the full model

2 Conclusions


Problem

Working as an individual agent, in selling an asset or collection of assetswithin a network like Amazon or eBay, we can:

Work a little harder.

Manage our reputation or brand within a strategy that takesadvantage of our extra capacity to work.

Model how our online reputation affects our bottom line, and act in away that it is ”optimal” (maximize expected profit, for example.)

Question: Can we leverage our skills in continuous-time modeling ofstochastic optimal control to design such an optimal strategy ? As a firstapproach, we don’t consider shipping considerations or other factorsavailable to a buyer, only ranking 1

1Bradonjic, M., Causley, M., and Cohen, A. Stochastic Optimal Control forOnline Seller under Reputational Mechanisms. Risks. 2015 Dec 4;3(4):553-72.


Online Marketplace

In this work, we consider sellers in an online market like Amazonwithout a bidding mechanism. The setting may include concurrentonline auctions on a site like eBay, for a product or service with only abuy-it-now option, under the assumption that buyers can comparereputation among online platforms.

In this online enviroment, the buyer and seller have a certain amountof anonymity, which has implications on the fairness of thetransaction. One of the only counter measures to ensure fairness isthe use of feedback forums, such as Amazon, eBay, and Yahoo!, atonline sites.

The feedback provided by customers provides a ranking system forthe sellers, which in turn rewards fair sellers and penalizes unfair orunreliable sellers. Since it is in the sellers best interest to maximizetheir profits, it follows that a seller will seek to use reputation as ameans to optimize their decision making.


Online Marketplace

We propose a simple behavioral model to study the relationshipbetween a seller’s wealth and reputation in the online environmentwith only the buy-it-now option. The wealth and growth of the sellerare described by continuous stochastic processes.

Note that our model is a continuous approximation of a discretesales/marketing/processing process. The discrete model of theprocess may be less tractable but still very interesting and useful.


Online Marketplace

Online sellers can be broadly categorized by two types of behavior:

those focused on advertising and customer service, which includesfollowing up with past customers; andthose who instead focus only on processing existing orders.

While the latter behavior will likely lower the seller reputation onaverage, this immediate increase in wealth may offset the long-termdamage from a lower reputation. With this in mind, it is compellingto ask whether there exists an optimal long term strategy, in which aseller attempts to maximize her profit by achieving a fixed reputation.


Online Marketplace

This work incorporates the idea that humans do not multitask well,but rather switch between tasks 2 , 3.

In other words, switching costs do not factor in for an individualagent, but only for large online retailers who have marketingdepartment reorganization costs.

Additionally, the model in this work allows all sellers to have a littleextra capacity/money for either shipping or advertising (e.g. writing aletter or similar individual campaign), which is rather small and notunlimited

2https://earlkmiller.org/archives/3Levitin, Daniel J. The organized mind: Thinking straight in the age of

information overload. Penguin, 2014.Albert Cohen (MSU) Albion Math Department Colloquium Albion College April 2017 7 / 48

Agent Description

Consider a seller in an online marketplace as a triple (W ,R, µ) representingher wealth, reputation, and excess rate, respectively. Here, the excess rateis extra effort that can be allocated to either expedite payment or increaseher reputation. In this setting, the reputation R is a positive numberreflecting customer satisfaction. As an initial approach, we consider onlythe effects of reputation on the optimal way to grow her wealth, and leavethe more general model with shipping costs for future work.


Agent Description

Reputation R is observed in O := (0,∞) and the control µ isconstrained to U := [−ε, ε].Since this process will be stochastic, we introduce a probability space(Ω,F ,P) and corresponding reference probability systemsν = ((Ω, Fs ,P) ,B) where Fs ⊂ F and B is an Fs -adaptedBrownian motion.

The evolution of wealth and reputation is hence modeled by atwo-state Markov process,

dW(µ)t = (1− µt)h(R

(µ)t ) dt , (1)

dR(µ)t = (µt − κR

(µ)t )dt + σ dBt , (2)

where κ is a proportionality constant which accounts for meanreversion.


Agent Description

Also, τO is the first exit time from O, or ∞ if R ∈ O for all s ≥ 0. We set∞ as an absorbing state for reputation, and enforce that the µ are chosenfrom the admissible class Aν ,

Aν := B1 ∩ B2 ∩ B3B1 := µ ∈ Fs | µ is Fs−p.m.B2 := µ ∈ Fs | µ. ∈ U on [0,∞)

B3 :=

µ ∈ Fs | E

[ ∫ τO

0e−ρs

∣∣∣(1− µs)h(R(µ)s )

∣∣∣ ds] <∞ .(3)

Recall that p.m. denotes progressively measurable.


Mink-Seifert Model

The major difficulty is in defining the growth rate h(R). However, anexplicit mechanism has been proposed by Mink and Seifert 4. There theauthors not only propose, but empirically justify a growth rate of

h(R) = A + C

(1− 1

ln (e + R)

), (4)

where A relates to the inherent value of the object for sale and C is aparameter to be fitted. This is accomplished by obtaining data using anauction robot and then computing a single regression, which givesC = 2.50.

4Mink, Moritz, and Stefan Seifert. ”Reputation on eBay and its impact onsales prices.” Group decision and negotiation (GDN) 2006, InternationalConference, Karlsruhe, Germany. 2006.


Mink-Seifert Model

It should be noted that in many economic papers on the effect ofreputation on bidding and final sales price, the Mink-Seifert model isthe first one we found to give an explicit relationship betweenreputation and price.

To the best of our knowledge, their model is among the first to givean explicit relationship between reputation and price.

The Mink-Seifert model also suggests a multiple regression formulawhere other factors, such as shipping costs and whether a buy-it-nowprice is offered, are considered as well, in which case C = 1.93.


Hamilton-Jacobi-Bellman formulation

With the stochastic dynamics for reputation above, a growth ratemodel for reputational effect on sales, and an infinite horizon, weexpect that switching would depend only on the current reputationalstate.

We now seek a function V ∈ C 2[0,∞) ∩ Cp[0,∞) which is acandidate solution of the optimal control problem

V (R) = supν

supµ∈Aν

E[∫ τ0

0e−ρs(1− µs)h(R

(µ)s ) ds | R0 = R

]h(R) = A + C

(1− 1

ln (e + R)

)τ0 := inf t ≥ 0 | Rt = 0 .

(5)



One approach to finding V would be to solve (5) directly. For example, bythe definition of V , it follows that

V (0) = 0

V (∞) = supν

supµ∈Aν

E[∫ ∞

0e−ρs(1− µs)h(R

(µ)s )ds | R0 =∞

]= E

[∫ ∞0

e−ρs(1 + ε)h(∞)ds

]=

1 + ε

ρh(∞) .

(6)



However, we shall instead apply the principle of optimality, and in doing soarrive at the following nonlinear Hamilton-Jacobi-Bellman (HJB) boundaryvalue problem (suppressing the explicit dependence of R on µ)

0 = max−ε≤µ≤ε

[(µ− κR)

∂V

∂R+σ2

2

∂2V

∂R2+ (1− µ)h(R)− ρV

]V (0) = 0

V (∞) =1 + ε

ρh(∞)

h(R) = A + C

(1− 1

ln (e + R)

)= A + C

ln(1 + R

e

)1 + ln

(1 + R

e

) ,(7)



The previous problem simplifies to

ρV = h(R)− κR∂V

∂R+σ2

2

∂2V

∂R2+ ε∣∣∣∂V

∂R− h(R)

∣∣∣V (0) = 0

V (∞) =1 + ε

ρ(A + C )

h(R) = A + Cln(1 + R

e

)1 + ln

(1 + R

e

) .(8)



We therefore instead solve the HJB problem, which is justified by thefollowing standard theorem in Fleming-Soner (2006) 5:

Theorem

Let V ∈ C 2(O) ∩ Cb(O) be a solution to the boundary value problem (8).Then V is an optimal solution to the optimal control problem (5).

Remark We note here that it is sufficient to require polynomial growth onV , not bounded growth. However, as h is bounded above by A + C on O,we can restrict our attention to bounded V . Furthermore, an upper bound(and in fact limit as R →∞) for V is the value obtained by a sellerearning maximum value A + C on her items on the time interval [0,∞)while always processing orders.

5Fleming, Wendell H., and Halil Mete Soner. Controlled Markov processesand viscosity solutions. Vol. 25. Springer Science and Business Media, 2006.



Proof.

By Corollary IV.5.1 in Fleming-Soner (2006), since

(i) the drift and volatility are (uniformly) Lipschitz in R and µ,

(ii) ρ > 0, and

(iii) we impose an additional boundary condition on the solution asR →∞ as our infinite horizon is now [0, τ0) and not [0, τO),

it follows directly that our bounded classical solution V = V and thecontrol

µ∗(R) ∈ argmaxµ∈[−ε,ε]

[(µ− κR)

∂V

∂R+σ2

2

∂2V

∂R2+ (1− µ)h(R)− ρV

](9)

leads to an optimal, stationary Markov control µ∗(R(µ∗)s ).


Conversion from Reputation to Market Share

We observe that under its current definition, the reputation R ∈ O.In this section, we employ the mapping

Y := f (R) =R

1 + R, (10)

which results in a normalized market share reputationY ∈ Q := (0, 1).

This is consistent with Mink and Seifert’s assumption that the valuefunction h(R) be concave, and bounded. We therefore also simplify(4) accordingly, replacing logarithms with a rational function

h(R) = A + CR

1 + R. (11)



Consequently, we have dYt = f ′(R)dRt + 12 f ′′(R)dRtdRt and so

dYt = (1− Y )2[(µ− κ Y

1− Y

)dt + σ dBt

]+

1

2(−2(1− Y )3)σ2 dt

=[µ(1− Y )2 − κY (1− Y )− σ2(1− Y )3

]dt + σ(1− Y )2 dBt

h(Y ) = A + CY

V (y) = supν

supµ∈Aν

E[∫ τ0

0e−ρs(1− µs)h(Ys) ds | Y0 = y

].

(12)Note that a function that is bounded for market share Y ∈ (0, 1) is alsocorrespondingly bounded for all reputation R ∈ (0,∞). We therefore studythe market share model below, confident that our results will hold for thefull reputation model by virtue of the inverse map of (10).



After market share rescaling, the HJB now takes a form which is in factdegenerate at both endpoints y = 0, 1:

− σ2

2(1− y)4V ′′(y) + ρV = h(y)−

[κy(1− y) + σ2(1− y)3

]V ′(y)

+ ε∣∣∣(1− y)2V ′(y)− h(y)

∣∣∣V (0) = 0

V (1) =1 + ε

ρh(1) .

(13)



In theory, a closed form solution for this boundary value problem canbe formally constructed piecewise, to the left and right of the specialpoint y∗, for which h(y∗) = (1− y∗)2V ′(y∗). We shall refer to y∗ asthe switching point below, since it is the reputational value at whichthe seller switches from advertising to processing.

In practice, we shall resort to numerical computation to approximatethis solution, and in doing so estimate the value y∗.



Theorem

If V ∗∗ ∈ C 2(Q) is a monotonically increasing solution to the HJB problem(13), then V ∗∗ = V .

Proof.

By Theorem 5.1 in Fleming-Soner (2006), the construction of a solutionV ∗∗ on an open interval Q to (13) that resides in C 2(Q) ∩ Cp(Q) is infact equal to the optimal value function V . Enforcing the boundarycondition on V ∗∗(1) enforces not only polynomial growth on ourmonotonically increasing V ∗∗, but in fact bounded growth. Note thatQ = (0, 1) is open, but y = 1 is not reached in finite time from anywherein Q and the diffusion Y is in fact absorbed for all time in state Y = 1.Hence, the exact value of V ∗∗(1) imposed.


Reduced Model

With the previous mapping, note that the market (reputation) shareY is absorbed at Y = 1, which corresponds to R →∞.

Moreover, as in the Stochastic Nerlove evolution, we have that theprobability that Yt < 0 for some positive t is greater than 0, and thedrift term p2((1− Y );µ) is a second order polynomial in Y .

Based on these observations, we propose a reduced model forStochastic Reputation Share, expressed by the following stochasticdifferential equation and associated stochastic optimal controlproblem

V (y) = supν

supµ∈Aν

E[∫ τ0

0e−ρs(1− µs)(A + CYs) ds | Y0 = y

]dYt

1− Yt= µdt + σ dBt .

(14)


Reduced Model

This leads to a corresponding nonlinear HJB boundary value problem:

− σ2

2(1− y)2V ′′(y) + ρV = h(y) + ε

∣∣∣(1− y)V ′(y)− h(y)∣∣∣

V (0) = 0

V (1) =1 + ε

ρh(1) .

(15)

One of the most attractive features of the reduced model (15) is that ithas a closed form, piecewise defined analytic solution.


Reduced Model

Theorem

There exists a solution V ∗ ∈ C 2(Q) ∩ Cb(Q) to the reduced model (15).Furthermore, V ∗ = V , and is given by the piecewise solution

V ∗(y) =

V`(y), 0 ≤ y ≤ y∗,

Vr (y), y∗ ≤ y ≤ 1,(16)

where

V` = c1(1− y)γ`− + c2(1− y)γ

`+ + α` + β`y , (17)

Vr = c3(1− y)γr+ + αr + βry , (18)

with constants defined below.


Reduced Model

Proof.

The construction of this piecewise solution is found in the Appendix. Thesolution constructed is monotonically increasing in y . Enforcing theboundary condition on V ∗(1) uniformly bounds V ∗ on Q. The proof ofequality V ∗ = V then follows from Theorem 5.1 in Fleming-Soner (2006),as in our previous theorem.

Remark: Since the reduced model has a known closed form solution, wecan measure the error made in constructing numerical solutions, which inturn acts as a benchmark for the code we use to study the full model. It isuncommon to find a closed form solution for most optimal controlproblems.


Piecewise analytic solution of reduced model

We now construct the exact solution to the reduced HJB boundary valueproblem. The principal difficulty that must be overcome is the appearanceof an absolute value, which contains the unknown variable V ′. Wetherefore consider a piecewise defined solution V (y), which remains C 2

across the switching point y∗, defined by the vanishing of the expression inthe absolute value

(1− y∗)V ′(y∗) = h(y∗). (19)

We now separately consider the regions y < y∗ and y > y∗ in the reducedmodel (8), and define the corresponding linear differential operators L± as

L±[V ] =

(−σ

2

2(1− y)2

d2

dy2± ε(1− y)

d

dy+ ρ

)V . (20)



The boundary value problem can then be decomposed into twosmaller problems for V` and Vr , where boundary conditions provideone condition for each of V` and Vr .

The remaining two conditions are provided by enforcing C 2

smoothness of the solution at the switching point (19). Hence wenow formulate two (well-posed) boundary value problems.

Once solved, enforcing continuity uniquely defines the value of theswitching point y∗:

V`(y∗) = Vr (y∗). (21)



L−[V`] = (1− ε)h(y) 0 < y < y∗, (22)

V`(0) = 0, (1− y∗)V ′`(y∗) = h(y∗), (23)

L+[Vr ] = (1 + ε)h(y) y∗ < y < 1, (24)

Vr (1) =1 + ε

ρh(1), (1− y∗)V ′r (y∗) = h(y∗). (25)


Constructing the reduced model solution

We shall construct the solutions V` and Vr separately first, although theapproach for both solutions will be the same. Following standard methods,we first decompose the full solution into a homogeneous and particularsolution, where the homogeneous part satisfies L±[u] = 0. Since theoperators L± are equi-dimensional, we seek solutions of the form

u = c(1− y)γ , (26)

and the application of the differential operator yields

L±[u] = c(1− y)γ(−σ

2

2(γ2 − γ)∓ εγ + ρ

)= 0 . (27)



We set the parenthetical term of this expression to zero in order to solvefor the admissible exponents γ

γ`± =1

2+

ε

σ2±

√(1

2+

ε

σ2

)2

+ 2ρ

σ2(from L−) (28)

γr± =1

2− ε

σ2±

√(1

2− ε

σ2

)2

+ 2ρ

σ2(from L+) . (29)

Because γr− < 0 for ρ > 0, the homogeneous solution (1− y)γr− will be

unbounded as y → 1, and so we exclude it from further considerationbelow.



Furthermore, since h(y) = A + Cy is a linear function, the particularsolution will also be linear. We therefore have a general solution of theform

V` = c1(1− y)γ`− + c2(1− y)γ

`+ + α` + β`y , (30)

Vr = c3(1− y)γr+ + αr + βry . (31)

The particular solution is fixed by enforcing that the differential equations(22) and (24) are satisfied, which yields

α` =

(1− ερ

)(A + C )−

(1− ερ+ ε

)C , β` =

(1− ερ+ ε

)C , (32)

αr =

(1 + ε

ρ

)(A + C )−

(1 + ε

ρ− ε

)C , βr =

(1 + ε

ρ− ε

)C . (33)



The homogeneous coefficients c1 and c2 are then found by applying theboundary conditions (23), and the resulting linear system(

1 1

γ`−(1− y∗)γ`− γ`+(1− y∗)γ

`+

)(c1c2

)=

(−α`

β`(1− y∗)− h(y∗)

)is solved by

c1 = −α`γ

`+(1− y∗)γ

`+ + β`(1− y∗)− h(y∗)

γ`+(1− y∗)γ`+ − γ`−(1− y∗)γ

`−

, (34)

c2 =β`(1− y∗)− h(y∗) + α`γ

`−(1− y∗)γ

`−

γ`+(1− y∗)γ`+ − γ`−(1− y∗)γ

`−

. (35)



The boundary condition (25) at y = 1 is automatically satisfied by theparticular solution, and so the final coefficient c3 is determined by thecondition at y = y∗,

c3 =βr (1− y∗)− h(y∗)

γr+(1− y∗)γr+

. (36)

Finally, having determined the general solutions V` and Vr , the solution Vis obtained by enforcing continuity. This also fixes the value y∗, whichmust now satisfy the nonlinear transcendental equation

c1(1−y∗)γ`− + c2(1−y∗)γ

`+ +α`+β`y

∗ = c3(1−y∗)γr+ +αr +βry∗ . (37)


Constructing the numerical solution

The boundary value problem (8) is discretized using a standard finitedifference scheme. Let yk = k/Ny , and Vk = V (yk), for k = 0, 1, . . . ,Ny .Then we solve a linear system of Ny + 1 equations of the form Mv = f,where

Mv ≈ ρV − σ2

2(1− y)2V ′′, and f ≈ h + ε

∣∣∣(1− y)V ′ − h∣∣∣.



Since f depends on V , the solution is implicit, and therefore must beobtained by using a fixed point iteration Mv(k+1) = f(k).

In our numerical experiments, we let Ny = 1000, and set themaximum iteration count at K = 20.

Convergence is observed in all tested cases. In Figure 1, severalnumerical solutions V (y) are shown, for fixed A = 1, C = 0.15,ε = 0.02, and various values of ρ = 0.1, 0.2, 0.5, 2.0 andσ = 0.2, 0.5, 1.0, 5.0.

It follows that both ρ and σ have a dramatic effect not only the shapeof the solution, but also, as is demonstrated in Figure 2, on theswitching point y∗. These numerical results suggest that both thediscount rate ρ and the volatility σ of a seller’s reputation can‘dramatically’ affect her behavior, particularly the critical reputationat which she switches from the processing to advertising mode.


Figure 1: Value Function (Reduced Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: A plot of several numerical solutions V (y) for the reduced boundary valueproblem (15) are shown with A = 1, C = 0.15, and ε = 0.02.


Figure 2: Switching Point (Reduced Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: A plot of the quantity (1− y)V ′(y), where V (y) is a numerical solutionof the the reduced boundary value problem (15). The value y∗ is defined as theintersection of these curves with h(y) (the solid line).


Figure 3: Value Function (Full Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: A plot of several numerical solutions V (y) for the full boundary valueproblem (13) are shown with A = 1, C = 0.15, ε = 0.02, and κ = 0.


Figure 4: Switching Point (Full Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: A plot of the quantity (1− y)2V ′, where V is a numerical solution of thethe full boundary value problem (13). The value y∗ is defined as the intersectionof these curves with h(y) (black solid line).



The same numerical discretization is employed to solve the fullNerlove-Arrow model (13), where we solve a linear system of the formMv = f, with

Mv ≈ ρV +[κy(1− y) + σ2(1− y)3

]V ′(y)− σ2

2(1− y)4V ′′(y)

f ≈ h + ε∣∣∣(1− y)2V ′ − h

∣∣∣. (38)


Figure 5: Value Function (Full Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: The same as Figure 3, but with κ = 1.


Figure 6: Switching Point (Full Model)

(a) ρ = 0.1 (b) ρ = 0.2 (c) ρ = 0.5 (d) ρ = 2.0

Figure: The same as Figure 4, but with κ = 1.


Conclusions

In this work, we have proposed a model which describes the behaviorof sellers participating in online markets. In doing so, we were able todesign an optimal selling strategy, wherein the seller switches theirbehavior when an optimal market share reputation is reached, whichdepends on statistical modeling parameters.

These modeling parameters were introduced through thewealth-reputation mechanisms by Nerlove and Arrow 6, which havebeen generalized to stochastic settings by Raman 7, among others.

6Nerlove, Marc, and Kenneth J. Arrow. ”Optimal advertising policy underdynamic conditions.” Economica (1962): 129-142

7Raman, Kalyan. ”Boundary value problems in stochastic optimal control ofadvertising.” Automatica 42.8 (2006): 1357-1362.


Conclusions

We have additionally modified the empirical model h(R) found inMink and Seifert, relating the price per unit to the reputation. Ratherthan viewing reputation R as an unbounded quantity, we insteadintroduced a market scaled reputation Y = R/(1 + R), whereY ∈ [0, 1), which reduces the price per unit to a simple linear modelh(Y ) = A + CY .

We then optimized the stochastic model over the control of theexcess rate µ using the Hamilton-Jacobi-Bellman equation, yielding adeterministic equation relating the value per unit to the sellerreputation.

The resulting boundary value problem is then solved numerically, andwe qualitatively see that the value of sellers goods increasemonotonically with reputation, but that a unique optimal reputationy = y∗ determines when the seller should switch from advertising toprocessing.


Conclusions

The numerical scheme was validated with a reduced model, which hasa closed form piecewise analytic solution, and permits directdetermination of y∗.

The numerical results qualitatively validate our modelingassumptions, and provide a framework for studying seller behaviorbased on seller reputation.

Although the used techniques are standard, we believe that theoptimal strategy presented both analytically and numerically hasimplications on the way reputational information can be used topredict the behavior of an individual seller in an online market.

This work can naturally be generalized in a variety of ways. Forinstance, the model could be developed in the case of finite horizontime T , since a more realistic assumption is that a seller strategydepends on both time and the current reputation state. Unlike ourcurrent infinite horizon model, the resulting HJB equation will lead toa time-dependent partial differential equation.


Acknowledgements

Milan Bradonjic (Bell Labs)

Matt Causley (Kettering U.)

John Chadam (Pitt)

Nick Costanzino (AIG)

Sean Hilden (Bank of America)

Goran Peskir (U. of Manchester)

Steven Shreve (CMU)


optimal control: applications in risk management and ...the organized mind: thinking straight in the...

Documents