vadym omelchenko model of approximate dynamic programming

IntroductionValue Function

Calculating Transition MatrixApproximate Dynamic Programming

.

Vadym Omelchenko

Faculty of Mathematics and Physics, Charles University in Prague andInstitute of Information Theory and Automation, Academy of Sciences of the Czech Republic

Model of Approximate Dynamic Programming Applied onDay-Ahead Trading of a

Renewable Producer of Energy



Figure: Set-up of a renewable producer



1) Renewable producer generates energy but he does not knowhow much he will generate in the following day due touncertainties entailed by weather.2) We assume that the producer is penalized for insufficientdelivery of energy because it corresponds to market conditions andbecause some countries have introduced such a system, e.g.Bulgaria.3) In our settings, the state space is a two-dimensional variablethat consists of wind data and electricity price.5) Our goal is to determine a bidding strategy of the producer byusing dynamic programming.



The value function is defined as follows:

Vt(St) = maxxt (Ct(St , xt) + E {Vt+1(St+1)|St})where VT+1 = 0and C·(·, ·) is the reward function.

By the way

1) V1(S1) = maxx1,x2,..,xT

∑Tt=1 Ct(St , xt)

2) Vt(St) = maxxt≥0

(Ct(St , xt) +

∑s′∈S P(s ′|x , s)Vt+1(s

′))

where P(s ′|x , s) is the transition function whose calculation is achallenging task of dynamic programming.



There is a special case of reward functions when they depend notonly on the current state but also on the the next state/states. Inthis case, the value function will be represented as follows:

Ct(St , xt , St+1)

The value function will be then in the following form :

Vt(St) = maxxt≥0 E (Ct(St , xt , St+1) + Vt+1(St+1)|St)



In our settings, we have St = (yt , pt) where yt is the amount ofelectricity produced and pt is the market price of electricity.

if xt > y ′t+1 thenCt(St , xt , St+1) = (yt+1 + c−)pt+1 − u · pt+1(xt − c− − yt+1).

if xt ≤ y ′t+1 thenCt(St , xt , St+1) = xtpt+1 + o · pt+1(yt+1 − xt − c+).

c+ (c−) is the amount of energy charged (discharged).



Finite and Infinite Horizon Problems.

1) At some t < ∞ we have VT+1 = 0. Knowledge of the valuefunction at the terminal state enables to calculate the valuefunction backward in time2) T is tending to infinity.



Modeling prices

We model prices by means of AR(1) model with a stable residuali.e.

Pricet = a · Pricet−1 + εt , t = 1, 2, 3, ....

The we determine the estimate a of the parameter a as follows:

a = argmina

∑Tt=1 |Pricet − a · Pricet−1|



Modeling the residuals of prices

Having obtained the estimate a, we can get the series of theresiduals as follows:

εp t = Pricet − a · Pricet−1

The analysis of the residuals will follow below



Modeling the Wind Production

There is the dependence of Wind Production on the wind speed(let us denote it as ”wind”):

WindProduction = c ·Wind3 where c is a positive constant

The square root of wind speed can be modeled by AR(1) process.Taking into account the dependence of Wind Production on windspeed we modeled WindProduction1/6 by AR(1) process.



THE DATA

We have the data of Polish wind production and Polish electricityprices for the period from May 2011 to March 2013.



Figure: Visualization of the test of googness of fit. Residuals of Prices ofAR(1) process modeled by stable AR(1) process. Kolmogorov-Smirnovand Anderson-Darling tests confirmed the hypothesis that the residualshave the stable distribution S1.562(1, 0, 0)



Figure: Visualization of the test of googness of fit. Residuals of Wind ofAR(1) process modeled applied on WindProduction1/6.Kolmogorov-Smirnov and Anderson-Darling tests confirmed thehypothesis that the residuals have the stable distribution S1.651(1, 0, 0)



Assumptions on the Residuals of Autoregressive Models ofWind Production and Prices

Assumption 1. Residuals are independent. We can assumedifferent tail index.Assumption 2. Residuals are not independent because windaffects prices. Sub-Gaussian.Assumption 3. Residuals are not independent and we assume thatthe tail index is different for wind production and prices.




Assumption 1. We can analyse the residuals separately. Easy toimplement.Assumption 2. Sub-Gaussian distributions can be expressed asfollows:X = W 1/2 · Z where W ∼ Sα/2

((cos(πα/2))2/α, 1, 0

),

Z ∼ N(0,Q)We need to approximate the distribution function.Assumption 3. It is complicated due to the spectral measure. It isan operator stable distribution.In the following slides, we will comment what follows fromthese assumptions




Assumption 1. The tail index α of the residuals of windproduction equals 1.651 and the tail index of the residuals of pricesequals 1.562.Assumption 2. The classical correlation is equal to 45%. Thedependence parameter between the residuals under assumptionthat the joint distribution equals 63%. The tail index is 1.61. Weneed to approximate the distribution function.




Assumption 3. Any univariate stable distribution can besimulated by means of exponential and uniform distributions. Inour case it looks as follows:If W (α, exp(1), U(−π/2, π/2)) = Sα/2(cos(πα/2)2/α, 1, 0)

Any state is a two-dimensional vector S = (Price, Wind)T

Xprice = W (αprice)1/2 · Z , Xwind = W (αwind)1/2 · Z

X ∗ = (Xprice1, Xwind2)In this case, we will approximate the distribution function bymeans of empirical distribution function because it convergesuniformly to the true distribution function.



Thw knowledge of the distribution function enables us to calculatethe transition matrix.

For each current state s = (p, y) and each following states ′ = (p′, y ′) we have that

P(s ′|s) = P(p′, y ′|p, y) = P(εp

D = p′ − ap · p, εy

D = y ′ − ay · y)and∀s, ∑

s′ P(s ′|s) = 1



The following results will be demonstrated only for Assumption 1.



Approximating value functions by iterations:

Step 0.Set v0(s) = 0, ∀s ∈ S .fix a tolerance parameter ε > 0.Set n = 1.Step 1. For each s ∈ S compute:V n(s) = maxx∈X

(C (s, x) + γ

∑s′∈S P(s ′|x , s)V n−1(s ′)

)(1)

let xn be the decision vector that solves equations (1).Step 2. If |vn − vn−1| < ε(1− γ)/2γ, let xπ be the resultingpolicy that solves (1), and let v ε = vn and stop. (| · | denoted themaximum norm) Else set n = n + 1 and go to step 1.



THEOREM 1.

If we apply the value iteration algorithm with stopping parameter εand the algorithm terminates at iteration n with value functionvn+1, then

|vn+1 − v∗| ≤ ε/2.



Formulation of the Problem

Discount Factor:We have chosen the value γ = 0.08.Reward Function:if xt > y ′t+1 thenCt(St , xt , St+1) = (yt+1 + c−)pt+1 − u · pt+1(xt − c− − yt+1).

if xt ≤ y ′t+1 thenCt(St , xt , St+1) = xtpt+1 + o · pt+1(yt+1 − xt − c+).Transition MatrixAssumption 1.Discretization:25 values of wind and 25 values of prices (625 states).



Figure: Value Iteration. Difference between 20-th iteration and 21-st



After 21-st iteration we have that |v21 − v20| = 41.7. ByTHEOREM 1. we have |v∗ − v21| ≤ 3.62609. (v∗ is the optimalvalue)

But v1, v2, ..., v21 are measured in millions!



Application of Random Forests to Estimate Value Function

We apply random forests to estimate value function afterreformulation of the problem in terms of post-decision variables.

We used the value function obtained by value iteration as abenchmark.We express the value function as a function of price,WindProduction, price2, WindProduction2,price ·WindProduction, price2 ·WindProduction2. In the case ofregression and instrumental variables it will be a linear function ofthese variables. This approximation yields the similar results andRandom Forests outperform regression and instrumental variables.In the case of instrumental variables, the relative error is just 2.5percent and in the case of random forests, it is 2.1 percent.



Figure: Random Forests versus Regression



Figure: Random Forests versus Instrumental variables.



Figure: Software used for implementing dynamic programming



FURTHER RESEARCH

1) To reduce simplifying assumptions.2) To combine the technique of ADP with techniques of predictionof prices.3) To implement ADP for Assumption 2. and Assumption 3.4) To handle only one-sided dependence structure: wind can affectprices but not vice versa.5) To use bidding strategies that follow from the improved modelfor trading purposes.



BIBLIOGRAPHY

1) L. Breiman. Random Forests. Statistics Department. Universityof California Berkeley, CA 94720. January 2001.2) N. Lohndorf, S. Minner. Optimal Day-Ahead Trading andStorage of Renewable Energies - An Approximate DynamicProgramming Approach. Department of Business Administration,University of Vienna. December 2009.3) W.R. Scott, W.B. Powell. Approximate Dynamic Programmingfor Energy Storage with New Results on Instrumental Variablesand Projected Bellman Errors. Submitted to Operations Research.4) S. Snih. Random Forests for Classification Trees andCategorical Dependent Variables: an informal Quick Start R Guide.Stanford University. February 2011.



THANK YOU FOR YOUR ATTENTION!

vadym omelchenko model of approximate dynamic programming

Documents