vadym omelchenko model of approximate dynamic programming
TRANSCRIPT
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
.
Vadym Omelchenko
Faculty of Mathematics and Physics, Charles University in Prague andInstitute of Information Theory and Automation, Academy of Sciences of the Czech Republic
Model of Approximate Dynamic Programming Applied onDay-Ahead Trading of a
Renewable Producer of Energy
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Set-up of a renewable producer
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
1) Renewable producer generates energy but he does not knowhow much he will generate in the following day due touncertainties entailed by weather.2) We assume that the producer is penalized for insufficientdelivery of energy because it corresponds to market conditions andbecause some countries have introduced such a system, e.g.Bulgaria.3) In our settings, the state space is a two-dimensional variablethat consists of wind data and electricity price.5) Our goal is to determine a bidding strategy of the producer byusing dynamic programming.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
The value function is defined as follows:
Vt(St) = maxxt (Ct(St , xt) + E {Vt+1(St+1)|St})where VT+1 = 0and C·(·, ·) is the reward function.
By the way
1) V1(S1) = maxx1,x2,..,xT
∑Tt=1 Ct(St , xt)
2) Vt(St) = maxxt≥0
(Ct(St , xt) +
∑s′∈S P(s ′|x , s)Vt+1(s
′))
where P(s ′|x , s) is the transition function whose calculation is achallenging task of dynamic programming.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
The value function is defined as follows:
Vt(St) = maxxt (Ct(St , xt) + E {Vt+1(St+1)|St})where VT+1 = 0and C·(·, ·) is the reward function.
By the way
1) V1(S1) = maxx1,x2,..,xT
∑Tt=1 Ct(St , xt)
2) Vt(St) = maxxt≥0
(Ct(St , xt) +
∑s′∈S P(s ′|x , s)Vt+1(s
′))
where P(s ′|x , s) is the transition function whose calculation is achallenging task of dynamic programming.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
The value function is defined as follows:
Vt(St) = maxxt (Ct(St , xt) + E {Vt+1(St+1)|St})where VT+1 = 0and C·(·, ·) is the reward function.
By the way
1) V1(S1) = maxx1,x2,..,xT
∑Tt=1 Ct(St , xt)
2) Vt(St) = maxxt≥0
(Ct(St , xt) +
∑s′∈S P(s ′|x , s)Vt+1(s
′))
where P(s ′|x , s) is the transition function whose calculation is achallenging task of dynamic programming.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
There is a special case of reward functions when they depend notonly on the current state but also on the the next state/states. Inthis case, the value function will be represented as follows:
Ct(St , xt , St+1)
The value function will be then in the following form :
Vt(St) = maxxt≥0 E (Ct(St , xt , St+1) + Vt+1(St+1)|St)
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
In our settings, we have St = (yt , pt) where yt is the amount ofelectricity produced and pt is the market price of electricity.
if xt > y ′t+1 thenCt(St , xt , St+1) = (yt+1 + c−)pt+1 − u · pt+1(xt − c− − yt+1).
if xt ≤ y ′t+1 thenCt(St , xt , St+1) = xtpt+1 + o · pt+1(yt+1 − xt − c+).
c+ (c−) is the amount of energy charged (discharged).
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
In our settings, we have St = (yt , pt) where yt is the amount ofelectricity produced and pt is the market price of electricity.
if xt > y ′t+1 thenCt(St , xt , St+1) = (yt+1 + c−)pt+1 − u · pt+1(xt − c− − yt+1).
if xt ≤ y ′t+1 thenCt(St , xt , St+1) = xtpt+1 + o · pt+1(yt+1 − xt − c+).
c+ (c−) is the amount of energy charged (discharged).
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Finite and Infinite Horizon Problems.
1) At some t < ∞ we have VT+1 = 0. Knowledge of the valuefunction at the terminal state enables to calculate the valuefunction backward in time2) T is tending to infinity.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Modeling prices
We model prices by means of AR(1) model with a stable residuali.e.
Pricet = a · Pricet−1 + εt , t = 1, 2, 3, ....
The we determine the estimate a of the parameter a as follows:
a = argmina
∑Tt=1 |Pricet − a · Pricet−1|
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Modeling the residuals of prices
Having obtained the estimate a, we can get the series of theresiduals as follows:
εp t = Pricet − a · Pricet−1
The analysis of the residuals will follow below
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Modeling the Wind Production
There is the dependence of Wind Production on the wind speed(let us denote it as ”wind”):
WindProduction = c ·Wind3 where c is a positive constant
The square root of wind speed can be modeled by AR(1) process.Taking into account the dependence of Wind Production on windspeed we modeled WindProduction1/6 by AR(1) process.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
THE DATA
We have the data of Polish wind production and Polish electricityprices for the period from May 2011 to March 2013.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Visualization of the test of googness of fit. Residuals of Prices ofAR(1) process modeled by stable AR(1) process. Kolmogorov-Smirnovand Anderson-Darling tests confirmed the hypothesis that the residualshave the stable distribution S1.562(1, 0, 0)
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Visualization of the test of googness of fit. Residuals of Wind ofAR(1) process modeled applied on WindProduction1/6.Kolmogorov-Smirnov and Anderson-Darling tests confirmed thehypothesis that the residuals have the stable distribution S1.651(1, 0, 0)
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Assumptions on the Residuals of Autoregressive Models ofWind Production and Prices
Assumption 1. Residuals are independent. We can assumedifferent tail index.Assumption 2. Residuals are not independent because windaffects prices. Sub-Gaussian.Assumption 3. Residuals are not independent and we assume thatthe tail index is different for wind production and prices.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Assumptions on the Residuals of Autoregressive Models ofWind Production and Prices
Assumption 1. We can analyse the residuals separately. Easy toimplement.Assumption 2. Sub-Gaussian distributions can be expressed asfollows:X = W 1/2 · Z where W ∼ Sα/2
((cos(πα/2))2/α, 1, 0
),
Z ∼ N(0,Q)We need to approximate the distribution function.Assumption 3. It is complicated due to the spectral measure. It isan operator stable distribution.In the following slides, we will comment what follows fromthese assumptions
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Assumptions on the Residuals of Autoregressive Models ofWind Production and Prices
Assumption 1. The tail index α of the residuals of windproduction equals 1.651 and the tail index of the residuals of pricesequals 1.562.Assumption 2. The classical correlation is equal to 45%. Thedependence parameter between the residuals under assumptionthat the joint distribution equals 63%. The tail index is 1.61. Weneed to approximate the distribution function.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Assumptions on the Residuals of Autoregressive Models ofWind Production and Prices
Assumption 3. Any univariate stable distribution can besimulated by means of exponential and uniform distributions. Inour case it looks as follows:If W (α, exp(1), U(−π/2, π/2)) = Sα/2(cos(πα/2)2/α, 1, 0)
Any state is a two-dimensional vector S = (Price, Wind)T
Xprice = W (αprice)1/2 · Z , Xwind = W (αwind)1/2 · Z
X ∗ = (Xprice1, Xwind2)In this case, we will approximate the distribution function bymeans of empirical distribution function because it convergesuniformly to the true distribution function.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Thw knowledge of the distribution function enables us to calculatethe transition matrix.
For each current state s = (p, y) and each following states ′ = (p′, y ′) we have that
P(s ′|s) = P(p′, y ′|p, y) = P(εp
D = p′ − ap · p, εy
D = y ′ − ay · y)and∀s, ∑
s′ P(s ′|s) = 1
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Thw knowledge of the distribution function enables us to calculatethe transition matrix.
For each current state s = (p, y) and each following states ′ = (p′, y ′) we have that
P(s ′|s) = P(p′, y ′|p, y) = P(εp
D = p′ − ap · p, εy
D = y ′ − ay · y)and∀s, ∑
s′ P(s ′|s) = 1
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Thw knowledge of the distribution function enables us to calculatethe transition matrix.
For each current state s = (p, y) and each following states ′ = (p′, y ′) we have that
P(s ′|s) = P(p′, y ′|p, y) = P(εp
D = p′ − ap · p, εy
D = y ′ − ay · y)and∀s, ∑
s′ P(s ′|s) = 1
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
The following results will be demonstrated only for Assumption 1.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Approximating value functions by iterations:
Step 0.Set v0(s) = 0, ∀s ∈ S .fix a tolerance parameter ε > 0.Set n = 1.Step 1. For each s ∈ S compute:V n(s) = maxx∈X
(C (s, x) + γ
∑s′∈S P(s ′|x , s)V n−1(s ′)
)(1)
let xn be the decision vector that solves equations (1).Step 2. If |vn − vn−1| < ε(1− γ)/2γ, let xπ be the resultingpolicy that solves (1), and let v ε = vn and stop. (| · | denoted themaximum norm) Else set n = n + 1 and go to step 1.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
THEOREM 1.
If we apply the value iteration algorithm with stopping parameter εand the algorithm terminates at iteration n with value functionvn+1, then
|vn+1 − v∗| ≤ ε/2.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Formulation of the Problem
Discount Factor:We have chosen the value γ = 0.08.Reward Function:if xt > y ′t+1 thenCt(St , xt , St+1) = (yt+1 + c−)pt+1 − u · pt+1(xt − c− − yt+1).
if xt ≤ y ′t+1 thenCt(St , xt , St+1) = xtpt+1 + o · pt+1(yt+1 − xt − c+).Transition MatrixAssumption 1.Discretization:25 values of wind and 25 values of prices (625 states).
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Value Iteration. Difference between 20-th iteration and 21-st
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
After 21-st iteration we have that |v21 − v20| = 41.7. ByTHEOREM 1. we have |v∗ − v21| ≤ 3.62609. (v∗ is the optimalvalue)
But v1, v2, ..., v21 are measured in millions!
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Application of Random Forests to Estimate Value Function
We apply random forests to estimate value function afterreformulation of the problem in terms of post-decision variables.
We used the value function obtained by value iteration as abenchmark.We express the value function as a function of price,WindProduction, price2, WindProduction2,price ·WindProduction, price2 ·WindProduction2. In the case ofregression and instrumental variables it will be a linear function ofthese variables. This approximation yields the similar results andRandom Forests outperform regression and instrumental variables.In the case of instrumental variables, the relative error is just 2.5percent and in the case of random forests, it is 2.1 percent.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Application of Random Forests to Estimate Value Function
We apply random forests to estimate value function afterreformulation of the problem in terms of post-decision variables.
We used the value function obtained by value iteration as abenchmark.We express the value function as a function of price,WindProduction, price2, WindProduction2,price ·WindProduction, price2 ·WindProduction2. In the case ofregression and instrumental variables it will be a linear function ofthese variables. This approximation yields the similar results andRandom Forests outperform regression and instrumental variables.In the case of instrumental variables, the relative error is just 2.5percent and in the case of random forests, it is 2.1 percent.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Random Forests versus Regression
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Random Forests versus Instrumental variables.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
Figure: Software used for implementing dynamic programming
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
FURTHER RESEARCH
1) To reduce simplifying assumptions.2) To combine the technique of ADP with techniques of predictionof prices.3) To implement ADP for Assumption 2. and Assumption 3.4) To handle only one-sided dependence structure: wind can affectprices but not vice versa.5) To use bidding strategies that follow from the improved modelfor trading purposes.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
BIBLIOGRAPHY
1) L. Breiman. Random Forests. Statistics Department. Universityof California Berkeley, CA 94720. January 2001.2) N. Lohndorf, S. Minner. Optimal Day-Ahead Trading andStorage of Renewable Energies - An Approximate DynamicProgramming Approach. Department of Business Administration,University of Vienna. December 2009.3) W.R. Scott, W.B. Powell. Approximate Dynamic Programmingfor Energy Storage with New Results on Instrumental Variablesand Projected Bellman Errors. Submitted to Operations Research.4) S. Snih. Random Forests for Classification Trees andCategorical Dependent Variables: an informal Quick Start R Guide.Stanford University. February 2011.
IntroductionValue Function
Calculating Transition MatrixApproximate Dynamic Programming
THANK YOU FOR YOUR ATTENTION!