Continuous formulation of the irrigation network problem cannot be solved exactly by any MDP solver
Evaluation of solution quality (mean and standard deviation) and running time (in seconds):
Irrigation network is a network of irrigation channels connected by regulation devices
Transition functions represent water flows between channels given actions at regulation devices
Objective is the operation of valves to maintain optimal water levels
Reward function characterizes preferred water levels
Solving Factored MDPs with Continuous and Discrete Variables
Introduction Approximate LP for HMDPs Factored -HALP Algorithm Experimental ResultsLinear Value Function Approximation
Value function represented as a linear combination of k basis functions:
Basis functions fi(x) depend on continuous and discrete variables. Optimization is performed over weights w
k
1iiifw)V( xx
HALP Formulation
Hybrid approximate LP (HALP) formulation:
where i is state relevance weight Fi(x, a) is a difference between basis function fi(x)
and its discounted backprojection
AaXxaxax
w
, ,0,R,Fw:tosubject
wminimize
iii
iii
D C
D C
x xCiii
x xCii
df,pf,F
df
xxaxxxax
xxx
Quality of HALP Approximation
Proposition 1 Let w be an optimal solution of the HALP. Then, for any Lyapunov function L(x):
Analogous to de Farias and Van Roy 2001 result for approximate LP for discrete MDPs
L1,
T
,1HVmin
1L2
HV
ww
w
Choice of Representation
Continuous basis functions defined as polynomials
Basis function decomposition along continuous and discrete factors
Closed-form representation of the objective function
Mixture of betas transition model for continuous factors
Decomposition of the constraints along continuous and discrete functions and closed-form representation
ij
i,j
x
mjii xf
x
x
Ci
CCC
Di
DD
D C
xiiii
xiii
x xCi
df,pf,p
df,p
xxaxxxaxx
xxaxx
Hybrid Markov Decision Processes
Many real-world stochastic planning problems have continuous and discrete variables, naturally formulated as hybrid MDPs (HMDPs)
There are few methods for solving Hybrid MDPs
Hybrid MDPs are Complex to Solve
Traditional solution techniques are affected by the curse of dimensionality
Discrete-state MDPs State and action spaces grow exponentially with
the number of variables Continuous-state MDPs
State and action spaces are infinitely large Often, no closed-form representation for the value
function exists Naïve discretization often leads to exponential
complexity
Irrigation Network Example
Experimental Results
Mean Std Time Method Mean Std1 42.8 3.0 2 Random 35.9 2.7
1 / 2 60.3 3.0 21 Local 55.4 2.51 / 4 61.9 2.9 184 Global 1 60.4 3.01 / 8 72.2 3.5 1068 Global 4 66.0 3.61 / 16 73.8 3.0 13219 Global 16 68.2 3.2
-HALP Alternative solutions
Mean Time Mean Time Mean Time Mean Time Mean Time1 28.4 1 37.5 1 46.9 1 55.6 2 64.5 3
1 / 2 33.5 3 43.0 5 52.6 9 62.9 17 72.1 281 / 4 35.1 11 45.2 21 54.2 43 64.2 63 74.5 851 / 8 40.1 46 51.4 85 62.2 118 73.2 168 84.9 1931 / 16 40.4 331 51.8 519 63.7 709 75.5 963 86.8 1285
Mean Time Mean Time Mean Time Mean Time Mean Time1 14.8 1 16.2 2 17.5 4 18.5 5 19.7 6
1 / 2 38.6 12 50.5 25 44.0 103 75.8 69 87.6 1071 / 4 40.1 82 53.6 184 66.7 345 79.0 590 93.1 8611 / 8 48.0 581 62.4 1250 76.1 2367 90.5 3977 104.5 63771 / 16 47.1 4736 62.3 11369 77.6 22699 92.4 35281 107.8 53600
n-ring-of-rings
n = 6 n = 9 n = 12 n = 15 n = 18
n = 15 n = 18n-ring
n = 6 n = 9 n = 12
The quality of the -HALP solution beats alternative approx. opt. techniques on the large irrigation network example
Solution quality improves with higher grid resolution
Time complexity grows polynomially
with higher grid resolution 1/
Time complexity grows polynomially with network
topology size n
HALP provides effective formulation for solving hybrid MDPs Including bounds on the quality of the solution
Factored hybrid MDPs allow for closed-form representation of HALP constraints Number of constraints remains infinite
Exploit factorization for efficient discretization, -HALP Provide bounds on the effect of discretization Lipschitz constant grows linearly in number of
variables
Using factored LP decomposition to solve -HALP For fixed tree-width, running time is polynomial in
number of variables and in discretization level 1/
Conclusions
Large irrigation network
n-ring-of-rings topology
Outflow regulation device
Inflow regulation device
n-ring topology
Irrigation channel represented by a
continuous variable
Regulation device represented by a discrete
action node
Optimal Policy and Value Function
Value function of an optimal policy satisfies the Bellman-Hamilton-Jacobi fixed point equation:
D Cx x
CdV,pRsupV xxaxxax,xa
Value function V(x) difficult to compute and represent
Closed-form solution of the value function may not exist due to the recursive integral definition
Approximatesolutions
Factored Hybrid MDPs
Multiagent factored hybrid MDP (HMDP) is a 4-tuple (X, A, P, R): X is a vector of
state variables (discrete or continuous)
A is a vector of action variables (discrete or continuous)
Continuous variables restricted to [0,1]
P is a transition model represented by DBN
R is a reward function is sum of local rewards
CCDD iiiiii fff xxx
Carlos GuestrinIntel Research, Berkeley
Milos HauskrechtDepartment of Computer Science
Branislav KvetonIntelligent Systems Program
X1X1
A1A1
X2X2
A2A2
X’1X’1
X’2X’2
R1R1
X3X3
A3A3
X’3X’3
R2R2
Representing Conditional Probabilities
Use parametric representation Discrete child with discrete parents:
Use tabular, decision trees, noisy-or, etc. Discrete child with continuous and discrete parents:
Use discriminant functions, dj(Par(Xi’))≥0:
Continuous child with continuous and discrete parents: Mixture of Beta distributions:
p(Xi’|Par(Xi’)) = Σ Beta(Xi’| hi1(Par(Xi’)), hi
2(Par(Xi’)))hi
1(Par(Xi’))>0 and hi2(Par(Xi’))>0 define
moments
uiu
ijii ))'X(Par(d
))'X(Par(d))'X(Par'|X(P
Representational & Computational Challenges
Constraints require representation of backprojections, functions of continuous and discrete variables
HALP requires solution of (linear) convex problem with infinite number of constraints
Summary of Factored -HALP Algorithm
Factored -HALP formulation
HALP formulation contains infinite number of constraints, one for each state x and action a
Discretization of continuous state and action variables to (1 / 2 + 1) equally spaced values
Total number points per factor exponential only in the dimension of factor
Number of constraints is finite, although exponential in the number of variablesEfficient Solution for Factored -
HALP
1. Discretize continuous state and action variables2. Identify subsets of variables Xi and Ai (Xj and Aj)
that the functions Fi(x, a) (Rj(x, a)) depend on3. Compute Fi(xi, ai) and (Rj(xj, aj)) for all possible
configurations of Xi and Ai (Xj and Aj) 4. Calculate state relevance weights i
5. Use ALP algorithm for factored discrete-valued variables to find the vector of optimal weights w (Guestrin et al. ’01)Near Feasibility Implies Near Optimality
Solution of -HALP likely violates constraints in the HALP
Proposition 2 Let w be an optimal solution of the HALP and w be an optimal solution of the -HALP, such that solution w is -infeasible. Then:
12HV
ˆHV
,1
,1
w
w
w w
AaXx
ax,ax,
,
RFwi
ii
Quality of -HALP Approximation
Theorem 1 Let w be an optimal solution of the -HALP satisfying the -infeasibility condition. Then, for any Lyapunov function L(x):
L1,
T
,1HVmin
1L2
12ˆHV
www
w
Achieving -Infeasibility
Appropriate choice of -grid to achieve -infeasibility
Lipschitz modulus of the discretized functions
GGi
GGiii
ii RFwRFw a,xa,xax,ax,
(xG, aG) is the closest -grid point to the state-action pair (x, a)
Discretize continuous variables using a regular spaced-grid
Formulate a linear program with constraints restricted only to grid points
Solve the LP using an ALP algorithm for factored discrete MDPs
maxMK
Number of factors
Worst-case Lipschitz constant over
functions wiFi(x, a) and Rj(x, a)