open-sourcemultistagestochasticoptimization withr …dinamico2.unibg.it/cms2017/papers/177.pdf ·...

140
Open-Source Multistage Stochastic Optimization with R - Long-term Financial Management Ronald Hochreiter Winter School on Stochastic Programming 2017

Upload: trinhduong

Post on 06-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Open-Source Multistage Stochastic Optimizationwith R - Long-term Financial Management

Ronald Hochreiter

Winter School on Stochastic Programming 2017

Introduction

Giorgio Consigli will talk about a complete multi-stage stochasticoptimization model in Finance, so this talk focus on generalapproaches to Open-Source Multistage StochasticOptimization with R, i.e.

I Scenario Computation: Generation, Reduction, SelectionI Multi-stage Modeling & Scenario Tree Computation

In Finance data (i.e. scenarios and scenario trees) is much moreimportant than the optimization model, which is oftenstraightforward.

What is great about stochastic optimization

I Tradition in the field of OR.I Great and lovely people.I Great conferences.

What is bad about stochastic optimization

I Almost no open-source solutions available.I Many (if not all) papers are complete non-reproducible.I Most PhD students reinvent the wheel (not by choice).I No one uses R.

R

I Software for Statistical Computing & Data ScienceI http://www.r-project.org/I Example of successful open-source softwareI Maturity and stabilityI R Ecosystem: R Core and R PackagesI RStudio http://www.rstudio.com/ and ShinyI R scientists have reproducibility in mindI Probably the only reason not to use R: Optimization

Optimization Modeling & R

Facts about OR/Optimization and R:

I Statisticians love to build optimization models matrix-wise.I CRAN (Optimization Task View) is focussed on solvers.I Optimization under Uncertainty is not covered.I OR people use MatLab/Julia, Python, and/or C++.

Strategy: Simplify and do it - and implement as R packages.

Issue: It is impossible to design a one fits all product/package!

Solution: Split into simple, extensible, lightweight packages.

Optimization modeling gap

I The gap between Academia and the Real-World is all tooobvious.

I This gap is of the reasons why it is hard to sell MSP models tothe industry.

Optimization modeling gap - Academia

I Everything is done for one (or two) research paper(s).I Focus on one specific solution method or one specific solver.

I The model is created in a solver-readable matrix format.I Data and model is mixed (messed) up.I Need mathematical proofs to impress non-scientists.

I PhD students are forced to do the implementation.I If a deadline is missed: you send the project to the next

conference or special issue of a journal.

Optimization modeling gap - Real-World

I Data and model should be separated.I Different groups and persons are working on it.

I Simplifications are crucialI to communicate and maintain the model, and toI implement optimization results into your business process.

I No one cares about how it is solved, i.e.I dirty heuristics are ok, no proofs necessary.

I If a deadline is missed: you lose money, clients, reputation,your wife & life, and so on. . .

Common Issue: Optimization is considered to be a side-product(“someone has to do it”).

Benefits of getting older

Benefits of getting older

One of the main benefits of getting older is that you eventually endup finding a student or colleague who implements something youalways fantasized about!

Benefits of getting older

One of the main benefits of getting older is that you eventually endup finding a student or colleague who implements something youalways fantasized about!

Laura Vana Florian Schwendinger

Portfolio optimization modeling with ROML

Laura Vana, Florian Schwendinger, Ronald Hochreiter

Winter School on Stochastic Programming 2017

Portfolio optimization modeling

A simple and contemporary way to modeland solve complex portfolio optimizationproblems with R.

R packages

I AML (algebraic modeling language) now available in R: ROML(R Optimization Modeling Language) package.

I Access to different solvers for various problem classes based onthe ROI (R Optimization Infrastructure) package (Hornik et al.,2016).

I Portfolio optimization modeling language is built on top of thegeneralized AML: ROML.portfolio package.

All available open-source on https://r-forge.r-project.org/

Modern Portfolio Theory

Calculate an optimal portfolio x out of a assets given a vector ofexpected returns M and a co-variance matrix C subject to furtherconstraints X , i.e. the well-known Markowitz approach:

minimize x x C xT

subject to x ×M ≥ µx ∈ X .

Issues with this approach:

I Uncertainty is just implicitly modeled → deterministic!I QP framework too rigid and specific (OR perspective).I General extensions (CVaR, . . . ) are put on top of this base.

Stochastic Programming - Uncertainty

Stochastic Programming naturally separates the objective andsubjective part of a decision problem.

1. Optimization model specifies the event space at eachdecision stage to integrate objective real-world constraints anddynamics, i.e. the event handling - quantitative aspects of thesolution.

2. Uncertainty model is chosen independently from optimizationmodel to reflect subjective beliefs of the decision taker -qualitative aspects of the solution.

Stochastic Portfolio Optimization

Objective and subjective parts within portfolio optimization:

I X - set of regulatory & organizational constraints.I S - asset return uncertainty model.

Flexible to integrate any risk measure (VaR, Omega, . . . ):

I `x - loss distribution for some portfolio x , i.e. `x =⟨x ,S

⟩.

(Bi-criteria) optimization meta-model:

maximize x Return(`x )minimize x Risk(`x )subject to x ∈ X

ROML.portfolio - Syntax

Minimum variance portfolio:

m <- model()m$variable(portfolio, lb = 0)m$minimize(markowitz(portfolio))m$subject_to(budget_norm(portfolio))

ROML.portfolio - Syntax

Changing the risk-measure to 95% CVaR?

m <- model()m$variable(portfolio, lb = 0)m$minimize(cvar(portfolio, 0.95))m$subject_to(budget_norm(portfolio))

Different solver required (linear instead of quadratic), but this ishandled by other packages, i.e. ROML and ROI.

Keywords - Objective and Constraints

I Extensive set of complex and creative risk and return-riskmeasures implemented, e.g.,

I rewardI markowitzI cvarI madI omegaI sharpeI downside_varI downside_madI minimax_young

I Different constraints: cardinality, turnover, . . .I All standard functionalities of the generalized AML.

Examplesinstall.packages("ROML.portfolio",

repos="http://R-Forge.R-project.org")install.packages("ROML",

repos="http://R-Forge.R-project.org")library(ROML); library(ROML.portfolio)data(djia2013)

m <- model()m$variable(portfolio, lb = -1) # portfolio choice vectorm$maximize(reward(portfolio))m$subject_to(cvar(portfolio, 0.95) <= 0.02)m$subject_to(cvar(portfolio, 0.99) <= 0.03)m$subject_to(portfolio[2] + portfolio[10] +

portfolio[20] <= 0.5)m$subject_to(turnover(portfolio) <= 0.5)solution <- optimize(m, solver = "glpk",

data = list(returns = djia2013))

Examples

m <- model()m$variable(portfolio, lb = 0)m$maximize(omega(portfolio))m$subject_to(cardinality(portfolio) <= 7)m$subject_to(cvar(portfolio, 0.95) <= 0.02)m$subject_to(markowitz(portfolio) <= 0.03^2)solution <- optimize(m, solver = "",

data = list(returns = djia2013))

Thank you

Email [email protected] http://www.hochreiter.net/ronald/More infos http://finance-r.com/portfolio/

Modeling Multi-stage Decision OptimizationProblems under Uncertainty

Ronald Hochreiter

Winter School on Stochastic Programming 2017

Project Overview

I Initiated in February 2005 during two weeks together withTeemu Pennanen in Helsinki.

I Rooted in multi-stage stochastic programming and dependingon scenario trees.

I Presented at ICSP 2007, CMS 2009, OR 2011, and CMS 2012.I Many iterations, many flaws ironed out, . . .I New in 2014: Meta-modeling - not bound to a specific

language (R).

Project Aim

I Avoiding the reinvention of the wheel.I Simplifying the learning process for PhD students entering the

field.I Building an extensive multi-stage model library.I Getting the industry to adopt real multi-stage models.

Design strategy

Simplify and do it!

Multi-stage optimization under uncertainty

Design goal: Modeling language independent of

I optimization modeling approach:I Expectation-based convex multi-stage stochastic programming,I Worst-case optimization,I . . .

I underlying solution technique:I Tree-based deterministic equivalent formulation.I Primal/dual linear decision rules, upper/lower bounds.I . . .

I programming language:I R, Python, Julia, MatLab, . . .

Solution Method - Scenario Tree

(1) Modeler View (Stage)

(2) Stochastic View (Tree)

Root Stage Recourse Stage Terminal Stage

(3) Data View (Node)

Scenario tree-based three-layered approach

Solution Method - Scenario Tree-free

(1) Modeler View (Stage)

(2) Stochastic View (Upper/Lower Approximation)

Root Stage Recourse Stage Terminal Stage

Scenario tree-free approximation

Decision rules, . . .

A simple multi-stage model

Multi-stage stochastic programming example [Heitsch el al. 2006]:

I Compute optimal purchase over time under cost uncertainty.I Uncertain prices are given by Vt .I Decisions xt amounts to be purchased at each time period t.I Objective: minimize expected costs such that a prescribed

amount a is achieved at T .I State variable st : amount held at time t.

A simple multi-stage model

Multi-stage stochastic programming example [Heitsch el al. 2006]:

minimize E(∑T

t=1 Vtxt)

subject to st − st−1 = xt ∀t = 2, . . . ,Ts1 = 0, sT = a,xt ≥ 0, st ≥ 0.

The meta-model (based on AMPL syntax)

deterministic a: T;stochastic V, x, s, objective: 0..T;stochastic non_anticitpativity: 1..T;stochastic root_stage: 0;stochastic terminal_stage: T;

param a; param V;var x >= 0, s >= 0;

maximize objective: E(V * x);subject to non_anticitpativity: s - s(-1) = x;subject to root_stage: s = 0;subject to terminal_stage: s = a;

Model instance in R (modopt.multistage)

parameter(a); parameter(V);variable(x, lb=0); variable(s, lb=0)

maximize("objective", "E(V * x)")subject_to("non_anticitpativity", "s - s(-1) = x")subject_to("root_stage", "s = 0")subject_to("terminal_stage", "s = a")

deterministic("T", a)stochastic("0..T", V, x, s, "objective")stochastic("1..T", "non_anticipativity")stochastic("0", "root_stage")stochastic("T", "terminal_stage")

Meta-modeling language - stochastic additions

Based on AMPL with additional keywords for stochasticitydefinitions:

I deterministic object-list: stage-set;I stochastic object-list: stage-set;

In terms of scenario trees: stochastic parameters are defined on theunderlying tree node structure, and deterministic parameters aredefined on the stage structure, i.e. same value for all nodes in therespective stage.

I objective: expectation functional E(variable).I recourse definition: variable(recourse-depth).

Stage-sets can be overridden explicitly, e.g. E(wealth, T).

Meta-modeling language - further extensions

General extensions:

I objective: quantiles Q(variable, alpha).I objective/constraints: application-related risk measures

CVaR(variable, alpha),I probabilistic constraints P(variable <= alpha).

Worst case optimization:

I simplified description of uncertainty sets (ROME),I new objective function operator(s).

Stage-set parsing, node-set creation

I Node-sets Nt , include all tree nodes of stage t.I Add one stage-set for the whole horizon (0..T ).I Parse all stage-sets defined

I direcly with keywords stochastic and deterministic, andI within the objective functional E().

I For each stage-set - given one specific scenario tree - createappropriate node-sets containing all nodes of the respectivestages.

Multi-stage Scenario Tree Format

I No common standard for representing discretized stochasticprocesses available, mainly because of the lack of commercialinterest.

I To accommodate all different tree structures, and the implicitformulation of non-anticipativity constraints, a node-basedvector/matrix data format of scenario trees is proposed:

V (n, d) d-dimensional value of node nA(n) ancestor node of node nT (n) stage of node nP(n) probability to reach node n from its ancestor

Conversion example - Stage and node-sets

Example: Simple three-stage (t = 0, 1, 2) binary tree, (uni-variate)starting value: 10. Up 1 with p = 0.6 and down 1 with p = 0.4, i.e.n 0 1 2 3 4 5 6V[n] 10 11 9 12 10 10 8A[n] 0 0 1 1 2 2T[n] 1 1 2 2 2 2P[n] 1 0.6 0.4 0.6 0.4 0.6 0.4Z[n] 1 0.6 0.4 0.36 0.24 0.24 0.16

Node- and stage-sets: Using the above inventory example:Model Node-Set Stages Nodes(0..T) 0 0 1 2 0 1 2 3 4 5 60 1 0 0T 2 2 3 4 5 61..T 3 1 2 1 2 3 4 5 6

Conversion example - Variables and Parameters

I Add (reserved) scenario tree variables V, A, P, Z, T.I Replace stochastic parameters and variables by a node-set

definition, andI replace deterministic parameters and variables by stage-set

definitions.

deterministic a: T;stochastic V, x, s: 0..T;param a, V;var x >= 0, s >= 0;

param a[stageSet2], V[nodeSet0];var x[nodeSet0] >= 0, s[nodeSet0] >= 0;

Conversion example - Constraints

For each stochastic constraint add deterministic equivalentconstraints as nodes given the respective node set, i.e.

stochastic root_stage: 0;subject to root_stage: s = 0;

subject to root_stage{n in node_set1}: s[n] = 0;

Conversion example - Recourse constraints

Deterministic parameters in stochastic constraints make use of thestage mapping information T[n]:

stochastic terminal_stage: T;subject to terminal_stage: s = a;

subject to terminal_stage{n in node_set2}: s[n] = a[T[n]];

Conversion example - Recourse constraints

Recourse constraints make use of the anchestor information A[n].Higher depths are integrated recursively.

stochastic non_anticitpativity: 1..T;subject to non_anticitpativity: s - s(-1) = x;

subject to non_anticitpativity{n in node_set3}:s[n] - s[A[n]] = x[n];

Finally: No explicit tree formulation in the model anymore.

Conversion example - Objective function

Objective function replacements, i.e. adding tree information andbased replacing E() by sums using the stage probabilities Z[n]:

maximize objective_function: E(x, 0..T);

maximize objective_function:( sum{n in nodeSet0}: Z[n] * ( V[n] * x[n] ) );

ALM example - Stage-based notation

maximizex E(wT ) + κ(γ − E( zT1−α))

subject to∑

a∈A xa ≤ β = w (t = 0)∀a ∈ A : xa ≤ Vax (−1)

a + ba − sa (t = 1, . . . ,T − 1)∑a∈A ba ≤

∑a∈A sa (t = 1, . . . ,T − 1)

w =∑

a∈A xa + f (t = 1, . . . ,T − 1)w =

∑a∈A Vax (−1)

a + f (t = T )z ≥ γ − w (t = T )

ALM example - Meta-model (1)

param alpha; param beta; param kappa;param assets; set Asset := 1 .. assets;param V{Asset};

var x{Asset} >= 0, b{Asset} >= 0, s{Asset} >= 0, w >= 0;var f; var g; var z >= 0;

maximize objective: E(w)+ kappa * ( g - ( E(z / ( 1 - alpha )) ) );

subject to cvar: z >= g - w;subject to init_budget: ( sum{a in Asset} x[a] ) <= beta;subject to init_wealth: ( sum{a in Asset} x[a] ) == w;

ALM example - Meta-model (1b)

param alpha; param beta; param kappa;param assets; set Asset := 1 .. assets;param V{Asset};

var x{Asset} >= 0, b{Asset} >= 0, s{Asset} >= 0;var w >= 0, f;

maximize objective: E(w) + kappa * CVaR(w, alpha);

subject to init_budget: ( sum{a in Asset} x[a] ) <= beta;subject to init_wealth: ( sum{a in Asset} x[a] ) == w;

ALM example - Meta-model (2)

subject to trading{a in Asset}:x[a] <= ( V[a] * x(-1, a) ) + b[a] - s[a];

subject to buy_sell:( sum{a in Asset} b[a] ) <= ( sum{a in ASSET} s[a] );

subject to wealth: w <= ( sum{a in Asset} x[a] ) + f;subject to final_wealth:

w <= ( sum{a in Asset} V[a] * x(-1, a) ) + f;

ALM example - Meta-model (3)

deterministic f: 1..T;

stochastic init_budget, init_wealth: 0;stochastic x, w: 0..T;stochastic b, s: 1..T-1;stochastic V, trading, buy_sell, wealth: 1..T;stochastic z, objective, cvar, final_wealth: T;

Thank you

Email [email protected] http://www.hochreiter.net/ronald/

Some remarks on Scenario Computation

Ronald Hochreiter

Winter School on Stochastic Programming 2017

Scenario Computation

Definitions often mixed up, so here is my take:

I Scenario Computation: Meta-level.I Scenario Generation: Generating a scenario structure out of

(mostly continuous) specification, e.g. build a tree directlyusing the definition of a stochastic process.

I Scenario Tree Generation: Generating a scenario tree structureout of sampled scenario paths.

I Scenario Reduction: Reduce the cardinality without modifyingthe structure of the scenario set, i.e. reducing a discreteprobablity distribution or the number of branches & nodes of ascenario tree.

I Scenario Selection: Directly pick a certain number of scenariosout of the original set.

Scenario Generation - Numerical Example

Consider the following setup:

I Classical Markowitz Minimum Variance Portfolio OptimizationI All 30 Assets from the Dow Jones Industrial AverageI Data from the beginning of 2012 until the end of 2016I Compute weekly returns on a daily basis: 1253 scenariosI Asset means and Covariance matrix out of historical data

Scenario Generation - Numerical Example

AAPL

KO

DDXOMIBM

JNJ

MCD

MRK

NKE

PFE

PG

UNH

VZ

WMT

Minimum Variance Portfolio

Scenario Generation - Numerical Example

AAPL KO DD XOM IBM JNJ MCD MRK NKE PFE PG UNH VZ WMT

0.00

0.05

0.10

0.15

Minimum Variance Portfolio

Scenario Generation - Numerical Example

Now check the effect of scenario generation, i.e. we sample out of amulti-variate normal distribution with the given mean vector andCovariance matrix.

I Our original data set had 1253 scenarios.I We sample from 100 to 900 with a stepsize of 100.

Scenario Generation - Numerical Example

0.00

0184

0.00

0188

0.00

0192

0.00

0196

Sample Size

Obj

ectiv

e

100

200

300

400

500

600

700

800

900

Real objective value

Scenario Generation - Numerical Example

Question: Is it appropriate to check the objective function only?Should this be considered as the only measure to indicate thequality of a scenario computation method?

Scenario Generation - Numerical Example

Question: Is it appropriate to check the objective function only?Should this be considered as the only measure to indicate thequality of a scenario computation method?

I Some people check the distance of theapproximation/generation.

I However, most important: Decision

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=100)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=200)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=300)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=400)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=500)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=600)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=700)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=800)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Generation - Numerical Example

Asset Weight Difference (sample size n=900)

−0.

10−

0.05

0.00

0.05

0.10

Scenario Reduction

I Reduce the cardinality without modifying the structure.I This is what most people do: quite some pitfalls included!I Single-stage and two-stage case: Clustering, which comes in

two main flavors:I Partitional Clustering (k-Means, . . . )I Hierarchical Clustering

I Different distances possible.I Problem: It is a greedy strategy, so you never end up at a good

(probability) distance approximation.

Scenario Reduction - Open Source

Due to popular request Scenario Reduction has been added to theopen-source R package scenarios. Implemented algorithms:

I k-Means most people use thisI More sophisticated partitioning algorithmI Hierachical join

Available Distances:

I `1 / Wasserstein / . . .I `2 / Fortet Mourier of order 2 / . . .

Scenario Reduction - Numerical example

Consider the following setup:

I All 30 Assets from the Dow Jones Industrial AverageI Data from the beginning of 2012 until the end of 2016I Compute weekly returns on a daily basis: 1253 scenariosI Stochastic Portfolio Optimization using MAD and CVaR(0.05)

Scenario Reduction - Numerical example

We reduce the scenario set to sizes 100 to 900 with stepsize 100using two different algorithms:

I Partitioning (P), andI Hierarchical (H)

with the `1 (1) and `2 (2) distance.

Scenario Reduction - Numerical example

−0.

030

−0.

035

−0.

040

−0.

045

Reduction Size

CV

aR

H1H2P1P2

100

200

300

400

500

600

700

800

900

Objective - CVaR

Scenario Reduction - Numerical example

0.01

20.

013

0.01

40.

015

0.01

60.

017

0.01

8

Reduction Size

MA

D

H1H2P1P2

100

200

300

400

500

600

700

800

900

Objective - MAD

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=100)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=200)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=300)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=400)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=500)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=600)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=700)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=800)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=900)

−0.

10−

0.05

0.00

0.05

0.10

CVaR, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=100)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=200)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=300)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=400)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=500)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=600)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=700)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=800)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Generation - Numerical ExampleAsset Weight Difference (sample size n=900)

−0.

10−

0.05

0.00

0.05

0.10

MAD, `1, Partitional (left), Hierarchical (right)

Scenario Selection

(Open-Source) Multi-stage Scenario Generation

Ronald Hochreiter

Winter School on Stochastic Programming 2017

Issues

Overcomplication

In order to publish papers in research journals, multi-stage treegeneration methods are often presented in an utterly complicatedway, although the actual algorithms are often pretty simple andstraightforward.

Issues

Overcomplication

In order to publish papers in research journals, multi-stage treegeneration methods are often presented in an utterly complicatedway, although the actual algorithms are often pretty simple andstraightforward.

Available (open-source) code

Almost no open-source code is available and the binaries (ifavailable) are also hard to use and somewhat restricted (e.g. to acertain OS).

IssuesOvercomplication

In order to publish papers in research journals, multi-stage treegeneration methods are often presented in an utterly complicatedway, although the actual algorithms are often pretty simple andstraightforward.

Available (open-source) code

Almost no open-source code is available and the binaries (ifavailable) are also hard to use and somewhat restricted (e.g. to acertain OS).

The reinvention of the wheelDue to the two previously mentioned issues - overcomplication andnon-availability of code - most researchers start everything fromscratch.

Issues

Heuristic tree-building

In case of multi-stage trees, the heuristic technique is moreimportant than the chosen distance!

Believe it or not

I Take Moment Matching (MM) and Wasserstein distance (WS)minimization, and two tree building methods, i.e.

I node-stage tree/approximation (NST/A) and scenariomerging/forward (SM/F).

I The NST/A-WS and NST/A-MM trees will be much closerthan the NST/A-MM and SM/F-MM trees!

Multi-stage Scenario Tree Generation

(1) Modeler View (Stage)

(2) Stochastic View (Tree)

Root Stage Recourse Stage Terminal Stage

(3) Data View (Node)

Scenario tree-based three-layered approach

Multi-stage Scenario Tree Generation

In general, there are two strategies, which can be used to design amulti-stage scenario tree generator:

I Direct scenario tree sampling. Based on historical data oreconometric model use sampling or node-wise approximationmethods to build the scenario tree iteratively from the rootnode to the terminal stage.

I Scenario path simulation and optimal treeapproximation. Use pre-sampled scenario paths to build anoptimal approximation of this data set. Different methods areavailable, which try to preserve the time-dependency of theunderlying process.

Scenario Tree Optimization - Distance Concept

Optimal approximations need to be based on some distance concept.The choice of this distance has different origins. It may be

I based on subjective taste, e.g. Moment Matching: Wallace etal., . . .

I selected due to theoretical (stability) considerations -probability metric minimization: Pflug et al., Römisch et al.,. . .

I predetermined by chosen approximation method, e.g. Sampling:(R)QMC - Pennanen et al., SAA - Shapiro et al., . . .

Note. Once the appropriate distance has been selected, theproblem of choosing the appropriate heuristic to approximate thechosen distance is prevalent.

Multi-stage Scenario Tree Structure

Different scenario tree methods allow for different scenario treestructures. In general, there are two different types possible:

I Nodes per stage. The scenario structure is represented by avector denoting the successors of each node in the respectivestage, e.g. N(5, 3, 2) results in a scenario tree with 30 scenariosand 1 + 5 + 15 + 30 = 51 nodes.

I Nodes at stage. The scenario structure is represented by avector containing the total number of nodes in each stage, e.g.S(10, 20, 30) results in a scenario tree with 30 scenarios and1 + 10 + 20 + 30 = 61 nodes.

The first is common for root-to-terminal iterative node-per-nodetechniques, while the second is used by generation methods, whichtry to find the optimal per-node branching structure in addition toapproximating tree values.

Multi-stage Scenario Tree Format

I No common standard for representing discretized stochasticprocesses available, mainly because of the lack of commercialinterest.

I To accommodate all different tree structures, and the implicitformulation of non-anticipativity constraints, a node-basedvector/matrix data format of scenario trees is used:

V (n, d) d-dimensional value of node nA(n) ancestor node of node nT (n) stage of node nP(n) probability to reach node n from its ancestor

Multi-stage Scenario Tree Generation

Node-based scenario generation:

I NST: Node (by Node)-Stage (by Stage) Tree.I NST/B: Bootstrapping.I NST/A: Approximation.I NST/P: Process estimation & sampling.

Scenario-based scenario generation:

I SM: Scenario Merging - forward (SM/F) and backward(SM/B).

I SC/F: Forward-based Scenario Clustering.I FL/B: Backward-based multi-dimensional Facility Location

Tree Generation.

NST: Node (by Node)-Stage (by Stage) Tree

NST/B - Node-Stage Tree/Bootstrapping

Input: Historical return time-series S of length `, stage-skip size k,node-based tree structure N(n1, . . . , nT ).

For each stage t = 1, ..., T:For each node in stage t:

Sample random n(t) values fromS(t*k : l) (& all predecessor values)

I Advantage: Simple, for highly multi-variate structures thehistorical dependence can be kept cheaply.

I Disadvantage:I Trees might differ significantly in the univariate case.I NST/B cannot be used for a huge amount of stages, if the

time-series are not long enough.

NST/B - Node-Stage Tree/BootstrappingBootstrapping problem with small trees for IBM prices, N(3,3),k = 20:

NST/A - Node-Stage Tree/Approximation

Input: Historical return time-series S of length `, stage-skip size k,node-based tree structure N(n1, . . . , nT ).

For each stage t = 1, ..., T:For each node in stage t:

Calculate an optimal single-stage approximation ofn(t) values from S(t*k : l) (& all predecessor values)

I Advantage: For highly multi-variate structures the historicaldependence might be kept - depending on the approximation.

I Disadvantage: Cannot be used for a huge amount of stages, ifthe time-series are not long enough, needs an appropriateapproximation method (distance and heuristic).

NST/A - Node-Stage Tree/Approximation

I Approximation: K-Means.I N(3,3) and N(10,10), k = 30.

NST/P - Node-Stage Tree/Process

Input: Process, estimated parameters, node-based tree structureN(n1, . . . , nT ).

Example: A univariate process ξ might be normally distributed andfollows an additive recursion, i.e.

ξ0 = µ0, ξ1 ∼ N(µ0, σ20), ξt = bξt−1 + εt , εt ∼ N(µ, σ2),

whereµ = µ0(1 − b), σ = σ0(1 − b2),

and ξt and εt are independent. ξ is a stationary Gaussian Markovprocess.

I Advantage: Simple, flexible (choice of process).I Disadvantage: Loss of time dependency, generally unfavorable

convergence behavior.

Scenario-based scenario generation

Estimating stock returns, sampling 200 paths (length 100):

I Econometric time-series models ((V)AR(I)MA, (G)ARCH, . . . )I Specialized models (e.g. Wilkie model for actuarial use)I Custom (company-specific) scenario generators

Scenario Merging - forward and backward

Input: Set of simulated paths, stage-based tree structureS(s1, . . . , sT ), one specific distance function.

Calculate distance of each scenario to each other with chosen distance.for t = 1, ..., T (forward) or t = T, ..., 1(backward)

while there are more nodes than $s_t$Merge two scenarios closest to each other.

I Advantage: Rather straightforward, quite efficient and flexible(distance calculation only once at the beginning), sameprocedure for forward and backward generation.

I Disadvantage: Distance calculation can be computationallyexpensive.

Scenario Merging - forward and backward

Simple example: 4 scenarios - build S(2, 3, 4) tree.

Scenario Merging - forward and backward

Simple example: 4 scenarios - build S(2, 3, 4) tree.

Scenario Merging - forward and backward

Simple example: 4 scenarios - build S(2, 3, 4) tree.

Scenario Merging - F (black) and B (red)

Using time-series simulations, building a S(10, 20, 30) tree:

Scenario Merging - F (black) and B (red)

Using time-series simulations, building a S(10, 20, 30, 40, 50) tree:

SC/F - Forward-based Scenario Clustering

Advantage:

I Rather fast computation (approximation problems are small -only the first stage approximation might take some time).

I Can be used for scenario trees with a large amount of stages(more than 20, up to 500).

Disadvantage:

I Thinning out: at some stage, there might occur non-branchingscenarios.

I Implementation is quite tricky.

SC/F - Forward-based Scenario Clustering

Step One: Simulate trajectories.

SC/F - Forward-based Scenario Clustering

Step Two: Calculate approximation of first stage.

SC/F - Forward-based Scenario Clustering

Step Three: For subsequent stages, use clustered paths only.

SC/F - Forward-based Scenario Clustering

Step Four: Iterative stage-wise clustering.

FL/B: Multi-dimensional Facility Location

Advantage:

I The first approximation needs to be done by an optimal facilitylocation approximation of dimension T × d .

I Very good convergence results.

Disadvantage:

I Underlying concept of facility location might be considered asan overkill.

I Choice of distance and approximation heuristic.I Implementation is quite tricky.

Simulated paths and facility location view (for 2 stages)

Important decision: point in time of discrete stages, i.e. in thisexample:

I Simulated daily future values: January - April.I Two Stages: February, 1st and April, 1st.

Simulated paths and facility location view (for 2 stages)

Simulated paths and facility location view (for 2 stages)

Input: 400 Scenarios. Requested Output: 1-12-40 scenario tree.

Facility location - Step One: Clustering

Facility location - Step Two: Projection

Facility location - Step Two: Projection

Facility location - Step Three: Clustering

Facility location - Step Four: Tree buildup

Facility location - Final scenario tree

Final scenario tree in

Facility location view Scenario tree view

Facility location - Each stage adds a dimension

I Simulated daily future values: January - April.I Three Stages: February 1st, March 1st, and April 1st.

Open-Source Implementation

Open Source R Package for multi-stage stochastic programs:

I Learning environment.I Prototype testing environment.I Open-source, free.

Not designed for production.

I Reducing scenario generation to the bare minimum.I Provide clear, extensible code.

Open-Source Implementation

It’s implemented in R, but MatLab users rejoice:

I matlab.read.treeI matlab.write.tree

Furthermore, R can be used in batch mode on the shell, i.e. easilyintegrated into other workflows. However:

I using a completely open-source alternative makes sense.I working with S3 scenario (tree) classes is very convenient.

Open-Source Implementation

1. Install R - http://www.r-project.org/2. Install RStudio IDE - http://www.rstudio.com/

On the R shell, simply type

install.packages("devtools")library(devtools)install_github("scenarios", "rhochreiter")

and you are ready to build multi-stage scenario trees.

Easy tree generation and plotting

tree1 <- nst.p_inventory(c(5, 3, 2), 4, 0.6, 0.8)plot(tree1)

3.0

3.5

4.0

4.5

5.0

Stage

Val

ue

0 1 2 3

Easy tree generation and plotting

tree2 <- nst.p_inventory(c(10, 5, 5), 4, 0.6, 0.8)plot(tree2)

2.5

3.0

3.5

4.0

4.5

5.0

Stage

Val

ue

0 1 2 3

Scenario Simulation

b <- block.bootstrap(returns, 30, 20)plot(return2price(b, as.numeric(Ad(series[1]))))

0 200 600 1000

800

1000

1200

1400

1600

Index

His

toric

al V

alue

s

800

1000

1200

1400

1600

Index

Sim

ulat

ed V

alue

s

0 4 8 13 18 23 28

Thank you

Email [email protected] http://www.hochreiter.net/ronald/Web http://www.github.com/rhochreiter/scenarios/