model risk help

8/19/2019 Model Risk Help

http://slidepdf.com/reader/full/model-risk-help 1/755

ModelRisk Help

This is a cut-down version of the ModelRisk Help file for printing. It explains ModelRisk functionality for all featuresavailable in the Standard, Professional and Industrial editions. A compendium of the over 100 distributions available

in ModelRisk is available separately in PDF form from www.vosesoftware.com/content/ebookmr4.pdf . The fullModelRisk help file that includes the distributions and the risk modeling theory is installed together with the ModelRisk

software. A complete version can also be found online at www.vosesoftware.com/ModelRiskHelp. The ModelRisk

VBA help file is available from the Help drop-down menu within ModelRisk.



iii

Table of ContentsModelRisk 5 ............................................................................................................................................... 1

Upgrading your edition of ModelRisk ........................................................................................................ 3

Three ModelRisk editions ....................................................................................................................... 3

The trial version of ModelRisk ................................................................................................................ 4

How to purchase a copy of ModelRisk ................................................................................................... 4

Your First Model......................................................................................................................................... 5

Add distributions ..................................................................................................................................... 5

Define inputs .......................................................................................................................................... 7

Define outputs ........................................................................................................................................ 9

Run the model ...................................................................................................................................... 10

View the results .................................................................................................................................... 10

Sensitivity analysis ............................................................................................................................... 12

From analysis to decision ..................................................................................................................... 14

Next steps in learning to use ModelRisk and risk analysis .................................................................. 14

Distributions ............................................................................................................................................. 17

Distributions in ModelRisk .................................................................................................................... 17

Distribution functions and the U parameter .......................................................................................... 20

Select Distribution ................................................................................................................................ 22

Multivariate distributions....................................................................................................................... 25

Distribution editing functions ................................................................................................................ 37

Modeling with objects ........................................................................................................................... 40

Probability calculations in ModelRisk ................................................................................................... 43

Vose probability calculation f(x) F(x) and F-1(U) windows .................................................................. 47

Modeling expert opinion in ModelRisk ................................................................................................. 49

Expert Window ..................................................................................................................................... 51

Combined Distribution .......................................................................................................................... 57

VoseCombined ..................................................................................................................................... 59

Splicing Distributions ............................................................................................................................ 60

VoseSplice ........................................................................................................................................... 61

Risk Event Calculation ......................................................................................................................... 62

VoseRiskEvent ..................................................................................................................................... 63

Correlation and copulas ........................................................................................................................... 66

Correlation in ModelRisk ...................................................................................................................... 66



ModelRisk Help printable

iv

Copulas in ModelRisk .......................................................................................................................... 67

Bivariate Copula ................................................................................................................................... 70

Direction of a copula ............................................................................................................................ 73

VoseCopulaBiClayton .......................................................................................................................... 74

VoseCopulaBiFrank ............................................................................................................................. 76

VoseCopulaBiGumbel .......................................................................................................................... 78

VoseCopulaBiNormal ........................................................................................................................... 80

VoseCopulaBiT .................................................................................................................................... 82

Multivariate Copula .............................................................................................................................. 84

VoseCopulaMultiClayton ...................................................................................................................... 86

VoseCopulaMultiFrank ......................................................................................................................... 87

VoseCopulaMultiGumbel ..................................................................................................................... 88

VoseCopulaMultiNormal ...................................................................................................................... 90

VoseCopulaMultiT ................................................................................................................................ 91

VoseCopulaSimulate ............................................................................................................................ 92

VoseCopulaData .................................................................................................................................. 93

VoseCopulaDataSeries ........................................................................................................................ 94

Time Series .............................................................................................................................................. 97

Time series in ModelRisk ..................................................................................................................... 97

Univariate Time Series ....................................................................................................................... 100

VoseTimeAR1 .................................................................................................................................... 103

VoseTimeAR2 .................................................................................................................................... 104

VoseTimeMA1 .................................................................................................................................... 105

VoseTimeMA2 .................................................................................................................................... 106

VoseTimeARCH ................................................................................................................................. 107

VoseTimeARMA ................................................................................................................................. 108

VoseTimeEGARCH ............................................................................................................................ 110

VoseTimeAPARCH ............................................................................................................................ 112

VoseTimeGARCH .............................................................................................................................. 114

VoseTimeGBM ................................................................................................................................... 116

VoseTimeGBMAJ ............................................................................................................................... 117

VoseTimeGBMAJVR .......................................................................................................................... 119

VoseTimeGBMJD .............................................................................................................................. 121

VoseTimeGBMJDMR ......................................................................................................................... 122



Table of Contents

v

VoseTimeGBMMR ............................................................................................................................. 123

VoseTimeGBMVR .............................................................................................................................. 124

Multivariate Time Series..................................................................................................................... 125

VoseMarkovMatrix ............................................................................................................................. 128

VoseMarkovSample ........................................................................................................................... 129

VoseTimeMultiAR1 ............................................................................................................................ 130

VoseTimeMultiAR2 ............................................................................................................................ 131

VoseTimeMultiBEKK .......................................................................................................................... 133

VoseTimeMultiGBM ........................................................................................................................... 134

VoseTimeMultiMA1 ............................................................................................................................ 136

VoseTimeMultiMA2 ............................................................................................................................ 137

VoseTimeSimulate ............................................................................................................................. 139

VoseTimeYule .................................................................................................................................... 140

VoseTimeDeath ................................................................................................................................. 141

Wilkie Models ..................................................................................................................................... 142

VoseTimeWilkie ................................................................................................................................. 145

VoseTimeDividends ........................................................................................................................... 146

VoseTimeDividendsA ......................................................................................................................... 147

VoseTimeLongTermInterestRate ....................................................................................................... 148

VoseTimeLongTermInterestRateA .................................................................................................... 149

VoseTimePriceInflation ...................................................................................................................... 150

VoseTimeSeasonalGBM .................................................................................................................... 151

VoseTimeShareYields ........................................................................................................................ 153

VoseTimeShareYieldsA ..................................................................................................................... 154

VoseTimeShortTermInterestRate ...................................................................................................... 155

VoseTimeShortTermInterestRateA .................................................................................................... 156

VoseTimeWageInflation ..................................................................................................................... 158

VoseTimeWageInflationA ................................................................................................................... 159

Subject Matter Expert (SME) Time Series Forecasts ........................................................................ 160

VoseTimeSME2Perc .......................................................................................................................... 161

VoseTimeSMEPoisson ...................................................................................................................... 163

VoseTimeSMESaturation ................................................................................................................... 166

VoseTimeSMEThreePoint .................................................................................................................. 169

VoseTimeSMEUniform ....................................................................................................................... 171




vi

VoseTimeEmpiricalFit ........................................................................................................................ 173

Aggregate modeling ............................................................................................................................... 175

Aggregate modeling in ModelRisk ..................................................................................................... 175

Aggregate Monte Carlo ...................................................................................................................... 178

VoseAggregateMC ............................................................................................................................. 180

Aggregate FFT ................................................................................................................................... 182

VoseAggregateFFT ............................................................................................................................ 184

Aggregate Multivariate Monte Carlo .................................................................................................. 185

VoseAggregateMultiMC ..................................................................................................................... 187

Aggregate Multivariate FFT................................................................................................................ 188

VoseAggregateMultiFFT .................................................................................................................... 190

Aggregate De Pril ............................................................................................................................... 191

VoseAggregateDePril ......................................................................................................................... 194

Aggregate Discrete window ............................................................................................................... 196

VoseAggregateDiscrete ..................................................................................................................... 199

Aggregate Panjer ............................................................................................................................... 201

VoseAggregatePanjer ........................................................................................................................ 204

Stop Sum............................................................................................................................................ 206

VoseStopSum .................................................................................................................................... 208

Sum Product ...................................................................................................................................... 209

VoseSumProduct ............................................................................................................................... 211

VoseAggregateDeduct ....................................................................................................................... 212

VoseAggregateMoments .................................................................................................................... 213

VoseAggregateMultiMoments ............................................................................................................ 215

VoseAggregateProduct ...................................................................................................................... 216

VoseAggregateTranche ..................................................................................................................... 218

Optimization ........................................................................................................................................... 220

The OptQuest Optimizer .................................................................................................................... 220

Defining Targets in an Optimization Model ........................................................................................ 222

VoseOptTargetMaximize .................................................................................................................... 225

VoseOptTargetMinimize..................................................................................................................... 226

VoseOptTargetValue .......................................................................................................................... 227

Defining Decision Variables in an Optimization Model ...................................................................... 228

VoseOptDecisionBoolean .................................................................................................................. 231



Table of Contents

vii

VoseOptDecisionContinuous ............................................................................................................. 232

VoseOptDecisionDiscrete .................................................................................................................. 233

VoseOptDecisionList .......................................................................................................................... 234

Defining Decision Constraints in an Optimization Model ................................................................... 235

VoseOptConstraintMin ....................................................................................................................... 238

VoseOptConstraintMax ...................................................................................................................... 239

VoseOptConstraintBetween ............................................................................................................... 240

VoseOptConstraintEquals .................................................................................................................. 241

VoseOptConstraintString ................................................................................................................... 242

Defining Simulation Requirements in an Optimization Model ............................................................ 243

VoseOptRequirementMin ................................................................................................................... 245

VoseOptRequirementMax .................................................................................................................. 246

VoseOptRequirementBetween .......................................................................................................... 247

VoseOptRequirementEquals ............................................................................................................. 248

VoseOptPercentile ............................................................................................................................. 249

VoseOptCVARx ................................................................................................................................. 250

VoseOptCVARp ................................................................................................................................. 251

Optimization Settings Dialog .............................................................................................................. 252

Optimization Progress control ............................................................................................................ 254

Optimization Results Window ............................................................................................................ 255

Fitting models to data ............................................................................................................................ 259

Fitting in ModelRisk ............................................................................................................................ 259

Goodness of fit functions .................................................................................................................... 265

Distribution Fit .................................................................................................................................... 273

VoseTruncData .................................................................................................................................. 277

Bivariate Copula Fit ............................................................................................................................ 278

Multivariate Copula Fit ....................................................................................................................... 280

Empirical Copula ................................................................................................................................ 282

Univariate Time Series Fit .................................................................................................................. 284

Multivariate Time Series Fit................................................................................................................ 287

Ordinary Differential Equations (ODE) .................................................................................................. 289

Ordinary Differential Equations .......................................................................................................... 289

Ordinary Differential Equations (ODE) tool ........................................................................................ 291

VoseODE ........................................................................................................................................... 304




viii

Other tools ............................................................................................................................................. 305

View Function ..................................................................................................................................... 305

Deduct Calculation ............................................................................................................................. 306

Data Viewer ........................................................................................................................................ 308

Extreme Values Calculation ............................................................................................................... 318

Find Vose Functions .......................................................................................................................... 320

Vose Ogive window ............................................................................................................................ 321

Simulation Settings Window............................................................................................................... 323

Output/Input Window .......................................................................................................................... 328

Simulation Progress Control .............................................................................................................. 333

ModelRisk Results Viewer layout ....................................................................................................... 335

ModelRisk’s Library ............................................................................................................................ 344

Portfolio Optimization ......................................................................................................................... 355

Data Object Window .......................................................................................................................... 357

Ruin Calculation ................................................................................................................................. 362

Depletion Calculation ......................................................................................................................... 364

Integrate Calculation .......................................................................................................................... 366

Interpolate Calculation ....................................................................................................................... 367

Correlation Matrix Calculation ............................................................................................................ 368

Bayesian Model Averaging .................................................................................................................... 369

Bayesian model averaging ................................................................................................................. 369

VoseBMA ........................................................................................................................................... 371

VoseBMAObject ................................................................................................................................. 372

VoseBMAProb .................................................................................................................................... 373

VoseBMAProb10 ................................................................................................................................ 374

VoseCopulaBMA ................................................................................................................................ 375

VoseCopulaBMAObject ..................................................................................................................... 376

VoseTimeBMA ................................................................................................................................... 377

VoseTimeBMAObject ......................................................................................................................... 378

Six Sigma ............................................................................................................................................... 379

ModelRisk's Six Sigma functions ....................................................................................................... 379

VoseSixSigmaCp ............................................................................................................................... 384

VoseSixSigmaCpk ............................................................................................................................. 385

VoseSixSigmaCpkLower .................................................................................................................... 386



Table of Contents

ix

VoseSixSigmaCpkUpper .................................................................................................................... 387

VoseSixSigmaCpm ............................................................................................................................ 388

VoseSixSigmaDefectPPM .................................................................................................................. 389

VoseSixSigmaDefectShiftPPM .......................................................................................................... 390

VoseSixSigmaDefectShiftPPMLower ................................................................................................ 391

VoseSixSigmaDefectShiftPPMUpper ................................................................................................ 392

VoseSixSigmaK ................................................................................................................................. 393

VoseSixSigmaLowerBound................................................................................................................ 394

VoseSixSigmaProbDefectShift........................................................................................................... 395

VoseSixSigmaProbDefectShiftLower ................................................................................................. 396

VoseSixSigmaProbDefectShiftUpper ................................................................................................. 397

VoseSixSigmaSigmaLevel ................................................................................................................. 398

VoseSixSigmaUpperBound................................................................................................................ 399

VoseSixSigmaYield ............................................................................................................................ 400

VoseSixSigmaZlower ......................................................................................................................... 401

VoseSixSigmaZmin ............................................................................................................................ 402

VoseSixSigmaZupper ........................................................................................................................ 403

Other functions....................................................................................................................................... 404

Bootstrap ............................................................................................................................................ 405

Extreme value .................................................................................................................................... 415

Simulation results ............................................................................................................................... 422

Distribution properties ........................................................................................................................ 441

Data analysis ...................................................................................................................................... 457

VoseCholesky .................................................................................................................................... 473

VoseCLTSum ..................................................................................................................................... 474

VoseCorrMatrix .................................................................................................................................. 475

VoseCorrMatrixU ................................................................................................................................ 476

VoseCorrToCov ................................................................................................................................. 477

VoseCovToCorr ................................................................................................................................. 479

VoseCurrentSample ........................................................................................................................... 481

VoseCurrentSim ................................................................................................................................. 482

VoseDataObject ................................................................................................................................. 483

VoseDeduct ........................................................................................................................................ 484

VoseDepletion .................................................................................................................................... 486




x

VoseDepletionFlag ............................................................................................................................. 488

VoseDepletionShortfall ....................................................................................................................... 489

VoseDepletionTime ............................................................................................................................ 490

VoseDescription ................................................................................................................................. 491

VoseDominance ................................................................................................................................. 492

VoseEigenValues ............................................................................................................................... 494

EigenVectors ...................................................................................................................................... 495

VoseExpression ................................................................................................................................. 496

VoseIdentity ....................................................................................................................................... 497

VoseInput ........................................................................................................................................... 498

VoseIntegrate ..................................................................................................................................... 499

VoseInterpolate .................................................................................................................................. 500

VosejkProduct .................................................................................................................................... 501

VosejkSum ......................................................................................................................................... 502

VosejProduct ...................................................................................................................................... 503

VosejSum ........................................................................................................................................... 504

VosejSumInf ....................................................................................................................................... 505

Kendall's tau ....................................................................................................................................... 506

VoseLibAssumption ........................................................................................................................... 507

VoseLibReference .............................................................................................................................. 508

VoseMeanExcessP ............................................................................................................................ 509

VoseMeanExcessX ............................................................................................................................ 510

VoseOutput ........................................................................................................................................ 511

VoseParameters ................................................................................................................................ 512

VosePrincipleEsscher ........................................................................................................................ 513

VosePrincipleEV ................................................................................................................................ 514

VosePrincipleRA ................................................................................................................................ 515

VosePrincipleStdev ............................................................................................................................ 516

VoseRuin ............................................................................................................................................ 517

VoseRuinFlag ..................................................................................................................................... 519

VoseRuinMaxSeverity ........................................................................................................................ 520

VoseRuinNPV .................................................................................................................................... 521

VoseRuinSeverity ............................................................................................................................... 522

VoseRuinTime .................................................................................................................................... 523



Table of Contents

xi

VoseRunoff......................................................................................................................................... 524

VoseSample ....................................................................................................................................... 528

VoseShuffle ........................................................................................................................................ 529

VoseSimTable .................................................................................................................................... 530

VoseSimulate ..................................................................................................................................... 532

VoseTangentPortfolio ......................................................................................................................... 533

VoseThielU ......................................................................................................................................... 535

VoseValidCorrmat .............................................................................................................................. 536

Database connectivity ............................................................................................................................ 538

ModelRisk database connectivity functions ....................................................................................... 538

Data Object Window .......................................................................................................................... 540

VoseDataObject ................................................................................................................................. 545

PK/PD module ....................................................................................................................................... 546

PK/PD module .................................................................................................................................... 546

Simulation Imported Data Files (SIDs) .................................................................................................. 552

SIDs (Simulation Imported Data Files) ............................................................................................... 552

Creating a SID .................................................................................................................................... 552

Using a SID ........................................................................................................................................ 557

Managing SIDs ................................................................................................................................... 557

VoseSID ............................................................................................................................................. 560

@RISK model converter ........................................................................................................................ 561

Crystal Ball model converter .................................................................................................................. 567

More on Conversion .............................................................................................................................. 571

ModelRisk Results Viewer ..................................................................................................................... 573

ModelRisk Results Viewer layout .......................................................................................................... 575

Graphical reports ................................................................................................................................ 575

General Controls ................................................................................................................................ 576

Statistical and data reports................................................................................................................. 579

Saving the report ................................................................................................................................ 582

Box Plots ................................................................................................................................................ 584

Graphing controls ............................................................................................................................... 584

Cumulative Plots .................................................................................................................................... 592

Histogram Plots...................................................................................................................................... 600

Pareto Plots ........................................................................................................................................... 608




xii

Scatter plots ........................................................................................................................................... 617

Spider plots ............................................................................................................................................ 619

Time series plots .................................................................................................................................... 622

Tornado plots ......................................................................................................................................... 624

ModelTree .............................................................................................................................................. 627

Example models explaining risk analysis techniques ............................................................................ 633

Sum of a random number of random variables.................................................................................. 633

Financial risk analysis ........................................................................................................................ 635

Project risk analysis ........................................................................................................................... 657

Other problems .................................................................................................................................. 661

About this Help File ................................................................................................................................ 727

Authors ................................................................................................................................................... 727

About Vose - contacting us .................................................................................................................... 729

Updates .................................................................................................................................................. 730

FAQ - Troubleshooting .......................................................................................................................... 731



1

Introduction

ModelRisk 5

ModelRisk by Vose Software is a professional quality risk analysis add-in to Microsoft Excel. There arethree editions available:

• Standard

• Professional

• Industrial

This help file covers all three editions. Topics that describe functions and features of the software have anicon like this in the top right corner:

This indicates which editions include the feature being described.

The ModelRisk ribbons for the three editions appear as follows:

ModelRisk Standard:

ModelRisk Professional:

ModelRisk Industrial:




2



Introduction

3

Upgrading your edition of ModelRisk

Three ModelRisk editions

There are three editions of ModelRisk available. They are designed to help you match your

technical and budgetary requirements:

1. ModelRisk Standard

ModelRisk Standard is designed to be as easy to use as possible. It is a professional

quality product that includes all the common distributions used in risk analysis, plus

a wide range of correlation capabilities, and great graphical results which you can

share electronically with non-ModelRisk users by using our free ModelRisk Results

Viewer. It also includes a converter if you want to convert models from other Excelrisk analysis add-ins.

The Standard edition has 8 interfaces and adds 99 new functions to Excel.

2. ModelRisk Professional

The Professional edition of ModelRisk is designed for people who need to build more

sophisticated models. The features are sufficient for most risk analysis problems. It adds a

large array of tools and features to the Standard Edition, including:

• Over 100 different distributions

• More correlation shapes

• Stochastic optimization

• Time series

• Markov chains

• Fitting distributions, copulas and time series to data

• Expert elicitation tools

• Extreme value tools

• Modeling with Objects

• Aggregate (compound) modeling

• Interactive data visualizing tool

• Probability calculations

The Professional edition has 21 interfaces and adds 1011 new functions to Excel.

3. ModelRisk Industrial

The Industrial edition of ModelRisk is designed for real power users! It adds some verysophisticated tools and features to the Professional edition, which will be of greatest appeal

to the insurance, finance, engineering and scientific user. For example:




4

• Six sigma support

• Spliced and Deduct distributions

• Risk event modeling

• PK/PD (Pharmacokinetic /Pharmacodynamic ) Tool

• Financial time series (ARCH, GARCH, APARCH, multivariate GBM, Wilkie, etc)

• FFT, multivariate FFT, de Pril, custom logic and other aggregate modeling tools

• Bayesian model averaging for distribution, copula and time series fits

• Database connectivity for model fitting

• Eigenvalue and eigenvector determination, Cholesky decomposition

• Interpolation, numerical integration, summation and ordinary differential equation

tools

• Insurance fund ruin and depletion tools, runoff triangle simulation, portfoliooptimization, mean excess and premium calculation tools

• Reference library system

The Industrial edition has 33 interfaces and adds 1197 new functions to Excel.

The trial version of ModelRisk

The trial version of ModelRisk provides you with all the features of ModelRisk Industrial, ourmost powerful edition, for 15 days. The trial period can be extended once by telephoneonly using one of the numbers below.

Once the trial has expired, ModelRisk will no longer run, but a dialog will open providing

links to instructions on our Web site on how to extend the trial or make a purchase.

How to purchase a copy of ModelRisk

Purchase online

You can request a quote or purchase copies of ModelRisk using a credit card online from here. Volumediscounts are automatically calculated during the secure transaction.

By phone or email

• You are welcome to contact us by phone or email using the contact details here if, forexample: • You prefer speaking to a friendly voice rather than ordering online;

• You need to pay by check or bank transfer; • You would like to arrange some training on how to use ModelRisk together with yoursoftware purchase; • You would like some training on how to perform a high quality risk analysis; • You have a potential risk analysis consulting project to discuss • You need a network license; • You need some advice before purchasing on which edition would best suit your needs;or

• You would like to arrange a demo of the software.



Introduction

5

Your First Model

ModelRisk is a risk analysis add-in application for Excel by Vose Software BVBA. This topic is aimed atthe risk analysis novice and introduces the very basics of building a Monte Carlo simulation model to getyou started.

We’ll begin with the following spreadsheet model for the cost of building a house. The finished model can

be downloaded here.

Column C contains your best guess at how much each element of the project might cost, summing to atotal of $396,000 in Cell C13.

However, these are just best guesses and the actual cost could be higher or low. For example, you mightalready have agreed purchase of the land, so the price is known, but the cost of laying the foundationsmight be up to 10% lower, or 25% higher. We can build a couple of extra columns showing thepercentage range:

Add distributions

In another column we now add ModelRisk functions that will generate random values around thoseranges with a most likely value of 100% by clicking the Select Distribution button:

This opens up a dialog in which we can chose from a very wide range of distributions. In this case, the

Subjective group of distributions is most appropriate because these are subjective estimates:




6

The most common choices would be a PERT or Triangle distribution because they are defined by their

minimum, mode (most likely) and maximum values – the information that we have in this model. We’llpick both by using CTRL-click and then OK.

ModelRisk plots these two distributions together. We can link each distribution’s parameter values to cellsin Excel:

Let’s say that the Triangle distribution better reflects your opinion because it gives more probability to theright hand side of the range. Select the Triangle (by clicking on its name, highlighted here in pink) and

then click on to insert the Triangle distribution into the correct model cell. There are several options

available at this point:



Introduction

7

‘Distribution’ is the most commonly used, which will insert a function in Excel that will randomly generatevalues from this distribution. Cell F4 (the selected location) now displays a VoseTriangle distribution with

minimum, mode and maximum values of 90% (D4), 100%, and 125% (E4) respectively.

Define inputs

We will name this cell as an input distribution to the model by clicking on the Output/Input button:

The following dialog box appears:




8

Here we select Cell B4 for the Name field, select Input rather than Output, and click OK. The cell formulahas now changed to include a VoseInput function. This function does not alter the calculation in anyways, but is useful in a later stage discussed below.

We can now copy this formula through the rest of the column:



Introduction

9

Next, we write a new formula to calculate the total project cost with these random variations from themost likely values. In this case, we will use Excel’s SUMPRODUCT function:

Define outputs

Finally, since this is the focus of our problem, we name the cell as a ModelRisk output – using the sameOutputs/Inputs dialog as before but now selecting the Output rather than Input option. The final formula incell F13 now becomes:

=VoseOutput(D13)+SUMPRODUCT(C3:C11,F3:F11)

The model is finished. Now it is time to analyze what it can tell us.




10

Run the model

In order to understand how much uncertainty there is in the total cost of the project we need to run aMonte Carlo simulation – which results in a large set of probabilistically weighted ‘what-if’ scenarios bypicking different random values from each of the model’s distributions and calculating the total cost eachtime.

To run a simulation in ModelRisk, simply select the number of samples to run in the ribbon dialog (in thescreenshot above it is set to 100, which we’ll change to 50,000) and then click on:

ModelRisk will then run 50,000 Monte Carlo ‘samples’, which takes about 14 seconds.

View the results

When the simulation has finished, ModelRisk will open the Results Viewer window:

On the left is a list of the named outputs and inputs of the model (i.e. those cells containing a VoseOutputor VoseInput function). On the right is a graph of the output (total cost) and at the bottom a list of pages.One can add more pages by clicking the right-most tab.

The original $396,000 estimate based on adding the best guess values is quite far to the left, meaningthat there is a high probability of the project costing more. We can see what that probability is, by moving

the sliders, and also find a more realistic budget by clicking the icon above the graph which opensthe following dialog:



Introduction

11

Here, we have entered the original $396,000 value and asked for a budget for which there is a 90%probability the actual cost will fall below. Click OK, and the sliders move to reflect these changes:




12

It shows that, given the assumptions made earlier, there is only about a 6% chance of falling below theoriginal estimate, and that there is only 10% probability of exceeding a more conservative budget of$415,000.

Sensitivity analysis

The histogram plot shows that the total cost might lie anywhere between around $390,000 and $425,000.You might well be interested in knowing which of the costs is driving this uncertainty, which is the purposeof performing a sensitivity analysis. ModelRisk offers many variations on sensitivity analysis because it is

a very important component of risk-based decision making. We’ll look at just two here.

The first type of sensitivity analysis is a tornado chart, which ModelRisk will generate by clicking this icon:

resulting in the following plot:



Introduction

13

This plot shows the sensitivity of the 90th percentile of the total cost distribution to each input distribution.It shows that roofing costs drive the project cost uncertainty the most. If the roofing cost is low, the project

cost’s 90th percentile is around $404,000, and if the roofing cost is high, the project cost’s 90th percentileis around $422,000 – a wider range than for any other input variable.

The second type of sensitivity analysis is called a Spider Plot, which ModelRisk will generate by clickingthis icon:

resulting in the following plot:




14

This plot gives more detailed information than the Tornado Plot. Here we are looking at the sensitivity toeach input of the mean total cost (the mean is the ‘balance point’ of the histogram distribution, we could

also look at a percentile or other statistical attributes). Again, it shows that roofing costs are dominantbecause it gives the greatest vertical range. In this problem, we are dealing with costs so there is a linearrelationship between the inputs and the output, reflected in lines that increase from bottom left to top rightin the plot, but in more involved problems a spider plot can reveal more complex relationships.

From analysis to decision

The analysis clearly provides some important information for a decision maker:

1. The budget should be set closer to $420,000 to be reasonably sure of having the cash availableto complete the project

2. It is probably worth investigating whether it is possible to reduce the uncertainty on roofing costs(as well as the wall construction and laying the foundation) because these will firm up the costestimate considerably.

Next steps in learning to use ModelRisk and risk analysis

ModelRisk has a very extensive range of risk analysis tools for you to explore. For example, in the modeldescribed in this document, perhaps the major driver behind the roofing and wall construction uncertaintyis the competence of the contractor – and the same contractor is doing both parts of the project. Thatmeans that if the contractor turns out to be incompetent it will affect both parts of the project adversely –

in other words, there is a correlation between these two input variables that needs to be describedbecause it will increase the uncertainty of the total cost estimate. ModelRisk offers a range of correlationtools to build correlation relationships.



Introduction

15

You might have a lot of data you wish to use to support your risk analysis. ModelRisk offers advanced yetuser-friendly tools for fitting distributions, correlations, and time series – as well as a range of features tostatistically and visually explore your data.

ModelRisk also comes with a very extensive help file that you can browse and search through. There area wide variety of example models you can work through too. Vose Software (www.vosesoftware.com) andour reseller network also provide in-house and public training courses in building risk analysis models –

and using them to make decisions. The courses are written and presented by professional risk analysts,so while you learn to use ModelRisk you will also benefit from the real world experience of a seasonedrisk analyst.

You can also download this topic in PDF format.



17

ModelRisk INDUSTRIAL Edition

Distributions

Distributions in ModelRisk

In ModelRisk a large number of distributions is included, from which you can sample random values,calculate the joint probability for given x data values, calculate and use the statistical moments, etc. Wegenerally refer to these as VoseDistributions.

Each distribution has a separate topic with an explanation of its use and mathematics (e.g. Normaldistribution).

Every univariate distribution in ModelRisk comes as a set of functions added to Excel: VoseDistribution,VoseDistributionProb, VoseDistributionProb10 and VoseDistributionObject. These functions are explained

below.

These functions are also available for custom distributions like VoseDeduct, VoseCombined,

VoseAggregatePanjer, VoseAggregateFFT, VoseRiskEvent, etc.

For a reminder of the parameters of a function, Excel's function arguments dialog can be convenient. You

call this dialog by clicking next to the formula bar.

For an explanation about functions for fitting distributions, see Distribution fitting functions.

VoseDistribution

The general syntax for sampling a random value from a distribution is as follows:

=VoseDistribution([parameters separated by commas], U )

where Distribution is replaced by the name of the distribution.

• [parameters separated by commas] - each distribution has its own specific parameters.

For example, the PERT takes a min, mode and max parameter - in that order. You can alwayslook this information up on that distribution's topic or through the Function Arguments window asexplained above.

• U - If an (optional) U -parameter is provided, the inverse cumulative percentile for that U-

value is returned. U has to be a value on [0,1]. More info about the U parameter and its use canbe found in this topic: Distribution functions and the U parameter

For example, to simulate a random value from a Cauchy(1,2) distribution, use

=VoseCauchy(1,2)

ModelRisk uses the Mersenne twister to generate random numbers.

To return the 99th percentile from a Cauchy(1,2) distribution, use




18

= VoseCauchy(1,2,0.99)

Multivariate distributions have an output of multiple cells, so random values from these are generatedtrough an array function.

VoseDistributionObject

With every distribution corresponds an Object function. When a ModelRisk function requires the

distribution rather than a sampled value from it as a parameter, this parameter should be provided as aDistribution Object.

Using distribution Objects has the great advantage that you can keep your distributional assumptions in

once place in the model, making it easier to maintain and update (analogous to keep your constants in

one place).

The general syntax for creating a distribution object is as follows:

=VoseDistributionObject([parameters separated by commas])


• [parameters separated by commas] - each distribution has its own specific parameters.For example, the PERT takes a min, mode and max parameter - in that order. You can alwayslook this information up on that distribution's help file topic or through the Function Argumentswindow as explained above.

For example, to calculate the first four statistical moments of a LogNormal(1,2) distribution, you would usethe VoseMoments array function on a LogNormal distribution object:

{=VoseMoments(VoseLogNormalObject(1,2))}

It is generally good practice to place a distribution object in a separate cell to keep overview. To place aCauchy(1,2) Distribution Object in a spreadsheet cell you would use:

=VoseCauchyObject(1,2)

The above formula will be displayed as VoseCauchy(1,2).

For a more thorough explanation about objects see Modeling with objects.

VoseDistributionProb

These functions allow you to calculate the joint probability density/mass, joint cumulative probability andinverse cumulative of a given value or set of values.

General form:

VoseDistributionProb({x}, {parameters}, cumulative, truncation)


• {x} - a set of one or more values or cell references, on which the probability is to be

performed

• {parameters} - the parameters of the distribution

• Cumulative - an optional Boolean parameter. Set to FALSE (default) to return the joint

probability density for continuous distributions or the joint probability mass for discretedistributions. Set TRUE to return the joint cumulative probability.

• Truncation - optional parameter that takes the form of either VoseXbounds(min,max) or

VosePbounds(min,max), to truncate at specified x-values respectively p-values. Use VoseShift toshift the distribution along the X axis.

The probability calculation functions are explained more thoroughly here.




19

VoseDistributionProb10

The joint probability of probability density for a large set of values can quickly approach values too smallfor Excel to handle. Therefore ModelRisk has a set of functions that return Log base 10 of the probabilitycalculations described above.

General form:

VoseDistributionProb10({x}, {parameters}, cumulative, truncation)


• {x} - a set of one or more values or cell references, on which the probability calculation is

to be performed


• Cumulative - an optional Boolean parameter. Set to FALSE (default) to return the joint

probability density for continuous distributions or the joint probability mass for discretedistributions. Set TRUE to return the joint cumulative probability.

• Truncation - optional parameter that takes the form of either VoseXbounds(min,max) orVosePbounds(min,max), to truncate at specified x-values respectively p-values.

The probability calculation functions are explained more thoroughly here.

Truncating and shifting distributions

You can truncate or shift each of the distributions in ModelRisk by using the VoseXBounds,VosePBounds and/or VoseShift functions as parameter. For example:

=VoseGammaObject(3,40,,VoseXBounds(,120))

produces a Gamma(3,40) distribution object constrained to lie below 120.=VoseGammaObject(3,40,,VoseShift(30)

will generate random values from a Gamma(3,40) distribution shifted 30 units to the right along the Xaxis.

You can use both the shift and bound functions at the same time, separated by commas:

=VoseGamma(3,40,,VoseXBounds(70,120),VoseShift(30))

will generate random values of a Gamma(3,40) distribution constrained between 70 and 120 and thenshifted 30 units to the right.

When using these functions for truncating and/or shifting a distribution, remember to leave an "openspace" for the U parameter!

Apart from the method described above, one could also build logic into the model that rejects nonsensicalvalues. For example, using the IF function: A2:=IF(A1<0,ERR(),0) only allows values into cell A2 from cell

A1 that are >=0 and produces an error in cell A2 otherwise.

If you are faced with the problem of needing to constrain the tail of a distribution, however, to avoidunwanted values, it is worth questioning whether you are using the appropriate distribution in the firstplace.

For modifying a distribution specifically to model deductibles and payout limits of a claim severitydistribution, see the Deduct calculation window.




20

Distribution functions and the U parameter

Every ModelRisk function that generates random samples from a univariate distribution includes anoptional U parameter to enable one to control the generation of the sampling. For example, theVoseNormal function can be written as follows:

=VoseNormal(100,10)

in which case it will return a random sample from a Normal distribution with mean of 100 and standard

deviation of 10. One can also include the optional U parameter as follows:

=VoseNormal(100,10,0.95)

This function will now return the 95th percentile of the Normal distribution.

Leave space for the U parameter

ModelRisk offers a number of functions that modify a distribution. For example, theVoseXBounds(min,max) function will constrain a distribution to lie within the Min to Max range. Thesemodifying functions are included within the distribution function after all the usual parameters. Even if youdo not use the U parameter, you still need to leave space for it as follows:

=VoseNormal(100,10,,VoseXBounds(90,120))

This formula will generate a Normal(100,10) distribution bounded to lie within the range [90,120].

The inversion method

The ModelRisk distribution functions randomly generate numbers from a chosen distribution using theinversion method, where every distribution has its own set of parameters (shape, scale or locationparameters). This method first constructs a cumulative distribution curve for the distribution, as shown in

the figure.

Then a random numberis generated betweenzero and one (using theMersenne Twisterrandom numbergenerator), and thisvalue is used to find thevariable value that

corresponds to acumulative probabilityequal to the randomnumber that wasgenerated.

In this figure, a U valueof 0.7 is used with aPareto(20, 0.95)distribution returning the70th percentile of thedistribution equal to

1.00895.




21




22

Select Distribution

Introduction

Use the Selectdistribution window to

insert a randomlysampled value, adistribution object ora percentilecalculation from anyof the univariate

distributions in thespreadsheet.

The SelectDistribution windowlets you choosebetween differentcategories ofdistributions. From

the list on the left,you can choose to see all available univariate distributions, or those of one of the following categories:

• Discrete Univariate distributions • Continuous Univariate distributions • Multivariate distributions• Claim Size distributions

• Claim Frequency distributions • Unbounded distributions • Left Bounded distributions • Both Bounded distributions • Subjective distributions • Waiting Time distributions

Click a distribution to selectit (hold CTRL while clickingor drag the mouse forselecting more than onedistribution) and then pressthe OK button.

Once the distribution(s) ofchoice selected, you aretaken to the Distribution

Details window, where youcan specify the distribution

parameters, change thepercentiles, export a




23

sampled random value to a spreadsheet cell, etc...

Also, dynamically updating PDF, PMF and/or CDF graphs and useful summary statistics are shown.

Window elements

Toolbar

From left to right, the toolbar buttons allow you to:

• Show/hide the PDF or PMF

• Show/hide the CDF

• Show/hide the panel on the left with the parameters of the selected distribution(s)

• Show/hide the panel on the right with the statistics of the selected distribution(s)

• When multiple distributions are loaded, show the graphs of all loaded distributionscombined, or only the graph of the selected distribution.

• Load distribution(s) from spreadsheet(s)

• Insert the selected distribution into spreadsheet cell(s). You can insert the distributionas an object, a sampled value from it, or a value corresponding to a U-parameter on [0,1]

• Help on this window (this brings you to this current MA page)

For each loaded distribution, a parameters panel is shown on the left.

• The distribution's name. This has a pinkbackground if the distribution is currently selected, and a whitebackground if not.

• Buttons for: help on this distribution, replaceby a new distribution, minimize/expand the parameters, deletedistribution from the list.

• The distribution's parameters and their values.These values that occur in the distribution's PDF and CDF andtypically define its scale,location, and shape. You can changethese manually or link them to a spreadsheet cell.

• The distributions boundary and shift values. For unbounded distributions, this will show - or +Infinity as default boundary values. The shift determines how many units the distribution is shifted

along the X-axis (default is 0). Press the buttons to specify boundary points in X-values orpercentile values.




24

To load an additional distribution, press the Add Distribution

button.




25

Multivariate distributions

Multivariate distributions introduction

Multivariate distributions describe several parameters whose values are probabilistically linked in someway. In most cases, we create the probabilistic links via one of several correlation methods. However,there are a few specific multivariate distributions that have specific, very useful purposes and aretherefore worth studying more.

Multivariate distributions are inserted in the spreadsheet as array functions. For example, to insert

random values from a Dirichlet distribution in the spreadsheet:

• Select 3 spreadsheet cells.

• Type =VoseDirichlet({1,2,3}) in the formula bar

• Press CTRL+SHIFT+ENTER

• Now the function is inserted as array function over the 3 selected cells (inserted by the {}around the formula) which now contain randomly sampled values from the Dirichlet distribution.

The multivariate distributions available in ModelRisk are:

• Dirichlet distribution

• Multinomial distribution

• Multivariate Hypergeometric distribution

• Multivariate Inverse Hypergeometric distribution type1

• Multivariate Inverse Hypergeometric distribution type2

• Multivariate Normal distribution

• Negative Multinomial distribution type 1

• Negative Multinomial distribution type 2




26

Dirichlet distribution

Format: VoseDirichlet({ i})

Multivariate distribution whose components all takes values on (0,1) and which sum to one.

The Dirichlet distribution of order K ≥ 2 with parameters α 1, ..., α K > 0 has a probability density function

given by

for all x 1, ..., x K –1 > 0 satisfying x 1 + ... + x K –1 < 1, and where x K = 1 – x 1 – ... – x K –1. The density is zerooutside this open (K − 1)-dimensional simplex.

The normalizing constant is the multinomial beta function, which can be expressed in terms of the gammafunction:

Uses

The Dirichlet distribution is used in modeling probabilities, prevalence of fractions where there aremultiple states to consider. It is the multinomial extension to the beta distribution for a binomial process.

Examples

Example 1:

You have the results of a survey conducted in the premises of a retail outlet. The age and sex of 500randomly selected shopopers were recorded:

<25 years, male: 38 people

25 to < 40 years, male: 72 people

> 40 years, male: 134 people

<25 years, female: 57 people

25 to < 40 years, female: 126 people

> 40 years, female: 73 people

In a manner analogous to the beta distribution, by adding 1 to each number of observations we canestimate the fraction of all shoppers to this store that are in each category as follows:




27

=VoseDirichlet({38,72,134,57,126,73}+1)

or

=VoseDirichlet({39,73,135,58,127,74})

or

=VoseDirichlet(A1:A6+1)

where A1:A6 would contain the data. Note that the VoseDirichlet function is entered as an array function(in this case covering six cells), and then returns the uncertainty about the fraction of all shoppers that are

in each of the six group. See example Election.xls

Example 2:

A review of 1000 companies that were S&P AAA rated last year in your sector shows their rating one yearlater:

AAA: 908

AA: 83

A: 7

BBB or below: 2

If we assume that the market has similar volatilities to last year, we can estimate the probability that acompany rated AAA now will be in each state next year as:

=VoseDirichlet({909,84,8,3})

The Dirichlet then returns the uncertainty about these probabilities.

VoseFunctions for this distribution

VoseDirichlet generates values from this distribution

VoseDirichletProb returns the probability density or cumulative distribution function for this distribution

VoseDirichletProb10 returns the log10 of the probability density or cumulative distribution function




28

Multinomial distribution

Format: VoseMultinomial(n,{pi})

The Multinomial distribution is a multivariate distribution is used to describe how many independent trialswill fall into each of several categories where the probability of falling into any one category is constant for

all trials.

As such, it is an extension of the Binomial distribution where there are only two possible outcomes('successes' and, by implication, 'failures').

Uses

For example, consider the action people might take on entering a shop:

Code Action Probability

A1 Enter and leave without purchase or sample merchandise 32%

A2 Enter and leave with a purchase 41%

A3 Enter and leave with sample merchandise 21%

A4 Enter to return a product and leave without purchase 5%

A5 Enter to return a product and leave with a purchase 1%

If 1000 people enter a shop, how many will match each of the above actions?

The answer is

{VoseMultinomial(1000,{32%, 41%, 21%, 5%, 1%})}

which is an array function that generates five separate values. The sum of those five values must, ofcourse, always add up to the number of trials (1000 in this example).


VoseMultinomial generates values from this distribution

VoseMultinomialProb returns the probability mass or cumulative distribution function for this distribution

VoseMultinomialProb10 returns the log10 of the probability mass or cumulative distribution function




29

Multivariate Hypergeometric distribution

Format: VoseMultiHypergeo(n,{D i})

The Multivariate Hypergeometric distribution is an extension of the Hypergeometric distribution wheremore than two different states of individuals in a group exist.

Example

In a group of 50 people, of whom 20 were male, a VoseHypergeo(10,20,50) would describe how manyfrom ten randomly chosen people would be male (and by deduction how many would therefore befemale). However, let's say we have a group of 10 people as follows:

German English French Canadian

3 2 1 4

Now let's take a sample of 4 people at random from this group. We could have various numbers of each

nationality in our sample:

German English French Canadian

3 1 0 0

3 0 1 0

3 0 0 1

2 2 0 0

2 1 1 0

2 1 0 1

2 0 2 0

2 0 1 1

2 0 0 2

... ... ... ...

Etc.

and each combination has a certain probability. The Multivariate Hypergeometric distribution is an array

distribution, in this case generating simultaneously four numbers, that returns how many individuals in therandom sample came from each sub-group (e.g. German, English, French, and Canadian).

Generation




30

The Multivariate Hypergeometric distribution is created by extending the mathematics of theHypergeometric distribution. For the Hypergeometric distribution with a sample of size n, the probability ofobserving s individuals from a sub-group of size M, and therefore (n-s) from the remaining number (M-D):

and results in the probability distribution for s:

where M is the group size, and D is the sub-group of interest. The numerator is the number of differentsampling combinations (each of which has the same probability because each individual has the sameprobability of being sampled) where one would have exactly s from the sub-group D (and by implication(n-s) from the sub-group (M-D). The denominator is the total number of different combinations ofindividuals one could have in selecting n individuals from a group of size M. Thus the equation is just theproportion of different possible scenarios, each of which has the same probability, that would give us sfrom D.

The Multivariate Hypergeometric probability equation is just an extension of this idea. The figure belowshows the graphical representation of the multivariate hypergeometric process: D 1, D2, D3 and so on arethe number of individuals of different types in a population, and x 1, x2, x3, ... are the number of successes(the number of individuals in our random sample (circled) belonging to each category).

and results in the probability distribution for {s}:




31

where

Example and generation

Let's imagine a problem where we have 100 coloured balls in a bag, from which 10 are red, 15 purple, 20blue, 25 green and 30 yellow. Without looking into the bag, you take 30 balls out. How many balls of eachcolour will you take from the bag?

We cannot model this problem using the multinomial distribution, because when we take the first ball out,the proportions of the different colour balls in the bag change. The same happen when we take the

second ball out and so on.

Thus, we must proceed as follows:

• Model the first colour (red for example) as x1= Hypergeometric(s, D1 ,M) , where s is the

sample size = 30, D 1 is the total number of red balls in the bag = 10, and M is the population size- 100

• Model the rest as: x i = Hypergeometric (s - SUM(x 1: x i-1), D i , SUM(D i : Dn)) , where x i isthe number of successes of the type i in a sample, x i-1 is the number of successes of the type i-1in a sample, D i number of successes of type i in the total population, Dn in the number ofsuccesses of the last type in the total population.


VoseMultiHypergeo generates values from this distribution

VoseMultiHypergeoProb returns the probability mass or cumulative distribution function for thisdistribution

VoseMultiHypergeoProb10 returns the log10 of the probability mass or cumulative distribution function




32

Multivariate Inverse Hypergeometric distribution type 1

Format: VoseInvMultiHypergeo({s},{d})

The Inverse Multivariate Hypergeomeric distribution answers the question: how many extra (wasted)random multivariate hypergeometric samples will occur before the required numbers of successes {s} areselected from each sub-population {D}.

For example, imagine that our population is split up into four sub-groups {A,B,C,D} of sizes {20,30,50,10}and that we are going to randomly sample from this population until we have {4,5,2,1} of each sub-grouprespectively. The number of extra samples we will have to make is modeled as:

=VoseInvMultiHypergeo({4,5,2,1},{20,30,50,10})

The total number of trials that need to be performed is:

=SUM({4,5,2,1}) + VoseInvMultiHypergeo({4,5,2,1},{20,30,50,10})

The InvMultiHypergeo2 is a multivariate distribution that responds to the same question, but breaks downthe number of extra samples into their sub-groups.


VoseInvMultiHypergeo generates values from this distribution




33

Multivariate Inverse Hypergeometric distribution type 2

Format: VoseInvMultiHypergeo2({s},{d})

The 2nd Inverse Multivariate Hypergeometric distribution answers the question: how many extra (wasted)random multivariate hypergeometric samples be drawn from each sub-population before the requirednumbers of successes {s} are selected from each sub-population {D}.

For example, imagine that our population is split up into four sub-groups {A,B,C,D} of sizes {20,30,50,10}and that we are going to randomly sample from this population until we have {4,5,2,1} of each sub-grouprespectively. The number of extra samples we will have to make for each sub-population A to D is

modeled as the array function:

{=VoseInvMultiHypergeo2({4,5,2,1},{20,30,50,10})}

The InvMultiHypergeo2 responds to the same question as the MultiHypergeo distribution but breaks downthe number of extra samples into their sub-groups, whereas the Multihypergeo simply returns the totalnumber of extra samples.


VoseInvMultiHypergeo2 generates values from this distribution




34

Multivariate Normal distribution

Format: VoseMultiNormal({ i},{cov_matrix})

A multinormal distribution, also sometimes called a multivariate normal distribution, is a specificmultivariate probability distribution, which can be thought of as a generalization to higher dimensions of

the one-dimensional normal distribution.


VoseMultiNormal generates values from this distribution or calculates a percentile

VoseMultiNormalProb returns the probability density or cumulative distribution function for this distribution

VoseMultiNormalProb10 returns the log10 of the probability density or cumulative distribution function




35

Negative Multinomial distribution type 1

Format: VoseNegMultinomial({s}),{p})

The NegMultinomial distribution is a generalization of the NegBin distribution. The NegBin(s,p) distributionestimates the total number of binomial trials that are failures before s successes are achieved where

there is a probability p of success with each trial.

For the NegMultinomial distribution, instead of having a single value for s, we now have a set of successvalues {s} representing different 'states' of successes (s i) one can have, with each 'state' i having a

probability p i of success.

Now, the NegMultinomial distribution tells us how many failures we will have before we have achieved thetotal number of successes.

Example

Suppose you want to do a telephone survey about a certain product you made by calling people you pickrandomly out of the phone book.

You want to make sure that at the end of the week you have called 50 people who never heard of yourproduct, 50 people who don't have internet at home and 200 people who use internet almost daily.

If you know the probabilities of success p i , the NegMultinomial({50,50,200},{p 1,p2,p3}) will tell you howmany failures you'll have before you've called all the people you wanted and so you also know the totalnumber of phone calls you'll have to make to reach the people you wanted.

The total number of phone calls = the total number of successes (300) + the total number of failures(NegMultinomial({50,50,200},{p 1,p2,p3})).


VoseNegMultinomial generates values from this distribution.




36

Negative Multinomial distribution type 2

Format: VoseNegMultinomial2({s}),{p})

The NegMultinomial2 is the same distribution as the NegMultinomial, but instead of giving you the globalnumber of failures before reaching a certain number of successes, the NegMultinomial2 gives you the

number of failures in each 'group' or 'state'.

So, in the example of the telephone survey (see NegMultinomial) where the total number of phone callswas equal to the total number of successes plus the total number of failures, the total number of failures

would now be a sum of the number of failures in each group (3 groups in the example).


VoseNegMultinomial2 generates values from this distribution.




37

Distribution editing functions

VosePBounds

VosePBounds(MinP,MaxP)

Example model

This function can be inputted as an extra parameter into distributions (after the U-parameter) and theresult is a truncation of the distribution based on percentile values.

This truncation value is a minimum if only one parameter is entered, and a minimum and maximum if 2values are entered.

Examples

VoseNormal(10,2,,VosePBounds(0.3,0.6)) generates values from a Normal distribution with

mean 10 and standard deviation 2 where the values lie within the 30th and the 60th percentile.

VoseLogNormal(10,2,,VosePBounds(,0.6)) generates values from a LogNormal distributionwith mean 10 and standard deviation 2 where the values are all smaller than the 60th percentile.




38

VoseShift

VoseShift(Shift)

Example model

This function can be inputted as an extra parameter into distributions (after the U-parameter) and theresult is a shift of the distribution along the X-axis.

For example, VoseLogNormal(10,2,,VoseShift(5)) generates values from a LogNormal distribution with

mean 10 and standard deviation 2 and that is shifted 5 units to the right along the X-axis.




39

VoseXBounds

VoseXBounds(Min,Max)

Example model

This function can be inputted as an extra parameter into the VoseDistributions (after the U-parameter) ofVoseDistribution Objects, resulting is a truncation of the distribution.

Either the Min or Max parameter can be omitted. I only one parameter is entered, it is taken to be the Minvalue.

Examples:

VoseNormal(Mu,Sigma,,VoseXBounds(Min,Max)) generates values from a Normal distribution with mean=Mu and standard deviation = Sigma, truncated to lie between Min and Max.

VoseNormalObject(Mu,Sigma,,VoseXBounds(Min,Max)) defines a Normal distribution with mean =Muand standard deviation = Sigma, truncated to lie between Min and Max.

where Mu, Sigma, Min, Max are any valid values.

Allowed parameter combinations are as follows:

VoseXBounds(Min, Max) – truncates the distribution to lie between Min and Max

VoseXBounds(Min,) – truncates the distribution to lie above Min

VoseXBounds(Min) – truncates the distribution to above Min

VoseXBounds(,Max) – truncates the distribution to lie below Max

VoseXBounds() or VoseXBounds(,) – no truncation




40

Modeling with objects

Example model

Whenever a ModelRisk function takes a distribution as argument rather than a sampled value from it, thisargument should be provided as a distribution object function.

Using distribution Objects has the great advantage that you can keep your distributional assumptions in

once place in the model, making it easier to maintain and update (analogous to keep your constants in

one place).

For each distribution, time series and copula in ModelRisk, there is a corresponding Object and a Fit

Object function available for further use as an argument in other ModelRisk functions. Both types behavethe same, but have their parameters specified in a different way: Object functions are directly specifiedby their parameters, and Fit Object functions are specified by the array of data they are fitted to.

ModelRisk then determines parameter values that best fit the data.

It is important to know that the VoseDistributionObject functions are in essence only "placeholders", i.e.creating them executes no algorithm. The algorithms are only executed when another VoseFunction calls

the object as argument. The VoseDistributionFitObject functions are a different case: when inserting afitted object, the fitted parameters are immediately calculated and contained in the object. Any furtheractions done on this fitted object will again only execute upon calling the object (e.g. calculating aprobability with VoseProb)

There is a Distribution Object function for every distribution in ModelRisk, including custom distributions

(e.g. VoseCombined, VoseDeduct, VoseRiskEvent, VoseAggregateFFT). So all of these can be usedfurther as argument in other ModelRisk functions.

It makes logical sense to differ between a distribution as a whole, and a random value sampled from thatdistribution. It would be conceptually wrong and inconsistent to have the same function playing both roles.

To make your model as easy to maintain, review and update as possible we consider it good practiceto store each used Distribution, Time Series or Copula (see below) object only once and in a separate spreadsheet cell, and refer to that cell from other ModelRisk functions.

A simple example: calculating moments of a distribution

Looking at a simple example, the sense of modeling with objects - in particular the fact that they areseparate functions - becomes clear immediately. To sample random values from a Normal(0,1)

distribution - for example when doing MC simulation - you would insert this formula in a spreadsheet cell:

=VoseNormal(0,1)

On the other hand, if you need the statistical moments of that same distribution in your model, you woulduse the VoseMoments function. The corresponding spreadsheet formula is

{=VoseMoments(VoseDistributionObject(1,2,3))}

because you want the moments of the distribution as a whole. The VoseMoments function has a 4 x 2

output, as shown below (it is an array function):




41

Now if we would also want to create an array with 100 sampled values from that distribution, we could

write =VoseSimulate(B2) in the entire array. To calculate the cumulative probability of a given x-

value occurring under this same distribution, we would write =VoseProb(x,B2,1).

If we would not know the parameters of the Normal distribution, but rather like to estimate them fromavailable data, we would replace the VoseNormalObject with a VoseNormalFitObject and all the other

calculations would still be performed in the same matter.

Similarly, if we would like to do all of the aforementioned calculations on any other distribution, we would just replace cell B1 with any other (for example an aggregate FFT) distribution object.

The image below illustrates modeling with objects schematically.

An example from insurance risk analysis

See also: Accident insurance

ModelRisk has a range of special functions that convert a distribution object into another object.




42

For example we can use VoseDeduct to create a modified claim severity distribution:

=VoseDeductObject(CostDistributionObject, deductible, maxlimit, zeros)

The zero parameter is either TRUE or FALSE (or omitted). If FALSE, the DeductObject has no probabilitymass at zero, i.e. it is the distribution of a claim size given that a claim has occurred. If TRUE, theDeductObject has probability mass at zero, i.e. it is the distribution of a claim size given that a risk eventhas occurred. The DeductObject is a distribution that can be used in turn as argument for other

ModelRisk functions.

We start with a Lognormal(10,7) distribution and apply a deductible and limit:

A1: =VoseLognormalObject(10,7)

A2: =VoseDeductObject(A1, 5, 25, TRUE)

The object in cell A2 can then be used in recursive (like Panjer and De Pril available in ModelRisk ) and

FFT aggregate methods, since these methods discretize the individual loss distribution and can thereforetake care of a discrete-continuous mixture. Thus, we can use, for example:

=VoseAggregateFFT(VosePoissonObject(2700),A2)

to simulate the cost of Poisson(2700) random claims. Or

=VoseAggregateFFT(VosePoissonObject(2700),A2,0.99)

using the U parameter to calculate the 99th percentile of the aggregate cost of Poisson(2700) random

claims.

Time series and copula objects

Similarly to distribution objects, ModelRisk has time series objects and copula objects. For time series orcopulas fitted to data there are also FitObject functions. Use these to create an object for a copula or timeseries in one place, and then use the VoseTimeSimulate and VoseCopulaSimulate functions to generate

random values.

Using these of course provides the same advantage as distribution Objects: they allow you to keep theassumptions about Time Series or Copula model in one place in the spreadsheet. Imagine having used aClayton copula for modeling correlation all through your model, and deciding later on it is better to switchto a Gumbel. If you set up your model correctly, this should only require a change in one cell.




43

Probability calculations in ModelRisk

ModelRisk has functions for calculating the joint probability density (or probabilitymass) f({x}) and joint cumulative probability F({x}) for a set of values {x} against a specified distribution.

These functions offer a simple way of calculating the likelihood of observations being drawn from a

specified distribution, which is useful for various statistical models from distribution fitting to hypothesistesting, as well as predicting the likelihood of observing values in the future.

The functions are particularly efficient where you have a large set of values {x} as the required joint

probability can be calculated in one single formula. However, the joint probability of probability density fora large set of values can quickly approach values too small for Excel to handle. Therefor ModelRisk has a

parallel set of functions that return Log base 10 of the probability calculations.

Probability functions as described below exist for custom distributions constructed through ModelRisk aswell: e.g. VoseAggregatePanjerProb, VoseCombinedprob etc.

There are three ModelRisk windows for easily performing probability calculations. These are explained

here.

VoseDistributionProb

Calculates the joint probability density/mass or joint cumulative probability. The general syntax is:

VoseDistributionProb({x}, {parameters}, cumulative, truncation)





44


to be performed


• Cumulative - an optional Boolean (TRUE/FALSE) parameter. If FALSE (default) the joint

probability density for continuous distributions) or the joint probability mass for discrete

distributions) is returned. If TRUE the joint cumulative probability is returned.


VosePbounds(min,max), to truncate at specified x-values respectively p-values.

VoseDistributionProb10

Returns the Logarithm base 10 of the probability calculations described above. The general form is

VoseDistributionProb10({x}, {parameters}, cumulative, truncation)

where Distribution is replaced by the name of the distribution. This can be convenient, since the jointprobability density/mass for a large set of values can quickly approach values too small for Excel tohandle.


to be performed


• Cumulative - an optional Boolean (TRUE/FALSE) parameter. If FALSE (default) the joint

probability density for continuous distributions) or the joint probability mass for discrete

distributions) is returned. If TRUE the joint cumulative probability is returned.


VosePbounds(min,max), to truncate at specified x-values respectively p-values.

Examples

VoseBetaProb(0.3, 2, 5, FALSE) and VoseBetaProb(0.3, 2, 5) return the probability density

of a Beta(2, 5) distribution at x = 0.3:

VoseBetaProb(0.3, 2, 5, 1) returns the cumulative probability of a Beta(2, 5) distribution at x = 0.3

VoseBetaProb({0.2,0.4,0.7}, 2, 5, FALSE) returns the joint probability density of a Beta(2, 5)

distribution for the values x = {0.2,0.4,0.7}:

VoseBetaProb({0.2,0.4,0.7}, 2, 5, TRUE) returns the joint cumulative probability of a Beta(2, 5)distribution for the values x = {0.2,0.4,0.7}:

VoseBetaProb(A1:A6, 2, 5, TRUE) returns the joint cumulative probability of a Beta(2, 5)distribution for the values displayed in the spreadsheet range A1:A6

VoseBinomialProb(3, 12, 0.6, FALSE) and VoseBinomialProb(3, 12, 0.6) return the probabilitymass of a Binomial(12, 0.6) distribution at x = 3:

VosePoissonProb({3,4,7}, 5, 1) returns the cumulative probability of a Poisson(5) distribution at

x = {3,4,7)

The image below demonstrates the principle of the function applied to a Beta distribution:

VoseBetaProb({0.3,0.6,0.8},3,4,0) calculates the joint density of the values {0.3,0.6,0.8} for aBeta(3,4) distribution. Result = 0.786579... which is the product of:

VoseBetaProb(0.3,3,4,0) = 1.8522...

VoseBetaProb(0.6,3,4,0) = 1.3824...




45

VoseBetaProb(0.8,3,4,0) = 0.3072...

VoseBetaProb({0.3,0.6,0.8},3,4,1) calculates the joint cumulative probability of the values{0.3,0.6,0.8} for a Beta(3,4) distribution. Result = 0.206310... which is the product of:

VoseBetaProb(0.3,3,4,1) = 0.25569...

VoseBetaProb(0.6,3,4,1) = 0.8208

VoseBetaProb(0.8,3,4,1) = 0.98304...




46




47

Vose probability calculation f(x) F(x) and F-1(U) windows

Introduction

The probabilitycalculation windowsare accessible throughthe Vose ModelRiskmenu underProbability calculation.

With these windows

you can calculaterespectively the jointprobability density (orprobability mass) f({x})and joint cumulative

probability F({x}) for aset of values {x}against a specifieddistribution.

These windows offer asimple way ofcalculating thelikelihood of

observations beingdrawn from a specified distribution, which is useful for various statistical models from distribution fitting tohypothesis testing, as well as predicting the likelihood of observing values in the future.

There are three windows for probability calculations in ModelRisk:

• PDF/PMF f(x) window

• CDF F(x) window,

• inverse CDF x=F-1(U) window

The ModelRisk functions for probability calculations are VoseDistribution (with a U-

parameter),VoseDistributionProb and VoseDistributionProb10.These are explained in depthhere.

Window elements

In the Distribution drop-down menu you can choose the distribution to perform the probability calculationon. In the parameters region below you can specify the distribution's parameters (or link them to a

spreadsheet cell).

In the shift distribution field, you can provide the amount

the distribution should be shifted along the X-axis. Bydefault this is zero.

In the bounds region you can specify whether the

distribution is truncated at certain x-values (Xbounds) or at




48

certain percentile values (Pbounds).

In the x values fields you can insert the x values of which the (joint) probability or likelihood is to becalculated. By marking the Log 10 checkbox the logarithm (base 10) of the joint probability is calculated.




49

Modeling expert opinion in ModelRisk

The subjective distributions included in ModelRisk are particularly suited for estimating expert opinion.When we want to model a quantity we are uncertain about, a Subject Matter Expert (SME) is consulted toprovide an estimate for it.

Rather than using a single point estimate, or minimum, most likely, maximum values, we model theuncertainty about a quantity with a distribution.

The PERT and Triangle distributions are most commonly used for expert estimation. These take absolute

minimum, most likely and absolute maximum possible values as parameters.

These distributions haveknown drawbacks so it is

better use the moreflexible Modified PERT

or GTU distributions.

See Distributions used inmodeling expert opinionfor a more thoroughexplanation of subjectivedistributions.

To combine the opinionsof multiple experts, use

the CombinedDistribution window.

For an explanation aboutchoosing, modifying andinserting a distribution with ModelRisk , please refer to the select Distribution window.




50




51

Expert Window

A common problem in risk analysis is to elicit reliable, realistic estimates of uncertain quantities in your

model. ModelRisk offers two interfaces to help achieve this: the Expert window described here for singleuncertain variables; and the SME Time Series window for uncertain time series forecasts.

The Expert Window is accessed by clicking the Expert icon under the Tools group in the ModelRisktoolbar. It has two tabs:

Selector – for defining a distribution from a number of statistical properties; and

Shaper – for ‘drawing’ your own distribution

Selector tab

The Selector tab allows you to select a number of statistical parameters with which you wish to define adistribution. As you select each statistical parameter using the check box, the parameters that are nolonger valid gray out. The process goes as follows:

1. Select the parameters you wish to use by checking each applicable box;

2. Enter the values for these parameters in the adjacent fields; and

3. Click the ‘Calculate list’ button to see the options that are available.

In the following image, the minimum, mode and maximum have been used, and ModelRisk has provideda list of potential candidates.

On the left a ‘Distribution list’ provides all available options. Un-ticking any distribution will remove it fromthe graphs and statistics tables..

In the middle are relative and cumulative plots of each distribution within the list. Clicking on any item in

the ‘Distribution list’ will bring that distribution to the front of the chart.

On the right is a table of statistics to allow you to compare, for example, the means or percentiles of theavailable options.

In certain cases, for example here where one has defined the minimum, mode and maximum, some

distributions have an extra degree of control. For example, the following image shows theKumaraswamy4 distribution selected with a red dot that can be slid up and down within the bar range toassign more or less peakedness whilst respecting the defined parameters:




52




53

Once an appropriate distribution has been selected, the user can click the ‘Insert in Worksheet’ buttonand select to enter a distribution random sample or object function for the selected distribution.

ModelRisk will then enter the function into Excel, and add an Excel Comment to display just how thisdistribution has been defined.

This is useful in situations where, for example, one has defined a mean, standard deviation and

skewness and found the most appealing option to be an unusual distribution like a Bradford or Fatigue.

Shaper tab




54

The Shaper tab allows you to draw your own distribution:

To use this tool, proceed as follows:

1. Define the minimum and maximum values to give a range for ModelRisk to plot. Here we’ll set itto 5 and 30:




55

2. Choose whether to edit the relative or cumulative plots. In general, it is more intuitive to edit therelative plot.

3. Double-click within the selected graph to create some extra points, and then use the mouse tomove them around and create the shape you are looking for.

4. Take note of the statistics in the right pane and the graph you are not editing to make sure thatthey continue to make sense.

Once you are satisfied with the distribution, select the ‘Insert in Worksheet’ button and select to enter adistribution random sample or object function.

ModelRisk will then enter the distribution you drew as a Relative distribution if you used the relative plot,or a CumulA distribution if you edited the cumulative plot.

Note: ModelRisk interpolates between the points you place in the plot with straight lines. Thus you shouldplace more points in areas of greatest curvature to get a more accurate reflection of your estimate – butyou don’t need to add too many points because, after all, subjective estimates are not that precise by

their nature. One result of using straight line interpolation is that if you edit the cumulative plot, the relativeplot will become a histogram, and the relationship may not be that intuitive to a subject matter expert,possibly providing a distraction from the elicitation exercise:




56




57

Combined Distribution

Introduction

When modeling anuncertain variable basedon expert opinion, it canoften occur that differentexperts independentlyprovide a differentestimate. So one ends

up with a number ofdistinct distributions: one

for each expert's opinion.

Combined Distributioncombines these

"individual" distributionsinto one final distributionthat incorporates all theinformation.

The empirical distributions are usually used in modelling expert opinions, such as Uniform, Triangle, PERT, ModPERT,Relative, CumulA, etc.

The Combined Distribution interface allows modeling the estimates of as many experts as needed, andalso allows assigning weights to each of the different opinions. The level of confidence that we assign to

each of the experts is proportional to the weight that we give to that expert.

Combining expert opinions is explained in more detail here.

To see the output functions of this window, click here.

Output functions of this window: VoseCombined VoseCombinedProb, VoseCombinedProb10,

VoseCombinedObject

Window Elements

On the upper left, you can add distributions to combine.

To add a new distribution field to the list, click anywhere in thewhite area.

To remove a distribution from the list, click the X button below.

The chart on top of the window shows the separate distributionsthat each of the experts provide their estimates on. They arescaled according to the weights assigned to the experts, so thatthe area under all curves sums up 1. If there are three experts

and they are assigned the following weights: {1,2,4} then thearea under the first expert's curve is 1/7, second expert's curve -2/7 and the last expert's (the one with most credibility) is 4/7.

The upper graph shows the separate distributions that need to




58

be combined.

The graph at the bottom shows the Combined distribution - the constructed distribution that takes intoaccount the opinions of all experts.

Different types of output can be specified by selecting the appropriate option under the preview graph:

• Object - to insert the constructed distribution as a distribution Object in the spreadsheet.

• Simulation - (default) to generate random values from the distribution.

• f(x) and F(x) - to calculate the probability density function and the cumulative distributionfunction of some x value(s) (an extra parameter x values will appear on the left side of the

window).

• F-1(U) - to calculate the inverse cumulative when a U-value is entered.




59

VoseCombined

=VoseCombined({distributions},{weights},{names},U )

Constructs a newcustom distribution of anumber of differentdistributions. Use thisfunction to model theopinions of different

experts. Weights areassigned to each of the

distributions, reflectingthe relative confidencewe have in them.

• {distributions} - an array

of distributionobjects

• {weight

s} - the weights

assigned to each distribution. These should be proportional to the confidence you have in the

individual estimates.

• {names} - (optional) array with names for each distribution.

• U - optional parameter specifying the cumulative percentile of the distribution. If omittedthe function generates random values.

The function combines the distributions using the linear opinion pool method described in Stone (1961).

See Vose Combined Distribution window for an explanation about the ModelRisk window correspondingto this function.

VoseFunctions for this custom distribution

VoseCombined generates values from this distribution or calculates a percentile.

VoseCombinedObject constructs a distribution object for this distribution.

VoseCombinedProb returns the probability density or cumulative distribution function for this distribution.

VoseCombinedProb10 returns the log10 of the probability density or cumulative distribution function.




60

Splicing Distributions

Introduction

The Splicing distributions windowallows one to connect the left part ofa given distribution together with theright part of another distribution, at apoint (the so-called splice point ) of

one's choice.

This is typically used for modeling a

variable that follows a certaindistribution for in the lower x region,but has a longer tail.

Splicing distributions is explained inmore detail here.

To see the output functions of thiswindow, click here.

Output functions of this window: VoseSplice,VoseSpliceProb, VoseSpliceProb10,VoseSpliceObject

Window elements

In the Splice parameters region on the left, you can specify the distributions and the splice point.

In the LowDistribution field comes the distribution object that will constitute the 'left' part (i.e. the lower x-value), in the HighDistribution field comes the distribution object that will constitute the 'right' part (i.e. the

higher x-values). You can provide the Splice Point as well.

For this calculation to make sense, the lower and higher distributions should have a region of overlap,and the splice point should be in this region.

The graph is showing you the spliced distribution you constructed together with a vertical bar, allowingyou to move the splice point manually along the graph.



• Simulation - (default) to generate random values from the distribution.

• f(x) and F(x) - to calculate the probability density function and the cumulative distribution

function of some x value(s) (an extra parameter x values will appear on the left side of thewindow).





61

VoseSplice

VoseSplice(Low_distribution,High_distribution,Splice_point,U )

Constructs a new distribution by joining the left part of a givendistribution together with the rightpart of another distribution, at a point(the so-called splice point ) of one'schoice.

• Low_distribution -

the low distribution

• High_distribution -the high distribution

• Splice_point - the

splice point

• U - optional

parameter specifying the cumulative percentile of the distribution. If omitted the functiongenerates random values.

Splicing distributions is typically done to model variables that follow a long-tailed distribution (e.g. a

Pareto) for high x, but are better modeled with another distribution for lower x.


VoseSplice generates values from this distribution or calculates a percentile.

VoseSpliceObject constructs a distribution object for this distribution.

VoseSpliceProb returns the probability density or cumulative distribution function for this distribution.

VoseSpliceProb10 returns the log10 of the probability density or cumulative distribution function.




62

Risk Event Calculation

Introduction

A risk event represents the situation where you have a risk that only occurs with a certain probability and

where the risk itself is represented by a specified distribution (the so called 'Impact distribution').

If P is the specified probability ofimpact, this will also be the areaunder the (rescaled) impactdistribution's PDF, while theremaining 1-P is the probability

assigned to an outcome of zero (i.e.the risk event not occurring).


Output functions of this window:

VoseRiskEvent

Window elements

The Probability field can either be filled in manually or can be selected from a cell in your excel sheet. Itshould have a value on [0,1].

The 'Impact' distribution Object can also be filled in manually and picked from your sheet and can also beselected from the 'Select Distribution' window.

The graph of the VoseRiskEvent function consists of two parts:

The part in green (by default) represents the probability of not having the risk and thus has a probability

mass of 1-'Probability'.

The part in blue (by default) represents the impact one has when the risk actually occurs.



• Simulation - (default) to generate random values from the constructed distribution.







63

VoseRiskEvent

VoseRiskEvent(Probability, Impact distribution,U)

A risk event is arandom event thatmay occur with someprobability p and hasan impact with somedistribution in theevent that it doesoccur (its conditionalimpact).

• Prob

ability - the

probability ofthe eventoccuring. Should be on [0,1]

• Impact distribution - should be a distribution object.

The VoseRiskEvent functions combine the probability and conditional impact elements into one

distribution. For example:

=VoseRiskEvent(p, VosePERTObject(Min, Mode, Max))

will simulate the impact of a risk event where p is the probability of occurrence and the PERT distribution

reflects the size of the impact should it occur.

Uses and motivation

Imagine that the event has a probability p of occurrence, and its conditional impact follows a PERTdistribution with known minimum, mode and maximum values. The impact of the risk can be modelled as:

=VoseBinomial(1, p) * VosePERT(Min, Mode, Max)

Or

=VoseBernoulli( p) * VosePERT(Min, Mode, Max)

The Binomial (or Bernoulli) distribution produces a 1 with probability p, and a zero otherwise, so withprobability (1-p) the formulae will return a value of zero and with probability p will return a value drawn

randomly from the PERT distribution.

The problem with this method of simulating the impact of a risk occurs when one performs a sensitivity

analysis because the simulated impact is a combination of two random variables. A sensitivity analysis




64

will look at sensitivity to each variable in turn (the Bernoulli and the PERT, in this case). That means thatthe statistical analysis of generated values will not exclude values from the PERT when the Bernoulli isgenerating a zero (when the PERT value has no influence on the model output), and it spreads theinfluence of the risk between two distributions.

The following example model provides an illustration:

The same cost model is performed twice: the first uses a Bernoulli formula, the second uses a RiskEvent

formula. Using the simple rank correlation Tornado plot of sensitivity, we get the following results for thetwo approaches to the model:




65

On the left (correct) pane each risk event only appears once. On the right (incorrect) pane a risk eventappears once or, more usually, twice (once if the software sees only a significant level of correlation ofthe output to the Bernoulli or PERT, twice if both have significant correlation).

Approximate sum of a large number of risks

The RiskEventObject function is also very useful in summing the impact of many risk events by using it asan argument in the VoseAggregateMC function.


VoseRiskEvent generates values from this distribution or calculates a percentile.

VoseRiskEventObject constructs a distribution object for this distribution.

VoseRiskEventProb returns the probability density or cumulative distribution function for this distribution.

VoseRiskEventProb10 returns the log10 of the probability density or cumulative distribution function.




66

Correlation and copulas

Correlation in ModelRisk

There are a number of toolsincluded in ModelRisk for workingwith correlation.

The U-parameter is used togetherwith copulas to construct correlationrelationships between any desiredunivariate distributions withinModelRisk. See Copulas inModelRisk for a more in-depthexplanation.

Functions like VoseCorrMatrix and

VoseKendallsTau allow the user toanalyze correlation patterns withindata sets.

Listed below are the functions and

windows in ModelRisk related tocorrelation:

• Vose Correlation Matrix

• Vose Bivariate Copula

• Vose Multivariate Copula

• Vose Empirical Copula

• VoseKendallsTau

• VoseCorrMatrix

• VoseCorrMatrixU




67

Copulas in ModelRisk

Copulas offer a tool for correlating two or more random variables together without changing the shape ofthe random variables' distributions themselves.

For example, a scatter plot of a Normal(100,10) and a Weibull(2,3) distribution with no correlation looksas shown in panel 1. By using the Clayton(10) copula variables U1 and U2 (shown in panel 2) as the Uparameters for Normal(100,10) and a Weibull(2,3) distributions we produce the correlation pattern shownin figure 3:

Note that the marginal distributions are unchanged meaning, for example, that the data generated for theNormal variable will still follow a Normal(100,10) distribution. ModelRisk provides the choice of correlatingtwo or more variables using all of the most common copulas, allowing you great flexibility in building

correlation structures. You can also fit copulas directly from a data set and compare the level of fit foreach tested copula using information criteria.

Copulas are a very important technique for modeling correlation in finance and insurance risk problems.The rank order correlation employed by most Monte Carlo simulation tools is a useful measure of

dependence because it is easy to estimate from data and maintains the marginal distributions of the

correlated variables. However, it offers a fixed correlation pattern and is not a probability model, socannot be compared with other correlation patterns with statistical analysis.

See the Copulas topic for an in-depth explanation of the theory behind copulas.

The functions for fitting copulas are explained in the Copula fitting functions topic.

Example: correlating variables with a bivariate copula

In ModelRisk, copulas are used to control the sampling of univariate distributions via the optional U-parameter. The general syntax for generating random values from a bivariate, respectively multivariatecopula is:

{=VoseCopulaBiName([parameters])}

{=VoseCopulaMultiName([parameters])}

where Name is replaced by the name of the copula.

• [parameters] - the copula's parameters, separated by commas (if there are more than

one)

So for example, to generate a normal(0,1) and a beta(2,1) value correlated by a Clayton(3) copula, youwould do the following:

• Select the A1 and B1 spreadsheet cells.

• Type =VoseCopulaBiClayton(3) in the Excel formula bar and pressCTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,indicated by curly brackets. Two random samples from the copula are generated.




68

• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cell

B2. The cell references are U parameters that refer to the copula values generated in the firstcell.

• Now the A2 and B2 cell contain random values correlated by the copula.

Types of copulas available in ModelRisk

The Clayton copula produces a tight correlation at the low end of each variable;

The Frank copula produces and even, sausage shaped correlation across the range of the variables;

The Gumbel copula produces more correlation at the two extremes of the correlated distributions buthas its highest correlation in the maxima tails;

The Normal copula produces an elliptically shaped correlation;

The T (or Student) copula produces a star-shaped copula for low nu with its highest density on the main

diagonal, transitioning towards a Normal copula for high values of nu (30+).

Direction of a copula

The three Archimedean copulas in ModelRisk exhibit positive correlation. In the multivariate versions ofthese copulas the positive correlation pattern must be maintained. However, for a bivariate copula wehave the flexibility to rotate these copulas over 90, 180 or 270 degrees. Therefor bivariate copulafunctions have a direction parameter when appropriate.

The motivation and implementation of the copula direction parameter is explained here.




70

Bivariate Copula

Copulas

ModelRisk

• • • • •

Associated with every bivariatecopula is its copula density, which ismuch like the probability density of a(bivariate) distribution.

The output of a bivariate Copula inModelRisk is an array function oftwo spreadsheet cells. These cellswill contain correlated Uniform(0,1) random variables, with a pattern of correlation defined by the copula.

Next, these correlated Uniform(0,1) variables are used as the U-parameter in the two desireddistributions. The result is that the two final spreadsheet cells will contain variables sampled from yourchosen distribution(s), and at the same time generating the pattern of correlation defined by the chosencopula.

You can read more about the mathematical details of copulas here.


Output functions of this window: VoseCopulaBiClayton, VoseCopulaBiFrank, VoseCopulaBiGumbel,

VoseCopulaBiNormal, VoseCopulaBiT

Window elements

You first select the type of copula to use in the Copulas sectionof the bivariate copula interface.

The copula's parameters can be set manually or linked to a

spreadsheet cell. In the picture above, we see the BivariateClayton copula which takes one parameter named Alpha.

The copula direction can be set to 1, 2, 3 or 4. This is forrotating the copula 90, 180 and 270 degrees respectively. This

allows for more flexibility in modeling correlations.

Correlated distributions




71

In the Correlated distributions area, the distributions to be correlated can be selected. These can be

either typed directly, chosen from the Select Distribution window, or inserted from a spreadsheet cell.

Copula graph

In the middlepane, a graphfor the copula isshown. Thepoints representrandomly

generated (x,y)valuesgenerated bythis copula: theX- and the Y-

axis representthe correlatedvariablesassociated withthe first and

second selected

distribution,respectively.

By default, the percentiles of

these 2correlatedvariables can beshown: these

are valuesbetween 0 and 1. As explained above, certain pairs will have a higher probability of being generated, asdetermined by their correlation (i.e. the copula used).

Optionally, the actual values of sampled random variables can be shown, with both axes rescaling

appropriately. This goes one step further: the (x,y) pairs represent sampled random variables from thechosen distributions, with the percentiles now being driven by the copula. Internally, this is the U -

parameter in action: it takes the random value generated by the copula.

Copulas are directly connected to classical measures of correlation, like rank order correlation. Theequivalent Rank Order Coefficient of the current copula is shown on the left.




72




73

Direction of a copula

The three Archimedean copulas in ModelRisk exhibit positive correlation (pane 1 below). In the

multivariate versions of these copulas the positive correlation pattern must be maintained.

However, for a bivariate copula we have the flexibility to transform the relationships by taking one or bothof the u,v variables and transform to 1-u or 1-v. This gives us three more patterns, shown in panes 2-4below.

The Clayton and Gumbel copulas can be set to directions 1, 2, 3 and 4 corresponding to the directions

illustrated below:

As the Frank copula is symmetric under rotations of 180° it can only take 1 (default) or 2 as directionparameter.

The T copula remains identical under any number of 90° rotations of the plane so it does not take a

direction parameter.

Changing the direction of the Normal copula corresponds to changing the sign of its covariance

parameter.




74

VoseCopulaBiClayton

{=VoseCopulaBiClayton(Alpha, direction)}

Example model

Array function thatreturns randomvariables from abivariate Claytoncopula.

• Alpha

- Correlation

parameter.Can rangefrom -35

(maximumnegativecorrelation)over 0 (no

correlation) to36 (maximumpositivecorrelation)

• Direction - optional

parameter that sets the direction of the parameter: can take values 1 (default), 2, 3 or 4.

The output is an array of two cells, with randomly generated copula values between [0,1]. Link the U-parameter of distribution functions to these to generate values of these distributions correlated by thiscopula.

The optional direction parameter changes the direction of the copula. This can take values 1,2,3,4,orienting the generated densities as illustrated below (when omitted the direction is 1):




75

For the multivariate version of this copula see VoseCopulaMultiClayton.


So for example, to generate a normal(0,1) and a beta(2,1) value correlated by a Clayton(3) copula, youwould do the following:


• Type =VoseCopulaBiClayton(3) in the Excel formula bar and press

CTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,indicated by curly brackets.

• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cellB2. The cell references are U parameters that refer to the copula values generated in the firstcell.


VoseFunctions for this copula

VoseCopulaBiClayton generates values from this copula.

VoseCopulaBiClaytonFit fits this copula to data.

VoseCopulaBiClaytonFitP returns the parameter(s) of this copula fitted to data.

VoseCopulaBiClaytonObject creates a copula object for this copula (use VoseCopulaSimulate to simulatefrom it).

VoseCopulaBiClaytonFitObject creates a copula object for this copula fitted to data (useVoseCopulaSimulate to simulate from it).




76

VoseCopulaBiFrank

{=VoseCopulaBiFrank(Theta, direction)}

Example model

Array function thatreturns randomvariables from abivariate Frankcopula.

• Theta

- Correlation





parameter that sets the direction of the parameter: can take values 1 (default) or 2.


The optional direction parameter changes the direction of the copula. This can take values 1 or 2: 1

(default) means no rotation, 2 means the copula is rotated over 90°.

For the multivariate version of this copula see VoseCopulaMultiFrank.

Example: correlating variables with a bivariate Frank copula




77

So for example, to generate a normal(0,1) and a beta(2,1) value correlated by a Frank(10) copula, youwould do the following:


• Type =VoseCopulaBiFrank(10) in the Excel formula bar and press

CTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,

indicated by curly brackets.• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cell


• Now the A2 and B2 cell contain random values correlated by the Frank(10) copula.


VoseCopulaBiFrank generates values from this copula.

VoseCopulaBiFrankFit fits this copula to data.

VoseCopulaBiFrankFitP returns the parameter(s) of this copula fitted to data.




78

VoseCopulaBiGumbel

{=VoseCopulaBiGumbel(Theta, direction)}

Example model

Array function thatreturns randomvalues from abivariate Gumbelcopula.

• Theta

- Correlation





parameter that sets the direction of the parameter: can take values 1 (default), 2, 3 or 4.


The optional direction parameter changes the direction of the copula. This can take values 1,2,3,4

according to the number of counterclockwise 90° rotations. Direction 1 (default) means no rotation, 2

means rotated over 90°, 3 means rotated over 180°, and 4 means rotated over 270°.

For the multivariate version of this copula see VoseCopulaMultiGumbel.




79

Example: correlating variables with a bivariate Gumbel copula

So for example, to generate a normal(0,1) and a beta(2,1) value correlated by a Gumbel(10) copula, youwould do the following:


• Type =VoseCopulaBiGumbel(10) in the Excel formula bar and pressCTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,indicated by curly brackets.

• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cellB2. The cell references are U parameters that refer to the copula values generated in the firstcell.

• Now the A2 and B2 cell contain random values correlated by the Gumbel(10) copula.


VoseCopulaBiGumbel generates values from this distribution or calculates a percentile.

VoseCopulaBiGumbelFit fits this copula to data.

VoseCopulaBiGumbelFitP returns the parameter(s) of this copula fitted to data.




80

VoseCopulaBiNormal

{=VoseCopulaBiNormal(Correlation)}

Example model

Array function thatreturns randomvariables from abivariate Normalcopula.

• Corre

lation - linear

correlationcoefficient.Must be on [-

1,1]

The output is an arrayof two cells, withrandomly generated

copula valuesbetween [0,1]. Linkthe U-parameter ofdistribution functions

to these to generatevalues of thesedistributions correlated by this copula.

Note that a negating the sign of the covariance parameter corresponds to changing the direction of the

copula.

For the multivariate version of this copula see VoseCopulaMultiNormal.

Example: correlating variables with a bivariate Normal copula

For example, to generate a normal(0,1) and a beta(2,1) value correlated by a Normal(0.5) copula, youwould do the following:





81

• Type =VoseCopulaBiNormal(5) in the Excel formula bar and press

CTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,indicated by curly brackets.

• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cellB2. The cell references are U parameters that refer to the copula values generated in the first

cell.



VoseCopulaBiNormal generates values from this distribution or calculates a percentile.

VoseCopulaBiNormalFit fits this copula to data.

VoseCopulaBiNormalFitP returns the parameter(s) of this copula fitted to data.




82

VoseCopulaBiT

{=VoseCopulaBiT(Nu, Correlation)}

Example model

Array function thatreturns randomvalues from abivariate T copula.

• Nu -

Number ofdegrees offreedom.Must be a

positiveinteger.

• Correlation - linearcorrelationcoefficient.

Must be on [-1,1]

The output is an arrayof two cells, withrandomly generated

copula density values between [0,1]. Link the U-parameter of distribution functions to these to generatevalues of these distributions correlated by this copula.

For the multivariate version of this copula see VoseCopulaMultiT.

Note that this copula is symmetric under rotations over 90° so it does not have a direction parameter.


For example, to generate a normal(0,1) and a beta(2,1) value correlated by a T(1,0.5) copula, you woulddo the following:


• Type =VoseCopulaBiT(1,0.5) in the Excel formula bar and pressCTRL+SHIFT+ENTER - Excel now inserts this as an array function over the two selected cells,indicated by curly brackets.




83

• Insert =VoseNormal(0,1,A1) in the cell A2, and =VoseBeta(2,1, B1) in the cell




VoseCopulaBiT generates values from this distribution or calculates a percentile.

VoseCopulaBiTFit fits this copula to data.

VoseCopulaBiTFitP returns the parameter(s) of this copula fitted to data.




84

Multivariate Copula

Copulas are used for correlating two or more random variables and allow for greater flexibility than oldercorrelation methods, like rank_order_correlation.

Associated with every MultivariateCopula is its copula density, which is

much like the probability density of a(multivariate) distribution.

The output of a multivariate Copulain ModelRisk is an array of

spreadsheet cells. These cells willcontain Uniform(0,1) randomvariables with a pattern ofcorrelation defined by the copula.

The following multivariate copulasare available for use in spreadsheet

models in ModelRisk :

• Multivariate ClaytonCopula

• Multivariate Frank Copula

• Multivariate Gumbel Copula

• Multivariate Normal Copula

• Multivariate T Copula

These correlated Uniform(0,1) variables are used as the U-parameter in the desired distributions. Theresult is that the final spreadsheet cells will contain variables sampled from your chosen distributions and

correlated (through the Uniform(0,1) variables as U-parameter) by the chosen copula.



Output functions of this window: VoseCopulaMultiClayton, VoseCopulaMultiFrank,VoseCopulaMultiGumbel, VoseCopulaMultiNormal, VoseCopulaMultiT

Window elements

Copula parameters, Correlation matrix and correlated distributions

In the Copulas section, you can choose between a number of different copulas:

• Multivariate Clayton Copula

• Multivariate Frank Copula

• Multivariate Gumbel Copula

• Multivariate Normal Copula

• Multivariate T Copula

The Correlation matrix is shown, and dimensions can be added or removed by clicking the + and -

buttons. When an element of the matrix is selected, a distribution to be correlated can be chosen for it.




85

In the Correlated distributions area, the distributions to be correlated can be selected. These can be

either typed directly, chosen from the Select Distribution window, or inserted from a spreadsheet cell.

Copula graph

In the middle pane, ascatter plot of thecopula is shown. Ofcourse, a copula ofdimension greaterthan 2 is hard tovisualize. The points

represent randomlygenerated (x,y)values generated bytwo distributions inthe copula: the X-and the Y- axisrepresent thecorrelated variables

associated with thefirst and secondselected distribution,respectively.

By default, the percentiles of these

2 correlatedvariables can beshown: these arevalues between 0and 1. As explained

above, certain pairs will have a higher probability of being generated, as determined by their correlation(i.e. the copula used).

Optionally, the actual values of sampled random variables can be shown, with both axes rescaling

appropriately. This goes one step further: the (x,y) pairs represent sampled random variables from thechosen distributions, with the percentiles now being driven by the copula. Internally, this is the U -parameter in action: it takes the random value generated by the copula.

The zero-axes are shown with grey lines.




86

VoseCopulaMultiClayton

{=VoseCopulaMultiClayton(alpha)}

Example model

Array function thatreturns random

variables from amultivariate Claytoncopula.

Alpha - Correlationparameter. Can rangefrom -35 (maximumnegative correlation)over 0 (no correlation)to 36 (maximumpositive correlation)

The output is a 1xn ornx1 array of randomlygenerated copulavalues between

[0,1],with n being thenumber of variablesto be correlated. Linkthe U-parameter ofdistribution functions to these to generate values of these distributions correlated by this copula.

For the bivariate version of this copula, see VoseCopulaBiClayton.


VoseCopulaMultiClayton generates values from this distribution or calculates a percentile.

VoseCopulaMultiClaytonFit fits this copula to data.

VoseCopulaMultiClaytonFitP returns the parameter(s) of this copula fitted to data.




87

VoseCopulaMultiFrank

{=VoseCopulaMultiFrank(theta)}

Example model

Array function thatreturns variables from

a multivariate Frankcopula.

• Theta

- Correlationparameter.Can rangefrom -35(maximumnegativecorrelation)over 0 (no


The output is an nx1

or 1xn array of withrandomly generatedcopula values between [0,1],with n being the number of variables to be correlated. Link the U-parameterof distribution functions to these to generate values of these distributions correlated by this copula.

For the bivariate version of this copula, see VoseCopulaBiFrank.


VoseCopulaMultiFrank generates values from this distribution or calculates a percentile.

VoseCopulaMultiFrankFit fits this copula to data.

VoseCopulaMultiFrankFitP returns the parameter(s) of this copula fitted to data.




88

VoseCopulaMultiGumbel

{=VoseCopulaMultiGumbel(theta)}

Example model

Array function thatreturns random

variables from amultivariate Gumbelcopula.

• Theta - Correlationparameter.Can rangefrom -35(maximumnegativecorrelation)

over 0 (nocorrelation) to35 (maximumpositive

correlation)

The output is an nx1or 1xn array of with randomly generated copula values between [0,1],with n being the number of variablesto be correlated. Link the U-parameter of distribution functions to these to generate values of thesedistributions correlated by this copula.

For the bivariate version of this copula, see VoseCopulaBiGumbel.


VoseCopulaMultiGumbel generates values from this distribution or calculates a percentile.




89

VoseCopulaMultiGumbelFit fits this copula to data.

VoseCopulaMultiGumbelFitP returns the parameter(s) of this copula fitted to data.




90

VoseCopulaMultiNormal

{=VoseCopulaMultiNormal({correlation_matrix})}

Example model

Array function thatmodels a multivariate

Normal copula.

• correlation_matrix

- a nxn arraythat containsa validcorrelationmatrix.

The output is an nx1or 1xn array of withrandomly generatedcopula valuesbetween [0,1],with nbeing the number ofvariables to becorrelated. Link the U-

parameter ofdistribution functionsto these to generate values of these distributions correlated by this copula.

For the bivariate version of this copula, see VoseCopulaBiNormal.


VoseCopulaMultiNormal generates values from this distribution or calculates a percentile.

VoseCopulaMultiNormalFit fits this copula to data.

VoseCopulaMultiNormalFitP returns the parameter(s) of this copula fitted to data.




91

VoseCopulaMultiT

{=VoseCopulaMultiT(nu,{correlation_matrix})}

Example model

Array function thatmodels a multivariateT copula.

• Nu -Number ofdegrees offreedom.Must be apositive

integer.

• correl

ation_matrix- a nxn arraythat containsa valid

correlationmatrix.

The output is an 1xnor nx1 array ofrandomly generated

copula values between [0,1],with n being the number of variables to be correlated. Link the U-parameterof distribution functions to these to generate values of these distributions correlated by this copula.

For the bivariate version of this copula, see VoseCopulaBiT.


VoseCopulaMultiT generates values from this distribution or calculates a percentile.

VoseCopulaMultiTFit fits this copula to data.

VoseCopulaMultiTFitP returns the parameter(s) of this copula fitted to data.




92

VoseCopulaSimulate

=VoseCopulaSimulate(CopulaObject)

Array function that returns random values from a copula object.

• CopulaObject - a valid copula object

Example

Say you want to correlate a set of 4 variables with a multivariate Clayton copula fitted to an array withhistorical data called DataSet. You would then write

=VoseCopulaMultiClaytonFitObject(DataSet)

in cell A1. To generate random values from this copula you would then enter the array function

{=VoseCopulaSimulate(A1)}

with output over 4 cells. This returns a set of generated random values (all between 0 and 1) of thecopula object. Then you would use these as a U-parameter in distribution functions as explained here.




93

VoseCopulaData

{=VoseCopuladata({data},Data_in_rows,uncertainty )}

Array function thatreturns random valuesfrom an empiricalcopula that isconstructed based onspreadsheet data.

• {data}

- thespreadsheet

data fromwhich toconstruct thecopula. Thisshould be atleast a two-dimensionalarray.

• Data_i

n_rows - a

boolean

parameter(TRUE/FALSE) that specifies whether the data is oriented in rows (TRUE) or not (FALSE,default)

• Uncertainty - a boolean parameter (TRUE/FALSE) and specifies whether or not to

include uncertainty in the constructed empirical copula (FALSE by default).

Note the difference between constructing an empirical copula, and fitting an existing type of copula:

When fitting a copula, we determine the parameter of the copula that makes for a best fit to the data, butretaining the copula's functional form. With the empirical copula, the functional form itself (not just theparameter) is based on the data, making it a flexible tool for capturing any correlation pattern, however

unusual.

Also see Copula fitting functions for an explanation on how to fit copulas to data.




94

VoseCopulaDataSeries

=VoseCopulaDataSeries(Data, Uncertainty)

This array function returns a set of random values from a copula created by analyzing the correlation in adata series between contiguous values.

• Data – a set of sequential observations from a time series.

• Uncertainty – an optional Boolean parameter determining whether one should incorporate the

statistical uncertainty about the estimated copula relationship. Uncertainty is not included if theparameter is set to FALSE (or 0) or omitted, and is included if the parameter is set to TRUE (or 1)

Application

Imagine that you have a set of time series data for a single variable from which you wish to make aforecast. One approach would be to use one of the time series fitting functions in ModelRisk. However,each of ModelRisk’s time series fitting functions involves a number of assumptions that you may not becomfortable in accepting.

The VoseCopulaDataSeries function offers a more flexible alternative. The function analyzes a dataseries for any autocorrelation between sequential values in a series. For example, consider the followingtime series of log returns of a stock:

A scatter plot of the returns in each period against the returns in the previous period reveal some

correlation relationship:




95

Fitting a distribution to the log returns shows that the 3-parameter Student is a good fit:




96

By using the VoseCopulaDataSeries function to simulate a correlation, and a VoseStudent3 distribution to

simulate the size of returns one can produce a forecast. This model illustrates the example.

This approach has its own set of assumptions, namely: in terms of the use of the VoseCopulaDataSeriesfunction, that the autocorrelation occurs over just a single lag period; and in terms of the use of the 3-parameter Student distribution, that the distribution of the underlying variable is constant (although thiscould be relaxed by changing the distribution over the range of the forecast).




97

Time Series

Time series in ModelRisk

A time series model is a stochastic forecast of a variable that varies randomly over time.

ModelRisk contains a number of advanced time series models. All time series can be simulated,parameter estimates can be determined from data, or projections made based on fitting to historic data.

Time series can be inserted by directly inserting (array) functions in spreadsheet cells, or through theunivariate time series, multivariate time series, or time series fit windows. These ModelRisk windowseach have their separate page in this help file, while the general use of the functions is explained below.

To generate random values from a time series, use a VoseTimeSeries array function. The general syntaxis:

{=VoseTimeSeries([parameters], Initial Value, Log Returns)}

where Series is replaced by the name of the time series.

• [parameters] - the time series' parameters separated by commas.

• Initial Value - starting value (at time zero). The generated time series values will continueon from this value. Should only be provided if the Log Return parameter is set to FALSE oromitted.

• Log Returns - an optional parameter. Function generates log returns if set to TRUE, or

variable values if set to FALSE or omitted.

For example, to generate 10 random values from a GBM(0.02,0.1) model that start from a value of 100

you would insert

{=VoseTimeGBM(0.02,0.1,100)}

over a range of 10 spreadsheet cells. To generate Log Returns of that same time series you would write

{=VoseTimeGBM(0.02,0.1,100, TRUE)}

As the ModelRisk Time Series functions typically take a lot of parameters, we recommend for these inparticular to use the Time Series window to avoid errors.

This topic is about forecasting from time series. For an explanation about time series fitting with Modelrisksee Time series fitting functions.

About the Initial Value parameter

Where appropriate, time series functions take a Initial Value parameter. This is the starting value of the

variable from which the time series forecast is to be made, S 0.

In a forecast model not fitted to data, Initial Value is the value of the variable at time zero, and theforecast projects a series from period one as a departure from this value at time zero. In this situationInitial Value is a required parameter.

If the Log Return option is selected there is no need to specify Initial Value, since the forecast then

projects the series of log returns r t = LN[S t/S t-1]. If the Log Return option is not selected (the default) theforecast needs the Initial Value (S0) as it then makes the projection: S 1=S0*EXP(r 1); S2=S1*EXP(r 2); etc.

The following list describes the time series models available in ModelRisk.

Geometric Brownian Motion (GBM) based models




98

GBM is usually the default starting point for a time series of a non-negative financial variable - like a stockprice, exchange rate or interest rate. It assumes that the fractional changes in the variable betweenperiods are independent, random variables following a Normal distribution. ModelRisk offers the followingGBM-related distributions:

VoseTimeGBM - Basic GBM.

VoseTimeGBMAJ - GBM with asymmetric jumps, meaning random discrete jumps can affect the

variable.

VoseTimeGBMVR - GBM with reversion to a fixed value, meaning the variable is drawn back towards its

long-run mean in proportion to its deviation from the mean.

VoseTimeGBMAJVR - GBM with both mean reversion and jump diffusion.

VoseTimeSeasonalGBM - GBM with seasonal variation. ModelRisk offers a second optional cycle withineach period of the first cycle, useful for modelling say week/day or day/hour patterns.

Auto-Regressive Moving Average (ARMA) models

Auto-regression means that the expected fractional change of the variable is proportional (either

positively or negatively) to its fractional change in the previous recent periods. Moving average meansthat the expected fractional change of the variable is different to the long-run mean by a factor that isproportional to its recent variation from its long-run mean.

VoseTimeAR1 - Auto-regressive model with one-period dependence.

VoseTimeAR2 - Auto-regressive model with two-period dependence.

VoseTimeMA1 - Moving-average model with one-period dependence.

VoseTimeMA2 - Moving-average model with two-period dependence.

VoseTimeARMA - Auto-regressive moving average model with one-period dependence.

ARCH-type models

ARCH stands for autoregressive conditional heteroskedasticity . The volatility of the time series is defined

as a function of the previous deviations of the variable from its long run mean. ARCH-type models allowperiods of higher and lower volatility.

VoseTimeARCH - Basic ARCH model with one-period dependence.

VoseTimeGARCH - Generalized ARCH model with one-period dependence, i.e. ARCH model where thevolatility component is an ARMA model.

VoseTimeEGARCH - Exponential general autoregressive conditional heteroskedasticity model, allowing

negative values in the linear error variance equation with one-period dependence.

VoseTimeAPARCH - Asymmetric power autoregressive conditional heteroskedasticity with one-perioddependence. It is a good one to try because it nests a large number of other models: GARCH, TS-

GARCH, NGARCH, GJR-GARCH, TGARCH and log-GARCH.

'Population' Models

These models describe the evolution of a population size.

VoseTimeDeath - Pure Death model: Individuals 'die' independently at the same expected rate. Useful,

for example, in modelling the retention of clients, or the timing and number of life insurance claims.

VoseTimeYule - Yule linear growth model: Individuals 'reproduce' by division at the same expected rate,

meaning that each individual becomes two. Useful, for example, to model growth in a customer base byword-of-mouth.

Markov Chains




99

VoseMarkovSample - Markov chains are used to model the change in state of a population of individuals

over time. For example, changes in credit ratings of a company, or the health status of life insurancepolicy holders.

Wilkie Models

Wilkie Models receive a separate treatment (and window) in ModelRisk . See the Wilkie models topic for

a detailed explanation about Wilkie models. The following wilkie models are available:

• price inflation

• wage inflation

• share yields

• share dividends

• long term interest rate

• short term interest rate

Subject Matter Expert times series

ModelRisk has a variety of non-technical time series models designed to help produce subjective,described here.

Fitting a time series model to data

All time series models can be fitted to spreadsheet data. Fitted time series can be ranked according todifferent information criteria. See Fitting in ModelRisk for a more detailed explanation.

Multivariate time series

ModelRisk allows you to simulate from a number of multivariate Time Series. This allows for modeling ofdifferent quantities that vary in time together, but our connected through some relation: for example therealizations at each point could be correlated, or one component could come about through a regressionfrom current values driven by past values of the other Time Series, etc. The Multivariate Time Series in

ModelRisk are:

• VoseTimeMultiAR1 - Multivariate Autoregressive Time Series of order 1

• VoseTimeMultiAR2 - Multivariate Autoregressive Time Series of order 2

• VoseTimeMultiGARCH - Multivariate GARCH Time Series.

• VoseTimeMultiGBM - Multivariate Geometric Brownian Motion time series

• VoseTimeMultiMA1 - Multivariate Moving Average Time Series of order 1

• VoseTimeMultiMA2 - Multivariate Moving Average time series of order 2

All functions for simulating from a multivariate time series function are array functions.




100

Univariate Time Series

Introduction

A time series is a stochastic forecast of a variable over time.

Often, we want to predict what these values will be for future years based on data for the past and/ortheoretical considerations. This is called a time series forecast .

The output of a Time Series forecast in ModelRisk is a one-dimensional array of randomly generatedvalues. These represent predictions made by the time series forecast, based on the mathematical basisbehind it, past data and/or the time series' parameters.

For an overview of the time series available in ModelRisk, see the Time series in ModelRisk topic.

You can read more about the theory behind time series here.


Output functions of this window: VoseTimeGBM, VoseTimeGBMJD, VoseTimeGBMJDMR,

VoseTimeGBMMR, VoseTimeSeasonalGBM, VoseTimeMA1, VoseTimeMA2, VoseTimeAR1,VoseTimeAR2, VoseTimeARMA, VoseTimeARCH, VoseTimeGARCH, VoseTimeEGARCH,VoseTimeAPARCH, VoseTimeDeath, VoseTimeYule

Window elements

Time series parameters




101

In the Time series parameters section, you can choose between a number of different Time Series

models:

• APARCH (Industrial edition only)

• AR1

• AR2 (Industrial edition only)

• ARCH (Industrial edition only)

• ARMA

• Death (Industrial edition only)

• EGARCH (Industrial edition only)

• GARCH (Industrial edition only)

• GBM

• GBMAJ

• GBMVR

• GBMAJVR (Industrial edition only)

• MA1

• MA2 (Industrial edition only)

• MarkovSample

• SeasonalGBM

• Yule (Industrial edition only)

These are explained one by one in the Time series in ModelRisk topic.

Each of these takes a certain set of parameters, which can be inserted manually by typing in theappropriate field, or dynamically link to a value in a spreadsheet cell.

Window elements

Options




102

In the Options area, a particularly interesting field is Number of lines: as a Time Series forecast is

uncertain, many scenarios are possible. By default, only one is generated and shown in the graph pane. Ifyou set this field to a higher number you will see a set of different scenarios generated and presented.

For easy comparison, you can also display historical data from your spreadsheet together with themodeled forecast.

Also, the output location (i.e. a one-dimensional array of spreadsheet cells) can be selected. This can be

either typed directly, or selected from the active worksheet.

Time Series graph

Time Series graphs from the Time Series window

In the middle pane, a graph for the generated Time Series model is shown. The lines represent randomlygenerated Time Series forecasts.

By default, only one line is shown in blue. This number can be increased by changing the Number of lines

field mentioned above.




103

VoseTimeAR1

VoseTimeAR1(Mu, Sigma, A, R0, Log Return, Initial Value)

Array function that models an autoregressive time series model of order 1.

• Mu - mean log return;

• Sigma - standard deviation of log returns;

• A - autoregressive parameter;

• R0 - log return at period 0;

• Log Return - an optional parameter. Function generates log returns if set to TRUE, or

variable values if set to FALSE or omitted;

• Initial Value - starting value (at time zero). The generated time series values will continue

on from this value. Should only be provided if the Log Return parameter is set to FALSE oromitted.

As the ModelRisk Time Series functions typically take a lot of parameters, we recommend for these inparticular to use the Time Series window.

Equations

where:

- a sample from a Normal(0,1)

- if is the value of the variable at time t, then is the log return defined as

- log return mean

- conditional log return standard deviation

A - autoregressive factor

VoseFunctions for this time series

VoseTimeAR1 - generates an array of random values from this time series.

VoseTimeAR1Fit - generates an array of random values from this time series fitted to data.

VoseTimeAR1FitP - returns the parameters of this time series fitted to data.




104

VoseTimeAR2

VoseTimeAR2(Mu, Sigma, A1, A2, R0, R_1, Log Return, Initial

Value)

Array function that models an autoregressive time series model of order 2.


• Sigma - the constant coefficient of the variance equation;

• A1 - first autoregressive parameter;

• A2 - second autoregressive parameter;


• R_1 - log return at period -1;

• Log Return - an optional parameter. Function generates log returns if set to TRUE, orvariable values if set to FALSE or omitted;

• Initial Value - starting value (at time zero). The generated time series values will continueon from this value. Should only be provided if the Log Return parameter is set to FALSE or

omitted.


Equations

where:



- log return mean


A1 - 1 lag autoregressive factor

A2 - 2 lags autoregressive factor


VoseTimeAR2 - generates an array of random values from this time series.

VoseTimeAR2Fit - generates an array of random values from this time series fitted to data.

VoseTimeAR2FitP - returns the parameters of this time series fitted to data.




105

VoseTimeMA1

VoseTimeMA1(Mu, Sigma, B, Z0, Log Return, Initial Value)

Array function that models a Moving Average time series model of order 1.



• B - moving average parameter;

• Z0 - movement at period 0;






Equations

where:



- log return mean


b - moving average factor


VoseTimeMA1 - generates an array of random values from this time series.

VoseTimeMA1Fit - generates an array of random values from this time series fitted to data.

VoseTimeMA1FitP - returns the parameters of this time series fitted to data.




106

VoseTimeMA2

VoseTimeMA2(Mu, Sigma, B, Z0, Z_1, Log Return, Initial Value)

Array function that models a Moving Average time series model of order 2.



• B1 - moving average parameter;

• B2 - second moving average parameter;


• Z_1 - movement at period -1;





As the ModelRisk Time Series functions typically take a lot of parameters, we recommend for these in

particular to use the Time Series window.

Equations

where:



- log return mean- conditional log return standard deviation

b1 - 1 lag moving average factor

b2 - 2 lags moving average factor


VoseTimeMA2 - generates an array of random values from this time series.

VoseTimeMA2Fit - generates an array of random values from this time series fitted to data.

VoseTimeMA2FitP - returns the parameters of this time series fitted to data.




107

VoseTimeARCH

VoseTimeARCH(Mu, Omega, A, R0, Log Return, Initial Value)

Array function that models an autoregressive conditional heteroskedasticity

time series of order 1.


• Omega - the constant coefficient of the variance equation;


• R0 - the log return at period 0;





As the Vose Time Series functions typically take a lot of parameters, we recommend for these in

particular to use the Time Series window for these.

Equations

where:



- log return mean

- log return standard deviation

- log return variance constant

α - sigma autoregressive factor


VoseTimeARCH - generates an array of random values from this time series.

VoseTimeARCHFit - generates an array of random values from this time series fitted to data.

VoseTimeARCHFitP - returns the parameters of this time series fitted to data.




109

VoseTimeARMAFitP - returns the parameters of this time series fitted to data.




110

VoseTimeEGARCH

VoseTimeEGARCH(Mu, Omega, Theta, A, B, Sigma0, Z0, Log Return,

Initial Value)

Array function that models an exponential general autoregressive conditionalheteroskedasticity model, allowing negative values in the linear error varianceequation with one-period dependence.



• Theta - exponential parameter;



• Sigma0 - standard deviation of log return at period 0;







Equations

where:



- log return mean


- factor for g function




111

- log variance constant

α - autoregressive factor



VoseTimeEGARCH - generates an array of random values from this time series.

VoseTimeEGARCHFit - generates an array of random values from this time series fitted to data.

VoseTimeEGARCHFitP - returns the parameters of this time series fitted to data.




112

VoseTimeAPARCH

VoseTimeAPARCH(Mu, Omega, Delta, Gamma, A, B, Sigma0, R0, E0,

Log Return, Initial Value)

Array function that models an asymmetric power autoregressive conditional

heteroskedasticity time series model of order (1,1).



• Delta - asymmetric power parameter;

• Gamma - asymmetric power parameter;





• E0 - the error term at time zero;

• Log Return - an optional parameter. Function generates log returns if set to TRUE, orvariable values if set to FALSE or omitted;




Equations

where:



- log return mean

- leverage effect factor

- power (Taylor effect)




113


- conditional variance constant

α - ARCH effect factor

b - GARCH effect factor


VoseTimeAPARCH - generates an array of random values from this time series.

VoseTimeAPARCHFit - generates an array of random values from this time series fitted to data.

VoseTimeAPARCHFitP - returns the parameters of this time series fitted to data.




114

VoseTimeGARCH

VoseTimeGARCH(Mu, Omega, A, B, Sigma0, R0, Log Return, Initial

Value)

Models a generalized autoregressive conditional heteroskedasticity timeseries model of order (1,1).












Equations

where:



- log return mean


- log variance constant

α - autoregressive factor






115

VoseTimeGARCH - generates an array of random values from this time series.

VoseTimeGARCHFit - generates an array of random values from this time series fitted to data.

VoseTimeGARCHFitP - returns the parameters of this time series fitted to data.




116

VoseTimeGBM

VoseTimeGBM(Mu, Sigma, Initial Value, {Time Stamps}, Log

Return)

Array function that models a Geometric Brownian Motion (GBM) time series

model. GBM is usually the default starting point for a time series of a non-negative financial variable - like a stock price, exchange rate or interest rate. Itassumes that the fractional changes in the variable between periods areindependent, random variables following a Normal distribution.


• Sigma- standard deviation of log returns;


on from this value. Should only be provided if the Log Return parameter is set to FALSE oromitted;

• {Time Stamps} - an array of time stamps (optional);





Equations

where:



- log return mean



VoseTimeGBM - generates an array of random values from this time series.

VoseTimeGBMFit - generates an array of random values from this time series fitted to data.

VoseTimeGBMFitP - returns the parameters of this time series fitted to data.




117

VoseTimeGBMAJ

VoseTimeGBMAJ(Mu, Sigma, MuJ, SigmaJ, PJump, PJumpUp,

Initial Value)

Array function that models a Geometric Brownian Motion (GBM) with Asymmetric Jumps.

• Mu - mean log return of underlying Geometric Brownian Motion;

• Sigma - standard deviation of log returns of underlying Geometric Brownian Motion;

• MuJ - mean log return of a jump;

• SigmaJ - standard deviation of a jump;

• PJump - the probability of a jump in a single period;

• PJumpUp - the probability that a jump will be upwards;






Equations

where:



- mean log return of underlying Geometric Brownian Motion

- mean log return of a jump




118

- standard deviation of log return of underlying Geometric Brownian Motion;

- standard deviation of a jump;

PJump -the probability of a jump in a single period;

PJumpUp - the probability that a jump will be upwards.


VoseTimeGBMAJ - generates an array of random values from this time series.

VoseTimeGBMAJFit - generates an array of random values from this time series fitted to data.

VoseTimeGBMAJFitP - returns the parameters of this time series fitted to data.




119

VoseTimeGBMAJVR

VoseTimeGBMAJVR(Reversion Value, Sigma, Alpha, MuJ, SigmaJ,

PJump, PJumpUp, Log Return, Initial Value)

Array function that models a Geometric Brownian Motion (GBM) time series model with AsymmetricJumps and Reversion to a fixed value S* .

• Reversion Value - value towards which the series reverts;


• Alpha - the mean reversion factor;

• MuJ - mean log return of a jump;

• SigmaJ - standard deviation of a jump;

• PJump - the probability of a jump in a single period;

• PJumpUp - the probability that a jump will be upwards;

• Log Return - Optional boolean parameter (TRUE/FALSE) specifying whether to return

the actual time series (FALSE, default) or log returns (TRUE);





Equations

where:


- mean log return of a jump

- standard deviation of log returns of underlying Geometric Brownian Motion




120

- standard deviation of a jump

- mean reversion factor

S* - value towards which the series reverts

PJump - the probability of a jump in a single period

PJumpUp - the probability that a jump will be upwards


VoseTimeGBMAJVR - generates an array of random values from this time series.

VoseTimeGBMAJVRFit - generates an array of random values from this time series fitted to data.

VoseTimeGBMAJVRFitP - returns the parameters of this time series fitted to data.




121

VoseTimeGBMJD

This function has been replaced by VoseTimeGBMAJ which now includes asymmetric jumps.




122

VoseTimeGBMJDMR

This function has been replaced by VoseTimeGBMAJVR which now includes asymmetric jumps andreversion to a fixed value.




123

VoseTimeGBMMR

This function has been replaced by VoseTimeGBMVR which now includes reversion to a fixed value.




124

VoseTimeGBMVR

VoseTimeGBMVR(Reversion Value, Sigma, Alpha, Log Return,

Initial Value)

Array function that models a Geometric Brownian Motion (GBM) with Reversion to a fixed value S*.

• Reversion Value - value towards which the series reverts;


• Alpha - the mean reversion factor;




on from this value. Should only be provided if the Log Return parameter is set to FALSE oromitted;


Equations

where:


- standard deviation of log returns of underlying Geometric Brownian Motion

- mean reversion factor

S* - value towards which the series reverts


VoseTimeGBMVR - generates an array of random values from this time series.

VoseTimeGBMVRFit - generates an array of random values from this time series fitted to data.

VoseTimeGBMVRFitP - returns the parameters of this time series fitted to data.




125

Multivariate Time Series

One often wants to forecast randomly varying values from different quantities in time, where thosequantities are somehow related to each other.

Quantities that move together in time are typically modeled using multivariate time series ("MultiTS")models. MultiTS models allow one to easily account for the relations and correlation that exist betweenthe "marginal" components.

A typical example of a situation where one can use multivariate time series is yield curve modeling for

example: here we model the interest rates for different times-to-maturity. At any point in time an interestrate for some time to maturity (say, 5 years) is typically related to:

• the (immediate) past,

• the interest rates at for other times-to-maturity (e.g. 1 month, 1 year, 10 years...)

A good way to model this is provided by multivariate time series. These are a generalization of their

univariate time series counterpart. Use the ModelRisk Multivariate Time Series window to simulate fromthe following multivariate time series models:

• Multivariate Autoregressive (order 1)

• Multivariate Autoregressive (order 2) (Industrial edition only)

• Multivariate Geometric Brownian Motion

• Multivariate Moving-Average (order 1)




126

• Multivariate Moving-Average (order 2) (Industrial edition only)

• Multivariate GARCH (Industrial edition only)

• Markov Chain models

Each of these takes a certain set of parameters, which can be inserted manually by typing in theappropriate field, or dynamically link to a value in a spreadsheet cell.

The output of the multivariate time series window will always span multiple cells, in other words it will be

an array function.


Output functions of this window: VoseTimeMultiAR1, VoseTimeMultiAR1Object, VoseTimeMultiAR2,VoseTimeMultiAR2Object, VoseTimeMultiBEKK, VoseTimeMultiBEKKObject, VoseTimeMultiGBM,VoseTimeMultiGBM, VoseTimeMultiMA1, VoseTimeMultiMA1Object, VoseTimeMultiMA2, VoseTimeMultiMA2Object

Window elements

Options

In the Time series parameters area, the type of multivariate time series to model can be selected. Also

the multivariate time series' parameters can be entered: you can enter them manually as an array or linkthem to the spreadsheet.

The Initial Values parameter is the array with starting values to begin your forecast from. Select the Log

Return checkbox to whether to model log returns instead of the actual series.

The other parameter fields shown depend on the specific Time Series model selected from the drop-downmenu.

A preview of the simulated time series is shown on the right. In the Graphing options area, select the

series you wish to display.




127

When appropriate for the selected Time Series model, Historical data to be taken into account can be

selected. This can be either a single cell or array.




128

VoseMarkovMatrix

{=VoseMarkovMatrix({matrix}, T)}

This is an n x n array function that calculates the Markov chain transition matrix for T periods.

• {matrix} - the n x n array of the transition matrix for a single period.

• T - the number of periods.

ModelRisk allows non-integer values for the number of periods T. However, if the number of periods is

non-integer, the transition matrix must be positive semi-definite. ModelRisk checks this during calculation

and returns an appropriate error message.To calculate how many individuals there are in each state after a certain period, use theVoseMarkovSample function.

For a more in-depth explanation about Markov Chain models, see Markov Chain models.




129

VoseMarkovSample

VoseMarkovSample({Start Vector}, {Transition Matrix}, Number

of Periods T)

Simulates the number of individuals there will be in each state of a Markov

chain process after T periods.

• {Start Vector} - a 1 x n array of the number of individuals in

each state.

• {Transition Matrix} - a n x n transition probability matrix for a

single period.

• Number of Periods T - The number of periods.

ModelRisk allows non-integer values for the number of periods T. However, if the number of periods is

non-integer, the transition matrix must be positive semi-definite. ModelRisk checks this during calculationand returns an appropriate error message.

To calculate the transition matrix for T periods, use the VoseMarkovMatrix function.

For a more in-depth explanation about Markov Chain models, see Markov Chain models.




132




133

VoseTimeMultiBEKK

This function has been replaced by VoseTimeMultiGARCH, which now incorporates mean returns.




134

VoseTimeMultiGBM

VoseTimeMultiGBM({Means}, {CovMatrix}, Log Return, {Initial

Values}, Data_in_rows)

Array function that simulates from a Multivariate Geometric Brownian Motion time series model.

• {Means} - array of mean log returns per period for each variable.

• {CovMatrix} - covariance matrix of log returns.



• {Initial Values} - array of starting values (at time zero) for each variable.

• Data_in_rows - optional parameter that specifies if the data is in rows (TRUE) or

columns (FALSE, default).

Equations

where:

k - number of variables

- k x 1 random vector, if is the value of the variable at time t, then is the log return defined as

- k x 1 vector of means

- k x 1 vector of uncorrelated random variables, which is defined as follows:

- k x k covariance matrix.


VoseTimeMultiGBM - generates an array of random values from this time series.

VoseTimeMultiGBM - creates an object for this time series.

VoseTimeMultiGBMFit - generates an array with random values from this time series fitted to data.

VoseTimeMultiGBMFitObject - creates an Object for this time series fitted to data.




135




136

VoseTimeMultiMA1

VoseTimeMultiMA1({Means}, {Theta}, {CovMatrix}, {E0}, Log

Return, {Initial Values}, Data_in_rows)

Array function that simulates from a Multivariate Moving-Average time series model of order 1.


• {Theta} - matrix of one period moving average parameters.


• {E0} - vector white noise process at period 0






Equations

where:




- k x k moving average coefficient matrix



VoseTimeMultiMA1 - generates an array of random values from this time series.

VoseTimeMultiMA1Object - creates an object for this time series.

VoseTimeMultiMA1Fit - generates an array with random values from this time series fitted to data.

VoseTimeMultiMA1FitObject - creates an Object for this time series fitted to data.




137

VoseTimeMultiMA2

VoseTimeMultiMA2({Means}, {Theta1}, {Theta2}, {CovMatrix},

{E0}, {E_1}, Log Return, {Initial Values}, Data_in_rows)

Array function that simulates from a Multivariate Moving-Average time series model of order 2.


• {Theta1} -matrix of first period moving average parameters.

• {Theta2} - matrix of second period moving average parameters.


• {E0} - vector white noise process at period 0.

• {E_1} - vector white noise process at period -1.

• Log Return - an optional parameter. Function generates log returns if set to TRUE, orvariable values if set to FALSE or omitted.




Equations

where:




- k x k moving average coefficient matrix, i=1,2,



VoseTimeMultiMA2 - generates an array of random values from this time series.

VoseTimeMultiMA2Object - creates an object for this time series.

VoseTimeMultiMA2Fit - generates an array with random values from this time series fitted to data.

VoseTimeMultiMA2FitObject - creates an object for this time series fitted to data.




138




139

VoseTimeSimulate

=VoseTimeSimulate(Time Object)

Array function that simulates random values from a time series object.

• Time Object - a valid Time Series object or time series fit object

You would typically use this to simulate from a time series object stored in a separate spreadsheet cell.This way, if you decide to use another model, you only need to change this in one place in thespreadsheet.

Example

Say you have an array with historic time series data called DataSet, and you want to fit a Geometric

Brownian Motion model to it. You would then write

=VoseTimeGBMFitObject(DataSet)

in cell A1. The following formula would then create an array of randomly generated values of this fittedmodel:

{=VoseTimeSimulate(A1)}




140

VoseTimeYule

VoseTimeYule(N0, Beta, T, Log10)

Array function that models numbers of a population following a Yule lineargrowth model.

• N0 - the initial number of individuals in a population, an integer

> 0.

• Beta - the instantaneous birth rate. Should be >0.

• T - the time increments over which death occurs. Should be >0.

• Log10 - optional boolean parameter (TRUE/FALSE) that specifies whether the log base

10 of the calculations is taken (TRUE) or not (FALSE, default).


VoseTimeYule - generates an array of random values from this time series

VoseTimeYuleFit - generates an array of random values from this time series fitted to data.

VoseTimeYuleFitP - returns the parameters of this time series fitted to data.




141

VoseTimeDeath

VoseTimeDeath(N0, Lambda, T, Log10)

Array function that models numbers of a population following a pure death

process.

• N0 - The initial number of individuals in a population, an

integer > 0;

• Lambda - the mean rate of death per time increment. Shouldbe >0;

• T - the time increments over which death occurs. Should be >0;

• Log10 - optional boolean parameter (TRUE/FALSE) that specifies whether the log base

10 of the calculations is taken (TRUE) or not (FALSE, default).


VoseTimeDeath - generates an array of random values from this time series.

VoseTimeDeathFit - generates an array of random values from this time series fitted to data.

VoseTimeDeathFitP - returns the parameters of this time series fitted to data.




142

Wilkie Models

Introduction

The Wilkie Model -

named after A.D.Wilkie - models thebehavior of variouseconomic series overtime. As it is currentlywidely being used in

actuarial work.

For a detailedexplanation aboutWilkie models see theWilkie models topic.

To see the outputfunctions of thiswindow, click here.

Output functions of

this window: VoseTimeWilkie

There are VoseFunctions for each separate Wilkie model as well: VoseTimePriceInflation,

VoseTimeLongTermInterestRate, VoseTimeShortTermInterestRate, VoseTimeDividends,VoseTimeShareYields, VoseTimeWageInflation

Window elements

Output

Each of these takes a certain set of parameters, which can be inserted manually by typing in theappropriate field, or dynamically link to a value in a spreadsheet.

The output to your spreadsheet will be a number of columns equal to the number of selected models.

Check the Show Descriptions checkbox to have the name of each Wilkie Model on top of its column with

generated values.

Models




143

In the models pane (shown in the image on the right), you can select what

models to generate data for. For each selected model, some summary dataare shown. The Wilkie models to choose from are:

• Price Inflation

• Wage Inflation

• Share Yields

• Dividends

• Long Term Rate

• Short Term Rate

Wilkie models graphs

In the middle pane, the graphs with generated lines from (only) the selected Wilkie model(s) are shown.

To save screen space, the toolbar for Wilkie models graphs is hidden by default. While this might give theimpression less customization is allowed, the opposite is actually true: you can customize each graphshown seperately!

By right-clicking anywhere in the graph area, you are presented a context menu from which you canhide/unhide the toolbar, change the color of the item selected, add/change the graphs title, point labels,

font and show the advanced display properties (3D, border, gridlines...).

The buttons on a graph toolbar allow you to, from left to right:




144

• Copy the graph to the windows clipboard (choose As a bitmap to paste the graph in any

other Windows application)

• Print the graph

• Choose the type of graph used. By default, a line graph is selected, but other types likehistogram can be chosen if so desired.

• Switch anti-aliasing (i.e. smoothing out "blocky"-looking lines by making them more

"blurry") on/off.

• Change the colour palette used. By default, generated lines are blue or green and the

background white.

• Switch between a 2D/3D graph

• Zoom in on an area: select this button and drag a rectangle on the graph to zoom in to it.

When you hold your mouse pointer on a line, it comes "in focus" and all other visible elements are greyedout, for easily pointing somebody to a certain line.




145

VoseTimeWilkie

An array function thatreturns an array withrandom values fromeach of Wilkie's timeseries models. Theseare returned as acollumn for each timeseries together with a

header.

A separateVoseFunction exists

for each of the Wilkiemodels as well:

• price

inflation

• wage

inflation

• share yields

• share dividends

• long term interest rate

• short term interest rate


particular to use the Wilkie models window.




146

VoseTimeDividends

VoseTimeDividends(QMU, QSD, QA, YSD, DMU, DSD, DB, DW, DX,

DD, DY)

Array function that

models Wilkie'sDividents time seriesmodel.

• QMU

- Mean force

of inflation.

• QSD -

standarddeviatioin offorce inflation.

• QA -

autoregression coefficient.

• YSD -

Standard deviation of residual.

• DMU - Mean force of real dividend growth.

• DSD - Standard deviation of residual.

• DB - autoregression coefficient.

• DW - Past inflation factor.

• DX - Current inflation factor (normally set to 1-DW).

• DD - Inflation autoregression coefficient.

• DY - Yield factor.

As the ModelRisk Time Series functions typically take a lot of parameters, we recommend for these inparticular to use the Wilkie Models window.




147

VoseTimeDividendsA

VoseTimeDividendsA({Price inflation}, QMU, QSD, QA, YSD, DMU,

DSD, DB, DW, DX, DD, DY)

Array function that

models Wilkie'sDividents time seriesmodel, based on anexisting price inflationarray.

• {Priceinflation} -

array withprice inflationdata.

• QMU

- Mean forceof inflation.

• QSD -

standard

deviatioin of force inflation.• QA - autoregression coefficient.

• YSD - Standard deviation of residual.

• DMU - Mean force of real dividend growth.

• DSD - Standard deviation of residual.

• DB - autoregression coefficient.

• DW - Past inflation factor.

• DX - Current inflation factor (normally set to 1-DW).

• DD - Inflation autoregression coefficient.

• DY - Yield factor.





148

VoseTimeLongTermInterestRate

VoseTimeLongTermInterestRate(QMU, QSD, QA, YSD, CMU, CSD,

CA, CW, CD, CY, CAA, CAAA)

Array function that

models Wilkie's LongTerm Interest Ratetime series model.

• QMU

- Mean force

of inflation.

• QSD -

standarddeviation offorce inflation.

• QA -

autoregression coefficient.

• YSD -

Standard deviation of residual.

• CMU - Mean yield in excess of inflation.

• CSD - Standard deviation of residual.

• CA - autoregression coefficient.

• CW - inflation factor.

• CD - Inflation autoregression coefficient.

• CY - Share links yield factor.

• CAA - second order correlation coefficient

• CAAA - third order correlation coefficient





149

VoseTimeLongTermInterestRateA

VoseTimeLongTermInterestRate({Price inflation}, QMU, QSD, QA,

YSD, CMU, CSD, CA, CW, CD, CY, CAA, CAAA)

Array function that

models Wilkie's LongTerm Interest Ratetime series modelbased on an existingPrice Inflation array.



• QMU


• QSD -

standard

deviatioin of force inflation.• QA - autoregression coefficient.








• CAA - second order correlation coefficient

• CAAA - third order correlation coefficient





150

VoseTimePriceInflation

VoseTimePriceInflation(QMU, QSD, QA)

Array function thatmodels Wilkie's PriceInflation model.

• {Price

inflation} -

array withprice inflation

data.

• QMU - Mean force

of inflation.

• QSD -

standarddeviatioin offorce inflation.

• QA - autoregression coefficient.





151

VoseTimeSeasonalGBM

VoseTimeSeasonalGBM(Mu, Sigma, {S1}, P1, {S2}, P2, Log

Return, Initial Value)

Array function that models a Seasonal Geometric Brownian Motion time seriesmodel.

You can provide an array with seasonal indices (e.g. 7 values, one for eachday of the week) that will be run through periodically, starting at position P1.

Optionally you can provide a second optional cycle within each period of thefirst cycle, useful for modelling, say, week/day or day/hour patterns.

• Mu - mean log return of the underlying GBM;

• Sigma - standard deviation of the log returns of the underlying GBM;

• {S1} - array of seasonality factors for the first (outer) cycle. For example: if outer cycle is

week of year, inner cycle is day of week, this value would be a list of 52 values representing themultiplying factor (with average of 1) to apply to each week;

• P1 - the starting index for cycle 1. For example, the week of the year (a value from 1 to

52);

• {S2} - array of seasonality factors for the second (inner) cycle. For example: if outer cycle

is week of year, inner cycle is day of week, this value would be a list of 7 values representing themultiplying factor (with average of 1) to apply to each day of the week;

• P2 - the starting index for cycle 2. For example, the day of the week (a value from 1 to 7);




on from this value. Should only be provided if the Log Return parameter is set to FALSE or

omitted.



Equations

where:



- log return mean




152

- standard deviation of log return

{f1} - set of outer loop multipliers

{f2} - set of inner loop multipliers


VoseTimeSeasonalGBM - generates an array of random values from this time series.

VoseTimeSeasonalGBMFit - generates an array of random values from this time series fitted to data.

VoseTimeSeasonalGBMFitP - returns the parameters of this time series fitted to data.




153

VoseTimeShareYields

VoseTimeShareYields(QMU, QSD, QA, YMU, YSD, YA, YW)

Array function thatmodels Wilkie's shareyields time seriesmodel.

• QMU


• QSD -


• QA -

Autoregression coefficient.

• YMU -

Mean yield net of inflation factor.

• YSD - standard deviation of residual.

• YA - autoregression coefficient

• YW - inflation factor





154

VoseTimeShareYieldsA

VoseTimeShareYieldsA({Price inflation}, QMU, QSD, QA, YMU, YSD,

YA, YW)

Array function that

models Wilkie's shareyields time seriesmodel based on anexisting price inflationarray.



• QMU


• QSD -

standard

deviation of force inflation.• QA - Autoregression coefficient.

• YMU - Mean yield net of inflation factor.

• YSD - standard deviation of residual.

• YA - autoregression coefficient

• YW - inflation factor





155

VoseTimeShortTermInterestRate

VoseTimeShortTermInterestRate(QMU, QSD, QA, YSD, CMU, CSD,

CA, CW, CD, CY, CAA, CAAA, BMU, BSD, BA)

Array function that

models Wilkie's ShortTerm Interest Ratetime series model.

• {Price

inflation} -


• QMU


• QSD -










• CAA - second order correlation coefficient.

• CAAA - third order correlation coefficient.

• BMU - log of interest rate ratio.

• BSD - standard deviation of residual.

• BA - autoregression coefficient.


particular to use the Wilkie Models window.




156

VoseTimeShortTermInterestRateA

VoseTimeShortTermInterestRate({Price inflation}, QMU, QSD, QA,

YSD, CMU, CSD, CA, CW, CD, CY, CAA, CAAA, BMU, BSD, BA)

Array function that

models Wilkie's ShortTerm Interest Ratetime series modelbased on an existingarray with long term

interest rate data.

• {Long

Term} - array

with long terminterest ratedata.

• QMU - Mean force

of inflation.

• QSD -

standard deviation of force inflation.• QA - autoregression coefficient.








• CAA - second order correlation coefficient.

• CAAA - third order correlation coefficient.

• BMU - log of interest rate ratio.

• BSD - standard deviation of residual.

• BA - autoregression coefficient.





157




158

VoseTimeWageInflation

VoseTimeWageInflation(QMU, QSD, QA, WMU, WSD, WA, WW1,

WW2)

Array function that

models Wilkie's wageinflation time seriesmodel.

• {Price

inflation} -


• QMU


• QSD -



• WMU - Factor related to mean force of real wages growth

• WSD - standard deviation of residual

• WA - Autoregression coefficient

• WW1 - factor for this year's inflation

• WW2 - factor for last year's inflation


particular to use the Wilkie Models window.




159

VoseTimeWageInflationA

VoseTimeWageInflationA({Price inflation}, QMU, QSD, QA, WMU,

WSD, WA, WW1, WW2)

Array function that

models Wilkie's wageinflation time seriesmodel based on anexisting price inflationarray.



• QMU


• QSD -

standard

deviation of force inflation.• QA - autoregression coefficient.

• WMU - Factor related to mean force of real wages growth

• WSD - standard deviation of residual

• WA - Autoregression coefficient

• WW1 - factor for this year's inflation

• WW2 - factor for last year's inflation





160

Subject Matter Expert (SME) Time Series Forecasts

ModelRisk provides several tools for modeling forecasts over a number of periods based on expertestimates.

These tools have the intuitive appeal of being flexible, easy to use and are not based on complexmathematical models.

The tools are displayed in the Subject Matter Expert Time Series window which is accessed from theTime Series drop-down menu by selecting ‘SME time series’, which opens the following interface:

The models available are listed below. Each link provides a detailed description of the model:

Poisson: for modeling events that occur randomly in time

2Perc: for modeling with estimates based on upper and lower percentiles

Three Point: for modeling with estimates based on minimum, most likely and maximum values

Uniform: for modeling with estimates based on minimum and maximum values

Saturation: for modeling ‘buy-in’ from a fixed population base




161

VoseTimeSME2Perc

VoseTimeSME2Perc({Percentiles1}, {Percentiles2}, P1, P2, Correlation Factor,

Negative Allowed)

Time series function modeling a variable estimated for each period by a lower and upper percentile.

• {Percentiles1} is an array of values of the P1 percentile in each period of the forecast.

• {Percentiles2} is an array of values of the P2 percentile in each period of the forecast.

• P1 is the probability used together with {Percentiles1}. For example, if P1 is set to 10%

the {Percentiles1} values are interpreted as the values for which, in each individual period, thevariable has a 10% probability of being below. P1 must lie on [0,1].

• P2 is the probability used together with {Percentiles2}. For example, if P2 is set to 90%

the {Percentiles2} values are interpreted as the values for which, in each individual period, thevariable has a 90% probability of being below. P2 must lie on [0,1].

• Correlation Factor applies a positive correlation between generated values within each

period of the series. CorrelationFactor must lie on [0,1]. Optional, set to zero if omitted.

• Negative Allowed is a Boolean parameter specifying whether the series may take

negative values (Negative Allowed = TRUE) or not (Negative Allowed = FALSE). This allows theuser to avoid a common problem when estimating with percentiles that the resultant distributioncan extend beyond plausible values.




162

Explanation and Uses

The SME2Perc time series function provides an easy, subjective way to specify a time series with somekey features:

Growth and spread over time can be controlled by changing the {Percentiles1} and {Percentiles2} array

values. P1 and P2 would most commonly be set at {0.2, 0.8}, {0.1, 0.9} or {0.05, 0.95} reflecting 1 in 5, 1in 10 and 1 in 20 probabilities respectively, which are probabilities that people can realisticallyunderstand. Avoid values like {0.01, 0.99} or more extreme if possible, because human beings are notthat great at appreciating and estimating probabilities with that level of precision.

Correlation between periods can be specified with a single parameter. The level of correlation is bestselected by reviewing the example pathways that are generated in the interface each time one clicks the

Generate button. Look at the range of variation from one period to the next across the entire series andadjust the CorrelationFactor until it looks reasonable. If you believe that there is correlation across theseries you will likely settle on a value above 0.4 since lower levels of correlation are not immediatelyobvious to the eye. You will want to use correlation, for example, when the variable being forecast willtend to be high in each year if it is high in the first year: for example, a forecast of sales of a new product,

when it either takes off because it is appealing to potential clients, or doesn’t; or the use of a new vaccinewhere people are generally convinced of its value, or not.




164

In probability modeling the usual approach to dealing with random variation in the expected

rate of occurrence is to model the rate using a Gamma distribution . The main reason for

choosing a Gamma is convenience: it turns out that a Poisson(Gamma(a,b)) follows a Pólya

distribution which has a fairly convenient mathematical form. Other reasons are that the

Gamma distribution is always greater than zero (which is of course a requirement) and that

it can take a variety of shapes from very right skewed to essentially normally distributed.

In the SMEPoisson function, the Pólya distribution comes into play if one selects aSpreadMultiplier greater than 1. For example, if one chooses a SpreadMultiplier of 2, the

function determines the parameters of Polya distributions that would give the defined mean

values but also give twice the spread (standard deviation) that a Poisson distribution would

produce. The following screen shots illustrate the principle:

{=VoseTimeSMEPoisson({5,6,7,8,9,10},1,0)}

{=VoseTimeSMEPoisson({5,6,7,8,9,10},2,0)}




165

Note: the Spread Multiplier value is limited to a maximum of 10 because this is an

extremely high multiplier for a modification to a Poisson process, and you should probably

consider one of the other SME time series functions instead.

The Gamma Correlation parameter allows one to apply a positive correlation to the Gamma

distributions that are used (i.e. when the Spread Multiplier is greater than 1). The effect is

most visible when the {Mean Values} are relatively large (say >100) because the Gamma

distributions are then more dominant than the Poisson distributions they sit within in termsof their contribution to randomness. The following screen shots illustrate the idea where, in

each plot, two possible pathways have been drawn (in black). The Gamma Correlation

parameter controls how much a simulated pathway will stay at a high value if it starts off

high, and vice versa:

{=VoseTimeSMEPoisson({5,6,7,8,9,10},2,0.0)} v

{=VoseTimeSMEPoisson({5,6,7,8,9,10},2,0.9)}

{=VoseTimeSMEPoisson({50,60,70,80,90,100},2,0.0)} v

{=VoseTimeSMEPoisson({50,60,70,80,90,100},2,0.9)}




166

VoseTimeSMESaturation

VoseTimeSMESaturation({Probabilities}, Initial Population,

Conditional)

Time series function modeling a variable estimated for each period by minimum, most likely andmaximum values.

• {Probabilities} is an array of probabilities per period that an individual from the

InitialPopulation will ‘convert’.

• InitialPopulation is the size of the population at time zero that might ‘convert’.

• Conditional is a Boolean variable. If TRUE then {Probabilities} define the probability of

‘converting’ in each period given that the individual has not yet ‘converted’. If FALSE then{Probabilities} define the probability of ‘converting’ in each period, and the sum of {Probabilities}may not them exceed 1.


The SMESaturation time series function allows one to model ‘conversion’ of a population over time,

where each conversion is assumed to occur independently of all others. For example, one might beinterested in modeling how many of a population of potential clients will make a purchase, or how many

people will get vaccinated, etc. The principle behind the model is that a ‘conversion’ occurs just once so,in terms of a sale for example, one would only expect a client to make a single purchase or none at all.




167

The function operates in two modes according to the Conditional parameter. If this parameter is set to

FALSE, then the {Probabilities} define the chance that any individual will convert in each given period.

So, for example, consider the following parameter set:

{Probabilities} = {0.2, 0.15, 0.1, 0.05}

InitialPopulation = 1000

Conditional = FALSE

The number of conversions in each year will then be:

{X1:X4} = Multinomial(1000, {0.2, 0.15, 0.1, 0.05})

The sum of probabilities must not exceed 1 (0.2 + 0.15 + 0.1 + 0.05 = 0.5) since these are theprobabilities for an individual converting in each year, and they may do so only once.

The second mode for this function is when the Conditional parameter is set to TRUE, in which case the{Probabilities} define the chance that any individual will convert in each given period given that the

individual has not yet converted . The function models this as a set of nested Binomial distributions. So,for example, consider the following parameter set:

{Probabilities} = {0.4, 0.5, 0.3, 0.2}

InitialPopulation = 1000

Conditional = TRUE

The number of conversions in each year will then be:

X1 = Binomial(1000,0.4)

X2 = Binomial(1000 – X1, 0.5)

X3 = Binomial(1000 – X1 – X2, 0.3)

X4 = Binomial(1000 – X1 – X2 – X3, 0.2)

In other words, in each year the size of the population that has not yet converted up to that period iscalculated and the probability that those remaining convert in that period is defined by the {Probabilities}parameter.

Converting between the two modes

Let be individual probabilities when the Conditional parameter is FALSE.

Let be individual probabilities when the Conditional parameter is TRUE.

The models are equivalent when:

Thus:




168

Behavior of the model

When the Conditional parameter is set to FALSE the mean value for each period is just the appropriatevalue from the {Probabilities} array multiplied by the InitialPopulation. Thus one will tend to see thesame up and down pattern in {Probabilities} repeated in the series itself.

When the Conditional parameter is set to FALSE, the observed pattern will be quite different from the

{Probabilities} array because we are modeling only the remaining population at each stage, not theentire population. Thus, for example, if Conditional = FALSE and {Probabilities} = {0.1, 0.1, 0.1, 0.1,

0.1, 0.1, 0.1, 0.1} – i.e. the probability of converting each period given one hasn’t yet converted isindependent of how much time has already passed – we get a decaying pattern of conversions because

the number remaining decreases each year (in this graphed example, InitialPopulation = 1000):




169

VoseTimeSMEThreePoint

VoseTimeSMEThreePoint({Min Values}, {Mode Values}, {Max Values},

Distribution Type, Correlation Factor)

Time series function modeling a variable estimated for each period by minimum, most likely andmaximum values.

• {MinValues} is an array of the minimum possible values for each period of the forecast.

• {ModeValues} is an array of the most likely values for each period of the forecast.

• {MaxValues} is an array of the maximum possible values for each period of the forecast.

• DistributionType is a text (either “TRIANGLE” or “PERT) determining whether the min,

mode and max for each period will specify a Triangle or PERT distribution.

• CorrelationFactor applies a positive correlation between generated values within each

period of the series. CorrelationFactor must lie on [0,1]. Optional, set to zero if omitted.


The SMEThreePoint time series function provides an easy, subjective way to specify a time series withsome key features:




170

Growth and spread over time can be controlled by changing the {MinValues} and {MaxValues} array

values. More likelihood is attributed to values close to the {ModeValues} so emphasis can be placed onthose values you feel are most plausible.

Correlation between periods can be specified with a single parameter. The level of correlation is best

selected by reviewing the example pathways that are generated in the interface each time one clicks theGenerate button. Look at the range of variation from one period to the next across the entire series and

adjust the CorrelationFactor until it looks reasonable. If you believe that there is correlation across theseries you will likely settle on a value above 0.4 since lower levels of correlation are not immediatelyobvious to the eye. You will want to use correlation, for example, when the variable being forecast willtend to be high in each year if it is high in the first year: for example, a forecast of sales of a new product,

when it either takes off because it is appealing to potential clients, or doesn’t; or the use of a new vaccinewhere people are generally convinced of its value, or not.

Level of the spread within the {MinValues} to {MaxValues} range can be controlled somewhat by selecting

the DistributionType. For more spread select “Triangle” and for less spread select “PERT”. Selecting“Triangle” will also make the mean value for each period equal to the average of (MinValue, ModeValue,MaxValue), whilst selecting “PERT” will give a mean that is the weighted average of these values, withfour times more weight on the ModeValue (so the mean will then be closer to the ModeValue). Click herefor a more detailed comparison of the Triangle and PERT distributions. The following graph illustrates the

effect of selecting either Triangle or PERT. Each plot is of 100 simulated pathways. The Triangle versionon the left has more spread than the PERT on the right.




171

VoseTimeSMEUniform

VoseTimeSMEUniform({Min Values}, {Max Values}, Correlation

Factor)

Time series function modeling a variable estimated for each period by minimum and maximum values.

• {Min Values} is an array of the minimum possible values for each period of the forecast.

• {Max Values} is an array of the maximum possible values for each period of the forecast.

• Correlation Factor applies a positive correlation between generated values within each

period of the series. Correlation Factor must lie on [0,1]. Optional, set to zero if omitted.


The SMEUniform time series function provides an easy, subjective way to specify a time series with somekey features:

Growth and spread over time can be controlled by changing the {Min Values} and {Max Values} array

values.

Correlation between periods can be specified with a single parameter. The level of correlation is best

selected by reviewing the example pathways that are generated in the interface each time one clicks theGenerate button. Look at the range of variation from one period to the next across the entire series andadjust the Correlation Factor until it looks reasonable. If you believe that there is correlation across theseries you will likely settle on a value above 0.4 since lower levels of correlation are not immediately




172

obvious to the eye. You will want to use correlation, for example, when the variable being forecast willtend to be high in each year if it is high in the first year: for example, a forecast of sales of a new product,when it either takes off because it is appealing to potential clients, or doesn’t; or the use of a new vaccinewhere people are generally convinced of its value, or not.




173

VoseTimeEmpiricalFit

VoseTimeEmpiricalFit({data}, Multiply, Initial Value, Uncertainty)

Example model

This function returns random samples from a time series empirically fit to a set of data.

• {data} – is a single column (or row) array of consecutive observations from some variable;

• Multiply – is a Boolean parameter (TRUE or FALSE). If TRUE each value in the series is

assumed to be related to its previous value by some multiplicative random factor. If FALSE each

value in the series is assumed to be related to its previous value by some additive random factor;• Initial Value – starting value (at time zero). The generated time series values will continue on

from this value. Should only be provided if the Log Return parameter is set to FALSE or omitted;

• Uncertainty – is an optional Boolean parameter. If TRUE, the function will use non-parametric

Bootstrapping to incorporate statistical uncertainty into the fitted projection. The parameter isFALSE if omitted.

The main advantage of this forecasting function is that it makes only very weak assumptions about thebehavior of the variable being modeled, namely (1) that there is no ‘memory’, meaning that the variable

does not behave in a way that is connected to its previous history; and (2) that the random variations fromone period to the next are either a multiplicative or additive factor on the previous value. The distribution

of this factor is determined by the data set, not by fitting a theoretical distribution.

Explanation of the mathematics

Assume the data array contains k values. VoseTimeEmpiricalFit operates in two modes, depending onthe setting of the Multiply parameter:

Case 1: Multiply = TRUE

The function calculates ratio[i] = data[i]/data[i-1] for i = 2 to k. It then makes a forecast for the requirednumber of periods T by using:

S[0] = InitialValue (not in forecast)

S[t] = S[t-1] * RandomSample[{ratio}] for t = 1 to T

In other words, in this mode the function is assuming that the underlying variable causing the randombehavior is dictating a proportional change in the modeled variable S. This is most appropriate for thingslike prices (of commodities, stocks, currency – i.e. exchange rates). This mode has the added advantage

that if {data} are all positive, then the function will produce a forecast that is always positive.

Case 2: Multiply = FALSE

The function calculates change[i] = data[i]-data[i-1] for i = 2 to k. It then makes a forecast for the requirednumber of periods T by using:

S[0] = InitialValue (not in forecast)

S[t] = S[t-1] + RandomSample[{change}] for t = 1 to T




174

In other words, in this mode the function is assuming that the underlying variable causing the randombehavior is dictating an additive change in the modeled variable S. This is most appropriate for things likechanges in water levels in a lake or any reservoir/storage-type of problem, and sales volumes and otherfairly linearly growing variables where the level of randomness is relatively small so that one has little riskof producing negative values. This mode has the advantage that it will continue a historic straight line

well, but has the disadvantage that it can produce negative values if {change} are not all positive.




175

Aggregate modeling

Aggregate modeling in ModelRisk

It is very common in risk analysis that one needs to model the sum of a number of independent identicalrandom variables. For example:

In insurance:

• The aggregate claim distribution for a portfolio of policies over a certain period

• The total claim distribution for an individual over a period

In human health:

• The total amount of drug that a population will require over a year

• The number of patient-days required in a year at a hospital

• The total exposure to a toxin over a lifetime

In engineering:

• The amount of downtime caused by failures of a network

• The parts inventory that needs to be carried to cover six months of replacement

In food safety:

• The total number of bacteria in a volume of liquid egg due to the contamination of the eggs used

• The total number of illnesses that result from a set of outbreaks

In business:

• The total amount of merchandise purchased by the public entering a store

• The total length of time spent talking to clients at a call center

The incorrect summing of random variables is one of the key causes of errors in risk analysis models.ModelRisk incorporates the latest and most powerful techniques available to provide simple and intuitivemethods for modeling aggregate distributions.

The number of random variables to be added together is called the 'frequency (distribution)'. The size ofeach random variable to be summed is called the 'severity (distribution)'. The methods available in

ModelRisk are listed below.

Pure Monte Carlo simulation

The ModelRisk function VoseAggregateMC automates a number of methodsto most efficiently generate values for the sum of a number of independent

identically distributed random variables. Syntax:

=VoseAggregateMC(N,VoseDistributionObject([parameters]

))

For example:

=VoseAggregateMC(VosePoisson(1000),VoseLognormalObject(15,5))




176

will generate a random value from the sum of n Lognormal(15,5) distributions where n is taken from aPoisson(1000) distribution (the frequency distribution) and Lognormal(15,5) is the severity distribution.

Panjer's recursive method

Error! Hyperlink reference not valid. determined an efficient method for directly calculating an

approximation of the aggregate distribution where the frequency distribution is Poisson. Sundt laterextended the technique so that we can now use the Panjer method with any of the following frequencydistributions: Poisson, Polya, Negative Binomial, Geometric, Logarithmic, Delaporte

Use the VoseAggregatePanjer functions to calculate the Panjer aggregate distribution.

De Pril's recursive method

See also: Vose Aggregate DePril

Panjer's method is unstable with a Binomial frequency distribution, but the Binomial is needed to be ableto model aggregate claims from a basket of life insurance policies. De Pril (1986) determined anotherrecursive method for the binomial.

Use the VoseAggregateDePril functions to calculate the De Pril aggregate distribution.

Fast Fourier Transforms

Improved computing speed has resulted in recent years in the increased popularity of Fast FourierTransform (FFT) methods to directly estimate aggregate distributions. They are more flexible than thePanjer method.

Use the VoseAggregateFFT functions to calculate the FFT aggregate distribution.

An extension to the FFT method allows one to calculate the total of a portfolio of aggregate distributionswhere the frequency distributions of each aggregate are correlated.

Use the VoseAggregateMultiFFT functions to calculate the total FFT aggregate distribution with multiplecorrelated frequency distributions.

Calculation of aggregate moments

If one knows the moments of both the frequency and severity distributions it is possible to directlydetermine the moments of the aggregate distribution.

The ModelRisk function VoseAggregateMoments will return the aggregate distribution for anycombination of applicable univariate distributions available with ModelRisk, including when either or both

distribution is truncated or shifted.

This is a powerful tool for ensuring that approximations are sufficiently accurate, which is why we haveincluded these values for comparison in the Panjer, De Pril or FFT windows, in the exact column of the

summary statistics table:

The example model Aggregate_moments demonstrates how one can use the direct calculation of

aggregate moments to check for the accuracy of a Panjer or FFT calculation.




177

One application of directly determining aggregate moments has been to then use Method of Moments tofit some parametric distribution. If there is essentially no probability of the aggregate distribution taking avalue of zero one can fit one of the continuous parametric distributions.

For example, the Gamma distribution with a positive shift is quite popular because one can fit to the firstthree moments (mean, variance, skewness). The AggregateMC, Panjer, FFT, De Pril and MultiFFTwindows in ModelRisk allow the user to fit a distribution based on matching aggregate moments and

place the fitted distribution in a spreadsheet. We don't recommend this method for critical analysis, andsuggest that you at least compare the fitted parametric distribution to a Panjer or FFT first.

StopSum

The VoseStopSum function simulates the number of random variables that are required to just exceed agiven total.

SumProduct

The VoseSumProduct function simulates the sum of a number of random variables, each of which is itselfa product of two or more random variables.




178

Aggregate Monte Carlo

The sum of a randomnumber (frequency ) of

randomly sized (severity )variables is in itself againa distribution, called theaggregate distribution.

Use the AggregateMonte Carlo window todirectly generate sumsof random variables.

We randomly sample apositive integer N from

the (discrete) frequencydistribution. Next, wesimulate N randomvalues from the severity

distributions and addthem together. Theoutcome is a new random value for total or "compound" severity.

Note that the severity distribution is specified as a Distribution Object. To incorporate a modified severity

distribution (e.g. to model a deductible): just use the proper Object.

Note that while Aggregate MC is the most straightforward way to calculate the aggregate distribution,there are algorithms for constructing the aggregate distribution directly with any desired accuracy andspeed: Aggregate DePril, Aggregate Panjer, Aggregate FFT.

Especially in insurance modeling, where modeling extreme scenarios is often of crucial importance, the

power of these algorithms may be preferred for modeling, as they allow for modeling to any desiredprecision much faster.

Aggregate distributions are most often associated with insurance risk analysis, but have applications in

virtually every type of risk analysis. Not properly understanding when to simulate aggregate distributionsis one of the most common errors in modeling.

A continuous distribution (e.g. a Gamma) can be fitted to the aggregate distribution (by matchingmoments), and this fitted distribution can in turn be inserted in the spreadsheet (see below).


Output functions of this window: VoseAggregateMC

Window elements

In the Frequency Distribution field you can insert the distribution that governs the number of random

variables to be added together. This should be a discrete non-negative Vose Distribution, or even apositive integer, not a Vose Distribution Object, since we are simply taking a random sample from this

distribution, not manipulating it in any other way.




179

In the Severity

Distribution field you

can insert thedistribution thatgoverns the size of

the individualsummed samples.

This should be acontinuous VoseDistribution object.

In the Chart options region, you canenter the number ofrandom samplesand the approximatenumber of barsshown for thepreview graph of the aggregate distribution.

Preview graphs for the frequency, severity and resulting aggregate distribution are shown.

The preview graph of the aggregate distribution below has the following special buttons in its graphicstoolbar:

From left to right, these allow you to:

• Overlay one of several fitted distribution (by matching moments) to the calculatedaggregate distribution.

• Insert the aggregate distribution in the spreadsheet in different ways.

• Insert the fitted overlay curve in the spreadsheet in different ways.




180

VoseAggregateMC

=VoseAggregateMC(N,Distribution)

This function aggregates N random values from a distribution using direct Monte Carlo simulation. It is themost straightforward way of modeling the sum of independent random values drawn from a givendistribution.

• N - the number of random values to be aggregated (summed). This should be an integer.

This can be a fixed number as well as a sampled value from a discrete distribution.

• Distribution - a distribution object where the N variables to be summed are sampled

from.

In insurance modeling for example, this function could be used to model the aggregation of a randomnumber of claims coming in with a random size. The total amount an insurance company has to pay outcould then be modelled with the function VoseAggregateMC where N represents the (random) number ofclaims and "Distribution" represents the random size of the claims.

There exists a number of identities that provide 'shortcuts' for calculating aggregate distributions faster, asexplained here. These identities are by the VoseAggregateMC function when appropriate to speed up thecalculation.

Examples

Example 1

When N = 100 and the distribution is a LogNormal(2,1), the aggregation

=VoseAggregateMC(100,LogNormalObject(2,1)) will be performed by Monte Carlosimulation, meaning that this function randomly takes 100 samples of a LogNormal(2,1) distribution and

then adds them all together.

Example 2

If N = 100 and the distribution is a Gamma(3,6), then the VoseAggregateMC function knows that there isa shortcut formula for aggregating Gamma distributions: Gamma(100*3,6).

That means that in this case the function =VoseAggregateMC(100,Gamma(3,6)) immediately

samples from the aggregated distribution.

Example 3

If the specified distribution is a known distribution, like in example 2, but with a truncation (for example

=VoseGamma(3,6,,VoseXBounds(1,7))), then there is no formula to sample directly from the

aggregate distribution and a Monte Carlo simulation has to be performed (like in the first example).

Example 4

If the distribution is known, but there is a shift in it, then the shortcut formula still holds, but one needs totake into account the shift.

For example, aggregating 100 VoseGamma(3,6,,VoseShift(10)) random variables by writing:

=VoseAggregateMC(100,VoseGamma(3,6,,VoseShift(10)))




181

means sampling from the aggregate distribution: 100*10 + VoseGamma(100*3,6).

Example 5

When N is not a number but a distribution (for example Poisson(50) ) and the specified distribution is notknown to have a shortcut formula (for example Pareto(3,1) ) then the VoseAggregateMC function

=VoseAggregateMC(VosePoisson(50),Pareto(47)

randomly samples from the Poisson(50) distribution (let's say 47), then randomly samples 47 times from

the Pareto distribution and finally adds them all up.

Example 6

In the case that N is a continuous distribution (for example LogNormal(20,15) ) and the specifieddistribution is known to have a shortcut formula (for example Normal(100,10) ), the function samples fromthe LogNormal distribution, rounds it up to an integer (let's say 22) and then knows that the aggregate

distribution is: Normal(22*100,SQRT(22)*10).

Comments

Comment 1

In the cases where the aggregation has to be performed by Monte Carlo simulation (for example 100Pareto(3,2) distributions), this function takes quite some time to complete the aggregation for very largeN. Also see VoseCLTSum.

But in the cases where there is a direct formula for the aggregation (for example 1000000 Gamma(2,3) )the aggregation is instantaneous.

Comment 2

The distribution parameter can also be a fixed number.

VoseAggregateMC(VosePoisson(50),100) will return N*100 where N is randomly sampled froma Poisson(50) distribution.




182

Aggregate FFT

Introduction

The sum of a random number (frequency ) of randomly sized (severity ) variables is in itself again a

distribution, called theaggregate distribution.

The Aggregate FFT

window directlyconstructs the aggregatedistribution using theFast Fourier Transform

method. There are a lotof advantages to beingable to construct theaggregate distributiondirectly, among whichare:

• We candetermine tailprobabilities to ahigh precision.

• It is

much faster than Monte Carlo simulation.

• We can manipulate the aggregate distribution as with any other in Monte Carlo

simulation, e.g. correlate it with other variables.

In the FFT algorithm the severity distribution is divided into a number m=2^n of discrete steps. By default

n=12 is chosen. Optionally n can be increased with the 'Density level' field: increasing n by one - doubles- the number of discrete steps, yielding a higher accuracy at the cost of a slower calculation. This can be

necessary when working with a long-tailed severity distribution.Compare the FFT moments with the exact moments in the summary statistics table ('FFT' and 'Exact'

columns) to check the calculation's accuracy with the chosen density level and increase if necessary.


The FFT method is explained in more mathematical detail here.


Output functions of this window: VoseAggregateFFT, VoseAggregateFFTProb,

VoseAggregateFFTProb10,

VoseAggregateFFTObject

Window elements




183

In the Aggregate parameters region, you can specify the Frequency distribution (a discrete distributionobject) and the Severity distribution (a continuous distribution object) in the fields labeled accordingly.

You can also specify the Density level . If omitted, this will have a default value of 12.

Preview graphs of the frequency, severity and resulting aggregate distribution are shown.





function of some x value(s) (an extra parameter x values will appear on the left side of the

window).





• Insert the aggregate distribution in the spreadsheet.


Using aggregate moments to check for accuracy

Whilst the aggregate calculation techniques offered by ModelRisk are generally very accurate, it is wisefor the user to ensure that the numerical result is within the level of accuracy required.

The most direct way of testing the required accuracy is to compare the moments of the constructedaggregate distribution to the exact values that can be determined through manipulation of the frequency

and claim size distributions.

That is why we have included the exact aggregate moment values for comparison in the ModelRiskaggregate De Pril, Panjer and FFT windows, in the exact column of the summary statistics table:




184

VoseAggregateFFT

=VoseAggregateFFT(Frequency distribution, Severity distribution,

Density, U)

Calculates the aggregate

distribution using theFast Fourier Transformmethod.

• Frequen

cy distribution -

a discretedistributionobject.

• Severity

distribution - a

continuousdistributionobject.

• Density

- an optional

accuracy parameter• U - optional parameter specifying the cumulative percentile of the distribution. If omitted

the function generates random values.

In the FFT algorithm the severity distribution is divided into a number m=2^n of discrete steps. By defaultn=12 is chosen. Optionally n can be increased with the 'Density level' field: increasing n by one doublesthe number of discrete steps, yielding a higher accuracy at the cost of a slower calculation. This can benecessary when working with a long-tailed severity distribution.

In the Aggregate FFT window you can compare the aggregate FFT moments with the exact moments inthe summary stats table ('FFT' and 'Exact' columns) to check the calculation's accuracy with the chosendensity level, and increase if necessary.

See Aggregate modeling - Fast Fourier Transform (FFT) method for an in-depth explanation of thismethod.


VoseAggregateFFT generates values from this distribution or calculates a percentile.

VoseAggregateFFTObject constructs a distribution object for this distribution.

VoseAggregateFFTProb returns the probability density or cumulative distribution function for thisdistribution.

VoseAggregateFFTProb10 returns the log10 of the probability density or cumulative distribution function.




185

Aggregate Multivariate Monte Carlo

Introduction

The AggregateMultivariate Monte Carlowindow is a lot like the

Aggregate Monte Carlowindow, in that it directly

calculates the sum of arandom number of

randomly sized variables.

However, in the

Aggregate MultivariateMonte Carlo window youcan enter multiple pairs ofseverity/frequencydistributions to be added,

and optionally correlatethe frequencydistributions.

So, for example, you could model a portfolio of two related insurance policies (e.g. one for car accidentsand one for trucks) aggregating a Poisson number of Lognormal-sized variables together with a Polyanumber of Normal-sized variables, and you can take into account that both have a correlated frequency(e.g. a bad winter will increase the number of accidents - claim events - for both)



Output functions of this window: VoseAggregateMultiMC

Window elements

On the upper left of the window is the list of Frequency/Severitydistribution pairs. These should be discrete, respectively continuous

Vose Distribution Objects.

To add a new distribution to the list, click anywhere in the white area.To remove a frequency/severity distribution pair from the list, select it (by clicking on it) and then click thex button on the right below the list.

In the correlation matrix shown, you can add correlation between the variance:

double click a matrix element to add correlation (by default it is zero) between thetwo frequency distributions it corresponds to. Note that this correlation matrix issymmetrical, so changing one element will update the one on the other side of the

diagonal accordingly.





186

• The preview graph of the aggregate distribution below has the following special buttons in

its graphics toolbar:

•

• From left to right, these allow you to:

• Overlay one of several fitted distribution (by matching moments) to thecalculated aggregate distribution.

• Insert the aggregate distribution in the spreadsheet in different ways.





187

VoseAggregateMultiMC

=VoseAggregateMultiMC({Frequency distributions},{Severity

distributions},{correlation matrix})

Models a number of

frequency-severitydistribution pairsaggregated together,using pure Monte Carlosimulation. Optionally, the

correlation between the

frequencies can bespecified.

• {Frequency distributions} - an array of

discretedistributionobjects. Shouldbe an 1xn or nx1array.

• {Severity distributions } - an array of severity distribution objects. Should be an 1xn ornx1 array.

• {correlation matrix} - optional parameter specifying the matrix with correlations between

the frequencies. If omitted, no correlation between the frequencies is supposed.

Also see Vose Aggregate Multivariate Monte Carlo window for an explanation about the window for thisfunction.




188

Aggregate Multivariate FFT

Introduction

The Aggregate

Multivariate FFT

windowis a lotlike the

Aggregate FFTwindow,in that itcalculate

s thesum of arandomnumber

ofrandomly sizeddistributions.

However, in the Aggregate Multivariate FFT window you can enter multiple pairs of severity/frequencydistributions to be added.

So, for example, you could model a portfolio of two related insurance policies (e.g. one for car accidentsand one for trucks) aggregating a Poisson number of Lognormal-sized variables together with a Polyanumber of Normal-sized variables.


The FFT method is explained in more mathematical detail here.


Output functions of this window: VoseAggregateMultiFFT, VoseAggregateMultiFFTProb,

VoseAggregateMultiFFTProb10,VoseAggregateMultiFFTObject

Window elements




189

On the upper left of the window is the list of Frequency/Severitydistribution pairs. These should be discrete, respectively continuousVose Distribution Objects.

To add a new distribution to the list, click anywhere in the white area.To remove a frequency/severity distribution pair from the list, select it (by clicking on it) and then click the

x button on the right below the list.






window).








Whilst the aggregate calculation techniques offered by ModelRisk are generally very accurate, it is wise

for the user to ensure that the numerical result is within the level of accuracy required.

The most direct way of testing the required accuracy is to compare the moments of the constructedaggregate distribution to the exact values that can be determined through manipulation of the frequencyand claim size distributions.





190

VoseAggregateMultiFFT

=VoseAggregateMultiFFT({frequency distributions},{severity

distributions}, U )

Calculates the Aggregate distribution of multiple frequency-severity pairs using the Fast Fourier

Transform method.

• {Frequency distributions} - array of discrete distribution objects.

• {Severity distributions} - array of continuous distribution objects. Should be of the same

size of the array with frequency distributions.

• U - optional parameter specifying the cumulative percentile of the distribution. If omittedthe function generates random values.

See Aggregate modeling - Fast Fourier Transform (FFT) method for an in-depth explanation of thismethod.


VoseAggregateMultiFFT generates values from this distribution or calculates a percentile.

VoseAggregateMultiFFTObject constructs a distribution object for this distribution.

VoseAggregateMultiFFTProb returns the probability density or cumulative distribution function for this

distribution.

VoseAggregateMultiFFTProb10 returns the log10 of the probability density or cumulative distributionfunction.




191

Aggregate De Pril

Introduction

De Pril's recursive method is a method often used in insurance risk analysis modeling.

It calculates the aggregate payout distribution of a portfolio of J independent life insurance policies thateach have a claim probability of p j. To put it in life insurance terminology, we classify policies by theirdifferent mortality rates.

The held policies are categorised according to the payout amount and the probability of claim. The

possible payout amounts are discretised into M multiples of a base, i.e. base, 2*base, ..., M*base. Theprobability of payout is also discretised into J possible values: p1, p2, ... pJ. n jm is the number of heldpolicies with payout m*base are deemed to have probability p j of being claimed within the cover period,

giving a total of MxJ different types of payout events to be modelled.The output is the aggregate payout distribution - note that it has a certain probability attached to a zerooutcome (by default this is the green vertical line on the window's preview graph).

The algorithm for calculating this aggregate payout is exact, but very computationally intensive.Specifying (optionally) a non-zero integer K gets a faster, but approximated result. K governs the payoutsize below which payout events are ignored in the calculation: the lower K, the faster the algorithm (at thecost of a cruder approximation).

The method is explained in more mathematical detail here.


Output functions of this window: VoseAggregateDePril, VoseAggregateDePrilProb,

VoseAggregateDePrilProb10




192

Window elements

In the parameters region, you can fill in the following fields:

• {probabilities} - this should be a 1xJ array, with J being the number of different policy's

payout probability possibilities.

• {n} - an JXM array of the elements n jm being the number of policies associated withprobability p j and claim size m*base.

• base - the base number for the benefit payouts. This is typically a value like $1000 or

$5000.

• K - optional integer parameter (>0) for using approximate rather than exact formulas in

the calculations, for higher speed. If omitted, the exact payout distribution will be calculated.

The upper previewgraph window plotsnjm against(m*base) for each of

the J differentprobabilities of claim.

The lower paneshows a graph of thecalculated aggregatedistribution.

Different types ofoutput can bespecified byselecting the

appropriate option under the preview graph:



window).



Whilst the aggregate calculation techniques offered by ModelRisk are generally very accurate, it is wisefor the user to ensure that the numerical result is within the level of accuracy required.

The most direct way of testing the required accuracy is to compare the moments of the constructed

aggregate distribution to the exact values that can be determined through manipulation of the frequencyand claim size distributions.





193




194

VoseAggregateDePril

=VoseAggregateDePril({Probabilities},{n},base,K)

Calculates the aggregate payout distribution for a set of policies using De Pril's recursive method.

• {probabilities} - this should be a 1xJ array, with J being the number of different policy's

payout probability possibilities.

• {n} - an JXM array of the elements n jm being the number of policies associated with

probability p j and claim size m*base.

• base - the base number for the benefit payouts. This is typically a value like $1000 or

$5000.

• K - optional integer parameter (>0) for using approximate rather than exact formulas inthe calculations, for higher speed. If omitted, the exact payout distribution will be calculated.

De Pril's method calculates the aggregate payout distribution of a portfolio of J independent life insurance

policies that each have a claim probability of p j. To put it in life insurance terminology, we classify policiesby their different mortality rates.

The held policies are categorised according to the payout amount and the probability of claim. Thepossible payout amounts are discretised into M multiples of a base, i.e. base, 2*base, ..., M*base. Theprobability of payout is also discretised into J possible values: p1, p2, ... pJ. n jm is the number of heldpolicies with payout m*base are deemed to have probability p j of being claimed within the cover period,giving a total of MxJ different types of payout events to be modelled.

The output is the aggregate payout distribution - note that it has a certain probability attached to a zerooutcome (by default this is the green vertical line on the window's preview graph).




195

The algorithm for calculating this aggregate payout is exact, but very computationally intensive.Specifying (optionally) a non-zero integer K gets a faster, but approximated result. K governs the payoutsize below which payout events are ignored in the calculation: the lower K, the faster the algorithm (at thecost of a cruder approximation).

The method is explained in more mathematical detail here.


VoseAggregateDePril generates values from this distribution or calculates a percentile.

VoseAggregateDePrilProb returns the probability density or cumulative distribution function for thisdistribution.

VoseAggregateDePrilProb10 returns the log10 of the probability density or cumulative distributionfunction.




196

Aggregate Discrete window

The sum of a random number (frequency) of randomly sized (severity) variables is in itself again adistribution, called the aggregate distribution. The Aggregate Discrete function allows you to determinethe distribution of the sum of a number of independent variables that follow a Discrete distribution

Output functions of this window: VoseAggregateMC

Window elements

In the Frequency Distribution field you can insert the distribution that governs the number of random

variables to be added together. This should be the Distribution Object function for one of the allowabledistribution types.

In the Severity Distribution X field you insert a list of values that the individual severity distribution maytake. This can be a list contained within { ... } or, more usually, a reference to a range in the spreadsheet.

In the Severity Distribution P field you insert a list of probabilities associated with the values that theindividual severity distribution may take. This must be a list of the same length as the previous field.

Again, this can be a list contained within { ... } or, more usually, a reference to a range in the spreadsheet.

Step is an optional parameter. It defines the length of the increments used in the algorithm explainedbelow.

MaxP is an optional parameter. It defines the cumulative probability at which the algorithm will finishevaluating the severity distribution, as explained below. It should be a value very close to 1. By default, ittakes the value 0.9999 if omitted.




197

Charts

The top left chart displays the frequency distribution. If a single value is entered, it will show this value asa vertical line.

The top right chart shows the discrete severity distribution, i.e. the values entered in the SeverityDistribution X field on the horizontal axis against the values entered in the Severity Distribution P field,

which have been normalized to sum to 1.

The chart below shows the aggregate distribution. Sliders at the left and right allow you to read offcumulative probabilities.

Table

The table to the right compares the theoretical moments of the exact aggregate distribution against theapproximation being produced by the VoseAggregateDiscrete function with these settings (Step andMaxP ). Comparing the two columns allows the user to determine whether a sufficiently accurateapproximation has been reached. If it has not, the MaxP value can be increased and the Step valuedecreased.

Output type

The user must enter a spreadsheet location for the VoseAggregateDiscrete function in the OutputLocation field. The user can select between five different types of output:

Object Insert a distribution object function (VoseAggregateDiscreteObject)Simulation Simulate from the aggregate distribution (VoseAggregateDiscrete)f(x) Calculate a probability mass (VoseAggregateDiscreteProb(....,FALSE))

F(x) Calculate a cumulative mass (VoseAggregateDiscreteProb(....,TRUE))F-1(U) Use the U parameter to simulate from the aggregate distribution(VoseAggregateDiscrete(...,U))

Algorithm

ModelRisk uses an adaptive algorithm that mathematically constructs the aggregate distribution. Theapproach is based on a well-known recursive relationship, with an adaptive component that allows thealgorithm to handle a very large set of possible values.

The algorithm rounds off the discrete severity distribution provided into one whose values are integer

numbers of Step apart. Thus, for example, using Severity distribution X = {1,2,3,4.1,5} and Step = 1 thealgorithm will round the X values to {1,2,3,4,5}. If Step = 0.1, the X values will be unchanged.

If the Severity distribution is entered as a discrete distribution with a very long right tail, the algorithm hasto deal with a huge set of possible value. The MaxP function is used to apply a limit to the amount thatthe tail is evaluated. Setting a MaxP value of 0.9999, for example, will ignore any tail values in the last0.01% of the distribution.

The algorithm use works with a category of distributions known as (a,b,1). Thus, the allowed frequencydistributions are: Geometric, Logarithmic, Negative Binomial, Polya, and Poisson.

Useful tips and tricks




198

The output of ModelRisk windows always corresponds to VoseFunctions (the functions ModelRisk addsto Excel) being entered into one or more spreadsheet cells.

You can always re-open the window for a ModelRisk function that is in a spreadsheet cell by using ViewFunction. Select the spreadsheet cell and then select View Function from the ModelRiskmenu/toolbar/ribbon.

Note that there are other functions for constructing the aggregate distribution from continuous severitydistribution directly with any desired accuracy and speed: Aggregate DePril, Aggregate Panjer , AggregateDiscrete. The VoseAggregateMC function also provides a generic simulation method for evaluatingaggregates.




199

VoseAggregateDiscrete

See also: Aggregate Discrete window

=VoseAggregateDiscrete({Values}, {Probabilities}, Step, MaxP,

U)

Calculates the aggregate distribution for a discrete severity variable using a recursive method.

Parameters

•

Frequency

distribution – a

distribution objectfunction

of one ofthefollowingdistribution types:

Geometric,Logarithmic,NegativeBinomial, Polya, and Poisson.

• {Values} - a list of k possible values for the severity distribution.

• {Probabilities} - a list of length k giving weights or probabilities to each value in

{Values}.

• Step - the discrete increments used to construct the aggregate distribution.

• Max P - an optional accuracy parameter limiting the amount of the right tail of the

severity distribution that will be evaluated. Set to 0.9999 by default. As a rough rule, if you arerunning a simulation of n samples, there is no value to setting Max P higher than 1-1/n.

• U - optional parameter specifying the cumulative percentile of the distribution. If omitted


In the Aggregate Discrete window you can compare the AggregateDiscrete moments with the exactmoments in the summary stats table ('AggregateDiscrete' and 'Exact' columns) to check the calculation'saccuracy with the chosen Step and Max P values, and adjust if necessary.





200

VoseAggregateDiscrete - generates values from this distribution or calculates a percentile.

VoseAggregateDiscreteObject - constructs a distribution object for this distribution.

VoseAggregateDiscreteProb - returns the probability density or cumulative distribution function for this

distribution.

VoseAggregateDiscreteProb10 - returns the log10 of the probability density or cumulative distribution

function.




201

Aggregate Panjer

Intr odu ction

The sum of a

random number

(frequency ) of

randomly sized

(severity )

variables is initself again a

distribution,

called the

aggregate

distribution.

Panjer's

recursive

method is an

efficient method for directly constructing an approximation of the aggregate distribution,

where the frequency distribution is any of the following: Poisson, Polya, Negative Binomial,

Geometric, Logarithmic, Delaporte.

There are a lot of advantages to being able to construct the aggregate distribution directly,

among which are:

• We can determine tail probabilities to a high precision.

• It is much faster than Monte Carlo simulation.

• We can manipulate the aggregate distribution as with any other in Monte Carlo

simulation, e.g. correlate it with other variables.

A continuous distribution (e.g. a Gamma) can be fitted to the aggregate distribution (by

matching moments), and this fitted distribution can in turn be inserted in the spreadsheet

(see below).

The Max P parameter specifies the upper percentile value of the claim size distribution

(called X from now on) at which the algorithm will stop, and the Intervals parameter

specifies how many steps will be used in the discretisation of the X distribution.

In general the larger one makes Intervals, the more accurate the model will be but at the

expense of computation time. The MaxP value should be set high enough to realistically

cover the distribution of X but if one sets it too high for a long tailed distribution, there will

be an insufficient number of increments in the main body of the distribution. In ModelRisk

one can compare the exact moments of the aggregate distribution with those of the Panjer

constructed distribution to ensure that the two correspond with sufficient accuracy for theanalyst's needs.




202

You can read more about the mathematical details of Panjer's recursive algorithm here.

To see the output functions of this window, click h ere.

Output functions of this window: VoseAggregatePanjer, VoseAggregatePanjerProb,

VoseAggregatePanjerProb10,VoseAggregatePanjerObject

Window elements

In the Aggregate parameters area, the Frequency and Severity distributions can be chosen:

you can insert these manually, link dynamically to a Distribution Object in the spreadsheet,

or select a distribution from the Select Distribution window.

In the two other fields listed you can specify the Number of Intervals and MaxP parameters

for Panjer's algorithm. You can read the details about these in the topic about Panjer's

recursive method.

Preview graphs for respectively the claim frequency distribution, claim size distribution, andaggregate distribution are shown.

Different types of output can be specified by selecting the appropriate option under the

preview graph:

• Object - to insert the constructed distribution as a distribution Object in the

spreadsheet.



window).


The preview graph of the aggregate distribution below has the following special buttons in

its graphics toolbar:






203




Whilst the aggregate calculation techniques offered by ModelRisk are generally very

accurate, it is wise for the user to ensure that the numerical result is within the level ofaccuracy required.

The most direct way of testing the required accuracy is to compare the moments of the

constructed aggregate distribution to the exact values that can be determined through

manipulation of the frequency and claim size distributions.

That is why we have included the exact aggregate moment values for comparison in the

ModelRisk aggregate De Pril, Panjer and FFT windows, in the exact column of the summary

statistics table:




204

VoseAggregatePanjer

=VoseAggregatePanjer(Frequency distribution, Severity

distribution, Intervals, MaxP, U )

Calculates the aggregate

distribution for givenFrequency and SeverityDistribution Objects,using Panjer's recursivemethod.

• Frequency distribution -

a discretedistributionobject: can onlybe a Delaporte,

Geometric,Logarithmic,NegativeBinomial,Poisson or

Polya.

• Severity distribution - a continuous distribution object.

• Intervals - an optional accuracy parameter.

• Maxp - a high percentile value used for Severity calculation. Typically a value like 0.9999.

(but smaller than 1)



When facing the problem of having to calculate a random number (represented by the claim frequencydistribution) of random sized (represented by the claim size distribution) claims, there are a couple ofpossibilities to do this. One method is Panjer's recursive method (Panjer, 1981), which only works whenthe claim frequency distribution is one of the following distributions: Delaporte, Geometric, Logarithmic,

Negative Binomial, Poisson or Polya.

Panjer's method is based on discretising the claim size distribution, which can seriously reduce thenumber of required computations.

The reason why not all distributions are allowed as a claim frequency distribution, is that in order toconstruct the compound distribution, the claim frequency distribution has to satisfy the relation:

where n = 1,2,3,... and pn denotes the probability that exactly n claims occur.




205


VoseAggregatePanjer generates values from this distribution or calculates a percentile.

VoseAggregatePanjerObject constructs a distribution object for this distribution.

VoseAggregatePanjerProb returns the probability density or cumulative distribution function for thisdistribution.

VoseAggregatePanjerProb10 returns the log10 of the probability density or cumulative distributionfunction.




206

Stop Sum

Introduction

The Stop Sum

function is used to

answer the following

question:

how many samples

need to be drawn

randomly from a

specified distribution

to meet (or justexceed) a specific

total?

The answer is in itself

a distribution, from

which we can

generate random

values.

To understand the use, consider the following example.

A company selling loan contracts wants to know how many contracts they wills sell next

year. The company employs seven sales people and each of them works 230 days a year.

The time to make a loan contract can be modelled by a Gamma(3,5) distribution shifted by

10.

The number of contracts that will be sold in one year by one sales person is then:

=VoseStopSum(VoseGammaObject(3,5,,VoseShift(10)),230)

To know the total number of contracts that will be sold next year by the company, we add 7

generated values from this function together.

Window elements

In the Distribution field comes a univariate Distribution

Object. This can be linked to the spreadsheet or chosen

directly from the Select Distribution window, by clicking

the appropriate button next to the field.

In the Total field comes the sum that needs to be met

or just exceeded. This needs to be a positive real

number.

In the Chart options region, the number of random samples to be drawn from the Stop Sum

distribution can be chosen. By default this is 1000. This number needs to be larger than

100. It should be a positive integer.




207

In the Output Location field, you can specify where in the spreadsheet to insert the

randomly sampled values of the Stop Sum distribution. These will be inserted upon pressing

the OK button.

Two graphs are shown. With the M and C buttons, you can switch

between viewing cumulative or normal

(probability density/mass) graphs.

With the generate button you can

generate a new set of random values

of the Stop Sum distribution.

Above is the graph of the univariate

Distribution object chosen in the

Distribution field.

Below is the graph of the resulting

Stop Sum distribution.

On the right of each graph, summary

statistics (like the mean, standard

deviation, percentiles...) are shown.




208

VoseStopSum

VoseStopSum(Distribution,total)

This function generatesrandom values from thedistribution of how manysamples need to bedrawn randomly from aspecified distribution to

meet or just exceed aspecific total. The

parameters are

• Distribution - aunivariatedistribution

Object.

• Total -

the sum thatneeds to be metor just exceeded.

Example

A company selling loan contracts wants to know how many contracts they wills sell next year. Thecompany employs seven sales people and each of them works 230 days a year. The time to make a loancontract can be modelled by a VoseGamma(3,5,,VoseShift(10)) distribution. The number of contracts thatwill be sold in one year by one sales person is then:

=VoseStopSum(VoseGamma(3,5,,VoseShift(10)),230)

To know the total number of contracts that will be sold next year by the company, we add 7 generated

values from this function together.




209

Sum Product

The Sum Product

function calculates

the sum of N terms,

in which each term is

the product of some

sampled random

values.

For example, say you

want to model 20(=N ) customers, that

each generate a

revenue of

$ModPERT(0,100,200

,2), with a profit

margin of Beta(8,2).

This would be

modeled with a Sum

Product calculation.


Output functions of this window: VoseSumProduct

Window elements

In the N field, you can enter the number of terms to be summed. This should be a positive

integer value. This can be dynamically linked to a spreadsheet cell (as you will often want to

do).

In the Distributions field, you can insert the Distribution Object that will be sampled from in

constructing the terms. These can be selected from a list, or from the spreadsheet. A new

distribution is added by clicking anywhere in the white space. You can remove a distribution

by clicking the X button below the field. Note that at least one distribution should bespecified, but there is no upper limit on the number of distributions.

In the Chart options region, you can provide the

number of samples to be drawn for the preview graphs,

and the number of bars (or "bins") these should be

grouped in for the histogram plot.

Two preview graphs are shown. With the M and C

buttons, you can switch between viewing cumulative or

normal (probability density/mass) graphs.

With the generate button you can generate a new set ofrandom values of the Sum Product distribution.




210

On top is the graph of the selected distribution (marked with the blue arrow in front of it).

Below is a histogram plot with samples generated from the Sum Product distribution.




211

VoseSumProduct

VoseSumProduct(N,DistributionObject1,DistributionObject2,...)

Calculates the sum of N terms, in which each term is the product of some sampled random values.

• N - the number of terms to add

• DistributionObjecti - univariate distribution object of which the ith factor in each term is a

random variable.

For example, say you want to model 20 (=N ) customers, that each generate a revenue of

$ModPERT(0,100,200,2), with a profit margin of Beta(8,2). This would be modeled with a Sum Productcalculation as follows:

=VoseSumProduct(20,VoseModPertObject(0,100,200,2),VoseBetaObject(8,2))

Also see the Vose SumProduct window for anexplanation about the

window for this function.

See the topic Discounted

cashflow modeling for aworked out example that

uses theVoseSumProductfunction to modelrevenue for a TV series.




212

VoseAggregateDeduct

=VoseAggregateDeduct(N,Cost Distribution, Deductible, MaxLimit)

Directly simulates the sum of N variables from the Cost Distribution where the cost distribution can beoptionally modified with a deductible or maximum payout limit. N can be a fixed integer or come from adiscrete distribution itself.

• N - the number of variables to sum. Can be either an integer number or a value simulated

from a discrete distribution (e.g. VosePoisson(50)).

• Cost Distribution - a non-negative distribution object.

• Deductible - (optional) the deductible.

• MaxLimit - (optional) the maximum payout limit.

The use of the deductible means that the insurance company does not pay out first x of the damage

described by the cost distribution. To account for the deductible the cost distribution is truncated andshifted to the right. When a value below the deductible occurs it is simulated as a cost of zero.

The optional Maxlimit parameter allows one to restrict the claim size that an insurance company pays outto no more than MaxLimit . So when a payout higher then MaxLimit occurs it will be simulated as

MaxLimit .




213

VoseAggregateMoments

=VoseAggregateMoments(Frequency distribution,Severity

distribution)

This array function directly calculates

the four first statistical moments (mean,variance, skewness, kurtosis) of theaggregate distribution that is the resultof the aggregation of the Frequencydistribution and the Severity distribution.

• Frequency - a discretedistribution Object. Alternativelyyou can use an integer number,to calculate the moments of thesum of n variables.

• Severity - any

distribution object

The output is a 4x1 (4x1) array, in whichcase the numerical values of themoments will be returned, or a 4x2 (2x4)array in which case the numericalvalues will be returned with labels, as shown in the image on the right.

The moments of an aggregate distribution can be calculated directly from those of the frequency and

severity distributions.

For example, if the frequency distribution has mean, variance and skewness of , , and

respectively, and the severity distribution has mean, variance and skewness of , , and

respectively, then these are the formulas for the first three moments:

Aggregate distribution moments

Mean

Variance

Skewness

Using aggregate moments to demonstrate CLT

A nice way to see CLT at work is using the VoseAggregateMoments function with a number as frequencyargument provides a nice illustration of Central Limit Theorem. The larger you make n, the closer the




214

skewness and kurtosis will approach the Normal skewness and kurtosis of 0 and 3 respectively. Tryinserting

{=VoseAggregateMoments(n,VoseTriangleObject(0,1,4))}

using larger and larger values of n. As you use larger n, the skewness and kurtosis (indicating the shapeof the aggregate distribution) will approach more and more the Normal values of 0 and 3.

Using aggregate moments to check for accuracy of an aggregate calculation

ModelRisk offers several aggregate functions designed to directly determine the distribution of the sum ofa random number of random variables independently drawn from the same distribution. The Panjer andFFT algorithms are based on well-known and commonly implemented numerical techniques. Whilst thesetechniques are generally very accurate, it is wise for the user to ensure that the numerical result is withinthe level of accuracy required.

The most direct way of testing the required accuracy is to compare the moments of the aggregatedistribution to the precise values that can be determined through manipulation of the frequency and claimsize distributions.

That is why we have included these values for comparison in the Panjer, De Pril or FFT windows, in theexact column of the summary statistics table:




215

VoseAggregateMultiMoments

=VoseAggregateMultiMoments({Frequency

distributions},{Severity distributions},{correlation matrix})

Array function that returns the four first statistical moments (mean, variance, skewness, kurtosis) of the

aggregate distribution that is the result of the aggregation of constructed from several pairs ofseverity/frequency distribution pairs together. Optionally, the correlation between the frequencies can bespecified.

• {Frequency distributions} - an array of discrete distribution objects. Should be an 1xn or

nx1 array.

• {Severity distributions } - an array of severity distribution objects. Should be an 1xn or

nx1 array.

• {correlation matrix} - optional parameter specifying the matrix with correlations between

the frequencies. If omitted, no correlation is supposed.

The output is a 4x1 (4x1) array, in which case the numerical values of the moments will be returned, or a

4x2 (2x4) array in which case the numerical values will be returned with labels.




216

VoseAggregateProduct

=VoseAggregateProduct(Frequency distribution, Exposure distribution,

Lossfraction distribution, U)

Directly constructs the distribution of the sum of a random number of random variables, where thevariables to be summed are each the product of an Exposure variable multiplied by a LossFractionvariable on [0,1]:

This function is useful when modeling credit risk, where we generally have separate distributions for theamount of exposure a debt holder has, and the fraction of that exposure that is realized as a loss.

• Frequency distribution - a claim frequency distribution object.

• Exposure distribution - a claim size distribution object.

• Lossfraction distribution - a distribution object with domain between zero and one.

• U - optional parameter specifying the cumulative percentile of the distribution. If omitted the function

generates random values.

In the routine performed by this function, f L(x) for the individual loss distribution is calculated as follows:

where f F ( ) is the density function for the loss fraction distribution and f E ( ) is the density function for the

exposure distribution. The aggregate distribution is then constructed directly using a Fast FourierTransform meaning we can do probability calculations on it and take advantage of the U-parameter.

Example

Say we want to model the total loss of an insurance policy. We assume a Poisson(1000) number ofclaims each of LogNormal(100,20) size, but for each claim we can recover part of the payout again. sothe eventual cost of an individual claim event is only a fraction of the payout:

Lossfraction*Exposure

where Lossfraction is a Beta(13,15) variable. The total cost is then modeled by:

=VoseAggregateProduct(VosePoissonObject(1000,VoseLogNormalObject(100,20,

VoseBetaObject(13,15))


VoseAggregateProduct generates values from this distribution or calculates a percentile.

VoseAggregateProductObject constructs a distribution object for this distribution.




217

VoseAggregateProductProb returns the probability density or cumulative distribution function for thisdistribution.

VoseAggregateProductProb10 returns the log10 of the probability density or cumulative distributionfunction.




218

VoseAggregateTranche

=VoseAggregateTranche(Frequency,

SeverityDistributionObjectObject, {TrancheMinima},

TrancheMaxima})

Insurance companies often share the exposure they take on in providing insurance cover by splitting thecoverage into tranches. For example, in the following graph insurance is being provided against a

possible damage that is estimated to follow a Lognormal distribution with a mean of $6000 and standarddeviation of $7000. There is a deductible of $3000 which means that the insured party pays the first$3000 of any damage. There are three other tranches of cover:

1. 3000 - 8000

2. 8000 - 15000

3. >15000

VoseAggregateTranche is an array function with length equal to the number of tranches defined withinthe function. It returns random samples of the total amount that would be paid out in each tranche.

Let there be T tranches. Then the function applies the following logic:

Sample from the frequency distribution (let k be the sampled value, an integer);

For i = 1 to k:

Take random sample from severity distribution (let S(i) be its value)

For t = 1 to T:

SUM(t) = SUM(t) + IF(S(i)> Min(t), MIN(S(i) – Min(t), Max(t) – Min(t)),0)




219

Next t

Next i

End

The result is an array of values SUM(1)…SUM(T) containing the payouts for each tranche. Note that”r+infinity” is an allowed input for a maximum value of a tranche.

The main additional value provided by VoseAggregateTranche is that the correlation between exposuresin each tranche is retained. Thus, an insurer can correctly gauge the exposure of covering more than onetranche, or fractions of several tranches.

Note: It is not required that the {Min} and {Max} arrays are non-overlapping, but the user should use

caution in this situation since summing any of the overlapping parts of the output array will double count

the exposure.

It is also not required that the {Min} and {Max} arrays cover the entire domain of the severity distribution.

For example, an insurer may only be interested in two non-contiguous tranches it proposes to cover.

However, the sum of the output array will then no longer be the aggregate exposure to all insurers.




220

Optimization

The OptQuest Optimizer

The Professional and Industrial editions of ModelRisk incorporate the World’s most popular

stochastic optimization engine called OptQuest by OptTek Systems, Inc. The Optimization

Settings is accessed by clicking on the OptQuest icon:

which opens the Optimization Settings interface.

The optimization variables that can be inserted into the model are ordered into different

categories by tabs: targets, decision variables, constraints and requirements.

The Options tab provides a number of controls f or how the Optimizer will run.




221

Clicking the arrow on the OptQuest icon shows two more controls:

The three controls follow the natural order of an Optimization exercise: set up the model;

run the optimizer; and review the results.

Clicking ‘Run optimization’ will begin an optimization run. If an optimization run has already

been performed on the current model, you will be prompted whether you wish to continuewith the previous run, or start afresh.

Clicking ‘Optimization Results’ will open the Optimization Results window.




222

Defining Targets in an Optimization Model

Targets are the variables that are to be optimized. For example, one might wish to minimize

the expected (mean) cost of a project or number of people made redundant; or maximize

the expected profit of a venture; or minimize the standard deviation (spread) of the

performance specification of a machine; or minimize the 99th percentile of potential trading

losses.

Clicking the Add button in the Targets tab opens a dialog for you to select the spreadsheetlocation where the decision target is to be placed. This is then entered into the Address

field and can be edited later by double-clicking this field.

Address is the cell position of the decision target that is to be monitored. This cell should

already have a formula entered for the Optimizer to work with.

Name is a text field that identifies the target.

Type provides a list of options: minimize; maximize; or set to a specific value (Value). If

one chooses ‘Value’ the Value field is enabled to enter the desired target value.




223

Statistic is used to specify the statistic when the Value option has been selected for Type.

Options are: Mean; Median; a Percentile (in which case the Percentile value field becomes

active requiring an input on (0,1)); Min; Max; StDev (standard deviation); Variance; Range

(the difference between the minimum and maximum observed values); Skewness; Kurtosis;

CofV (coefficient of variation); CVaRp and CVaRx (Conditional value at risk calculations);

and FinValue (the final value of a simulation, used if one builds a model that simulates

across iterations in some way).

Enabled switches the target definition on and off: ticked (=TRUE in the Excel function)

indicates it is active, not ticked (FALSE) indicated that it is disabled. This allows one to have

several targets defined in the model and build controls that switch those targets on and off.

Entering the information as shown above and selecting OK will add this target definition to

the Optimization Settings:

In Cell C27 of the model the following ModelRisk function has now been added:

=[formula]+VoseOptTargetMaximize("Total profit","Mean",TRUE)

where [formula] is the original Excel formula that was in this cell.

Vose Optimization Target Functions

ModelRisk incorporates three functions for defining target cells to optimize:

VoseOptTargetMaximize(Name, Statistic, Enabled) for a variable to maximize

VoseOptTargetMinimize(Name, Statistic, Enabled) for a variable to minimize

VoseOptTargetValue(Name, Statistic, Value, Enabled) for a variable to set as close as

possible to a specific Value

Options for the function parameters are:




224

Name: a text string describing the variable to be optimized

Statistic: “Mean”; “Median”; a value between zero and one for the Percentile option, “Min”,

“Max”, “StDev”, “Variance”, “Range”, “Skewness”, Kurtosis”, and “FinValue”

Value: any real value (VoseOptTargetValue only)

Enabled: a Boolean parameter taking either TRUE (or equivalently 1) when the target

definition is to be used, or FALSE (or equivalently 0) when the target definition is not to be

used.

Example model

Optimization example model




225

VoseOptTargetMaximize

VoseOptTargetMaximize( Name, Statistic, Enabled )

Example model

This function is used to mark calculation cells with the Maximization Targets.

• Name - Target Name

• Statistic - Statistic for be calculated for simulation optimization. Available options are:

"Mean","Median","Min","Max","StDev","Variance","Range","Skewness","Kurtosis","CofV","FinValue". Can also take VoseOptPercentile, VoseOptCVARx or VoseOptCVARp functions.

• Enabled - Set to TRUE if the Target is Enabled. There should be only one enabled Target on thesheet

For more information about this function refer to the topic on Optimization Targets.




226

VoseOptTargetMinimize

VoseOptTargetMinimize( Name, Statistic, Enabled )

Example model

This function is used to mark calculation cells with the Minimization Targets.


• Statistic - Statistic for be calculated for simulation optimization. Available options are:


• Enabled - Set to TRUE if the Target is Enabled. There should be only one enabled Target on thesheet





227

VoseOptTargetValue

See also: VoseOptTargetMaximize, VoseOptTargetMinimize

VoseOptTargetValue( Name, Statistic, Value, Enabled )

Example model

This function is used to mark calculation cells with the Value Targets.


• Statistic - Statistic for be calculated for simulation optimization. Available options are:"Mean","Median","Min","Max","StDev","Variance","Range","Skewness","Kurtosis","CofV","FinValue

". Can also take VoseOptPercentile, VoseOptCVARx or VoseOptCVARp functions.

• Value - Optimization Target Value

• Enabled - Set to TRUE if the Target is Enabled. There should be only one enabled Target on the

sheet





228

Defining Decision Variables in an Optimization Model

Decision variables are the variables within a model that one can control. They are not

random variables. For example, a decision variable might be: whether to vaccinate a

population (TRUE or FALSE); the amount of budget to spend (a continuous variable between

some minimum and maximum); or how many cars to have in a car pool (a discrete variable

between some minimum and maximum).

Clicking the Add button in the Decision Variables tab opens a dialog for you to select thespreadsheet location where the decision variable is to be placed. This is then entered into

the Address field and can be edited later by double-clicking this field. The Decision Variable

can then be specified by editing other entries in this line of the table:

Address is the cell position of the decision variable. This cell should already have a fixed

possible value entered for the Optimizer to work with. The content of the cell should not

be a formula. The Optimizer will change this value during its optimization routine.

Name is a text field that identifies the decision variable.




229

Mode provides a list of options: Discrete; Continuous; Boolean; or List. If one chooses ‘List’

the dialog suppresses the Lower bound, Upper bound and Step fields, and make the List

field active.

Lower bound specifies the lower bound of the range of values that can be tested when the

Mode is either Discrete or Continuous.

Upper bound specifies the upper bound of the range of values that can be tested when theMode is either Discrete or Continuous.

Step specifies the increments between the lower and upper bounds when the Mode is

Discrete. This will usually be 1. The difference between Upper Bound and Lower Bound must

equal an integer number of Step values.

List specifies the values that will be tried when the Mode is List. This can be entered as a

list within curly brackets {..} or a cell range within the spreadsheet (which is generally

better modeling practice).

Enabled is a Booelan parameter switching the decision variable definition on and off: TRUE

indicates it is active, FALSE indicated that it is disabled.

Entering the information as described above and selecting OK will add the decision variable

definition to your model. So, for example, where a value of 1000 had previously been in the

selected cell, the following formula might now appear:

=VoseOptDecisionDiscrete("var10W",0,10000,1,TRUE)+1000

where the decision variable is discrete (hence the function name), has been given the name

‘var10W’, runs from 0 to 1000 in steps of 1 and is active (last parameter = TRUE).

Vose Optimization Decision Variable Functions

ModelRisk incorporates three functions for defining decision variables to optimize:

VoseOptDecisionContinuous(Name, LowerBound, UpperBound, Enabled) for a

continuous variable

VoseOptDecisionDiscrete(Name, LowerBound, UpperBound, Step, Enabled) for a

discrete variable

VoseOptDecisionBoolean(Name, Enabled) for a Boolean variable

VoseOptDecisionList(Name, List, Enabled) for a variable taking a value from a list of

possible candidates


Name: a text field identifying the decision variable to the user

LowerBound: any numerical value

UpperBound: any numerical value greater than LowerBound

List: any set of different numerical values

Step: any positive value with the restriction that (UpperBound – LowerBound)/Step must

be an integer value




230

Enabled: a Boolean parameter taking either TRUE (or equivalently 1) when the decision

variable is to be used, or FALSE (or equivalently 0) when the decision variable is not to be

used.

Example model





231

VoseOptDecisionBoolean

VoseOptDecisionBoolean( Name, Enabled )

Example model

This function is used to mark cells with the Optimization Boolean Decision Variables.

• Name - Name of the Decision Variable

• Enabled - Set to TRUE if Decision Variable is Enabled. There must be at least one enabled

Decision Variable on the sheet

For more information about this function refer to the topic on Optimization Decision Variables.




232

VoseOptDecisionContinuous

VoseOptDecisionContinuous( Name, LowerBound, UpperBound,

Enabled )

Example model

This function is used to mark cells with the Optimization Continuous Decision Variable.


• LowerBound - Lower bound of the Decision variable

• UpperBound - Upper bound of the Decision variable

• Enabled - Set to TRUE if Decision Variable is Enabled. There must be at least one enabledDecision Variable on the sheet





233

VoseOptDecisionDiscrete

VoseOptDecisionDiscrete( Name, LowerBound, UpperBound, Step,

Enabled )

Example model

This function is used to mark cells with the Optimization Discrete Decision Variables.


• LowerBound - Lower bound of the Decision variable

• UpperBound - Upper bound of the Decision variable

• Step - Step parameter







234

VoseOptDecisionList

VoseOptDecisionList( Name, {List}, Enabled )

Example model

This function is used to mark cells with the Optimization List of values Decision Variables.


• {List} - Defines the list of values







235

Defining Decision Constraints in an Optimization Model

Decision Constraints define rules on the acceptable values for decision variables. They are

checked at the beginning of a simulation run when a new set of decision variables is being

tested. For example, a constraint might be that the minimum stock held is three units, or

that the maximum number of people not receiving an appropriate level of service is ten, or

that 100% of a manufactured product is distributed between shipments.

Adding a normal constraint

Clicking the upper Add button in the Decision Constraints tab opens a dialog for you to

select the spreadsheet location where the decision constraint is to be placed. This is then

entered into the Address field and can be edited later by double-clicking this field.

Address is the cell position of the decision constraint that is to be monitored. This cell

should already have a formula entered for the Optimizer to work with.

Name is a text field that identifies the constraint.

Type provides a dropdown list of constraint options: Min; Max; Between or Equals.




236

Value1 specifies the value of the constraint. When Type is Equals, the constraint is that the

formula in the cell is precisely equal to Value1; when Type is Min the constraint is that the

formula is greater than or equal to Value1; and if Type is Max the constraint is that the

formula is less than or equal to Value1. Value1 will nearly always be a fixed value: if it is

linked to a spreadsheet cell, ensure that the cell referred to is not generating random

values.

Value2 field is only available if Type is Between, in which case the constraint is that theformula in the cell lies between Value1 and Value2. Value2 must be > Value1. ModelRisk will

generate an error message if not and will switch the constraint off, showing it to be invalid

in the dialog box.

Enabled is a tick box switching the constraint on and off. In the Excel formula, this appears

as TRUE or FALSE respectively.

Entering the information as shown above will add the constraint to the Optimization

Settings. In the example above, where the first variable ‘Component 2’ in the cell is

constrained to have a minimum of 0, the following formula now appears:

= [formula] + VoseOptConstraintMin("Component2",0,TRUE)

where [formula] is the equation that was in the cell before the constraint was added.

Adding a constraint based on a formula

Clicking the lower Add button in the String Constraints section allows you to create

constraint based on a formula:

You can build up a formula based on standard Excel notation and include any decision

variables by clicking the Add variables button:




237

The formula can be edited or deleted later by clicking the Edit and Delete buttons

respectively.

VoseOptimization Decision Constraints functions

ModelRisk incorporates three functions for defining decision constraints:

VoseOptConstraintMin(Name, Value, Enabled)

VoseOptConstraintMax(Name, Value, Enabled)

VoseOptConstraintEquals(Name, Value, Enabled)

VoseOptDecisionList(Name, List, Enabled) for a variable taking a value from a list of

possible candidates

VoseOptConstraintString(Name, String, Enabled) if a constraint can be formulated by

a single string involving decision variables



Value: any numerical value


constraint is to be used, or FALSE (or equivalently 0) when the decision constraint is not to

be used.

Example model





239

VoseOptConstraintMax

VoseOptConstraintMax( Name, Value, Enabled )

Example model

This function is used to mark calculation cells with the upper bound constraints.

• Name - Constraint Name

• Value - Lower bound of the constraint

• Enabled - Set to TRUE if Constraint is Enabled

For more information about this function refer to the topic on Optimization Decision Constraints.




241

VoseOptConstraintEquals

VoseOptConstraintEquals( Name, Value, Enabled )

Example model

This function is used to mark calculation cells with the equality constraints.


• Value - Value to which the constraint should be equal






242

VoseOptConstraintString

VoseOptConstraintString (Name, String, Enabled )

Example model

This function is used to define linear string decision constraints.


• String – The linear constraint string that can contain Decision variable names


Linear constraints describe a linear relationship among decision variables. A linear constraint is amathematical expression where linear terms (i.e., a coefficient multiplied by a decision variable) are

added or subtracted and the resulting expression is forced to be greater-than-or-equal, less-than-or-equal, or exactly equal to a right-hand side value.

The following are examples of linear constraints on the decision variables:

Var1 + Var2 + Var3 + Var4 + Var5 = 10500

0 <= Var1 + 2*Var2 – Var3 <= 5000

Var1 - 3*Var5 >= 300

Var1 >= 6 or Var2 >= 6 or Var1+Var2 = 4





243

Defining Simulation Requirements in an Optimization Model

Simulation Requirements define rules on the acceptable values for stochastic variables

within the model. They are checked at the end of a simulation run when a new set of

decision variables has being tested and the simulated values for the variable in question can

be analyzed. Simulation requirements are probabilistic in nature, i.e. they define a

requirement for some statistical attribute of the variable in question.

For example, a requirement might be that the average delivery time is less than two weeks,or that the standard deviation of profit is less than 21%, or the expected (mean) profit is

greater than 12%, or that there is more than a 95% probability of finishing the project on

time.

Clicking the Add button in the Simulation Requirements tab opens a dialog for you to select

the spreadsheet location where the simulation requirement is to be placed. This is then

entered into the Address field and can be edited later by double-clicking this field.

Address is the cell position of the simulation requirement (constraint) that is to bemonitored. This cell should already have a formula entered for the Optimizer to work with

Name is a text field that identifies the constraint




244

Type provides a dropdown list of constraint options: Min; Max; Equals or Between

Value1 specifies the value of the constraint. When Type is Equals, the constraint is that the

formula in the cell is precisely equal to Value1; when Type is Min the constraint is that the

formula is greater than or equal to Value1; and if Type is Max the constraint is that the

formula is less than or equal to Value1. Value1 will nearly always be a fixed value: if it is

linked to a spreadsheet cell, ensure that the cell referred to is not generating random

values.

Value2 field is only available if Type is Between, in which case the constraint is that the

formula in the cell lies between Value1 and Value2. Value2 must be > Value1. ModelRisk will

generate an error message if not and will switch the constraint off, showing it to be invalid

in the dialog box.

Enabled is a tick box switching the constraint on and off. In the Excel formula, this appears

as TRUE or FALSE respectively.

Entering the information as described above will add the requirement to the model. In the

example above, where the first variable ‘Component 3 in the cell is constrained to have a 10

percentile of 0, the following formula now appears:

= [formula] +

VoseOptRequirementMin("Component3",VoseOptPercentile(0.1),0,TRUE)

where [formula] is the equation that was in the cell before the constraint was added.

VoseOptimization Decision Constraints functions

ModelRisk incorporates three functions for defining decision constraints:

VoseOptRequirementMin(Name, Statistic, Value, Enabled)

VoseOptRequirementMax(Name, Statistic, Value, Enabled)

VoseOptRequirementEquals(Name, Statistic, Value, Enabled)



Statistic “mean”; “median”; a value between zero and one for the Percentile option; “min”;

“max”; “stdev” (standard deviation); “variance”; “range” (the difference between the

minimum and maximum observed values); “skewness”; “kurtosis”; “CofV” (coefficient of

variation); and “FinValue” (the final value of a simulation, used if one builds a model that

simulates across iterations in some way). Value: any numerical value


constraint is to be used, or FALSE (or equivalently 0) when the decision constraint is not to

be used.

Example model





245

VoseOptRequirementMin

VoseOptRequirementMin( Name, Statistic, Value, Enabled )

Example model

This function is used to mark calculation cells with the lower bound requirements.

• Name - Requirement Name

• Statistic - Requirement statistic. Available options are:


• Value - Upper bound of the requirement

• Enabled - Set to TRUE if requirement is Enabled

For more information about this function refer to the topic on Optimization Simulation Requirements.




246

VoseOptRequirementMax

VoseOptRequirementMax( Name, Statistic, Value, Enabled )

Example model

This function is used to mark calculation cells with the upper bound requirements.




• Value - Lower bound of the requirement

• Enabled - Set to TRUE if requirementis Enabled





247

VoseOptRequirementBetween

VoseOptRequirementBetween( Name, Statistic, MinValue,

MaxValue, Enabled )

Example model

This function is used to mark calculation cells with the dual bound requirements.




• MinValue - Lower bound of the requirement

• MaxValue - Upper bound of the requirement

• Enabled - Set to TRUE if requirementis Enabled





248

VoseOptRequirementEquals

VoseOptRequirementEquals( Name, Statistic, Value, Enabled )

Example model

This function is used to mark calculation cells with equality requirements.




• Value - Value to which the requirement should be equal

• Enabled - Set to TRUE if requirement is Enabled





249

VoseOptPercentile

Example model

This function is used as a Statistic parameter for the cumulative percentile within the ModelRisk

optimization requirements functions:




and the ModelRisk optimization target functions:

VoseOptTargetMaximize(Name, Statistic, Enabled)

VoseOptTargetMinimize(Name, Statistic, Enabled)

VoseOptTargetValue(Name, Statistic, Value, Enabled)

The VoseOptPercentile function allows the user to specify, for example, that the target variable to beoptimised has a cumulative percentile (i.e. a quantile), as defined by its probability p, that is to be

minimised, maximised or set equal to Value in the above three functions respectively.

Thus, for example,

VoseOptTargetValue(“Cash”, VoseOptPercentile(0.9), 200, TRUE)

will make the Optimizer attempt to find a solution such that the 90th percentile of the distribution of thevariable “Cash” will equal 200.




250

VoseOptCVARx

Example model

This function is used as a Statistic parameter for Conditional Value-at-Risk calculated at aspecific value of the variable within the ModelRisk optimization requirements functions:








The VoseOptCVARx function allows the user to specify, for example, that the target variable

(the loss distribution) to be optimised has a Conditional Value-at-Risk at some cutoff value,

that is to be minimised, maximised or set equal to Value in the above three functions

respectively.

Thus, for example,

VoseOptTargetValue(“Cash”, VoseOptCVARx(120), 200, TRUE)

will make the Optimizer attempt to find a solution such that the CVAR from a cutoff of 120of the distribution of the variable “Cash” will equal 200.




251

VoseOptCVARp

Example model

This function is used as a Statistic parameter for Conditional Value-at-Risk calculated at a specificprobability within the ModelRisk optimization requirements functions:








The VoseOptCVARp function allows the user to specify, for example, that the target variable (the lossdistribution) to be optimised has a Conditional Value-at-Risk, as defined by its probability p, that is to be

minimised, maximised or set equal to Value in the above three functions respectively.

Thus, for example,

VoseOptTargetValue(“Cash”, VoseOptCVARp(0.9), 200, TRUE)

will make the Optimizer attempt to find a solution such that the CVAR defined at the 90th percentile of thedistribution of the variable “Cash” will equal 200.




252

Optimization Settings Dialog

ModelRisk incorporates the World’s most popular stochastic optimization engine called

OptQuest by OptTek Systems, Inc. The optimizer is accessed by clicking on the OptQuest

icon:

which opens the Optimization Settings interface. The last tab provides a number of control

options. These are described below:

Optimization control

User can select between a specific number of solutions to try, or specify how long theoptimizer should run.




253

Additional control

Selecting ‘Redraw Excel worksheet’ will show simulated random values at each sample of

the model. This can be a useful option for demonstrating what is happening during a model

simulation, but should generally be turned off because it greatly decreases the run speed.

Selecting ‘Show Results during Optimization’ will tabulate the progress of the optimizer

whilst it is running. Then selecting ‘Update Chart’ will plot the progress too.

Choose ‘fixed seed’ if you want ModelRisk to use the same set of random numbers for each

tested simulation run. This option will sometimes allow the optimizer to find a solution

quicker, particularly if one is optimizing on a volatile statistic of the target variable like an

extreme percentile, skewness or kurtosis. Choose ‘random seed’ to use different random

numbers for each tested simulation run. This may take a little longer, but may produce

more robust results.

Optimization type

Choose ‘With simulation’ (and the number of samples) if the model is a stochasticoptimization.

Choose ‘Without simulation’ if the model is a deterministic optimization (i.e. the model has

no random simulating components).

Note: the ‘Without simulation’ option is incompatible with setting any Simulation

Requirements. Any specified simulation requirements are reinterpreted as follows:

• VoseOptRequirementMin(Name, Statistic, Value, Enabled) – calculation must exceedValue

• VoseOptRequirementMax(Name, Statistic, Value, Enabled) – calculation must be lessthan Value

• VoseOptRequirementEquals(Name, Statistic, Value, Enabled) – calculation mustequal Value

Targets are reinterpreted for deterministic optimization as follows:

• VoseOptTargetMaximize(Name, Statistic, Enabled) – maximize calculation • VoseOptTargetMinimize(Name, Statistic, Enabled) – minimize calculation • VoseOptTargetValue(Name, Statistic, Value, Enabled) – get calculation as close to

Value as possible

Decision variable cell

Choose ‘Auto set to best solution’ if you wish ModelRisk to place the best solution directly

into your model at the end of its optimization run.

Choose ‘Leave set to original value’ if you wish ModelRisk not to place the best solution

directly into your model at the end of its optimization run. You can still select a solution and

place it into your Excel model via the Optimization Results Window which appears at the end of

the optimization run.




256

The solution analysis tab allows you to list all of the solutions that have been tried. Controls

below the table allow you to look at different subsets of solutions.

Selecting to show infeasible solutions will produce a table with ‘N\ A’ entries for the target

variable. Clicking on any of the column headers will reorder the list: in the example below

the solutions are ordered by increasing coefficient of variation of the Portfolio return:

Best Solution tab

The graph shows the solutions that have been tried and which met the Decision Constraints

and the Simulation Requirements. The horizontal axis is the index for the solution: in this




257

case 100 different solutions were tried. The vertical axis shows the value of the Decision

Target, in this case the mean of the Portfolio return’ which the model was attempting to

maximize. The red marker indicates the best solution found.

The table below the graph shows the statistics for this best solution:

Objective function describes what Target was being optimized and the value achieved

Decision variables shows the values of the decision variables that achieved the best

solution

Requirements describes any simulation requirements and returns the value of the variable

(in this case the coefficient of variation of the portfolio return) for comparison

Constraints shows any decision constraints and compares the results with the required

constraint

Below this list is the ‘Apply best solution to spreadsheet’ button that allows you to place this

solution directly into your model with a single click. This will already have occurred if, in the

Optimization Settings dialog, you have selected the button ‘Auto-set to best solution’.

Solution analysis tab

Right-clicking on any row allows you to place that particular solution directly into the

spreadsheet:




258




259

Fitting models to data

Fitting in ModelRisk

ModelRisk allows one to fit a distribution, time series or a copula to spreadsheet data.

All fits are performed using Maximum Likelihood Estimation (MLE) methods. In the fitting windows (seelist on the right) different fitted models can be ranked according to SIC, HQIC or AIC (Akaike) informationcriteria.

About the uncertainty parameter

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion ofuncertainty about the fitted model parameter estimates. Unfortunately, it is common practice in riskanalysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copula or time

series. However, when there are relatively few data available or when the model needs to be precise,omitting the uncertainty about the true parameter values can lead to significant underestimation of themodel output uncertainty.

The Uncertainty parameter is set to FALSE by default (i.e. returns MLEs or projections based on MLEs)to coincide with common practice, but we strongly recommend setting it to TRUE. Uncertainty values are

then generated for the fitted parameters using parametric bootstrapping techniques, which has the greatadvantage of allowing correlation structure between uncertain parameters and non-normal marginaluncertainty distributions, the latter being an important constraint of more classical methods based onasymptotic results (i.e. when the amount of data approaches infinity).

Distribution fittingfunctions

For non-parametricbootstrapping techniques forestimation of parameters,ModelRisk has the VoseNBootfunctions.

For each univariate distributionin ModelRisk, a number offitting functions are included:

VoseDistributionFit

Returns a sampled value froma distribution fitted to the datausing Maximum LikelihoodEstimation. The general syntaxis:

=VoseDistributionFit({data}, Uncertainty , U )





260

The parameters are:

• {data} - array containing data to fit the distribution to.

• Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted distribut

a new fitted parameter value is used on each spreadsheet recalculation through bootstrapping techniques.

• U - optional parameter specifying the cumulative percentile of the distribution. If omitted the function

For example, if DataSet is an array of data, VoseNormalFit(DataSet) will return a random value

from a Normal distribution that is the MLE fit to the DataSet. VoseNormalFit(DataSet,1) will use

bootstrapping to simulate the uncertainty about the fitted parameters.

If we want to use VoseDistributionFit to generate multiple random values from a fitted distribution withuncertainty included (i.e. Uncertainty=TRUE), there are two ways to do this:

1. In one cell, or in many cells, but not as array function

2. In many cells as array function

In the first case, the uncertainty and variability are mixed, because each random value is sampled form a

different distribution. However in the second case, all random values are sampled from the same distribution and the distribution will change only with the next iteration.

You can read more about separating uncertainty and randomness in the Separating uncertainty fromrandomness and variability introduction topic.

VoseDistributionFitP

Array function that returns the parameters of the VoseDistribution fitted to the data. The general syntax is:

{=VoseDistributionFitP({data}, Uncertainty)}



• Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the

fitted distribution (as explained above), and FALSE (default) to use the MLE.

The output array size should be one-dimensional, with the number of cells equal to the number of

estimated parameters. The fitted parameters are returned in the same order as they are in thecorresponding VoseDistribution (simulation) function.

So, for example,

{=VoseNormalFitP({1,2,2,3},0)}

should have an output of two cells. The function will return best fitting values for Mu and Sigma, in thatorder, because the ModelRisk syntax for the normal distribution is VoseNormal(Mu, Sigma).

VoseDistributionFitObject

Constructs a distribution object of the fitted distribution. General syntax:

=VoseDistributionFitObject({data}, Uncertainty)




fitted distribution, and FALSE (default) to use the MLE.

Time series fitting functions




261

For each time series in ModelRisk the following fitting functions are included.

VoseTimeSeriesFit

Generates a sequence of random values of a time series model fitted to the data using Maximum

Likelihood Estimation. Syntax:

{=VoseTimeSeriesFit({data}, Uncertainty , Log Returns, Initial Value)}


• {data} - array containing data to fit the time series to.

• Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about thefitted time series (as explained above), and FALSE (default) to use the MLE.

• Log Returns - optional boolean (TRUE/FALSE) parameter that specifies whether the time series ar

• Initial Value - last known historic value. The generated time series values will continue on from this

FALSE or omitted.

For example, if DataSet is an array of historical data, {=VoseTimeAR1Fit(DataSet)}) will return a

random value from a AR1 time series that is the MLE fit to the DataSet.

{=VoseTimeAR1Fit(DataSet,1)} will use bootstrapping to simulate the uncertainty about thefitted parameters.

When the data fitted to a time series takes negative values ModelRisk recognizes that these data canonly be log returns, not the actual value of the variable. In this situation, the Log Return option isautomatically selected and ModelRisk will produce a forecast of log returns, making Initial Valueredundant as described above.

VoseTimeSeriesFitP

Array function that returns the parameters of a time series model fitted to the data using Maximum

Likelihood Estimation. General syntax:

=VoseTimeSeriesFit({data}, Uncertainty , Log Returns)


• {data} - array containing data to fit the time series to.




262

• Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about

the fitted time series, and FALSE (default) to use the MLE.

• Log Returns - optional boolean (TRUE/FALSE) parameter that specifies whether

the time series are in log returns. Default is FALSE.

For example, if DataSet is an array of historical data, {=VoseTimeAR1FitP(DataSet)}) will return

the parameters from a AR1 time series that is the MLE fit to the DataSet.The output array size should be one-dimensional, with the number of cells equal to the number ofestimated parameters. The fitted parameters are returned in the same order as they are in thecorresponding VoseTimeSeries (simulating) function.

So, for example,

{=VoseTimeGBMFitP({1,2,2,3},0)}

should have an output of two cells. The function will return best fitting values for Mu and Sigma, in thatorder, because the ModelRisk syntax for modeling an GBM Time Series is VoseTimeGBM(Mu,Sigma,[other parameters]).

Copula fitting functions

For each of the Copulas availablein ModelRisk, the following fittingfunctions are included:

VoseCopulaFit

Array function that generatesvalues from the bivariate ormultivariate copula fitted to thedata using Maximum Likelihood

Estimation. The syntax for fittingbivariate respectively multivariatecopula is

{=VoseCopulaBiNameFit({data},Data_in_rows,Uncertainty )}

{=VoseCopulaMultiNameFit({data},Data_in_rows,Uncertainty )}


For the bivariate archimedean copulas (Clayton, Gumbel, Frank) this function chooses the direction ofcorrelation that best fits the data and simulates from the fitted copula. Alternatively one can use thestandard multivariate copula fitting functions, even for bivariate data, which will assume the standard

direction of the fitted copula.




263

• {data} - the array of data to fit the copula to. This should be an 2-dimensional array for

fitting a bivariate copula, or n-dimensional where n>2 for fitting a multivariate copula.

• Data_in_rows - optional boolean parameter that specifies whether the data is in columns

(FALSE,default) or rows (TRUE).


fitted time series, and FALSE (default) to use theMLE.

The scatter plot shown on the right illustrates a Gumbelcopula fitted to bivariate data.

Note from the scatter plot that the data are negativelycorrelated. The bivariate Gumbel can adapt to this by

rotating the fitted copula, and gives a fitted Directionparameter of 3, and a value of theta = 3.036.

In contrast, the multivariate Gumbel copula only has

Direction = 1 at its disposal so would provide a very poor

fit.

VoseCopulaFitP

Array function that returns the parameter(s) from thebivariate or multivariate copula fitted to the data. Thesyntax is:

{=VoseCopulaBiNameFitP({data},Data_in_rows,Uncertainty )}

{=VoseCopulaMultiNameFitP({data},Data_in_rows,Uncertainty )}


• {data} - the array of data to fit the copula to. This should be an 2-dimensional array forfitting a bivariate copula, or n-dimensional where n>2 for fitting a multivariate copula.

• Data_in_rows - optional boolean parameter that specifies whether the data is in columns

(FALSE,default) or rows (TRUE).


fitted time series, and FALSE (default) to use the MLE.

The output array size should be one-dimensional, with the number of cells equal to the number ofestimated parameters. The fitted parameters are returned in the same order as they are in thecorresponding VoseCopula (simulating) function.

So, for example,

{=VoseCopulaBiTFitP(Data,0)}

should have an output of two cells. The function will return best fitting values for the Nu and Covarianceparameters, in that order, because the ModelRisk syntax for modeling a bivariate T copula isVoseCopulaBiT(Nu,Covariance).

VoseCopulaData

This array function generates random values from an empirical copula constructed entirely from the

correlation pattern of given data. Syntax:

{=VoseCopuladata({data},Data_in_rows)}

• {data} - the spreadsheet data from which to construct the copula. This should be at least

a two-dimensional array.




264

• Data_in_rows - a boolean parameter (TRUE/FALSE) that specifies whether the data is

oriented in rows (TRUE) or not (FALSE, default)

Note the difference between constructing an empirical copula, and fitting anexisting type of copula:

When fitting a copula, we determine the parameter of the copula that makes

for a best fit to the data, but retaining the copula's functional form. With theempirical copula, the functional form itself (not just the parameter) is basedon the data, making it a flexible tool for capturing any correlation pattern,

however unusual (for example the one shown on the right).




265

Goodness of fit functions

VoseAIC

=VoseAIC(FitObject)

Returns the Akaike Information Criterion (AIC) for goodness of fit of a distribution, time series or copulamodel fitted to data.

The AIC is used to compare different fitted models against each other. The lower the value of the

information criterion, the better the fit.

• FitObject - a valid fitted distribution, time series or copula Fit Object

The AIC is one of the three information criteria included in ModelRisk for ranking various fitted modelsagainst each other, the other ones being Hannan-Quinn HQIC and Schwarz SIC (they are comparedhere). The reasoning behind information criteria is that the better model is the one that explains the datawell with a minimum number of free parameters.

AIC is defined as follows:

with

• n = number of observations (e.g. data values, frequencies)

• k = number of parameters to be estimated (e.g. the Normal distribution has 2: mu andsigma)

• Lmax = the maximized value of the log-Likelihood for the estimated model (i.e. fit the

parameters by MLE and record the natural log of the Likelihood.)

Note that in the ModelRisk distribution, time series and copula fitting windows, the negatives of theinformation criteria are shown to rank the fitted models by. So in the list shown in the window, a higher number means a better fit.

Example

Say you have an array with data named DataSet. To compare the fit of a GBM with a GBMMR time seriesmodel we could then create a fit object for each, in cells A1 and A2 respectively:


=VoseTimeGBMMRFitObject(DataSet)

and write in cells B1 and B2:

=VoseAIC(A1)

=VoseAIC(A2)

Now as you update the DataSet, the fitted objects and their AIC are adjusted accordingly. This allows one

to easily review the appropriateness of models used, as more accurate or recent data becomes available.




266




267

VoseOptimalFit and related functions

See also: VoseCopulaOptimalFit and related functions, VoseTimeOptimalFit and related functions

=VoseOptimalFit ({FittedDistributionObjects}, IC,U)

The VoseOptimalFit function returns a random sample from the optimal fitting distribution amongst a setof fitted distributions.

{FittedDistributionObjects} - an array of fitted distribution objects

IC - the information criteria to be used to determine which distribution fits optimal: 1 = AIC, 2 = SIC, 3 =HQIC.

Notes

If any of the distributions fitted are invalid, and therefore the relevant VoseDistributionFit function returns

an error message, the VoseOptimalFit and related functions will ignore this distribution.

The VoseOptimalFit and related functions will return an error message if all fitted distributions are invalid.

Other related functions

VoseOptimalFitObject({FittedDistributionObjects}, IC) - Returns the best fitting distribution objectaccording to the selected information criteria.

VoseOptimalFitLLH ({FittedDistributionObjects}) - Returns the Log Likelihood of the optimal fitting

model.

VoseOptimalFitAIC ({FittedDistributionObjects}) - Returns the Akaike information criteria of theoptimal fitting model.

VoseOptimalFitHQIC ({FittedDistributionObjects}) - Returns the Hannan-Quinn information criteria of the optimal fitting model.

VoseOptimalFitSIC ({FittedDistributionObjects}) - Returns the Schwarz information criteria of the

best optimal model.




268

VoseCopulaOptimalFit and related functions

See also: VoseOptimalFit and related functions, VoseTimeOptimalFit and related functions

{=VoseCopulaOptimalFit ({FittedCopulaObjects}, IC)}

VoseCopulaOptimalFit is an array function that returns a random sample from the optimal fitting copula

amongst a set of fitted copulas.

{FittedCopulaObjects} - an array of fitted copula objects

IC - the information criteria to be used to determine which copula fits optimal: 1 = AIC, 2 = SIC, 3 = HQIC.


VoseCopulaOptimalFitObject ({FittedCopulaObjects}, IC) - Returns the optimal fitting copula objectaccording to the selected information criteria.

VoseOptimalFitLLH ({FittedCopulaObjects}) - Returns the Log Likelihood of the optimal fitting copula.

VoseOptimalFitAIC ({FittedCopulaObjects}) - Returns the Akaike information criteria of the optimal

fitting copula.

VoseOptimalFitHQIC ({FittedCopulaObjects}) - Returns the Hannan-Quinn information criteria of the optimal fitting copula.

VoseOptimalFitSIC ({FittedCopulaObjects}) - Returns the Schwarz information criteria of the optimal

fitting copula.




270

VoseLLH

VoseLLH(FitObject)

Example model

The Vose LLH function calculates the natural log of the joint likelihood of observed data coming from the

fitted model. The parameter FitObject can be any ModelRisk distribution, copula or time series objectfitted to data.

For example, in the model below a gamma distribution is fitted to a data set. Cell E1 contains the formulafor a fitted gamma distribution object, and cell E3 returns the log likelihood associated with this fit. Since

ModelRisk fits probability models using maximum likelihood estimation techniques, the VoseLLH functionis returning the maximised log likelihood value.

Cell E4 gives a check on the value returned by VoseLLH in this example. The formula calculates the jointprobability density of the dataset were they to come from the fitted distribution. Multiplying by LN(10)converts from log base 10 to log base e.

The example model provides other examples of the VoseLLH function with copula and time series fittedobjects.

Fitted probability model objects can be nested within the VoseLLH function in the usual Excel notation so,

for example, the formula in cell E3 for the example model above could also be:=VoseLLH(VoseGammaFitObject(B2:B14))

Uses

VoseLLH is useful for comparing fits between different models on a likelihood basis. It can be used to

select the most appropriate fitted model automatically within the spreadsheet, as this example shows.

VoseLLH is closely connected to VoseAIC, VoseSIC, and VoseHQIC, which return information criteria fora fitted probability model object. Information criteria are essentially log-likelihood value modified topenalise a fitted model according to the number of parameters within the model. Information criteriabalance the goodness of fit of a model against its parsimony.




271

VoseSIC

=VoseSIC(FitObject)

Returns the Schwarz Information Criterion (SIC) (also known as Bayesian Information Criterion or BIC)for goodness of fit of a distribution, time series or copula model fitted to data.

The SIC is used to compare different fitted models against each other. The lower the value of the

information criterion, the better the fit.

• FitObject - a valid fitted distribution, time series or copula Fit Object

The SIC is one of the three information criteria included in ModelRisk for ranking various fitted models

against each other, the other ones being Akaike's AIC and Hannan-Quinn SIC (they are compared here).The reasoning behind information criteria is that the better model is the one that explains the data wellwith a minimum number of free parameters. SIC is defined as follows:

with

• n = number of observations (e.g. data values, frequencies)

• k = number of parameters to be estimated (e.g. the Normal distribution has 2: mu and

sigma)

• Lmax = the maximized value of the log-Likelihood for the estimated model (i.e. fit the

parameters by MLE and record the natural log of the Likelihood.)

Note that in the ModelRisk distribution, time series and copula fitting windows, the negatives of theinformation criteria are shown to rank the fitted models by. So in the list shown in the window, a higher number means a better fit.

Example

Say you have an array with data named DataSet. To compare the fit of a GBM with a GBMMR time series

model we could then create a fit object for each, in cells A1 and A2 respectively:


=VoseTimeGBMMRFitObject(DataSet)

and write in cells B1 and B2:

=VoseSIC(A1)

=VoseSIC(A2)

Now as you update the DataSet, the fitted objects and their SIC are adjusted accordingly. This allows oneto easily review the appropriateness of models used, as more accurate or recent data becomes available.




272

VoseTimeOptimalFit and related functions

See also: VoseOptimalFit and related functions, VoseCopulaOptimalFit and related functions

{=VoseTimeOptimalFit ({FittedTimeSeriesObjects}, IC)}

VoseTimeOptimalFit is an array function that returns a random sample from the optimal fitting time series

model amongst a set of fitted time series models.

{FittedTimeSeriesObjects} - an array of fitted time series model objects

IC - the information criteria to be used to determine which time series model fits optimal: 1 = AIC, 2 = SIC,3 = HQIC.


VoseTimeOptimalFitObject ({FittedTimeSeriesObjects}, IC) - Returns the optimal fitting time series

model object according to the selected information criteria.

VoseOptimalFitLLH ({FittedTimeSeriesObjects}) - Returns the Log Likelihood of the optimal fitting

time series model.

VoseOptimalFitAIC ({FittedTimeSeriesObjects}) - Returns the Akaike information criteria of the

optimal fitting time series model.

VoseOptimalFitHQIC ({FittedTimeSeriesObjects}) - Returns the Hannan-Quinn information criteria

of the optimal fitting time series model.

VoseOptimalFitSIC ({FittedTimeSeriesObjects}) - Returns the Schwarz information criteria of theoptimal fitting time series model.




273

Distribution Fit

Introduction

In the Distribution Fit window you can fit distributions toa set of data in the spreadsheet. The distribution'sparameters are estimated using maximum likelihoodestimation (MLE).

The fitted distributions are ranked according to the SIC, AIC (Akaike) and HQIC information criteria. For these

holds: the lower an information criterion, the better thefit. To avoid confusion the negatives of these criteria aredisplayed in the list. This means that:

the higher the value shown in the list, the better the

fit.

AIC and the other Information Criteria are superiorgoodness of fit statistics to other fit ranking criteria (e.g.chi-squared), because they take into account thenumber of parameters estimated , and penalize for

overfitting: a model that has a good fit using fewerparameters is preferred over one that needs moreparameters. You can read more about informationcriteria here.

The AIC is the least strict of the three in penalizing formore parameters, while SIC is the strictest. Moreinformation on information criteria can be found here.




274

The graph toolbar has additional buttons that showdensity, mass, P-P and Q-Q plots, for visual inspectionof the quality of the fit. Also see Goodness of Fit Plotsfor a more detailed explanation.

The fitted distribution can be dynamically linked to thespreadsheet data. From the fitted distribution you can

insert a random value, percentile calculation,distribution object, etc. Also, the fitted parametersthemselves can be inserted in the spreadsheet.

Window elements

In the Data location field, you can specify where in the

spreadsheet the data is located. Note that this can bean array of any dimension, though in most cases it willbe one-dimensional.

You can take truncated data in account by checking the

Enabled box. If enabled, the minimum and maximumcan be provided.

In the Distribution List, you specify what distributions to fit the data to. Add and remove distributions bypressing the Add and Remove buttons, respectively. Note that, when adding distributions to fit, you canonly select distributions that can be fitted to this data. For example, you can not:

• fit a discrete distribution to continuous data; or

• fit a bounded distribution (e.g. a Beta) to a data set that has data points outside of the

distribution's boundaries.

In the Distribution List, you can rank the fitted distributions by the SIC, HQIC or AIC information criterion.

The higher the value shown in the list, the better the fit.

By marking the checkbox you can choose whether or not to include uncertainty about the fitted

distribution's parameters. On the preview graph, this is represented by grey lines added to the fitteddistribution's graph. To read the motivation behind this parameter click here. Selecting the ‘Overlay’

option allows you to compare the fit of two or more selected distributions together:




275

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion ofuncertainty about the fitted model parameter estimates. Unfortunately, it is common practice in risk

analysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copula or timeseries. However, when there are relatively few data available or when the model needs to be precise,omitting the uncertainty about the true parameter values can lead to significant underestimation of themodel output uncertainty. The Uncertainty parameter is set to FALSE by default (i.e. returns MLEs orprojections based on MLEs) to coincide with common practice, but we strongly recommend setting it to

TRUE. Uncertainty values are then generated for the fitted parameters using parametric bootstrappingtechniques, which has the great advantage of allowing correlation structure between uncertainparameters and non-normal marginal uncertainty distributions, the latter being an important constraint ofmore classical methods based on asymptotic results (i.e. when the amount of data approaches infinity).

Below the graph you can specify the number of bins to group the data in (this only affects the image, notthe fitting algorithm) and the number of lines generated to represent uncertainty in fitted distribution with.

Click the button above the preview graph to insert thefitted distribution in the spreadsheet. The following options

are available

• Random sample (linked to data) - generaterandom values from the fitted distribution.

• Random sample (not linked to data) - insert thefitted distribution with static values as parameters,i.e. not dynamically linked to the source data.• Estimated parameters - insert the fitted distribution's parameters

• Quantile(U) - calculate a quantile value of the fitted distribution (through the U parameter)

• Object - construct a fitted distribution object.




276

Industrial version only:

Clicking the “Create report” button above the chart will produce a fit report in a new Worksheet withthe fitted models in a table. The table will have the fitted distribution objects, Goodness of Fit rankings,statistics and percentiles of the fitted models. The report will also include the OptimatFit function that

automatically returns the best fitted model according to the selected information criteria.

An example of such report is available in the example model.




277

VoseTruncData

VoseTruncData(MinX,MaxX)

This function is used as an argument for the ModelRisk distribution fitting functions, to indicate that agiven data set is truncated between MinX , MaxX or both. Fitting to truncated data requires an adjustment

in the likelihood function that is optimized for obtaining the MLE parameter estimates.

Truncated data occurs when there are observations that we do not see above or below some level. For

example, at a bank it may not be required to record an error below $100.

Example

Say we have 5 observations of a measurement truncated at MinX and MaxX. Observations between Minand Max are a,b,c ,d and e.

Likelihood function: f(a)*f(b)*f(c)*f(d)*f(e)/(F(MaxX )-F(MinX ))5

Explanation: We only observe a value if it lies between Min and Max which has probability (F(MaxX )-F(MinX)).

In the ModelRisk distribution fit window you can indicate if the measurement data is truncated, andprovide min and max.




278

Bivariate Copula Fit

Introduction

With the BivariateCopula Fit window, you

can fit a bivariatecopula to spreadsheetdata.

The fitted distributionsare ranked according

to the SIC, AIC(Akaike) and HQICinformation criteria. For

these holds: the loweran information criterion,the better the fit. Toavoid confusion thenegatives of these

criteria are displayed inthe list. This meansthat:

the higher the valueshown in the list, thebetter the fit.

AIC and the other Information Criteria are superior goodness of fit statistics to other fit ranking criteria(e.g. chi-squared), because they take into account the number of parameters estimated , and penalize for

overfitting: a model that has a good fit using fewer parameters is preferred over one that needs moreparameters.

The AIC is the least strict of the three in penalizing for more parameters, while SIC is the strictest. More

information on these can be found here.

When fitting a bivariate copula, the copula direction that best matches the data is chosen. If you do not

want to vary the copula direction and just use the copula with the default direction, use multivariate copulafit instead.

Different types of output are possible, like the fitted parameters themselves, or data generated from thecopula based on them.



Output functions of this window: - VoseCopulaBiClaytonFit, VoseCopulaBiFrankFit,

VoseCopulaBiGumbelFit, VoseCopulaBiNormalFit, VoseCopulaBiTFit

VoseCopulaBiClaytonFitP, VoseCopulaBiFrankFitP, VoseCopulaBiGumbelFitP,VoseCopulaBiNormalFitP, VoseCopulaBiTFitP

Window elements




279

When opening the Bivariate Copula Fit window, you are first asked to choose which copula(s) to fit. Thisselection can be changed any time later on.

In the Source data region, the location of the source data in the spreadsheet, and its orientation (in rows

or columns) can be selected.

Next shown is the list of Correlations (i.e. fitted copulas), ranked by SIC, AIC or HQIC criterion. Click oneof these three to rank the fitted copulas according to it. Copulas for fitting can be added or removed forthis list by pressing the add and remove buttons, respectively.

You can mark the check box to choose whether or not you want to include the (unavoidable) uncertaintyin the fit. To read the motivation behind this parameter click here.

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion ofuncertainty about the fitted model parameter estimates. Unfortunately, it is common practice in riskanalysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copula or timeseries. However, when there are relatively few data available or when the model needs to be precise,

omitting the uncertainty about the true parameter values can lead to significant underestimation of the

model output uncertainty. The Uncertainty parameter is set to FALSE by default (i.e. returns MLEs orprojections based on MLEs) to coincide with common practice, but we strongly recommend setting it toTRUE. Uncertainty values are then generated for the fitted parameters using parametric bootstrapping

techniques, which has the great advantage of allowing correlation structure between uncertainparameters and non-normal marginal uncertainty distributions, the latter being an important constraint ofmore classical methods based on asymptotic results (i.e. when the amount of data approaches infinity).

With the Chart mode buttons below the preview graph, you can switch between showing the source data,

randomly generated data from the fitted copula, or a combination of both, as well as the number ofgenerated points to be shown.

With the Output mode buttons below the preview graph, you can switch between exporting the

parameters of the fitted copula, randomly generated values from the fitted copula, a combination of both

or an object.


Clicking the “Create report” button above the chart will produce a fit report in a new Worksheet withthe fitted models in a table. The table will have the fitted Copula objects and Goodness of Fit rankings.The report will also include the OptimatFit function that automatically returns the best fitted model

according to the selected information criteria.

An example of such report is available in the following example model.




280

Multivariate Copula Fit

Introduction

With the MultivariateCopula Fit window,you can fit amultivariate copula tospreadsheet data.

The fitted distributionsare ranked according

to the SIC, AIC(Akaike) and HQICinformation criteria. Forthese holds: the loweran information

criterion, the better thefit. To avoid confusionthe negatives of these

criteria are displayed inthe list. This meansthat:

the higher the valueshown in the list, the

better the fit.

AIC and the other Information Criteria are superior goodness of fit statistics to other fit ranking criteria(e.g. chi-squared), because they take into account the number of parameters estimated , and penalize for

overfitting: a model that has a good fit using fewer parameters is preferred over one that needs moreparameters.

The AIC is the least strict of the three in penalizing for more parameters, while SIC is the strictest. More

information on these can be found here.

Different types of output are possible, like the fitted parameters themselves, or data generated from thecopula based on them.



Output functions of this window: - VoseCopulaMultiClaytonFit, VoseCopulaMultiFrankFit,

VoseCopulaMultiGumbelFit, VoseCopulaMultiNormalFit, VoseCopulaMultiTFit

VoseCopulaMultiClaytonFitP, VoseCopulaMultiFrankFitP, VoseCopulaMultiGumbelFitP,VoseCopulaMultiNormalFitP, VoseCopulaMultiTFitP

Window elements

(When opening the Multivariate Copula Fit window, you are first asked to choose the copulas to be fitted.This selection can be changed at any time later on.)




281

In the Source data region, the location of the source data in the spreadsheet, and its orientation (in rows

or columns) can be selected.

Next shown is the list of Correlations (i.e. fitted copulas), ranked by SIC, AIC or HQIC criterion. Click oneof these three to rank the fitted copulas according to it. Copulas for fitting can be added or removed forthis list by pressing the add and remove buttons, respectively.

You can mark the check box to choose whether or not you want to include the (unavoidable) uncertaintyin the fit. To read the motivation behind this parameter click here.

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion ofuncertainty about the fitted model parameter estimates. Unfortunately, it is common practice in riskanalysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copula or timeseries. However, when there are relatively few data available or when the model needs to be precise,omitting the uncertainty about the true parameter values can lead to significant underestimation of themodel output uncertainty. The Uncertainty parameter is set to FALSE by default (i.e. returns MLEs orprojections based on MLEs) to coincide with common practice, but we strongly recommend setting it toTRUE. Uncertainty values are then generated for the fitted parameters using parametric bootstrapping

techniques, which has the great advantage of allowing correlation structure between uncertainparameters and non-normal marginal uncertainty distributions, the latter being an important constraint ofmore classical methods based on asymptotic results (i.e. when the amount of data approaches infinity).

In the correlation matrix shown, click any of the white fields to toggle the preview graph to display the twovariables this field corresponds to. For example, in the image below the correlation between Var1 andVar2 is toggled for displaying:

With the Chart mode buttons below the preview graph, you can switch between showing the source data,

randomly generated data from the fitted copula, or a combination of both, as well as the number of

generated points to be shown.

With the Output mode buttons below the preview graph, you can switch between exporting theparameters of the fitted copula, randomly generated values from the fitted copula, a combination of bothor an object.


Clicking the “Create report” button above the chart will produce a fit report in a new Worksheet withthe fitted models in a table. The table will have the fitted Copula objects and Goodness of Fit rankings.The report will also include the OptimatFit function that automatically returns the best fitted modelaccording to the selected information criteria.

An example of such report is available in the following example model.




282

Empirical Copula

Introduction

The Empirical Copulawindow provides a way todirectly construct abivariate or multivariate

copula based onspreadsheet data. From thisconstructed copula,

randomly sampled valuescan be generated.

Note the difference betweenconstructing an empirical copula, and fitting an

existing type of copula:

When fitting a copula, wedetermine the parameter of

the copula that makes for a

best fit to the data, butretaining the copula'sfunctional form. With theempirical copula, the

functional form itself (not just the parameter) is based on the data, making it a flexible tool for capturingany correlation pattern, however unusual.

From within this window, you can also specify the distributions that model the variables to be correlatedby this empirical copula, as well as names for these variables, and insert these in the spreadsheet.



Output functions of this window: VoseCopulaData

Window elements

In the Source data region, the location of the source data in the spreadsheet, and its orientation (in rowsor columns) can be selected.

Uncertainty in constructing the empirical copula can be included by marking the check box.

In the Variables references region, you can (optionally) select distributions to model the variables that will

be correlated by the empirical copula. If you select none, randomly generated values from the constructedcopula will be outputted. You can of course always link these to distributions in your spreadsheet modelthrough the U-parameter later on.

To select a distribution to model one of the correlated variables, first click this variable's label in thecorrelation matrix shown, then fill in the desired VoseDistribution in the Distribution for current variable field.




283

For the preview graph, you can choose between displaying generated percentiles (i.e. copula values) or

randomly sampled values of the correlated variables (if you selected a distribution for them as explainedabove).

You can specify the number of Fitted points in the designated field.

With the Chart mode buttons you can choose wether to display source data, the constructed empirical

copula or a combination of both.




284

Univariate Time Series Fit

Introduction

The Time Series available in ModelRisk can be fitted to agiven set of spreadsheet data.

The fitted Time Series (or explicitly the fitted parametervalues) can then be used for time series forecasting in yourspreadsheet model.

The fitted distributions are ranked according to the SIC, AIC(Akaike) and HQIC information criteria. For these holds: the

lower an information criterion, the better the fit. To avoidconfusion the negatives of these criteria are displayed in the

list. This means that:

the higher the value shown in the list, the better the fit.

AIC and the other Information Criteria are superiorgoodness of fit statistics to other fit ranking criteria (e.g. chi-squared), because they take into account the number of

parameters estimated , and penalize for overfitting : a modelthat has a good fit using fewer parameters is preferred over

one that needs more parameters.

The AIC is the least strict of the three in penalizing for moreparameters, while SIC is the strictest. More information on

these can be found here.

Source data can be either actual observations of the

variable (e.g. stock price) or the log return of that variable. If




285

the source data are log returns, the parameter View as Log

Returns should be set to TRUE (by marking the check box),

and a forecast will correspondingly generate a stochasticlog returns series.

Depending on the time series model being fit, some of thefunction input parameters may not be present. Past Time

Stamps and Future Time Stamps only appear where themathematical model allows variations between the timeintervals of the observations (and the projections). TheGeometric Brownian Motion (GBM) and the GBM with Mean

Reversion models, for example, can jump from oneobservation to any future point in time. The GBM with JumpDiffusion model, however, cannot do so because it wouldhave to allow any number of jumps to occur within a timeperiod leading to an intractable model. When providing anarray of Time Stamps please make sure it is in ascending

order.

The set of values in Source data may not contain the last observation you have (e.g. the series may not

allows Time Stamps but you know what the variable value is now. In this case you can provide the lastobserved value as Initial Value. More generally, Initial Value allows you to construct a series from someparticular point that is based on an analysis of past behavior. If View as Log Returns is set to TRUE,Initial Value becomes redundant and is ignored.

Both historical data and data generated from the fitted model are displayed, optionally with more than oneline for the forecast.

You can read more about the theory behind time series here.



Random values of fitted time series:VoseTimeGBMFit, VoseTimeGBMJDFit, VoseTimeGBMJDMRFit, VoseTimeGBMMRFit,

VoseTimeSeasonalGBMFit, VoseTimeMA1Fit, VoseTimeMA2Fit, VoseTimeAR1Fit, VoseTimeAR2Fit,VoseTimeARMAFit, VoseTimeARCHFit, VoseTimeGARCHFit, VoseTimeEGARCHFit,VoseTimeAPARCHFit, VoseTimeDeathFit, VoseTimeYuleFit

Parameters of fitted time series:

VoseTimeGBMFitP, VoseTimeGBMJDFitP, VoseTimeGBMJDMRFitP, VoseTimeGBMMRFitP,VoseTimeSeasonalGBMFitP, VoseTimeMA1FitP, VoseTimeMA2FitP, VoseTimeAR1FitP,VoseTimeAR2FitP, VoseTimeARMAFitP, VoseTimeARCHFitP, VoseTimeGARCHFitP,

VoseTimeEGARCHFitP, VoseTimeAPARCHFitP, VoseTimeDeathFitP, VoseTimeYuleFitP

Window elements

When opening Time Series Fit you are first presented a screen to select Time Series to fit. In the fittingwindow itself, this can be changed at any time (i.e. new Time Series added and removed for fitting).

In the Source data region you can select the location of the data to fit to in your spreadsheet.

In the Time Series list on the left, you can add and remove Time Series to fit by pressing the Add orRemove buttons. Click on a Time Series in the list to toggle it for the preview graph. Click on one of thethree information criteria to sort the fitted models according to this criterion (lower value is better fit).




286

In the Time Fit Function Parameters window, extra data can be filled in - this depends on the Time Seriescurrently selected from the list.

• To display a preview graph of the Log Returns rather than the values themselves, markthe Data are in Log Returns check box.

• In the Initial Value field, you can specify the last data value in time you have. Random

values generated from the fitted time series model will start from this value.• To take into account the uncertainty that (unavoidably) exists about the fitted parameters,mark the Uncertainty check box. The smaller the dataset, the larger the uncertainty on the fittedparameters will be. To read the motivation behind this parameter click here.

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion ofuncertainty about the fitted model parameter estimates. Unfortunately, it is common practice inrisk analysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copulaor time series. However, when there are relatively few data available or when the model needs tobe precise, omitting the uncertainty about the true parameter values can lead to significantunderestimation of the model output uncertainty. The Uncertainty parameter is set to FALSE bydefault (i.e. returns MLEs or projections based on MLEs) to coincide with common practice, but

we strongly recommend setting it to TRUE. Uncertainty values are then generated for the fittedparameters using parametric bootstrapping techniques, which has the great advantage ofallowing correlation structure between uncertain parameters and non-normal marginal uncertaintydistributions, the latter being an important constraint of more classical methods based on

asymptotic results (i.e. when the amount of data approaches infinity).

By default, the vertical scale of the preview graph is automatically rescaled according to the(historical/generated) data. To keep the vertical scale fixed, check Fix Y-scale.


C licking the “Create report” button above the chart will produce a fit report in a new Worksheet with

the fitted models in a table. The table will have the fitted time series objects and Goodness of Fitrankings. The report will also include the OptimatFit function that automatically returns the best fittedmodel according to the selected information criteria.





287

Multivariate Time Series Fit

Quantities that move together in time are typicallymodeled using multivariate time series ("MultiTS")models. MultiTS models allow one to easily account forthe relations and correlation that exist between the"marginal" components.

A typical example of a situation where one can usemultivariate time series is yield curve modeling forexample: where we model the interest rates for differenttimes-to-maturity. At any point in time an interest rate for

some time to maturity (say, 5 years) is typically related to:

• the (immediate) past,

• the interest rates at for other times-to-maturity (e.g. 1 month, 1 year, 10 years...)

Use the ModelRisk Multivariate Time Series Fit window tofit a number of multivariate time series models to historicaldata.


Output functions of this window: VoseTimeMultiBEKKFit, VoseTimeMultiBEKKFitP,

VoseTimeMultiGBMFit, VoseTimeMultiGBMFitP, VoseTimeMultiMA1Fit, VoseTimeMultiMA1FitP, VoseTimeMultiMA2Fit, VoseTimeMultiMA2FitP




288

Window elements

In the Source data region you can specify an array of cells in the spreadsheet with data to fit to. Select

between data in rows and data in columns to specify the correct orientation - be careful to select thecorrect orientation as an error will produce a fit, but it will be nonsensical.

On the lower left corner of the window you see a matrix with which you can specify which two series toshow a preview plot for. Use the check mark View all series (marked by default) to display all component

time series.

In the Viewing controls region below the preview plot you can specify the number of forecasts to display

(an integer greater than 1), whether to fix the axis-scale and whether to display Log Returns or actualvalues.

For the output of the fittedseries, you can choose to

orient the simulated values incolumns or rows - so it ispossible to have the historicaldata in one orientation (e.g.rows) in the spreadsheet, and the forecasted values in another

(e.g. columns).


Clicking the “Create report” button above the chart will produce a fit report in a new Worksheet with

the fitted models in a table. The table will have the fitted time series objects and Goodness of Fitrankings. The report will also include the OptimatFit function that automatically returns the best fittedmodel according to the selected information criteria.





289

Ordinary Differential Equations (ODE)

Ordinary Differential Equations

An ordinary differential equation describes the rate of change of some variable asa function of other variables or constants. For example, imagine that a population p grows by 35% in each generation, then this can be represented by the followingdifferential equation:

where t is in units of a generation (e.g. 30 years). If we know the size of the population at some moment(its initial value, call it P0), we can make a prediction of its value PT at any future moment T byintegration, resulting in:

This formula represents exponential growth. Most systems we wish to model, however, are more complexand there does not exist a pure mathematical formula for their integration. This is particularly the casewhen there are two or more variables that interact with each other. Thus, we tend to revert to usingnumerical methods.

The simplest numerical method is Euler’s. Equation 1 is converted to describe a small change in time ∆t :

Starting with a value and , the Euler method performs a loop to calculate the value of p

at each additional increment of time:

If we want to make a projection for T = 5 (five generations) then the loop is repeated n = 500 times

. The degree of accuracy of the Euler method is determined by how large n is. In the figurebelow, the results from ∆t values of 0.1, 0.01, and 0.001 are compared against the known true value from

the exact equation . The panel on the right zooms in on the end of the sequences and showsthat the smaller the value of ∆t the closer the more accurate the result.




291

Ordinary Differential Equations (ODE) tool

Building a model

ModelRisk’s Ordinary Differential Equation (ODE) tool will numerically evaluate one or more variablesover time that follow one or more ordinary differential equations . It allows the user to define any set of

first order differential equations that can be described with Excel functions. One or more time stamps(specific points in time) can be specified for the evaluation of the variable(s). The interface will plot anyvariable against time or any two variables together. The user can also specify shock to the system at

specific points in time that change a variable or model parameter.

Example

Click here to open the model for this example. A missile is fired from a height h and an angle θ and

initial speed . The missiles speed reduces due to friction by an amount * where Friction

is a friction constant and is its speed at time t . Model the speed and location of the missile during

flight.

The diagram below shows this system. If you studied maths at college, you’ll probably remember that theeasiest way to solve this problem is to consider the movement in the horizontal (x) and vertical (y)

directions separately, since gravity only works in the vertical direction.

We can easily write down the differential equations for the horizontal and vertical speeds andlocations (x,y) as follows:




292

where g is gravitational acceleration.

To use the ODE tool we need to translate each of these equations into text using standard Excelfunctions and notation:

We will also use the following initial values (using SI units - meters, seconds):

The ODE tool can be accessed through the ‘More Tools’ list in the ModelRisk ribbon:




293

This opens the ODE Tool interface:

The interface has three sections:

Left side: Entry of time horizons, equations, initial values for variables and (more advanced) any

perturbations to the variable values during the time horizon.

Center : Graphs of the variables being modelled.

Right side: Statistics of the variables being displayed in the graph.




294

We will begin by entering the Start time (=0), Step (the size of the discrete time step used ∆ t to

approximate continuous time, = 0.01 seconds), and the ‘Time stamps’ (the times at which we would likeModelRisk to report the variable values. In this case we will just use one time stamp = 6 seconds:

Now we enter the differential equations by one of two means:

• Clicking the ‘Add’ button and then directly typing each equation into the interface; or • Placing the equations in Excel and then clicking the ‘Add’ button and then the ‘Ref’

button four times to select the differential equations in cells B3:B6:

Next we enter the initial values of parameters, by typing the name of the variable in the left column andeither again using ‘Add’ and ‘Ref’ buttons for the values, or typing those values directly into the interface:

If values are linked to spreadsheet cells, the variables can also of course be random samples fromdistributions. In this simple example, there are no discrete perturbations to the variables, so the last tableis left blank.

All the data have been entered and the ‘Errors’ message box is empty, showing that all variables have

been defined and the differential equations are syntactically correct. We can now save the model by

clicking the ‘OK’ button and following the instructions on where ModelRisk is to place the different modelcomponents.

ModelRisk will take all the inputs and arrange them together into your spreadsheet, together with anylinks you created. It has also inserted a VoseODE array function into the model, shown highlighted here:




295

Clicking the ‘View function’ icon while one of the cells covered by the VoseODE function is active willbring back up the ODE dialog box:

The two variables and have been selected from the differential dialog using tick boxes and are

therefore displayed graphically. One or two variables can be displayed at the same time and against adifferent x-axis variable selected at the bottom of the screen. In the following plot, the vertical distance y is

plotted against the horizontal distance x. This shows the trajectory of the missile:




296

The statistics show that the missile reaches a maximum height of 30.98 meters above ground, and landsabout 46 meters from where it was fired (where it crosses the x-axis).

Saving the model

You can save the model you have created by simply saving the Excel file that you have built. However,ModelRisk also allows you to save the set of differential equations in its own library. This will allow you to

pull the set of equations up again when you have anew model to build. To save the equation set to the

ODE library, enter a name for the model in the Name dialog box:

Click the Description button to add some descriptive text to explain the model:

Click OK, and then the Save button. When you next open the ODE tool interface you can click the Loadbutton and select the template from the library:




297

The ODE tool will then load the entire model you built, including time stamps and initial values, butwithout links to a spreadsheet:

Advanced feature: perturbations

ModelRisk allows the user to specify that a change can occur to one or more variables at a particularpoint in time. For example, the set of ODE equations could be a PK/PD (Pharmacokinetic /Pharmacodynamic) model describing the effect of a particular regime of drug application. Theperturbations would then represent sudden changes in the concentration of the drug due to taking a pill,

or receiving an injection.

For simplicity, we will stick with the missile model. Imagine that after 1 second of flight the non-slipcoating on the missile has worn off and the Friction parameter now changes to a value of 0.95 /s. Weenter this in the ODE interface as follows:




298

The effect on the missile path is obviously going to be rather dramatic. In terms of horizontal speed it willslow down very fast, and in vertical speed gravity will dominate. This can be seen by looking at differentgraphical plots in the ODE interface:

The trajectory of the missile will show a much steeper drop-off (compare with other plots above):

Using the ODE Tool

Using the ODE tool to find a target time

We can find out when and where the missile hits the ground by using Excel’s Solver:




299

This is determining the time stamp that gives y=0:

The model calculates that the missile will land 46.29 metres away after 4.62 seconds.

Using the ODE Tool to fit to data

Click here to open the model for this example. Imagine that we did not know the Friction constant, but

were able to track the position of the missile producing the following data and wished to estimate thevalue of Friction from these data:




300

We can set up the model with four time stamps (t={1,2,3,4}):

and click ‘OK’ to enter the function into the spreadsheet:




301

In this version of the model cell J24 calculates the sum of the squared errors between the observed andestimated x and y values. A dummy Friction value of 0.3 is entered into the model, and we can useExcel’s Solver to vary Friction until the smallest Total error is achieved:

The Solver returns the value of 0.100015, which is almost precisely the 0.1 value for Friction that was

used to generate the data used in this example.

Using the ODE tool with VBA macros and Solver

Click here to open the model for this example. Imagine that we wish to estimate the variation in range

that the missile will travel as a result of small variations in initial muzzle velocity and the coefficientFriction. We will replace the values in the spreadsheet with the following distributions:

: =VoseLognormal(25,0.7)

Friction: =VoseLognormal(0.1,0.029)

Since the model requires Solver to run, recalculating the spreadsheet several times to arrive at a solution,this model has a VBA macro that performs the following tasks:

• Recalculate the spreadsheet;

• Copy random sample values from the above distributions into the cells for andFriction;

• Using these copied values, run Solver to find the value of FlightTime that gives a yvalue of zero

The macro is run before each simulation sample of the model by specifying it in the ModelRisk SimulationSettings dialog box:




302

The Range (x) is stored as an output and the sampled values for and Friction are stored as inputs. After 5000 samples, we get the following output result:

Simulation results for the Range show it will lie between about 35m and 60m. A sensitivity plot verifiesthat the larger the friction coefficient, the smaller the range (negative correlation) and the higher themuzzle velocity the greater the range (positive correlation) which gives us some intuitive confidence thatthe model is behaving correctly.




303




304

VoseODE

{=VoseODE(Name, ArrayEqns, BeginTime,StepTime, TimeStamps, VariableNames,

InitialValues, PerturbatedVariables, PerturbatedValues, PerturbationTimes,

ShowExtremum)}

This array function returns the values of the defined variables in a set of ordinary

differential equations at a specific point or points in time.

The parameters are:

Name – a text entry, giving the set of equations a name for storage in the ODE l ibrary.

ArrayEqns – a cell range where the ODE equations are entered.

BeginTime – the time at which the system starts.

StepTime – the time increment that the function uses to approximate continuous changes.

The smaller the time increment is set, the more accurate the results will be, but at the

expense of a longer calculation time.

TimeStamps – the time (or times) at which values are required for the defined variables .

VariableNames – an array of names (in text) defining each variable.

InitialValues – an array of initial values for the defined variables. The array should be the

same length as VariableNames.

PerturbatedVariables – an optional parameter listing the names of the variables that

change value abruptly during the modeled period due to some shock. The names must

appear exactly as defined in VariableNames.

PerturbatedValues – an optional parameter listing the new values of PerturbedVariablesat the moment they change. Array must be of the same length as PerturbedVariables.

PerturbationTimes - an optional parameter listing the points in time at which

PerturbedVariables change to PerturbatedValues. Array must be of the same length asPerturbedVariables.

ShowExtremum - an optional Boolean parameter determining whether the function

returns the smallest and largest values of each defined variable over the modeled period.

Set to TRUE to show Extrema. Default is FALSE.

The VoseODE function is most easily implemented via its own interface . We recommend

that you use this interface until you are very familiar with the function.




305

Other tools

View Function

The View Function console opens the relevant ModelRisk window for the VoseFunction in the currentlyselected spreadsheet cell(s).

When multiple VoseFunctions are in the selected cell(s), you can choose the VoseFunction you want to

see the ModelRisk window for.Note that in the opened ModelRisk window, the parameters of the VoseFunction are filled in as they arein the spreadsheet. So View Function is a quick and easy way to "visualize" the VoseFunctions that makeup your mathematical model.

For example, if you call View Function for a cell that contains a VoseDistribution, you can have a quicklook at that distribution's shape and summary statistics.




306

Deduct Calculation

Introduction

The Deduct calculationwindow allows us themodify a base distributionto include a deductibleand/or a maximum payoutlimit.

The use of the deductible

means that the insurancecompany does not pay outfirst x of the damagedescribed by the basedistribution.

The use of a maximum

payout limit means thatinsurance company paysout no more than L of the damage described by the base distribution, so there is a probability spike of (1-

F(L)) at y=L.

To model this, the base distribution is truncated at the deduct value and then shifted that same value tothe left. The result is left-bounded at zero, with the probability of the distribution for values on the left of

the deductible all assigned to zero. Optionally, values on the left of the deductible can be discardedaltogether (see below) so no zeros are generated.

When a maximum limit value is set, the distribution will be truncated on the right of the value, and all theprobability for higher values assigned to the maximum limit value.

The Deduct distribution can be converted into a Distribution Object and used in the other ModelRiskinterfaces, in particular those for doing aggregate modeling.


Output functions of this window:VoseDeduct

Window elements

In the Distribution field, you can insert the base distribution to be modified. This should be a continuous

Vose Distribution Object.

In the Deductible field you can insert the deductible. This should be a real number.

In the Max limit field you can insert the maximum limit. This should be a real number.

On the preview graph in the middle, the probabilities assigned to zero and the max limit value are (bydefault) depicted by a red vertical line.

If Allow generation of zeros when below deductible is marked, every generated random value below thedeductible will be returned as a zero. If not, values below zero will be discarded: only values higher thanzero are generated (and the distribution is rescaled accordingly).




307










308

Data Viewer

ModelRisk incorporates a Data Viewer feature that allows you to quickly review your data sets prior todoing any analyses like distribution, copula or time series fitting. The Data Viewer is accessed by clicking

on the Data Viewer icon:

which opens the following interface:

The data to be reviewed need to be held in a contiguous range. One proceeds as follows:

1. Select the range of cells in which the data are held in the Select Data range dialog

2. If you wish to see labels for each variable in the analyses (recommended) then tick the Labelsbox and select the location of the data labels

3. Select whether your data are for a single variable (Univariate) or several variables (Multivariate)

4. Select whether the data are simple observations (Data) or form a time series

5. The Copying options are only relevant if you wish to create an Excel report at the end of yourreview. Select ‘Live link’ if you wish to maintain a link to the original data (useful if you think that thevalues in your data set may be revised).

6. Click OK

Depending on the options you have selected the Data Viewer will present your data differently. The fourmodes relate to the following combinations:




309

• Univariate data set

• Multivariate data set

• Univariate time series

• Multivariate time series

Each of these modes is described below.

Univariate data set

Clicking OK as above having selected ‘Data’ and ‘Univariate’ will display a window with two tabs. Thedata view tab shows a histogram of your data:

The Univariate Data Analysis tab gives a comprehensive graphical and statistical analysis of your data:




310

The left (red) and right (green) markers can be dragged across the graphs and the relevant data valuesand cumulative fractions are shown in the right Markers section of the statistics pane. One can also typein LowerX, UpperX, LowerP or UpperP values directly in this pane.

The scale of the horizontal axis can be changed by clicking this icon: .

The percentiles of the box plot can be changed by clicking .

Graphs can be copy/pasted to other applications by clicking .

The left hand pane provides reporting options if you wish to create an Excel report. Clicking the CreateReport button creates a report with a large number of statistical analyses including non-parametric

bootstrap assessments:




311

Multivariate data set

Clicking OK as above having selected ‘Data’ and ‘Multivariate’ will display a window with three tabs. The‘Data view’ tab shows a histogram of each variable and scatter plots to visualize any correlation betweenthe variables:




312

The ‘Points’ slider control allows you to vary the number of points in the scatter plots. This is useful if youhave a lot of data because a scatter plot can get too blocked to show detail.

The Percentiles tick box will toggle between showing values or percentiles in the scatter plots. It is often

easier to see correlation patterns for long-tailed variables if one uses percentiles.

The Logs tick box will toggle between showing values or log values in the histogram plots. It is ofteneasier to visualize long-tailed variables if one uses logs.

Sliders can be used to split the data into groups. In the following example, a slider splits the Edgethickness variable at a value of 3.2:




313

The scatter plots then show those points in blue and red that correspond to this split.

Double-clicking on a histogram plot will show that variable in the Univariate Data analysis window, whichis the window described above. Double-clicking a scatter plot will display the Multivariate Data analysis

window:

This window provides information on each variable and the correlation structure between the two.




314

Selecting the Enlarge matrix option will swap the location between the bottom left correlation matrix andthe scatter plot, helpful in better reviewing the correlation matrix if you have a large number of variables inthe data set.

The correlation matrix displays correlation or covariance between each variable. The slider can be usedto highlight correlations above a certain level. Selecting the |abs| option will allow you to highlightcorrelations whose absolute value is above some threshold.

Univariate time series data

Clicking OK as above having selected ‘Time Series’ and ‘Univariate’ will display a window with two tabs.The ‘Data view’ tab shows a time series of the variable. If you input a set of dates for the time series data

they will be shown on the horizontal axis:

Moving the markers in the ‘Select times’ slider will allow you to ‘play’ the time series, meaning that theseries will be presented as a video with the number of points shown controlled by the distance betweenthese markers. If you are analyzing financial data it will generally be more useful to select the LogReturnsoption.

The second tab, ‘Univariate Time series’ allows you to analyze the data in a number of key ways, whichcan be switched on and off:




315

Autocorrelation shows a correlogram displaying how the variable is correlated with values over different

numbers of lags. In the example above, there is no autocorrelation calculated to be significantly differentfrom zero with (1-alpha) probability. Statistically significant autocorrelations are shown as red bars.

Moving average simply smoothes the data set by averaging over the defined number of periods.

Moving standard deviation calculates the standard deviation over the defined number of periods. It is

useful to see whether there are periods of higher and lower random variation.

Linear regression fits a straight line to the series and reports the slope (b) and intercept (a) values onthe graph and in the related fields.

Remove seasonality shows estimated seasonality factors and optionally removes these factors before

the other analyses of the data.

LogReturns allows the user to analyse and graph the data in terms of the log return (essentially the

proportional change from one period to the next). This is very useful for the prices of stocks and financialinstruments but also for any other variable for which movement is likely to be proportional to its size (like

population sized).

Multivariate time series data

Clicking OK as above having selected ‘Time Series’ and ‘Multivariate’ will display a window with threetabs. The ‘Data view’ tab shows a time series of each variable and scatter plots to visualize anycorrelation. If you input a set of dates for the time series data they will be shown on the horizontal axis:




316

Moving the markers in the ‘Select times’ slider will allow you to ‘play’ the time series, meaning that theseries will be presented as a video with the number of points shown controlled by the distance betweenthese markers. This is particularly useful to get a feel for whether a correlation structure is fixed or varies

over time. If you are analyzing financial data it will generally be more useful to select the LogReturnsoption.

The second tab is the Univariate Time Series window, which is the same as described above.The third tab is the Multivariate Time Series window, which plots any two variables together. Variable canbe shown as a scatter plot:




317

or time line:

A new pair of variables can be selected by either double-clicking a scatter plot in the Data View tab, orclicking a cell in the correlation matrix.




318

Extreme Values Calculation

Introduction

The Extreme ValuesCalculation calculatesthe distribution of thesmallest and/or largestsample(s) from anumber N of samplesdrawn from a given

distribution.

For distributions of theexponential family (likethe Gamma), it is well-known that the extreme

values asymptoticallyfollow an Extreme ValueMin or Extreme ValueMax distribution.

In the case of a generaldistribution, and for large N this can get extremely complicated to model, though. This is exactly what theExtreme Values calculation from ModelRisk does, in a fast and accurate way.

This is useful for modeling scenarios where special attention has to be given to extreme cases, as is oftenthe case in insurance modeling.

Note how you can insert quite a large value for N (e.g. N=100000). Risk analysts who have modeled this

type of situation "manually" before will appreciate the speed with which this calculation is done.



VoseLargest, VoseLargestSet, VoseSmallest, VoseSmallestSet, VoseKthLargest, VoseKthSmallest, VoseExtremeRange

Window elements

In the Extreme values parameters section, three (or sometimes four) fields are provided.

In the Extreme values selection field, you can provide the type of extreme values to calculate thedistribution for. Note that these make use of VoseFunctions that can be used separately for modeling as

well. The options are:

• VoseSmallest - calculates the distribution of the smallest of N samples drawn from the

selected distribution.

• VoseLargest- calculates the distribution of the largest of N samples drawn from theselected distribution.




319

• VoseKthSmallest- calculates the distribution of the K th smallest of N samples drawn

from the selected distribution. The parameter K can be provided in the field that appears when

this option is selected.

• VoseKthLargest- calculates the distribution of the K th largest of N samples drawn from

the selected distribution. The parameter K can be provided in the field that appears when thisoption is selected.

• VoseSmallestSet - calculates the distributions of the smallest, 2nd smallest, ... up till the K th smallest of N samples drawn from the selected distribution. The parameter K can be providedin the number of values field that appears when this option is selected.

• VoseLargestSet- calculates the distributions of the largest, 2nd largest, ... up till the K thlargest of N samples drawn from the selected distribution. The parameter K can be provided inthe number of values field that appears when this option is selected.

• VoseExtremeRange (default) - calculates the distributions of both the smallest and thelargest (red) of N samples drawn from the selected distribution. Note that both distributions are

correlated - the smaller N, the more correlated they will be

In the Distribution field, you can provide the Distribution (as an Object) from which to take the N samples.This can be filled in manually, chosen from a list or from the spreadsheet.

In the N field, you can provide the number of samples of which the extreme value distribution should be

calculated.

When appropriate, a K or number of values field appear, of which the purpose is explained above.




320

Find Vose Functions

Introduction

Use FindVoseFunction to

look up and selectall the spreadsheetcells containing acertain

VoseFunction, andthen apply anydesired formatting tothese cells.

Spreadsheetmodels can easily

get very complex,with many cellscontainingdistributions rather

than deterministicvalues/formulas, other cells containing inputs, other cells containing outputs of interest, etc.

It is good practice to differ between all of these visually: apply a different formatting to cells depending onwhat they contain.

The Find VoseFunctions window can be a help in keeping your Risk Analysis models clear and easy to

maintain.

Window elements

In the Find field you can indicate what VoseFunction to search for. If left blank (default) all VoseFunctionscorresponding to the selected check boxes will be searched.

Mark to appropriate Area to indicate whether you want to look in the active sheet only (default) or in theentire workbook .

By marking the Categories checkboxes you can indicate to look for entire categories of VoseFunctions.For example, if you mark Distributions then cells containing VoseNormal, VoseBeta, etc... will beincluded.

Press the Find Next button to select the next spreadsheet cell that matches the selected criteria.

Press the Apply button to apply the selected formatting (fill color, and font formatting) to all spreadsheet

cells matching the selected criteria.

With the Undo button you can undo the applied formatting.




321

Vose Ogive window

The Vose Ogivewindow provides an

aid for inserting anOgive distribution inthe model. While thiscan be done with the

Insert Distributionwindow as well, thiscan be impracticalbecause of the manyparameters this

distribution takes. After all, it is entirelydefined by an array ofdata points.

To see the output

functions of thiswindow, click here.

Output functions of this window: VoseOgive, VoseOgiveObject, VoseOgiveProb, VoseOgiveProb10

Window Elements

In the Ogive parameters are you can enter a minimum and a maximum for the distribution, as well as anarray of values. Multiple values can (and typically do) occur in this array, and will be assigned a greaterprobability in the Ogive distribution.

Mark the Uncertainty checkbox to use bootstrapping techniques to include uncertainty in the constructed

distribution. This would be the Ogive equivalent of the Uncertainty parameter that is found in allModelRisk distribution functions.

If uncertainty is included, you can mark the number of chart lines to display on the preview graph.




322




323

Simulation Settings Window

The simulation settings window allows you to control how ModelRisk will run a simulation. It is accessed

by clicking on the ModelRisk ribbon. The window has two tabs: Model Settings, which

controls the next individual simulation run; and Application Settings which defines the default settingsfor a number of ModelRisk parameters.

Model Settings tab

Simulation Setup group

Samples determines how many samples will be run for your model;

Simulations determines how many simulations will be run, each with the number of samples defined bythe Samples dialog. The usual value to use is 1 but, in conjunction with the VoseSimTable and/or theVoseCurrentSim functions, you can build models that will run several different scenarios together usingthis feature;

Simulation names optionally allows you to give a name to each simulation for easier identification. Either

enter a list of names, e.g.

or refer to a spreadsheet range which will copy the list in for you. If used, the name will appear in the

Simulation # drop-down control in the Simulation Results window.

Options group




324

Refresh every … samples will update the Excel screen with the current simulated values every …samples. The option is switched off by default because it will slow down the simulation speedconsiderably, particularly if a small value (e.g. less than 1% of total samples to be run). If you wish to seenumbers change on the screen (which can certainly capture people’s attention, particularly if one hasimbedded graphs that change too), consider leaving this switched off for longer simulation runs and using

the Update Screen control (the blue button at the bottom of the Simulation Progress control) which allowsyou to toggle switch screen updating on and off:

Stop on output error is a useful feature to debug your model. ModelRisk will stop simulating and showthe scenario that produced the error.

Show results window at end of simulation will automatically pop up the Simulation Results windowonce ModelRisk has completed a simulation. This should normally be switched on, but you may want todeselect this option if, for example, you are only interested in the mean simulation result displayed byusing VoseSimMean.

Seed group

Seed values are used to control how ModelRisk generates random samples for its stochastic variables.They are particularly useful if you wish to reproduce a set of simulation results.

Seed generating: Random will randomly select a different seed value. You can use this option, forexample, to run multiple simulations of the same model with different seeds to see if there is anyappreciable effect on the simulation results by using different random numbers;

Seed generating: Manual will use the seeds specified in the Seed Value(s) list

Seed Value(s) allows you to enter specific seed values. If you enter a list of values, the first value will beused in the first simulation run, etc. Seed values should be an integer between 0 and 4294967295 (2^32-1)

Multiple simulation seeds has three options:

All use same seed : all simulations will use the first seed value specified in the Seed Value(s) field. If you

use the same number of random variables in each simulation version of your model, this ensures that thedifference is not due to random sampling

Use different seeds – First must be defined: the first simulation uses the first seed value specified in theSeed Value(s) field, and the remaining simulations will use different values. This helps evaluate whetherthe model results are materially affected by the random values being generated

Use different seeds – All must be defined: seed values for each simulation will use the list specified in the

Seed Value(s) field. This can be useful if you wish to check the effect of using different seed values onyour model, but is not commonly used.

Go To Sample:

Switching ON the Go To Sample feature will allow the user to load any simulation sample into thespreadsheet after the simulation has been performed. Switching this feature on might increase the




325

simulation start time, and if this feature is not required, the user can switch it off to speed up thesimulation.If the feature if turned on, the Go To Sample functionality becomes available in the Data list view of theResults Viewer window.

Warning: The Go To Sample feature will not work properly if the spreadsheet model has any non-ModelRisk volatile functions (like Excel's RAND() function for example)

Save and Load group

You can save simulation settings, and load previously saved settings. The simulation settings file takes a.vmro extension.

Macro Settings Tab

The Macro settings tab controls VBA macros that are run before and after simulation samples andsimulation runs. The tab allows specifying 4 places at which the VBA macros can be run:

1. Before each simulation

2. Before each sample

3. After each sample

4. After each simulation

The VBA macros can be chosen from the list of available macros saved in the current workbook:




326

Application Settings tab

Application settings control some basic ModelRisk default actions.

General group

Show User Account window determines whether the Vose User Account manager will appear on start-up.

Save Results determines whether, when one saves a spreadsheet model for which a simulation has

been performed, ModelRisk will prompt the user whether to save a simulation results file.

Simulation Settings group




327

Samples is the default number of samples, which appears in the ModelRisk toolbar or ribbon:

The default number of samples is easily over-ridden by typing a different number in this window, or in the

Model Settings window.

Stop on output error determines the default setting, which can be over-ridden for an individual modelrun in the Model Settings window.

Show results window at end of simulation determines whether up the Simulation Results window willautomatically pop once ModelRisk has completed a simulation. The setting can be over-ridden for an

individual model run in the Model Settings window.

The Global timeout sets the maximum number of seconds for any ModelRisk function to calculate itsvalue. It's useful when one uses calculation-intensive functions and accidentally enters parameters thatresult in a very long calculation loop.

Simulation Results group

Histogram Bars determines how many bars there will be in a histogram plot of simulation results. The Auto option will select the number of bars to reflect the level of detail implied by the number of samples

taken in a simulation run. The number of bars can still be changed for individual histogram plots in the

Simulation Results window.

Histogram View determines whether the default plotting of histograms should be bars or lines. This canstill be changed for individual histogram plots in the Simulation Results window.

Histogram sliders determines the default location for sliders in terms of cumulative probabilities. Thiscan still be changed for individual histogram plots in the Simulation Results window.

Box-plot percentiles determines the default percentiles plotted around the 50th percentile for box-plots.

This can still be changed for individual histogram plots in the Simulation Results window.

Trend-plot percentiles determines the default percentiles plotted around the mean for trend plots. Thiscan still be changed for individual histogram plots in the Simulation Results window.

Library group

Allow the user to set the location of the ModelRisk Library on a local computer.




328

Output/Input Window

The Output/Input interface can be accesses from the Excel 2007 ribbon by

clicking:

Or in Excel 2003 by clicking this icon on the ModelRisk toolbar:

The interface looks as follow:

Target cell(s) Location

Defines the cell that is to be labeled. By default this lists the location of the active Excel cell

or cells prior to clicking the Output/Input icon. This can be changed by clicking on the

‘Select from Excel’ icon.

Name(s)

Insert a name for a single target cell, or a set of names for an array of target cells.

Typically, names are defined by making reference to labels within Excel using the ‘Select

from Excel’ icon.

Range Name

If a range of cells has been defined in the Target cell(s) Location field, then it is useful to be

able to assign a name to the entire series (e.g. yearly sales, combined with Names like

{2010,2011,…})

Position in Range




329

Normally not used. One can manually define an array by marking each cell separately,

assigning the same Range Name and, in this field, stating the position of the cell (e.g.

1,2,3,…) relative to others in the same range.

Type

The user selects whether the cell should be an output or an input. The cell, or range of cells,

will be displayed in either the Outputs or Inputs section of the ModelRisk Results Reader.

Examples

which adds the Output labeling function:

VoseOutput("Total")+

to Cell E25.

which adds the Input labeling function:

VoseInput(A8,,"Sales",1)+





330


…

to Cells E8:E23.

etc

which adds the Input labeling function:

VoseInput("Period 1",,"Costs",1)+

VoseInput("Period 2",,"Costs",2)+

…

to Cells J6, L6, J8 and L8.

Together, these give the following Outputs and Inputs listing in the Results Viewer:




331

And the two series can be plotted in Trend plots, as follows:




332




333

Simulation Progress Control

Clicking on the ‘Start’ button, or ‘Start Simulation’ in the Start drop down

menu, will begin a simulation run.

The simulation progress control appears when one is running a simulation.

The control provides information on:

• How many simulations have been completed;

• How many samples have been completed for the current simulation;

• The number of samples that are being generated per second. This is useful if one is

comparing the performance of different ways one might build a model, or the relative

performance of different computers;

• The time that has elapsed since the simulation run started;

• The estimated time still needed to complete the simulation run; and

• The estimated time at which the simulation run will finish.

The three buttons at the bottom allow you to control the running of the simulation:

pauses the simulation run. When clicked it toggles to which will then continue the

simulation run




334

stops the simulation run. The results of the optimization completed up to that moment

are then displayed in the Simulation Results window.

refreshes Excel with each sample, so you can see values being generated in real time.

When clicked it toggles to which will then suppress the Excel refreshing.

You can also have ModelRisk refresh the Simulation Results window with each sample from

the beginning of a simulation by selecting the Start Demo option under the Start menu:




335

ModelRisk Results Viewer layout

The ModelRisk Results Viewer will open simulation results files produced

by a ModelRisk user.

On the left is a list of the Outputs and Inputs of the simulated model which have been

defined by the VoseOutput and VoseInput functions. On the right is the selected graph.

Graphical reports

The ModelRisk Results Viewer opens the file with the graphs and statistical reports from a

simulation run. There may be several graphs and reports, one in each tab shown at the

bottom of the screen. The graph type can be changed by clicking any of the graphical icons:

These will display, in order:




336

Histogram plots;

Cumulative ascending plots;

Cumulative descending plots;

Box plots;

Pareto plots;

Time series plots;

Spider plots;

Scatter plots; and

Tornado plots

Click the link for each plot type to view a detailed description of its use and meaning.

General Controls

The set of general controls is available on the Home Ribbon:

The View section of the Home Ribbon has the following control(s):

• Full screen: toggles the full screen mode for the active chart

• Outputs, Inputs, Statistics checkboxes: these are used to show and hide

corresponding panes with the lists of Outputs, Inputs and with Statistics for the

variables on an active chart.

• Windows: expands a drop-down menu with the options to duplicate a current

chart, rename a chart tab, and activate a chart tab.

The Variables section of the Home Ribbon has the following controls:

• Filter




337

Simulation results can be filtered so that one can look specifically at sets of generated

scenarios, as follows:

(1) Select the input or output of interest

(2) Click the filter icon . This opens a dialog box:

(3) Select how you wish to filter the simulation data. In this example, the results are filteredto show generated scenarios in which the selected output’s value is less than or equal to

zero. Click OK.

(4) The results shown are now filtered as required. The figure below shows the modified

histogram for the output, and also a small filter icon against the Output listing to show that

a filter is active:

Hovering over the filtered output with the mouse shows the filter that has been applied as a

tool tip pop-up:

To edit or delete the existing filter just click Filter button while filtered output is selected.

The following window will appear:

• Sort: Allows sorting the list of Inputs and Outputs.

Sort icon:

The Chart section of the Home Ribbon has the following control(s):




338

• Copy

• Print

• Zoom

Their functionality is explained below:

Editing, copying, zooming and printing graphs

Each ModelRisk result graph can be edited by right-mouse clicking over graph components

like titles. The user can zoom in on a section of the graph by clicking and then

selecting a region to display.

Graphs can also be copied at a Bitmap or Metafile by using following menu:

Graphs can be printed by clicking

The Report section of the Home Ribbon has following control(s):

• Report

Clicking on the Report icon:

opens the following dialog box:




339

Selecting ‘Report selected Charts’ will create a report in Excel that is a replica of the pages

the user has created in the ModelRisk Simulation Results window. Ticking the ‘Charts’ box

will place the charts you have created in Excel. Ticking the ‘Values’ box will place into thespreadsheet all the data used to create the reports which can be used for further analysis if

required.

Selecting ‘Report all variables’ will generate the ticked reports for all inputs and outputs.

One should be careful using this second option if there are a lot of inputs and outputs

because it will generate a very large file.

The Advanced tools section of the Home Ribbon has following control(s):

• Go To Sample: If the simulation is performed with the Go To Sample feature

turned on, the Go To Sample functionality becomes active.

Go To Sample icon:

Statistical and data reports

ModelRisk offers three kinds of statistics and data reports:

Table of all generated input and output values

Clicking on the Data List icon:




340

opens a list of all generated values, sorted by the order in which they were generated:

Clicking on a column selects the data. Right-click then allows one to copy these data andthen paste into another document (Word, Excel, etc) for further analysis. CTRL-Click allows

you to select several non-contiguous columns of data. SHIFT-Click allows you to select a set

of contiguous columns.

Clicking the header allows sorting the data according to the selected column. The arrow

pointing down and up indicates Descending and Ascending types of sorting correspondingly:

If the simulation is performed with the Go To Sample feature turned on, the Go To Sample

functionality becomes available in the pop-up menu if you right-click on a specific value:




341

The Go To Sample feature allows loading the selected sample into the spreadsheet model

and reproduce the exact simulation sample in full, i.e. all model cells will show exact same

values as during simulation at the selected sample. This is useful when, for example, one

wants to see how exactly the largest (smallest) value of the output was produced and what

were the values of other intermediary calculation cells.

Table of statistics

Clicking on the Statistics icon:

opens a list of statistics for the selected inputs and outputs:

Clicking the Options button allows you to increase the number of percentiles reported.

Pages (tabs)




342

Right-clicking any tab name allows you to rename the tab, or to make a copy. Making a

copy is useful if, for example, you wish to show two slightly different versions of the same

plot e.g. the same tornado plot but with one variable removed, or based on a different

output statistic.

If there are two or more pages present, right-clicking a page’s tab will also allow you todelete the page.

Saving the report

Once you are satisfied with your report you can save it as a file independently of your modelby clicking the save button and selecting a destination folder and file name:




343

The simulation results are stored with a .vmrs (Vose ModelRisk Simulation) extension.

The simulation results file can then be reloaded without opening the simulated model laterby clicking the open button and browsing for its location.




344

ModelRisk’s Library

The ModelRisk Library provides the ability to organise your risk analysis work in four ways:

• Projects – organise models and simulation results by project

• Assumptions – store assumptions that you commonly use in your models

• References – organize references (files, web addresses) that are used in your models

• SID – organize SIDs (Simulation Imported Data Files)

Projects




345

One will frequently have several risk analysis models within an individual project. Each

model will usually go through several dif ferent versions as the project develops, and for

each model one might run several different analyses. The ModelRisk Library Projects tab is

designed to help make organization of all these files much easier, and has the added benefit

of providing a back-up facility in which you can quickly collate a project’s files into one

folder and export.

Creating a project

One first creates an assumption by clicking which opens the following dialog:




346

The project name appears in the left column of the Projects tab. The project description

appears as a tool tip when one clicks on the project name:

Adding models to the project

In order for a model to be added to a project it must be loaded into Excel first. This is

because ModelRisk will add some information to the Excel file in order to identify it with aproject.




347

Once a model has been loaded into Excel, select the project to which it is to be added and

click [Note: Models can be opened in Excel without having to close the Library

window.]. This will open the following dialog:

One can enter any text for the model and description fields. On the right side one must

select from among the models that are currently loaded into Excel. Click OK and the modelis now added to the project:

Adding simulation results to the project




348

Once a model has been added to the project it can be loaded at any time into Excel by

double-clicking the icon in the ModelRisk Library’s Projects tab.

One can then run a simulation on the model and save the simulation results by adding them

to the Library by selecting the model and clicking .

The results are then displayed in the Library:




349

The saved simulation results can be retrieved at any time by double-clicking the Library

entry, and does not require reloading of the model.

Adding different versions of a model to the project

Models usually go through an iterative process of development, where content is added or

changed. It is good practice to save versions of the model as it progresses. It allows one to

review the effect on results and to go back to previous versions where an error has

occurred. New versions of a model are also often created to explore different scenarios.

ModelRisk allows one to save different versions of a model during its evolution. When you

use a model that is already registered as part of the Library and make changes to it, on

quitting the model ModelRisk will ask whether you wish to overwrite the registration in the

Library or save it as a new version of the registered model.

Assumptions




350

Your company may have a number of assumptions that it wishes to be used across all

models. For example, there might be an official company forecast of oil or steel prices,

exchange rates or the cost of some product you manufacture. These official assumptions

may be fixed values (deterministic) or include uncertainty (stochastic). The Assumptions tab

within ModelRisk’s Library allows you to directly use these official assumptions within your

model with a click of an icon. You can import any updates to the assumptions and future

runs of your model will then be automatically updated.

Creating an assumption

One first creates an assumption by clicking which opens the following dialog:




351

The assumption name and description can be any text, but it is useful to bear in mind that

the assumption list can be ordered alphabetically by either of these fields.

The assumption value can be a fixed value, for example:

2.87

1E8

27%

or a ModelRisk stochastic Object, for example:

=VoseNormalObject(2,3)

=VoseTimeGBMObject(0.02,0.1,100,)

=VoseCopulaMultiClaytonObject(10)

or simply some text, if required.

One can copy an html address into the ‘Assumption source’ field, in which case it will store

the source as a hyperlink as shown for the second to fourth references in the screenshot

above, or one can browse for a file location.

Clicking OK then adds the assumption to the assumption list. Individual reference can be

deleted by selecting the appropriate row and clicking .

Assumptions can be edited by right-mouse clicking an entry and selecting ‘Edit this

assumption’.

Adding an assumption to your model

Once an assumption has been incorporated into the library, it can be inserted into your

spreadsheet model, as follows:

• Click on a spreadsheet cell

• Click the Library icon

• Select the Assumption tab and then the required assumption

• Click the Insert button (or the icon if one wishes to put the reference in a different

cell to the one currently selected, or use the right-mouse click menu)




352

The formula in this cell now becomes:

=VoseLibAssumption("A0777432")

where ‘A0777432’ is replaced with the unique identification code for the assumption you

have selected. The VoseLibAssumption function returns whatever value it has been

assigned. Its purpose is to provide a direct link to the selected assumption. When browsing

this cell with ModelRisk’s View Function tool, it will display as follows:

Clicking the VoseLibAssumption hyperlink will automatically open the Assumptions tab of

the Library and select this assumption.

Clicking the hyperlink entry in the Source column will directly open the web page in your

default browser. If the source is a file, clicking will open the file in the appropriate

application.

References

You may have a number of references that you’ve used in quantifying the variables within

your model, that refer to some theory you are using, or that are pertinent to your model in




353

any number of ways. The Reference within ModelRisk’s Library tab allows you to store

references to documents or html addresses. Using the VoseLibReference function a

reference can then be attached to a cell of your model, allowing one to quickly review the

reasons behind certain values or assumptions.

Creating a reference

One first creates a reference by clicking which opens the following dialog:

The reference name and description can be any text, but it is useful to bear in mind that the

reference list can be ordered alphabetically by either of these fields.

On can copy an html address into the ‘Reference source’ field, in which case it will store the

source as a hyperlink as shown for the first two references in the screen shot above, or one

can browse for a file location.

ModelRisk then adds the reference to the reference list. Individual reference can be deleted

by selecting the appropriate row and clicking .

Adding a reference to your model

Once a reference has been incorporated into the library, it can be inserted into your

spreadsheet model. For example, imagine in some cell we have the formula:

=VosePoisson(127.4)

For someone not familiar with a Poisson distribution, we might want to point them to a

particular reference. Reference #1 in the above list is the Wikipedia entry for the Poissondistribution. A reference entry can be inserted into Excel as follows:

• Click on the spreadsheet cell with the Poisson formula;

• Click the Library icon

• Select the References tab and then the required reference

• Click the Insert button (or the icon if one wishes to put the reference in a different

cell to the one currently selected, or use the right-mouse click menu)

The formula in this cell now becomes:

=VosePoisson(127.4)+VoseLibReference("94D047C3")




354

The VoseLibReference function has no effect on the cells calculation: it returns a value of

zero in the spreadsheet. Its purpose is to provide a direct link to the selected reference.

When browsing this cell with ModelRisk’s View Function tool, it will display as follows:

Clicking the VoseLibReference hyperlink will automatically open the References tab of the

Library and select this reference.

Clicking the hyperlink entry in the Source column will directly open the web page in your

default browser. If the source is a file, clicking will open the file in the appropriate

application.




355

Portfolio Optimization

The Portfolio Optimization windowuses the Capital Asset PricingModel (CAPM) to find the compositionof a portfolio of assets that has optimalreturn rate for minimal variance (i.e.

sensitivity for market risk).

In the view of this model, two types of

risk are at play for assets:

• The non-systematicrisk attached to an individualasset. This can be reduced (tothe point where it isneglectable) by diversifying

the portfolio, so this risk isalso known as diversifiablerisk.

• The systematic risk,

caused by the uncertainty of the market. This can be thought of as the risk that is still there whenadding the asset to a portfolio that is already well diversified. This type of risk is called the non-

diversifiable or market risk.

Sensitivity for the second type of risk (which is the most important, as the first can be diversified away),called the variance of the portfolio, is represented by beta coefficient in finance. An optimal portfolio isone that has the lowest variance - lowest beta coefficient - for a given return. In a variance-return plot,these optimal portfolio combinations make up the efficient frontier .

As total budget to invest is often a constraint when composing a portfolio, the quantities of each asset thatcomprise it are expressed in weights (proportions of the total budget). The budget constraint is accountedfor in the fact that the weights sum to one.

One other component can be incorporated. Rather than investing the entire budget in assets, one mightkeep part of the budget in cash, earning an (albeit low) interest at the risk-free return rate. The variance-

return relationship of this is linear, and represented as the Security Market Line (SML).

Both components are optimally accounted for in the Tangent Portfolio: where the SML and efficientfrontier meet.

So, in summary: the Portfolio Optimization window finds the optimal set of asset weights for a

given portfolio, taking into account market risk, correlation between the assets, the "risk-free"interest rate of the assets, and of course the returns and deviations of each individual asset. Forcalculating this optimum, the Tangent Portfolio, the CAPM model is used.


Output functions of this window:VoseTangentPortfolio




356

Window elements

In the output range field, you can specify where thecalculated asset weights are inserted in thespreadsheet (upon pressing the OK button). Thisshould be a 1xN array, where N is the number ofassets.

In the Number of Assets field, you can specify the

number of assets to be included in the portfolio. (5 by

default) This should be a positive integer.

In the Interest Rate field, you can specify the risk-free interest rate. This should be a real numbergreater than zero.

In the "asset matrix" shown, you can specify the Labels, Returns and Deviations attached to eachindividual asset.

The correlation matrix of the portfolio is shown. This matrix's elements can be obtained from the

spreadsheet (should be an NxN array), or specified withinin the Portfolio Optimization window.

On the right, the individual assets, the efficient frontier and security

market line are shown in a variance-risk plot. By default the assetsare colored in green, the efficient frontier in blue and the securitymarket line in red.




357

Data Object Window

Introduction

Many datasets are simply not practical to import into Excel (especially if the datasets are very large andfrequently updated) and, if done, we might not want them to continuously perform (for example) aregression analysis each time a model opens. To address this issue, a unique ModelRisk Data Objectfunctionality has been developed that has the ability to:

• Import data from common databases in an easy-to-use manner;

• Construct custom SQL queries with filtering and sorting;

• Create links to spreadsheets with data that are not loaded in the memory;

• Easily Update queries when needed;

• Create Data Object functions that refer to these datasets and place them in a spreadsheet;

• Link ModelRisk function to Data Objects.

The DataObjects collate the data from relevant databases to be called by the calculation routines and thegraphical user interfaces. This eliminates the need for the user to have to open and query numerousdatabases.

VoseDataObject functionality can be used in several types of ModelRisk functions:

• Probability calculation functions, e.g. =VoseNormalProb(VoseDataObject(<link to data source>),Mean, Stdev, Cumulative)

• Distribution fitting functions, e.g. =VoseBetaFit(VoseDataObject(<link to data source>))

• Copula fitting functions, e.g. =VoseCopulaMultiClaytonFit(VoseDataObject(<link to datasource>))

• Time series fitting functions, e.g. = VoseTimeGBMFit(VoseDataObject(<link to data source>))

Window elements

ModelRisk Data Object interface allows easy linking to data located in Excel worksheets or in databasesthat support ODBC driver connection.




358

Vose Data Object main window

view

In the “Define Data Source” field you can define the source of the data. The two buttons on the right ofthis field allow the creation of a new data source, which can be either a link to worksheet range (left

button), or a link to a database (right button).

If connection to a database needs authorization, check on the “Authorization

needed” field and fill in the details for the “Login” and “Password” fields.

Checking the connection with the data source can be done by clicking the “Verify database connection”button. If the check was done successfully, you will get a confirmation message.

Linking to data in the databases can be done by typing the SQL queries directly into the Query string fieldor using the Query constructor (click “Wizard” button).

Query constructor window view.




359

Query constructor window has three tabs:

“Select data source”

This tab is for constructing the main query line. The “Database table fields” lists all database tables andfields that the user can connect to. Just move all required fields into the “Selected fields” list. The “Query

string” field below will show the main query line for the selected data.

“Define filter options”

This tab is for filtering the selection made in the first tab.

Filtering consists of two levels of filters: Joining condition and Filter condition.

In the Joining condition you can specify the logic for combining the filters by selecting necessary valuefrom the list:

Filter condition is set by the left argument (“Condition Left argument” field), comparison sign (“Condition”field), and the right argument (Condition Right argument” field). Arguments can be single values as wellas database table fields.

To select a database table field as a condition argument, the user should click the following button:

Comparison sign should be picked up from the list (“Condition” field):




360

When the filter is created it should be added to the filters list by clicking “Add filter to list” button. To delete

it from the filters list, select it and click the “Delete filter” button.

Query string with the comprised filters is reflected in the “Query string” field.

“Define sorting options”

This tab allows adding sorting options to the selected entries. The left pane (Database tables fields”) listsall fields that are available for sorting. To sort the data, select the fields that need sorting and move themto the “Sorted fields order” list choosing the sorting direction in the control above.

The final query string will be reflected in the “Query string” field.

When the query is constructed, press the “OK” button and you will get back to the main window “VoseData Object”.

If desired, you can click the “Run query” button. This will run the constructed query provided that queryhas been constructed correctly. The “Query results” window will then display the query results in tabular

form.

Query results window view




361

The tabulated data can be exported into Microsoft Excel or Microsoft Word by checking the requiredExport type and clicking the “Export” button.

Attention: avoid exporting large data sets to Word, as it can take long time.

By closing the results window, you will get back to the main window “Vose Data Object”. After clicking the

“OK” button, a VoseDataObject() function with the parameters (reference to database/range on theworksheet , selection query etc.) will be placed in the range that was specified in the “Output location”field.




362

Ruin Calculation

Introduction

The Ruin Calculation models

scenarios for the cash flow thatcomes with an insurance policy: theavailable funds are decreased by

payment events of random size thatoccur randomly in time, andincreased by selling policies of fixed

size.

A time horizon is set, and Ruin

Calculation models whether or notwe have a Ruin (i.e. funds droppingbelow zero within the time horizon).

A dividend threshold can be set.

When the budget exceeds thisthreshold, a dividend is paid out, andthe available funds remain at the

same threshold level.

The discount rate at which the value of the funds decreases over time can be set. This discount rate istaken into account when calculating the Net Present Value (NPV) of the policy in this scenario (i.e. the

total dividend we have).

To see the output functions of this window, click here .

Output functions of this window: VoseRuin, VoseRuinFlag, VoseRuinSeverity, VoseRuinTime

Window elements

In the Source data region, you can set the following quantities, as described above:

• Claim Interval - a discrete distribution object.

• Claim size - a continuous distribution object

• Initial reserve - the funds available at point zero in time. This should be a real number

greater than zero.

• PolicyPrice - the income generated by selling an individual policy. This should be a real

number greater than zero.

• Horizon - the time horizon against which to compare whether a ruin event occurs or not.

• DividendTreshold - The level of funds above which they are used for dividend.

• DiscountRate - The rate (in fraction-of-total per time unit) at which the value of the

funds/dividends decreases over time. This should be a real number greater than or equal to one.(it is typically a small number).




363

The graph displays the time (horizontal axis), fund level (left vertical axis), dividends paid (right verticalaxis), dividend threshold (horizontal red line), fund level at every point in time (black curve with dots forevery funds-altering event) and time horizon (green verticalline).




364

Depletion Calculation

Introduction

With the Depletion Calculation

window you can model the depletionof resources by a given timehorizon.

This is modeled with costs ofvariable amount (Claim size distribution) occurring randomly intime (Claim interval distribution).

If the resources are depleted beforethe set time horizon, the shortfall (i.e. additional sources needed topay the last incoming payment

event) is calculated as well.

The output of this calculationconsists of three parts:

• The time until depletion (if it occurs).

• A TRUE/FALSE flag, answering the question "does depletion occur before the set time

horizon".

• The shortfall (if it occurs).



VoseDepletion, VoseDepletionFlag, VoseDepletionShortfall, VoseDepletionTime

Window elements

In the Source data region, four parameters can be

specified.

In the ClaimInterval field, the Distribution Object to

model the time interval between payment events canbe specified. Typically (and by default), this is anexponential distribution.

In the ClaimSize field, the Distribution Object to modelthe size of payments can be specified. By default this

is a lognormal distribution.

three outputs are shown, as they are under this generated scenario:

• DepletionTime - the time until the resources are depleted, if they are before the time

horizon. If the resources are not depleted before the time horizon, -1 is returned.

• DepletionFlag - a boolean variable (TRUE/FALSE) that returns whether or not the

resources are depleted before the time horizon.




365

• DepletionShortfall - the shortfall (i.e. the extra amount of resources that would be

needed to complete the final payment)

These outputs will be inserted as a 3x2 array in the spreadsheet, upon pressing OK.




366

Integrate Calculation

Introduction

The Integrate Calculation allows oneto numerically integrate a real,continuous, univariate functionbetween user-specified min and max

boundaries.

The numerical integration performedis based on the Gauss-Kronrod

quadrature formula.

In the integrand, the variable to beintegrated over is presented by a #.Excel's mathematical functions (e.g.SIN() can be used in the formula,

including VoseFunctions, so a validintegrand would be for example:

VoseNormalProb(#,10,1,0)*4*VoseLognormalProb(#,4,5)

Note that the integrand formula is not preceded by a '=' sign, as it is an argument of the VoseIntegrate

function that is the output of this window.

Window elements

In the Expression field, you can specify the functional form of the integrand. Use a # symbol for theintegrated variable.

In the Min field, the lower integration boundary can be set.

In the Max field the upper integration boundary can be set.

The Steps parameter is an optional integer used to determine

how many sub-intervals are made within each intervalapproximation as the function iterates to optimized precision.

Note that this is not the same as the number of iterations doneto achieve the numerical result - a steps value as low as 5 willstill give extremely accurate results.




367

Interpolate Calculation

Introduction

The Interpolate calculation useslinear interpolation to return a

dependent variable value given dataand an independent variable value.

The function searches for thenearest values in independentabove and below Value, finds the

corresponding values in dependentand interpolates between them.


Output functions of this window:VoseInterpolate

Window elements

Interpolate parameters

In the Value field you can specify for which independent value you want to calculate the interpolated

dependent value.

In the Independent field you can provide the array of independent values.

In the Dependent field you can provide the array of dependent values. Note that both these arrays need

to be of the same size.

Result, errors and output location

In the result field, the calculated interpolated value is displayed.

This value can be inserted in the spreadsheet cells provided in the Output location field, by pressing theOK button.




368

Correlation Matrix Calculation

Introduction

The Correlation Matrix Calculation window calculates and visualizes the rank order correlation matrix of adata set.

The correlation matrix containsSpearman's rank order coefficient(also known as rho) for each pair ofdatasets.

It is symmetric because correlationbetween A and B is the same as

correlation between B and A. It'selements lie in the [-1,1] interval.

As the correlation of a variable withitself is 1, the diagonal elements of

the matrix are all equal to 1.

The output of the function is an nxnarray where n is the number ofvariables in the data set.


Output functions of this window:VoseCorrMatrix

Window Elements

In the location field you can indicate the range of spreadsheet cells that contain the data. You can specifywhether these are orientated in columns (selected by default, as this is usually the case) or rows. Thenumber of columns (respectively rows) is the n mentioned above.

The correlation matrix of the data is shown. Its elements are the pairwise correlation of each of the

variables within the dataset.

Optionally, in the Labels field you can specify where the labels of the dataset are in the spreadsheet. If nolabels are selected, the datasets will be named Var1, Var2, etc..

In the Output location field you can specify where the correlation matrix should be placed in thespreadsheet. It will be inserted there upon pressing the OK button.

The number of data values for each variable (source points) is displayed below the graphs.

Selecting any white square in the correlation matrix will generate a scatter plot of the data for thecorresponding two variables. You can choose whether to display the actual data values or the percentiles.In the first case, the horizontal and vertical axes are adjusted to the range of the data. When showingpercentiles, the range of the axes is [0,1].




369

Bayesian Model Averaging

Bayesian model averaging

Bayesian Model Averaging (BMA) is a technique for amalgamating several plausibleprobability models fit to the same data set using Bayes Theorem .

For example , imagine one has the following observed random values:

{2.434, 2.814, 2.662, 1.419, 1.314, 3.954, 4.238, 2.521, 1.774, 1.237, 0.975}

Let’s say that we want to try fitting Lognormal and Normal distributions to these data,

producing the following results in ModelRisk:

The fitted models are:

VoseNormal(2.303818,1.038533)

VoseLognormal(2.310153,1.130050)

The joint likelihood (the product of the probability density of the fitted distribution evaluated

for each data point) of observing these data from the fitted distributions are:

Normal: 1.09842E-07 Lognormal: 2.57535E-07

Thus the data are 2.34 times more likely to be observed from the Lognormal distribution

than the Normal. Distribution fitting software usually forces the user to choose one

distribution to use. However, one may not wish to eliminate the Normal distribution as a

possible model. Using BMA we can weight the Lognormal as being 2.34 times more plausible

than the Normal but still allow the Normal distribution as a plausible fit. This resultant BMA

distribution is shown below, together with the fitted Normal and Lognormal distributions.




370

Priors

The above analysis assumes that we have an uninformed prior (meaning that before

evaluating these data we had no preference for either distribution type). One could also

incorporate a prior preference using prior weights. For example, a literature review might

reveal seven occasions when a Normal distribution was used for this type of variable, and

four occasions when a Lognormal variable was used. Thus, we might consider applying a

weighting of {7,4} or, equivalently, {7/11, 4/11}. In fact a better, though less intuitive,

prior in this case would be {(7+1)/(11+2), (4+1)/(11+2)}, using the mean of a Beta

distribution applied in this fashion .

ModelRisk BMA functions

ModelRisk includes a range of BMA functions that will construct and simulate from BMA

variables and includes the option to incorporate prior weightings. This model provides some

examples of their use. They are:

VoseBMA – simulates from a univariate BMA distribution

VoseBMAObject - defines a univariate BMA distribution

VoseBMAProb – calculates the (joint) relative or cumulative probability of value(s) for a

univariate BMA distribution

VoseBMAProb10 – calculates log base 10 of the (joint) relative or cumulative probability of

value(s) for a univariate BMA distribution

VoseCopulaBMA – simulates from a BMA copula

VoseCopulaBMAObject - defines a BMA copula

VoseTimeBMA – simulates from a BMA time series

VoseTimeBMAObject - defines a BMA time series




371

VoseBMA

=VoseBMA({DistributionFitObjects}, {Priors},U)

Example model

This function returns random samples from a BMA fitted distribution.

• {DistributionFitObjects} – is an array of k distribution objects fitted to the same data set.

• {Priors} – is an optional array of length k of subjective prior weights. If omitted, the weights are

assumed equal.

• U – is the standard optional U parameter used with random sampling functions for univariatedistributions.

Note: all fitted distributions must apply to the same data set.




372

VoseBMAObject

=VoseBMAObject({DistributionFitObjects}, {Priors},U)

Example model

This function returns random samples from a BMA fitted distribution.



assumed equal.





373

VoseBMAProb

=VoseBMAProb({x}, {DistributionFitObjects}, {Priors},

cumulative)

Example model

This function calculates the joint probability density (or probability mass) and joint cumulative probabilityfor a set of values {x} against a BMA fitted distribution.

• {x} – array containing one or more values.


• {Priors} – is an optional array of length k of subjective prior weights. If omitted, the weights areassumed equal.

• cumulative - optional boolean parameter (TRUE/FALSE) specifying if the cumulative (TRUE)

probability of the {x} should be returned or not (FALSE, default).





374

VoseBMAProb10

=VoseBMAProb10({x}, {DistributionFitObjects}, {Priors},

cumulative)

Example model

This function calculates log base 10 of the joint probability density (or probability mass) and jointcumulative probability for a set of values {x} against a BMA fitted distribution.

• {x} – array containing one or more values


• {Priors} – is an optional array of length k of subjective prior weights. If omitted, the weights areassumed equal.

• cumulative - optional boolean parameter (TRUE/FALSE) specifying if the cumulative (TRUE)probability of the {x} should be returned or not (FALSE, default).





375

VoseCopulaBMA

=VoseCopulaBMA({CopulaFitObjects}, {Priors})

Example model

This array function returns random samples from a BMA fitted copula.

• {CopulaFitObjects} – is an array of k copula objects fitted to the same data set.


assumed equal.

Note: all fitted copulas must be of the same dimension and apply to the same data set.




376

VoseCopulaBMAObject

=VoseCopulaBMAObject({CopulaFitObjects}, {Priors})

Example model

This function defines a BMA fitted copula.

• {CopulaFitObjects} – is an array of k copula objects fitted to the same data set.


assumed equal.

Note: all fitted copulas must be of the same dimension and apply to the same data set.




377

VoseTimeBMA

VoseTimeBMA({TimeSeriesFitObjects}, {Priors})

Example model

This array function returns random samples from a BMA fitted time series.

• {TimeSeriesFitObjects} – is an array of k time series objects fitted to the same data set.


assumed equal.

Note: all fitted time series must be of the same dimension and apply to the same data set.




378

VoseTimeBMAObject

VoseTimeBMAObject({TimeSeriesFitObjects}, {Priors})

Example model

This function defines a BMA fitted time series.

• {TimeSeriesFitObjects} – is an array of k time series objects fitted to the same data set.


assumed equal.

Note: all fitted time series must be of the same dimension and apply to the same data set.




379

Six Sigma

ModelRisk's Six Sigma functions

The Industrial version of ModelRisk incorporates a set of functions that will return standard

Six Sigma performance measurements for the random variable in a spreadsheet cell. The

results are provided only after a simulation run has been completed. If no simulation has

been completed yet, the functions return the message “No simulation results”.

Form of the Six Sigma functions

Each function takes the form:

VoseSixSigmaFunction(OutputCell, Parameter1, Parameter2, …, SimulationID)

where:

OutputCell is a cell reference (like ‘A1’ or a cell name) for which the Six Sigma metric

is to be calculated;

Parameter1, Parameter2, … are parameters specific to the metric; and

SimulationID is an optional parameter used when running multiple simulations.

All of the Six Sigma functions use a sub-set of the following parameters:

• Lower Limit

• Upper Limit

• Target

• Long Term Shift

• Number of Standard Deviations

The functions make frequent use of the following:

m - the mean of the values generated in OutputCell;

s - the standard deviation of the values generated in OutputCell;

Φ(∙) - the standard normal cumulative distribution function; and

Φ (∙) - the standard normal inverse cumulative distribution function.

Function list

The functions, in alphabetical order, are:

VoseSixSigmaCp(OutputCell, LowerLimit, UpperLimit, SimulationID)




380

This function calculates the ‘Process Capability’ Cp defined as:

VoseSixSigmaCpk(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the ‘Process Capability Index’ Cpk defined as:

VoseSixSigmaCpkLower(OutputCell, LowerLimit, SimulationID)

This function calculates the ‘One-Sided Capability Index’ based on the lower

specification limit and is defined as:

VoseSixSigmaCpkUpper(OutputCell, UpperLimit, SimulationID)

This function calculates the ‘One-Sided Capability Index’ based on the upper


VoseSixSigmaCpm(OutputCell, LowerLimit, UpperLimit, Target, SimulationID)

This function calculates the ‘Taguchi Capability Index’ defined as:

VoseSixSigmaDefectPPM(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the ‘Defective Parts Per Million’ defined as:

VoseSixSigmaDefectShiftPPM(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ with a shift and is defined as:

VoseSixSigmaDefectShiftPPMLower(OutputCell, LowerLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ below the lower specification

limit with a shift and is defined as:




381

VoseSixSigmaDefectShiftPPMUpper(OutputCell, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ above the upper specificationlimit with a shift and is defined as:

VoseSixSigmaK(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the Six Sigma ‘Measure of Process Center’ defined as:

VoseSixSigmaLowerBound(OutputCell, NumberOfStandardDeviations, SimulationID)

This function calculates the ‘Lower Bound’ as a specific number of standard deviations

below the mean and is defined as:

VoseSixSigmaProbDefectShift(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Probability of Defect’ outside LowerLimit and UpperLimit

with a shift and is defined as:

VoseSixSigmaProbDefectShiftLower(OutputCell, LowerLimit, LongTermShift,

SimulationID)

This function calculates the ‘Probability of Defect’ below the LowerLimit with a shift

and is defined as:

VoseSixSigmaProbDefectShiftUpper(OutputCell, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Probability of Defect’ above the UpperLimit with a shift

and is defined as:




382

VoseSixSigmaSigmaLevel(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Process Sigma Level’ with a sh ift and is defined as:

VoseSixSigmaUpperBound(OutputCell, NumberOfStandardDeviations, SimulationID)

This function calculates the ‘Upper Bound’ as a specific number of standard deviations

above the mean and is defined as:

VoseSixSigmaYield(OutputCell, LowerLimit, UpperLimit, LongTermShift, SimulationID)

This function calculates the Six Sigma ‘Yield’ with a shift, i.e. the fraction of the

process that is free of defects, and is defined as:

VoseSixSigmaZlower(OutputCell, LowerLimit, SimulationID)

This function calculates the number of standard deviations of the process that

LowerLimit is below the mean of the process and is defined as:

VoseSixSigmaZmin(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the minimum of Zlower and Zupper and is defined as:

VoseSixSigmaZupper(OutputCell, UpperLimit, SimulationID)


UpperLimit is above the mean of the process and is defined as:

Assumptions

The Six Sigma functions are based on the assumption that samples generated by the

OutputCell are approximately normally distributed. You can check this visually by running a

simulation with OutputCell named as a simulation output and viewing the result in

histogram form. Alternatively, you can check this numerically within the model by writing

two formulae in the spreadsheet:

=VoseSimSkewness(OutputCell, SimulationID)




383

=VoseSimKurtosis(OutputCell, SimulationID)

where it is only necessary to specify SimulationID if it is also being used in the

VoseSixSigma function. These functions will return values close to 0 and 3 respectively at

the end of a simulation run if samples generated by the OutputCell are approximately

normal and provided one has run a sufficiently large number of samples (1000 or so should

be enough).

The most important vulnerability of the normality assumption is that it implies that

distances about the mean of the distribution measured in standard deviations have a

consistent probabilistic interpretation. For example, people with some limited knowledge of

statistics can often quote that a range +/- two standard deviations about the mean contains

95% of the distribution but forget that this rule of thumb only applies for the normal

distribution. Tchebysheff’s Rule expresses the more general, and far weaker, reality.

An additional assumption that is made in using the LongTermShift parameter is that the

process mean will drift by this number of standard deviations over time, but that the

standard deviation itself will remain unchanged.




384

VoseSixSigmaCp

VoseSixSigmaCp(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the ‘Process Capability’ Cp defined as:




385

VoseSixSigmaCpk

VoseSixSigmaCpk(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the ‘Process Capability Index’ Cpk defined as:




386

VoseSixSigmaCpkLower

VoseSixSigmaCpkLower(OutputCell, LowerLimit, SimulationID)

This function calculates the ‘One-Sided Capability Index’ based on the lower





387

VoseSixSigmaCpkUpper

VoseSixSigmaCpkUpper(OutputCell, UpperLimit, SimulationID)

This function calculates the ‘One-Sided Capability Index’ based on the upper





388

VoseSixSigmaCpm

VoseSixSigmaCpm(OutputCell, LowerLimit, UpperLimit, Target, SimulationID)

This function calculates the ‘Taguchi Capability Index’ defined as:




389

VoseSixSigmaDefectPPM

VoseSixSigmaDefectPPM(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the ‘Defective Parts Per Million’ defined as:




390

VoseSixSigmaDefectShiftPPM

VoseSixSigmaDefectShiftPPM(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ with a shift and is defined as:




391

VoseSixSigmaDefectShiftPPMLower

VoseSixSigmaDefectShiftPPMLower(OutputCell, LowerLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ below the lower specification





392

VoseSixSigmaDefectShiftPPMUpper

VoseSixSigmaDefectShiftPPMUpper(OutputCell, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Defective Parts Per Million’ above the upper specification





393

VoseSixSigmaK

VoseSixSigmaK(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the Six Sigma ‘Measure of Process Center’ defined as:




394

VoseSixSigmaLowerBound

VoseSixSigmaLowerBound(OutputCell, NumberOfStandardDeviations, SimulationID)

This function calculates the ‘Lower Bound’ as a specific number of standard deviations

below the mean and is defined as:




395

VoseSixSigmaProbDefectShift

VoseSixSigmaProbDefectShift(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Probability of Defect’ outside LowerLimit and UpperLimit

with a shift and is defined as:




396

VoseSixSigmaProbDefectShiftLower

VoseSixSigmaProbDefectShiftLower(OutputCell, LowerLimit, LongTermShift,

SimulationID)

This function calculates the ‘Probability of Defect’ below the LowerLimit with a shift

and is defined as:




398

VoseSixSigmaSigmaLevel

VoseSixSigmaSigmaLevel(OutputCell, LowerLimit, UpperLimit, LongTermShift,

SimulationID)

This function calculates the ‘Process Sigma Level’ with a shift and is defined as:




399

VoseSixSigmaUpperBound

VoseSixSigmaUpperBound(OutputCell, NumberOfStandardDeviations, SimulationID)

This function calculates the ‘Upper Bound’ as a specific number of standard deviations

above the mean and is defined as:




400

VoseSixSigmaYield

VoseSixSigmaYield(OutputCell, LowerLimit, UpperLimit, LongTermShift, SimulationID)

This function calculates the Six Sigma ‘Yield’ with a shift, i.e. the fraction of the

process that is free of defects, and is defined as:




401

VoseSixSigmaZlower

VoseSixSigmaZlower(OutputCell, LowerLimit, SimulationID)


LowerLimit is below the mean of the process and is defined as:




402

VoseSixSigmaZmin

VoseSixSigmaZmin(OutputCell, LowerLimit, UpperLimit, SimulationID)

This function calculates the minimum of Zlower and Zupper and is defined as:




403

VoseSixSigmaZupper

VoseSixSigmaZupper(OutputCell, UpperLimit, SimulationID)


UpperLimit is above the mean of the process and is defined as:




404

Other functions




405

Bootstrap

VoseNBootMean

VoseNBootMean({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for the mean ofa population given that one has observed random values from the population.

Data is an array of random values drawn from the population distribution. It must include at least twodifferent values.

See VoseNBoot for an explanation of all the VoseNBoot~ functions.




406

VoseNBootStdev

VoseNBootStDev({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for thestandard deviation of a population given that one has observed random values from the population.






407

VoseNBootSkew

VoseNBootSkew({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for theskewness of a population given that one has observed random values from the population.






408

VoseNBootVariance

VoseNBootVariance({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for thevariance of a population given that one has observed random values from the population.






409

VoseNBootKurtosis

VoseNBootKurtosis({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for the kurtosisof a population given that one has observed random values from the population.






410

VoseNBootPercentile

VoseNBootPercentile({data},percentile)

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for a certainspecified percentile value of a population given that one has observed random values from thepopulation.

Data is an array of random values drawn from the population distribution. It must include at least two

different values.





411

VoseNBootCofV

VoseNBootCofV({data})

Example model

This function generates values for the non-parametric Bootstrap distribution of uncertainty for thecoefficient of variance of a population given that one has observed random values from the population.






412

VoseNBootMoments

VoseNBootMoments({data})

Example model

Array function that generates values for the non-parametric Bootstrap distribution of uncertainty for themean, variance, skewness and kurtosis of a population given that one has observed random values fromthe population.

• Data - an array of random values drawn from the population distribution. It must include

at least two different values.

This function should be used in cases where one is interested in more than one statistic at the same time.

See VoseNBoot for an explanation of all the VoseNBoot functions.




413

VoseNBootPaired

VoseNBootPaired({data},direction)

Example model

Like it is done in the second stage of the non-parametric Bootstrap method, this function generatesrandom samples from a multidimensional data array. The output is an array with rows (or columns, ifdirection is set to TRUE), randomly sampled with replacement from the data array.

• {data} - an array with paired data

• direction - optional boolean parameter (TRUE/FALSE) to specify the direction of the dataarray. When set to FALSE or omitted, the function returns rows of samples from the data set.When set to TRUE, columns of samples are generated.





414

VoseNBootSeries

VoseNBootSeries({datarange},datainrow)

Example model

This function takes a random sub-series (a segment) of consecutive values of a data array. The length ofthe sub-series is determined by the output array size.

You can also use this function on paired data, i.e. on a higher-dimensional data array. When samplingfrom a higher-dimensional data array, the function's output range should have the same number ofcolumns (or rows if datainrow is set to TRUE) as the {datarange} array.

• Datarange -the array of data to take a random segment from.

• datainrow - a boolean parameter (TRUE/FALSE) to determine if the data range to

sample from is oriented in rows (TRUE) or in columns (FALSE, default).

• U - optional parameter on [0,1] to drive the selection of the sub-series. If omitted a

random sub-series.

If you want to take a number of random samples of paired values of a data array but the order of thesamples does not matter, use the VoseNBootPaired function.





415

Extreme value

VoseExtremeRange

VoseExtremeRange(Distribution, N)

Array function that simulates the lowest and highest values of a set of N independent observations drawnfrom a distribution.

• Distribution - a valid distribution object

• N - the number of observations drawn from the distribution

Note that this function is not the same as writing VoseSmallest and VoseLargest in separate cells: thevalues generated by VoseExtremeRange come from the same set of N observations, whereas withVoseSmallest and VoseLargest the generated values would each come from their separate set of

observations. The difference becomes clear if you consider low N values: for the extreme case of N=1,VoseExtremeRange will always return two identical values.




416

VoseKthLargest

VoseKthLargest(distribution,n,k)

Example model

This function returns the kth largest value from n sampled values from a certain distribution.

• distribution - A distribution object

• n - the number of samples to be taken from the distribution

• k - the rank of the sample. 1=largest, 2= second largest, etc.

Just like distributions in ModelRisk, this function has an optional U-parameter.

Example

You believe that the distribution for the claim size of a certain claim is a Pareto(3,1) and you would like toknow what, out of 1000 claims, will be the second largest claim you will get. This can be modelled with:

=VoseKthLargest(VosePareto(3,1),1000,2)

Comment

If one wants to know for example the largest, the second largest and the third largest value out of nsampled values from a distribution, then one should use the VoseLargestSet function, because if they areall estimated separately from the VoseKthLargest function, they would all be independent.




417

VoseKthSmallest

VoseKthSmallest(distribution,n,k)

Example model

This function returns the kth smallest value from n sampled values from a certain distribution.



• k - the rank of the sample. 1=smallest, 2= second smallest, etc.

Like the distributions in VoseDistribution, this function has an optional U-parameter.

Example

In insurance company wants for a certain claim to set a minimum they will pay the client back. Claimsbelow that minimum will not be paid back by the insurance company. The company knows that theseclaims are LogNormal(10,4) distributed and they decide that the minimum will be chosen as the fifth

smallest claim from 1000 sampled values:

VoseKthSmallest(VoseLogNormal(10,4),1000,5)

Comment

If one wants to know for example the smallest, the second smallest and the third smallest value out of n

sampled values from a distribution, then one should use the VoseSmallestSet function, because if theyare all estimated separately from the VoseKthSmallest function, they would all be independent.




418

VoseLargest

VoseLargest(Distribution,n,U)

Example model

This function returns the largest value from n generated values from a certain distribution. It is the samefunction as the VoseKthLargest function where k = 1.

This function, that can be seen as a distribution, has an optional U-parameter.


• n - the number of samples taken from the distribution, of which the largest value will be

returned (if the U parameter is ommitted)

• U - optional parameter specifying the cumulative percentile of the "largest value"

distribution. If omitted the function just returns the largest value.

Also see the Vose Extreme Values window for an explanation about the window for this function.




419

VoseLargestSet

VoseLargestSet(Distribution,n)

Example model

Array function that returns the k largest values out of n sampled values from the chosen distribution. Theoutput should be 1 x k or k x 1 array. k should be smaller or equal to n.



For example, if one selects 10 cells and types the formula=VoseLargestSet("VoseNormal(5,2)",100), the 10 biggest values are returned from 100sampled values from a Normal distribution with mean 5 and standard deviation 2.





420

VoseSmallest

VoseSmallest(Distribution,n,U)

Example model

This function returns the smallest value from n generated values from a certain distribution. It is the same

function as the VoseKthSmallest function where k = 1.

This function, that can be seen as a distribution, has an optional U-parameter.


• n - the number of samples taken from the distribution

• U - optional parameter specifying the cumulative percentile of the "smallest value"

distribution. If omitted the function just returns the smallest value.





421

VoseSmallestSet

VoseSmallestSet(Distribution,n)

Example model

This function returns an array (with size k smaller or equal to n) with the k smallest values out of the nsampled values from the chosen distribution.





422

Simulation results

VoseSimCofV

VoseSimCofV(Cell Reference, Simulation number)

Example model

This function returns the coefficient of variation of all simulated values generated in the defined cell.

• Cell Reference - the cell for which the generated values are to be analyzed.

• Simulation number – an index (1,2,3,…) referring to the simulation number. An optional

parameter that is only required when one runs multiple simulations.




423

VoseSimCorrelation

VoseSimCorrelation(Cell Reference1, Cell Reference2, Type,

Simulation number)

Example model

This function returns the correlation coefficient between two cells during simulation.

• Cell Reference 1 - Should be a valid reference to a spreadsheet cell.

• Cell Reference 2 - Should be a valid reference to a spreadsheet cell.

• Type - Type = 0 gives Spearman, Type = 1 gives Pearson.

• Simulation number - Optional simulation number parameter. Must be an integer>=1.

Equals 1 if omitted.




424

VoseSimCorrelationMatrix

VoseSimCorrelationMatrix({Cell References}, Type, Simulation

number)

Example model

This function returns the correlation matrix between a range of cells during simulation.

• {Cell References} - Should be a valid reference to a vector of cells.

• Type - Type = 0 gives Spearman, Type = 1 gives Pearson.

• Simulation number - Optional simulation number parameter. Must be aninteger>=1. Equals 1 if omitted.




425

VoseSimCVARp

VoseSimCVARp(Cell Reference, pValue, Simulation number)

Example model

This function returns the Conditional Value-at-Risk during simulation based on a p-value.

• Cell Reference - Should be a valid reference to a spreadsheet cell of the distribution of

loss.

• pValue - The right tail probability of exceedance for which CVAR is calculated.






426

VoseSimCVARx

VoseSimCVARx(Cell Reference, xValue, Simulation number)

Example model

This function returns the Conditional Value-at-Risk during simulation based on an x-value.

Cell Reference - Should be a valid reference to a spreadsheet cell of the distribution of loss.

xValue - A threshold loss value above which CVAR is calculated.

Simulation number - Optional simulation number parameter. Must be an integer>=1. Equals 1 if

omitted.




427

VoseSimKurtosis

VoseSimKurtosis(Cell Reference, Simulation number)

Example model

This function returns the kurtosis of all simulated values generated in the defined cell.


• Simulation number – an index (1,2,3,…) referring to the simulation number. An optional parameter that

is only required when one runs multiple simulations.




428

VoseSimMax

VoseSimMax(Cell Reference, Simulation number)

Example model

This function returns the maximum of the cell during simulation.

• Cell Reference - Should be a valid reference to a spreadsheet cell.






429

VoseSimMean

VoseSimMean(Cell Reference, Simulation number)

Example model

This function returns the mean of all simulated values generated in the defined cell.







430

VoseSimMeanDeviation

VoseSimMeanDeviation(Cell Reference, Simulation number)

Example model

This function returns the mean deviation of the cell during simulation.







431

VoseSimMin

VoseSimMin(Cell Reference, Simulation number)

Example model

This function returns the minimum of the cell during simulation.







432

VoseSimMoments

{=VoseSimMoments(Cell Reference, Simulation number)}

Example model

This 4x1 array function returns, in order, the mean, variance, skewness and kurtosis of all simulatedvalues generated in the defined cell.







433

VoseSimMSE

VoseSimMSE(Cell Reference, Simulation number)

Example model

This function returns the standard error about the mean of all simulated values generated in the definedcell.




The mean standard error (MSE) is a measure of the uncertainty about the ‘true’ mean. In the case ofsimulation results, the uncertainty about the true mean value (i.e. the mean value that would be achievedwith an essentially infinite number of samples) of the simulated variable is generally well-described by aNormal(SampleMean, MSE). The caveat ‘generally’ is included here because this formula is based on

Central Limit Theorem and so only applies precisely when the underlying distribution of the variable inquestion is Normal. However, in simulation one usually is taking a large number of samples so the resultgenerally holds unless the underlying distribution is extremely skewed.




434

VoseSimPercentile

VoseSimPercentile (Cell Reference, Percentile, Simulation number)

Example model

This function returns the value that corresponds to the required ‘Percentile’ from all

simulated values generated in the defined cell.


• Percentile – a value between 0 and 1



For example, VoseSimPercentile(A1,0.9) will return the 90% percentile of the values

generated in cell A1.




435

VoseSimProbability

VoseSimProbability(Cell Reference, Value, Simulation number)

This function returns the fraction of generated values in CellReference that fell below Value. In otherwords, the function returns the estimated probability that the variable being simulated in CellReferencewill fall below Value.


• Value – any real value for which we wish to know the estimated cumulative probability

• Simulation number – an index (1,2,3,…) referring to the simulation number. An optionalparameter that is only required when one runs multiple simulations.




436

VoseSimSemiStdev

VoseSimSemiStdev(Cell Reference,TargetValue,Below, Simulation

number)

Example model

This function returns the semi-standard deviation of the cell during simulation.


• Target Value - The threshold delineating the scenarios that represent a risk.

• Below - Specifies whether the scenario of interest is below the threshold (TRUE) or

above it (FALSE).






437

VoseSimSemiVariance

VoseSimSemiVariance(Cell Reference,TargetValue,Below,

Simulation number)

Example model

This function returns the semi-variance of the cell during simulation.


• Target Value - The threshold delineating the scenarios that represent a risk.

• Below - Specifies whether the scenario of interest is below the threshold (TRUE) or

above it (FALSE).






438

VoseSimSkewness

VoseSimSkewness(Cell Reference, Simulation number)

Example model

This function returns the skewness of all simulated values generated in the defined cell.







439

VoseSimStDev

VoseSimStdev(Cell Reference, Simulation number)

Example model

This function returns the standard deviation of all simulated values generated in the defined cell.







440

VoseSimVariance

VoseSimVariance(Cell Reference, Simulation nmber)

Example model

This function returns the variance of all simulated values generated in the defined cell.







442

VoseKurtosis

VoseKurtosis(VoseDistributionObject))

Example model

Returns the kurtosis of a Distribution Object.

The kurtosis statistic is calculated from the following formulae:

Discrete variable:

Continuous variable:

This is often called the standardised kurtosis, since it is divided by s4 to give a unitless statistic. Thekurtosis statistic refers to the peakedness of the distribution (see right panel above) - the higher the

kurtosis, the more peaked the distribution. A Normal distribution has a kurtosis of 3, so kurtosis values fora distribution are often compared to 3. For example, if a distribution has a kurtosis below 3 it is flatter thana Normal distribution.




443

VoseMax

VoseMin(Distribution Object)

Returns the minimum of a distribution object.

For example, VoseMin(VoseNormalObject(10,1)) will return “-Infinity” because the minimum of a normaldistribution is not defined and VoseMin(VoseBetaObject(3,4)) will return 0.




444

VoseMean

VoseMean(Distribution Object)

Example model

Returns the mean of a distribution object. If an exact formula exists, this will be used.

For example, VoseMean(VoseNormalObject(10,1)) will return the value 10 because the mean

of a normal distribution is equal to its first parameter .

Note that for each available distribution we have included a Distribution equations topic with the formulasfor the mean, variance, skewness and kurtosis (if they exist): for example Normal equations.

The mean of a distribution, also known as the expected value, is given by:

for discrete variables

for continuous variables

The mean is known as the first moment about zero. It can be considered to be the 'centre of gravity' of the

distribution. If one was to cut out the probability density function drawn on a piece of card, the mean is thevalue at which the distribution would balance.




445

VoseMin

VoseMin(Distribution Object)

Returns the minimum of a distribution object.

For example, VoseMin(VoseNormalObject(10,1)) will return “-Infinity” because the minimum of a normaldistribution is not defined and VoseMin(VoseBetaObject(3,4)) will return 0.




446

VoseMoments

VoseMoments(Distribution)

Example model

This array function returns the mean, variance, skewness and kurtosis of the specified distribution.

• Distribution - a distribution Object

The output is a 4x1 (4x1) array, in which case the numerical values of the moments will be returned, or a4x2 (2x4) array in which case the numerical values will be returned with labels, as shown in the imagebelow.

The calculation of the moments is done using the closed formula (if it exists), or through numericalintegration if not. Note that for each univariate distribution in ModelRisk a topic with that distribution'sequations is included (e.g. Beta equations).

The mean of a distribution, also known as the expected value, is given by:

for discrete variables

for continuous variables




448

VoseProb10

VoseProb10({x},Distribution,Cumulative)

Example model

Calculates the logarithm base 10 of the joint probability density (or probability mass) f({x}) and jointcumulative probability F({x}) for a set of values {x} against a specified distribution.

• {x} - array containing one or more values.

• Distribution - a distribution object.

• Cumulative - optional boolean parameter (TRUE/FALSE) specifying if the cumulative

(TRUE) probability of the {x} should be returned or not (FALSE, default).

For each distribution in ModelRisk separate probability calculation functions are implemented as well, asdescribed here.




449

VoseRawMoment1

VoseRawMoment1(Distribution)

Returns the first raw moment of a distribution.

• Distribution - a valid continuous or discrete distribution object.

The first raw moment, also known as the first moment about zero is defined as

for continuous distributions, and

for discrete distributions. So it is simply the mean of the distribution. Use VoseRawMoments to return thefirst four raw moments of a distribution all together.

The raw moments (or 'moments about zero') are defined for as

,

and the central moments (or 'moments about the mean') for are defined as:

with analogue definitions for discrete variables. The lower central moments are directly related to thevariance, skewness and kurtosis. The second, third and fourth central moments can be expressed interms of the raw moments as follows:




450

VoseRawMoment2


Returns the second raw moment of a distribution.


The second raw moment, also known as the second moment about zero is defined as


for discrete distributions. Use VoseRawMoments to return the first four raw moments of a distribution alltogether.


,






451

VoseRawMoment3


Returns the third raw moment of a distribution.


The third raw moment, also known as the third moment about zero is defined as




,


with analogue definitions for discrete variables. The lower central moments are directly related to thevariance, skewness and kurtosis. The second, third and fourth central moments can be expressed in

terms of the raw moments as follows:




452

VoseRawMoment4


Returns the fourth raw moment of a distribution.


The fourth raw moment, also known as the fourth moment about zero is defined as




,






453

VoseRawMoments

VoseRawMoments(Distribution)

Array function that returns the first four raw moments of a distribution.


The output array should be either 4x1 or 1x4 in which case the values of the raw moments will bereturned, or 4x2 or 2x4 in which case labels will be included.

The raw moments (or 'moments about zero') of a distribution are defined as

,

for continuous distributions with PDF f(x) and

for discrete distributions with PMF p i.

The central moments (or 'moments about the mean') for are defined as:





454

VoseSkewness

VoseSkewness(Distribution Object)

Example model

Returns the skewness of a distribution object. If an exact formula exists, this will be used.

For example, VoseSkewness(VoseNormalObject(10,2)) will return the value 0 because the

skewness of a normal distribution is zero (regardless of its parameters).

Note that for each available distribution we have included a Distribution equations topic with the formulas

for the mean, variance, skewness and kurtosis (if they exist): see for example Normal equations.

The skewness statistic is calculated from the following formulae:

Discrete variable:

Continuous variable:

This is often called the standardised skewness, since it is divided by s3 to give a unitless statistic. The

skewness statistic refers to the lopsidedness of the distribution.

If a distribution has a negative skewness (sometimes described as left skewed) it has a longer tail to theleft than to the right. A positively skewed distribution (right skewed) has a longer tail to the right, and zeroskewed distributions are usually symmetric.




455

VoseStDev

VoseStDev(Distribution Object)

Example model

Returns the standard deviation of a distribution object. The standard deviation is the positive square root

of the variance, i.e. σ = √V .

If a closed formula for the standard deviation exists, this is used.

For example, VoseStDev(VoseNormalObject(10,2)) will return the value 2 because the

standard deviation of a normal distribution is equal to its second parameter .

Note that for each available distribution we have included a Distribution equations topic with the formulasfor the mean, variance, skewness and kurtosis (if they exist): see for example Normal equations.




456

VoseVariance

VoseVariance(Distribution Object)

Example model

Returns the variance of a distribution. If a closed formula for the variance exists, this is used.

For example, VoseVariance(VoseNormalObject(10,2)) will return the value 4 because the

variance of a normal distribution is equal to its second parameter squared.

Note that for each available distribution we have included a Distribution equations topic with the formulas

for the mean, variance, skewness and kurtosis (if they exist): see for example Normal equations.

The variance is a measure of how much the probability distribution is spread from the mean:

where denotes the expected value (mean) of whatever is in the brackets, so:

The variance sums up the squared distance from the mean of all possible values of x , weighted by the

probability of x occurring. The variance is known as the second moment about the mean. It has units that

are the square of the units of x . So, if x is cows in a random field, V has units of cows2. This limits theintuitive value of the variance. To calculate standard deviation we use the VoseStDev function.




457

Data analysis

VoseBinomialP

=VoseBinomialP(s,n,ProcessExists, U )

Example model

This function generates values for the classical statistics uncertainty distribution for a binomial probabilityp estimated from data as described on this page.

• s - the number of successes.

• n - the number of trials.

• ProcessExists - an optional boolean parameter (TRUE/FALSE) that controls the

function's behavior for s=0 or s=n. TRUE specifies one knows the estimated probability is neither0 nor 1. FALSE (default) means one does not know.



Note that the ProcessExists parameter only has influence in the cases s=0 or s=n.

Example

A certain company sends out transactional documents (like bank statements, insurance policy

documents, etc.) to clients. It does a survey of 1000 documents, tracking if they were properly completedand delivered to the right address on time. It found all 1000 documents were successfully processed.Given this amount of data it would like to know the probability that a document might fail to be properlyprocessed.

The uncertainty distribution for the probability then can be found with the formula (Case 1):

=VoseBinomialP(0,1000,0)

This is the company's uncertainty about the probability without knowing if there is in fact a possibility ofthe system failing (ProcessExists = 0).

Now suppose the company receives a complaint letter from a client saying that an error occurred with

his/her transactional document. This now means that we know for sure that the process of printing andsending out a document incorrectly exists (ProcessExists = 1). The uncertainty distribution for the

probability can now be expressed by the formula (Case 2):

=VoseBinomialP(0,1000,1)

We could have defined a 'success' as a correctly processed document, in which case we have theprobability of success (the probability a document will be correctly processed, as:

Case 1: VoseBinomialP(1000,1000,0)

Case 2: VoseBinomialP(1000,1000,1)

The Bayesian equivalent to estimating a probability is described here. A comparison of the classical andBayesian methods can be found here.




459




460

VoseFrequencyCumulA

VoseFrequencyCumulA({dataset},{bins},label,relative)

Example model

Two column array function that returns an ascending cumulative frequency or relative frequency analysisof a set of data. Note that the output array columns should be once cell larger than the array with binvalues, because the function includes the frequency of data with values higher than the largest bin value.

• {Dataset} - an array of values to be analyzed.

• {Bins} - an ascending array of values defining the minimum value of each bin

• Label - a TRUE/FALSE switch that, if TRUE, considers the first value in the dataset to bea label.

• Frequency - a TRUE/FALSE switch so the function would return the relative frequency if

TRUE. If omitted, the function assumes label is false.




461

VoseFrequencyCumulD

VoseFrequencyCumulD({dataset},{bins},label,relative)

Example model

Two column array function that returns a descending cumulative frequency or relative frequency analysisof a set of data. Note that the output array columns should be once cell larger than the array with binvalues, because the function includes the frequency of data with values higher than the largest bin value.

• {Dataset} - an array of values to be analyzed.

• {Bins} - an ascending array of values defining the minimum value of each bin

• Label - a TRUE/FALSE switch that, if TRUE, considers the first value in the dataset to bea label.

• Frequency - a TRUE/FALSE switch so the function would return the relative frequency if

TRUE. If omitted, the function assumes label is false.




462

VoseOgive1

VoseOgive1()

Example model

This array function generates a set of "best guess" values for the cumulative probability that correspondto a set of data ranked in increasing order. If there are k data values, then the VoseOgive1() functionshould span k cells. It will return the values:

1/(k+1)

2/(k+1)

...

k/(k+1)

The data array (let us call them the {x i}) and VoseOgive1 array can then be used to produce an empiricalestimate of the cumulative distribution of the parent distribution from which the data come.

See Fitting a continuous non-parametric first-order distribution to data for an explanation about the theorybehind this function.

Including second-order uncertainty

The values generated by VoseOgive1() provide us with a "best guess" for the non-parametric distribution.

However, the smaller the dataset, the greater the uncertainty about the constructed non-parametric

distribution is. We can use the Bayesian technique explained here to take this uncertainty into account.

To generate an array of the F(x i) values use the VoseOgive2() function. As opposed to VoseOgive1,VoseOgive2 will generate a new array of values on each recalculation.

Constructing the Ogive directly

You can directly construct an Ogive distribution based on min, max and {xi} parameters using theModelRisk VoseOgive (first-order) and VoseOgiveU (second-order) functions.




463

VoseOgive2

VoseOgive2()

Example model

This array function generates a set of values for the cumulative probability that correspond to a set ofdata ranked in increasing order. If there are k data values, then the VoseOgive2() function should span k cells. Use this function to for fitting a discrete non-parametric second-order distribution to data.

It does the same as VoseOgive1, except that the (second-order) uncertainty of the constructed

distribution is taken into account: the smaller the dataset, the greater the uncertainty about theconstructed non-parametric distribution is.

VoseOgive2 uses the Bayesian technique explained here to take this uncertainty into account.

As opposed to VoseOgive1, VoseOgive2 will generate a new array of values on each recalculation.




464

VosePoissonLambda

VosePoissonLambda(alpha,t,ProcessExists, U )

Example model

This function generates values for the classical statistics uncertainty distribution for a Poisson intensity(lambda) estimated from data, using the technique explained here.

• Alpha - the number of Poisson observations made in time t. This must be a positive

integer.

• t - the time over which Alpha observations were made.

• ProcessExists - Optional boolean parameter (TRUE/FALSE). TRUE (or omitted) forwhen it is known that the possibility of observations >0. FALSE applies when alpha = 0 to allowpossibility that observations cannot be observed (Lambda = 0)

• U - (optional) The cumulative confidence associated with the estimate of lambda. If

omitted the function generates random values.

This function has, like all the distributions in ModelRisk, an optional U-parameter.

Example 1

An insurance company X is about to insure a big chemical company who in the past have had to deal withclients suing them and where a few times this chemical company lost the case against the client. In the

last three years (t = 3) this has not happened anymore though, which means that there were noobservations (alpha = 0) but we know that it could happen (ProcessExists = 1). A way to model the

uncertainty for the Poisson intensity of losing a court case is to use the formula:

=VosePoissonLambda(0,3,1)

If the chemical company had been sued 3 times in the last seven years, for example, we would estimatelambda as:


or


The last parameter ProcessExists becomes redundant in this case since the data (alpha > 0) demonstratethat the risk does indeed exist.

Example 2

Now, imagine a similar case where an insurance company Y insures a pharmaceutical company. Theinsurance company knows about the possible court cases insurance company X had to face in the pastbut they don't know if the same thing could happen with the pharmaceutical company they are insuring(this means that ProcessExists = 0 in this case).

Suppose that they haven't seen it happen in the 7 years that this pharmaceutical company exists (alpha =

0, t = 7).

Now to model the uncertainty for the Poisson intensity lambda of losing a court case we can use theformula:




465





466

VoseRank

VoseRank({data}, descending)

Example model-1, Example model-2

This array function returns an array with the ranks of the values from the input array. By default, the ranksin the list are sorted in ascending order, meaning that rank 1 is given to the lowest value.

When duplicates occur in the data, all of them are assigned the average of their ranks (which is morecorrect than Excel's RANK function that assigns the highest rank.)

• {data} - the data of which the ranks are to be determined.

• Descending - optional boolean parameter. Set FALSE (default) for ascending ranks, i.e.

1 = lowest data value. Set TRUE for descending ranks, i.e. 1 = highest data value.

When the output array is smaller in size than the {data} array, say n cells, only the ranks of the first nvalues of the input array are returned.

When the output array is larger in size than the {data} array it will return the ranks and fill up theremaining part of the output with #N/A errors.




467

VoseRollingStats

VoseRollingStats({data},statistic)

Example model

VoseRollingStats is an array function that uses efficient algorithms to calculate a rolling statistic of a

column of data. It is sometimes helpful to see how sample statistics evolve with the accumulation of data,e.g. from experiments or a simulation run, to get a feel of whether the statistic has stabilized.

• {data} - the data to calculate the rolling statistic on.

• Statistic - This can be the text of statistic's name between double parentheses: "Mean";

'Variance", "Stdev"; "Skewness"; "Kurtosis". The function is not case-sensitive. Alternatively the Statistic parameter can be a percentile value: For example, using the value 0.8

would return the rolling 80th percentile of the data set.

Whilst this is easy to do with Excel functions (e.g. using AVERAGE), for a large data set it can be veryslow since each calculation repeats the same analysis.

Excel's VAR, STDEV, SKEW,KURT are slightly less accurate.Excel's PERCENTILE givespeculiar results, for examplereporting a 95th percentile with just

one data point and linearlyinterpolating between values

For example, the model shown onthe right calculates the rollingmean for the data set in B2:B10.

C2: = 2/1

C3: = (2+3)/2

C4: = (2+3+4)/3

etc.

Uses

The VoseRollingStats function is most useful when plotted in an x-y scatter plot, where the sequentialorder of the data point (1 = 1st observation, 2 = 2nd observation, etc) is plotted on the horizontal axisagainst the rolling statistic of interest on the vertical axis. For example, the following Excel plot showsvarious rolling statistics up to the 4989th observation:




468

The mean has stabilized quickly, the kurtosis is still exhibiting a little volatility, etc. The VoseRollingStatsfunction is helpful in reviewing simulation output values that have been exported to a spreadsheet, or tolook at whether a data set is sufficient to provide stable moment and percentile estimates.

Note that the function returns #NA in initial places in the array where the statistic cannot be calculated.For example, at least four values are required to calculate the kurtosis of a data set.

VoseRollingStats does not subtract 3 in its kurtosis calculation, so normally distributed data would give a

kurtosis of 3, rather than zero - the latter convention being used by Excel's KURT function.




469

VoseSortA

VoseSortD(Data,Line,DataInRows)

Example model

Array function that returns the input data in ascending order - it can sort both text and values. If the inputarray has multiple columns, you can specify the column against which to sort.

• Data - the data to be sorted.

• Line - if Data contains multiple columns/rows, the column (or row) against which to sort.

Should be an integer value

• DataInRows - optional boolean parameter (TRUE/FALSE) specifying whether the data is

in columns (FALSE, default) or rows (TRUE)

This can be of use if you need to work with a sorted list of generated random values. Excel can sort butnot dynamically, so sorting a list of randomly generated values is undone on spreadsheet recalculation.




470

VoseSortD

VoseSortD(Data,Line,DataInRows)

Example model

Array function that returns the input data in descending order - it can sort both text and values. If the inputarray has multiple columns, you can specify the column against which to sort.

• Data - the data to be sorted.

• Line - if Data contains multiple columns/rows, the column (or row) against which to sort.

Should be an integer value

• DataInRows - optional boolean parameter (TRUE/FALSE) specifying whether the data isin columns (FALSE, default) or rows (TRUE)

This can be of use if you need to work with a sorted list of generated random values. Excel can sort butnot dynamically, so sorting a list of randomly generated values is undone on spreadsheet recalculation.




471

VoseSpearman

VoseSpearman({known_ys},{known_xs})

Example model

Estimates the Spearman's rank correlation coefficient of a certain data set using the formula shownbelow.

• {known_ys} - a list of observations for the first variable

• {known_xs} - a list of observations for the second variable

Spearman's rank correlation coefficient (a.k.a. Spearman's rho) is a non-parametric measure of the

degree of correspondence between two variables. Like Kendall's tau, Spearman's rank correlation iscarried out on the ranks of the data, i.e. what position (rank) the data point takes in an ordered list from

the minimum to maximum values, rather than the actual data values themselves.

The sample estimator of Spearman's rho is defined by:




472

VoseSpearmanU

VoseSpearmanU({known_ys},{known_xs})

Example model

This function simulates the uncertainty about the Spearman rank correlation coefficient between twovariables using non-parametric Bootstrap.

• {known_ys} - a list of observations for the first variable

• {known_xs} - a list of observations for the second variable

Spearman's rank correlation coefficient (a.k.a. Spearman's rho) is a non-parametric measure of the

degree of correspondence between two variables. Like Kendall's tau, Spearman's rank correlation iscarried out on the ranks of the data, i.e. what position (rank) the data point takes in an ordered list from

the minimum to maximum values, rather than the actual data values themselves.





473

VoseCholesky

=VoseCholesky({matrix})

Example model

This function does a Cholesky decomposition of the input matrix. The input matrix, that should be a

symmetric, positive-definite matrix, is decomposed into a lower Triangle matrix L and the transpose LT ofthe lower Triangle matrix. The lower Triangle matrix is called the Cholesky triangle of the original,positive-definite matrix.

The Cholesky decomposition is mainly used to solve linear equations Ax = b numerically. The way this isdone is by first computing the Cholesky decomposition A = L LT, then solving Ly = b for y and then solvingLTx = y for x. Another important use of the Cholesky decomposition (this is the use focussed on in

ModelRisk) is in Monte Carlo simulations to simulate systems with multiple correlated variables. In thiscase (and thus in ModelRisk too) the output of the decomposition is only the lower Triangle matrix L and anull matrix as upper Triangle matrix because all the correlation from the decomposed matrix is in thelower Triangle matrix L.

For example, if one wants to do a Cholesky decomposition of the matrix:

one has to type: =VoseCholesky(A) where A refers to the input matrix here above. The output (thisfunction is an array function and the output matrix must be of the same size as the input matrix)generated is then:




475

VoseCorrMatrix

=VoseCorrMatrix({DataRange},DataInRows)

Example model

This function is an array function that calculates the rank order correlation matrix of a data set. The outputof the function is an n x n array where n is the number of variables in the data set. For example, twocalculate the correlation matrix of 2 columns of data, the output of VoseCorrmatrix should be a 2x2 array.

• {DataRange} - the array of cells that contain the data.

• DataInRows - an optional parameter that should be set to 1 if the variables in DataRange

are listed by row for each variable.

The elements of the correlation matrix contain Spearman's rank correlation coefficient (a.k.a. Spearman'srho) between each pair of samples in the {DataRange} array. Spearman's rho is a non-parametricmeasure of the degree of correspondence between two variables. Spearman's rank correlation is carriedout on the ranks of the data, i.e. what position (rank) the data point takes in an ordered list from the

minimum to maximum values, rather than the actual data values themselves.


As each sample has a correlation of 1.0 with itself, the top left to bottom right diagonal elements are all1.0. Furthermore, because the formula for the rank order correlation coefficient is symmetric the matrix

elements are also symmetric about this diagonal line.




476

VoseCorrMatrixU

=VoseCorrMatrixU({DataRange},DataInRows)

Example model

This function is an array function that simulates the uncertainty of the rank order correlation matrix of a

data set: when we have a rather small data set there will be some uncertainty that the calculatedcorrelation coefficients in the matrix are truly representative of the underlying reality. The ModelRiskfunction VoseCorrMatrixU simulates values from the joint uncertainty distribution of the matrix.

• {DataRange} - the array of cells that contain the data.

• DataInRows - an optional boolean parameter that should be set to 1 if the variables in

DataRange are listed by row for each variable.

The output of the function VoseCorrMatrixU is an n x n array where n is the number of variables in thedata set. A new correlation matrix will be calculated on each spreadsheet recalculation.

To return just a static estimate of the correlation matrix use the function VoseCorrMatrix.




477

VoseCorrToCov

VoseCorrToCov(CorrelationMatrix,StdevVector)

Example model

The covariance between two random variables X and Y is defined as:

where E[ ] means the expected value, and µX, µY refer to the respective means of X and Y.

The size of Cov(X,Y) depend on the degree to which the variables deviate from their respective means.

Pearson’s correlation coefficient ρXY normalises the covariance to be independent of this variation, as

follows:

VoseCorrToCov is an array function that combines a correlation matrix for a set of variables with a vector

of standard deviation values for each variable to produce a covariance matrix. For example:




478

The cell range C13:G17 contains the covariance matrix. The top left to bottom right diagonal elements

equal the variance (stdev^2) of each variable because ρXX = 1. The elements in opposite positions from

the diagonal are the same, meaning that {X,Y} has the same covariance as {Y,X}.




479

VoseCovToCorr

VoseCovToCorr(CovarianceMatrix)

Example model

The covariance between two random variables X and Y is defined as:

where E[ ] means the expected value, and µX, µY refer to the respective means of X and Y.

The size of Cov(X,Y) depend on the degree to which the variables deviate from their respective means.

Pearson’s correlation coefficient ρXY normalises the covariance to be independent of this variation, asfollows:

A covariance matrix is a square matrix of dimension n giving the covariance between each i,j pair of

variables (I = 1 to n, j = 1 to n). The following table gives an example of a covariance matrix for thevariables A to E:

The diagonal in red gives the covariance where X = Y, which is

which is the definition of variance. Thus a covariance matrix gives us both the Pearson correlation

coefficient for each pair, and the variance for each variable.

VoseCovToCorr is an array function that extracts the correlation information from a covariance matrix. Forexample:




480

The cell range C11:G15 contains the correlation matrix. The top left to bottom right diagonal elementsequal 1 meaning that a variable is 100% correlated with itself. The elements in opposite positions fromthe line of 1's are the same, meaning that variable X is correlated to Y to the same extent that Y iscorrelated to X.




481

VoseCurrentSample

VoseCurrentSample()

This function reports the current sample (sometimes known as trial or iteration) number during asimulation. It requires no input parameters.

For example, in the simulation setting window, one might select 5 simulations with 1000 samples for eachsimulation:

Running the model will result in the VoseCurrentSample() function returning {1,2,3,…999,1000} duringthe first simulation, then repeating this cycle for the other four simulations.




482

VoseCurrentSim

VoseCurrentSim()

This function reports the current simulation number during a simulation run. It requires no input

parameters.

For example, in the simulation setting window, one might select 5 simulations with 1000 samples for eachsimulation:

Running the model will result in the VoseCurrentSim() function returning the value ‘1’ during the first

simulation, then {2,3,4,5} for the remaining simulations.




483

VoseDataObject

VoseDataObject(DataSource, Volatile)

Example model

This function defines a ModelRisk Data Object linked to a data source.

• DataSource – can be either an SQL query to a database, or a link to a spreadsheet range.

• Volatile – An optional Boolean parameter (FALSE by default) defining whether the data should

be retrieved from the data source with each spreadsheet recalculation. Setting to TRUE is

generally not required and will slow down a simulation run considerably.For more information about using the VoseDataObject function see the topic about Data Object window.




484

VoseDeduct

VoseDeduct(Base Distribution, Deductible, Maxlimit, Zeros, U )

Modifies a BaseDistribution to model aclaim size after accountingfor any Deductible and

maximum payout(MaxLimit ).

The use of the deductiblemeans that the insurancecompany does not pay out

first x of the damagedescribed by the basedistribution. Thisintroduces the problem ofwhat to do with the lostprobability F(x). It is addedas a spike at 0 if Zeros = true, otherwise the density is raised to compensate to simulate the claimdistribution conditional on the damage being greater than x.

The optional Maxlimit parameter allows one to restrict the claim size that an insurance company pays outto no more than L of the damage described by the Base Distribution. So there is a probability spike of (1-F(L)) at y=L.

• Base Distribution - the base distribution to be modified, describing the value of the

damage that was incurred. Must be a distribution object.

• Deductible - the amount subtracted from the distribution before payout. Values below

this are either not included, or included as zeros (see below).

• MaxLimit - (optional) the maximum amount that would be paid. Payout values from the

base distribution larger than this are returned as this value.

• Zeros - optional boolean parameter. Set to TRUE to model all insurance cases (valuesbelow the deductible will be returned as zero) and FALSE (default) to simulate only values higherthan the deductible.


(default) the function generates random values.

To construct a distribution object of the VoseDeduct distribution, use the function VoseDeductObject.

This object can then again be used as a severity distribution in aggregate calculations.

For an explanation about the ModelRisk window for this function see the Vose Deduct window topic.

Another way to model complex insurance policies is to use the VoseExpression function.


VoseDeduct generates values from this distribution or calculates a percentile.




485

VoseDeductObject constructs a distribution object for this distribution.

VoseDeductProb returns the probability density or cumulative distribution function for this distribution.

VoseDeductProb10 returns the log10 of the probability density or cumulative distribution function.




486

VoseDepletion

VoseDepletion(ClaimInterval,ClaimSize,Resource,Horizon)

Array function used to determine whether or not, starting from a certain amount of money (Resource), you

default (exhaust the resource) before a certain time horizon (Horizon) when claims with a certain size(ClaimSize) come in at a certain rate (ClaimInterval). The parameters ClaimInterval and ClaimSize can be

either distribution objects or fixed numbers. If they are distribution objects the function will select arandom sample for each individual claim and/or interval between each claim.

The generated output is a 3 by 2array with on the first row theDepletionTime, on the second rowthe DepletionFlag and on the lastrow the DepletionShortfall.

Example

Suppose an insurance fund has $1000 000 in cash to cover the cost ofa life insurance policy it has nowterminated. The company wants toknow if the $1 000 000 (Resource)

will be enough to cover the claimscoming in at a certain rate(ClaimInterval ) for the next 2 years(Horizon). The VoseDepletion

function can simulate if and whenthe resource will run out within that time Horizon. Suppose the rate at which the claims come in follows aPoisson process with a mean of 20 days between each claim, giving a ClaimInterval = VoseExpon(20)days. Suppose the claim size follows a LogNormal(25 000,4 000) distribution. The outputs of interest forthe insurance company can now be modelled with the function:

VoseDepletion(VoseExponObject(20),VoseLogNormalObject(25000,4000),100000

0,2*365)

This function covers a 3 x 2 array. It will typically generate the following types of outputs:

which means that in this iteration the fund was not exhausted within two years; or

Which means that in this iteration the fund was exhausted at day 727 and the fund was short $6581 atthat moment.




487

Although the terminology used for this function is insurance related, the conceptual model has many moreapplications. For example, you are a health authority with a stockpile of 80 000 vaccine shots. Infectionsoccur in random outbreaks with mean time between outbreaks of 120 days. Each outbreak requiresGamma(3,5 500) shots (ignoring the discrete nature of the actual number) and you wish to know whetheryou have enough stock to last the next 3 years

VoseDepletion(VoseExponObject(120),VoseGammaObject(3, 5000),80 000,3*365)

Inserting this array formula into cell range B2:C4 we get something like this:

Running a simulation and taking the mean value for Cell C3 will give the probability of running out ofvaccine within the timeframe (in this case, about 87.3%).

The easiest way of 'constructing' the VoseDepletion function is to open the Depletion Calculation window.




488

VoseDepletionFlag

VoseDepletionFlag(ClaimInterval,ClaimSize,Resource,Horizon)

VoseDepletionFlag is a function that only returns that part of the VoseDepletion function that tells you ifyou defaulted before a certain time horizon (the function then returns 1) or not (returns 0).




489

VoseDepletionShortfall

VoseDepletionShortfall(ClaimInterval,ClaimSize,Resource,Horizon)

VoseDepletionFlag is a function that only returns that part of the VoseDepletion function that returns thesize of the default on the time of default.




490

VoseDepletionTime

VoseDepletionTime (ClaimInterval, ClaimSize, Resource, Horizon)

VoseDepletionTime is a function that only returns that part of the VoseDepletion function that gives youthe time on which you default if you do, and returns -1 if you don't.




491

VoseDescription

VoseDescription(cell range)

Returns the description of the VoseFunction(s) in the cell range referred to. This makes it easy to addshort explanations to the different risk analysis functions used in a model.

• Cell range - this should be a reference to a single Excel cell.

When the cell points to a cell with a formula that contains multiple VoseFunctions, the description of thefirst VoseFunction in the row will be shown. When the cell range refers to an array of cells, the description

of the VoseFunction in the first array will be shown.




492

VoseDominance

VoseDominance({data},{TitleArray},DataInRows)

This function Determines a matrix of first and second order stochastic dominance between variables.

• {data} - array of spreadsheet data to be analyzed.

• {TitleArray} - optional array of labels associated with each variable.

• DataInRows - optional boolean (TRUE/FALSE) parameter. If set to FALSE (default)the

parameters are supposed to be arranged in two collumns. If TRUE, the parameters are supposedto be arranged in rows.

For example, let's say we have 1000 datapoints for 2 certain options where we want to determine thesuperiority of one over the other. The VoseDominance function then looks at the cumulative distributions

that can be constructed from these datapoints and then compares them.

If the cumulative distributions constructed from the datapoints look like this:

then the function returns that B is first order stochastic dominant over A.

If the cumulative graphs look like this:




493

then the function returns that B is second order stochastic dominant over A if the area X is bigger than the

area Y.

If this is not the case, the function returns "Inconclusive".




494

VoseEigenValues

VoseEigenValues({matrix})

Array function that returns the Eigenvalues of a given matrix (presented as an array in the spreadsheet).When {matrix} is an n x n array, the output range should be 1 x n or n X 1 array.

• {matrix} - an n x n matrix

A matrix [M] is said to be associated with a set of eigenvectors [V] and eigenvalues l, if:

[M][V] = l [V]

If we look at it in one dimension, a vector is an eigenvector of a matrix if multiplying it by the matrix resultsin a constant (the eigenvalue) times the original vector.




495

EigenVectors

VoseEigenVectors({matrix})

Array function that calculates the eigenvectors of a matrix. Eigenvectors and eigenvalues are conceptsused frequently in matrix algebra.

• {matrix} - an n x n matrix

A matrix [M] is said to be associated with a set of eigenvectors [V] and eigenvalues l, if:

[M][V] = l [V]

If we look at it in one dimension, a vector is an eigenvector of a matrix if multiplying it by the matrix resultsin a constant (the eigenvalue) times the original vector.




496

VoseExpression

VoseExpression(FormatString,Distribution1,Distribution2,...)

Allows you to create complex frequency and severity distributions for use as argument in theVoseAggregateMC function.

• Formatstring - a string expression (between "") with #1,#2,... where #n refers to the n th

DistributionN argument.

• DistributionN - a distribution object

Insurance policies are becoming ever more flexible in their terms, and more complex to model as a result.

For example, we might have a policy with a deductible of 5, and a limit of 20 beyond which the insurerpays only half the damages. Using a cost distribution of Lognormal(31,23) and an accident frequencydistribution of Delaporte(3,5,40) we can model this as follows:

A1: =VoseLognormalObject(31,23)

A2: =VoseExpression("IF(#1>20,(#1-25)/2,IF(#1<5,0,#1))",A1)

A3 (output): =VoseAggregateMC(VoseDelaporte(3,5,40),A2)

The VoseExpression function allows one a great deal of flexibility. The '#1' refers to the distribution linkedto Cell A1. Each time the VoseExpression function is called it will generate a new value from theLognormal distribution and perform the calculation replacing '#1' with the generated value. The Delaportefunction will generate a value (call it n) from this distribution, and the AggregateMC function will then call

the VoseExpression function n times, adding as it goes along and returning the sum into the spreadsheet.

The VoseExpression allows several random variables to take part in the calculation. For example:

=VoseExpression("#1*#2",VoseBernoulliObject(0.3),VoseLognormalObject(20,7)

)

will model a cost that follows a Lognormal(20,7) distribution with 30% probability and zero with 70%probability;

=VoseExpression("#1*(#2+#3)",VoseBernoulliObject(0.3),VoseLognormalObject(

20,7),VoseParetoObject(4,7))

will model a cost that follows a (Lognormal(20,7)+ VosePareto(4,7)) distribution with 30% probability andzero with 70% probability.




497

VoseIdentity

VoseIdentity()

Array function returning an identity matrix. The output array has to be square and can be of any size.




498

VoseInput

VoseInput(Name, Units, RangeName, PositionInRange)

This function marks a cell as a model input for the purposes of collecting and analyzing the values the cellgenerates during a simulation.

• Name – is an optional parameter specifying a name to identify the variable during analysis of the

simulation results. If omitted the cell is identified by its cell address. It is a text entry (either textenclosed within “..”, or the address of a cell containing text)

• Units – is an optional text entry specifying the units of the variable (e.g “kg”, “miles”, “people”). Itis used in reporting the graphs and statistics of simulation results.

• RangeName – is a text entry used when several cells are collected together to produce aforecast over time. All cells with the same input RangeName are collated within the ResultsViewer and displayed as a series as well as individual variables.

• PositionInRange – is an index variable (1,2,3,…) defining the order of the input amongst theother input variables with which it shares the same RangeName.




499

VoseIntegrate

VoseIntegrate(expression,min,max, steps, NonVolatile)

Numerically integrates a real,continuous, univariate functionbetween user-specified min and max

boundaries.

The numerical integration performed

is based on the Gauss-Kronrodquadrature formula.

• Expression - the

integrand, surrounded bydouble quotation marks ("").The variable to beintegrated over presented

by a #.

• min - the lower bound.

• max -the upper bound.

• Steps - optional accuracy parameter: the number of steps to divide each sub-interval by

on each iteration of the algorithm.

• NonVolatile - optional boolean parameter (TRUE/FALSE) to set recalculation mode. Set

to FALSE (default) to evaluate the integral on each spreadsheet recalculation. Set to true to only

evaluate at the moment of inserting the VoseIntegrate function.

To see how the NonVolatile parameter works, refer to a spreadsheet containing a randomly generatedvalue in the integrand expression. In NonVolatile mode the returned value will remain the same when thespreadsheet is recalculated.

In the integrand, cell references and Excel's mathematical functions (e.g. SIN()) can be used in the

formula, including VoseFunctions added by ModelRisk, so a valid integrand would be for example:

=VoseIntegrate("VoseNormalProb(#,10,1,0)*4*VoseLognormalProb(#,4,5)",9,12

)




500

VoseInterpolate

VoseInterpolate(Value,{independent},{dependent})

Uses linear interpolation to return adependent variable value given dataand an independent variable value.

• Value - the

independent variable valuefor which we wish to find thecorresponding dependentvariable value.

• {independent} - anarray of observations for anindependent variable.

• {dependent} - an

array of observations for adependent variable.

The {independent} and {dependent} arrays must be of the same length.

The function searches for the nearest values in independent above and below Value, finds the

corresponding values in dependent and interpolates between them.




501

VosejkProduct

VosejkProduct(Expression,jStart,jFinish,kStart,kFinish,Jincrement,

kIncrement, NonVolatile)

This function calculates the following expression:

for a certain function f(j,k).

• Expression - the equation to be summed in string form, i.e. within " ", with the labels j

and k representing the variables to be multiplied over

• Jstart - the minimum value of j at which the product is to begin

• Jfinish - the maximum value of j at which the product is to end

• Kstart - the minimum value of k at which the product is to begin

• Kfinish - the maximum value of k at which the product is to end

• Jincrement -optional parameter (default = 1) that allows one to specify that the

summation is over non-integer increments of j.

•

Kincrement -optional parameter (default = 1) that allows one to specify that the

summation is over non-integer increments of k.

• NonVolatile - an optional recalculation mode parameter. If set to TRUE the function is

evaluated only at the moment it is entered. If set to FALSE (default), the function is re-evaluatedon each spreadsheet recalculation.




502

VosejkSum

VosejkSum(Expression,jStart,jFinish,kStart,kFinish,jIncrement,kIncrement,

NonVolatile)


for a certain function f(j,k).

• Expression - the equation to be summed in string form, i.e. within " ", with the label #

representing the integral variable

• jStart - the minimum value of j at which the summation is to begin

• jFinish - the maximum value of j at which the summation is to end

• kStart - the minimum value of k at which the summation is to begin

• kFinish - the maximum value of k at which the summation is to end

• jIncrement - optional parameter (default = 1) that allows one to specify that the


• kIncrement - optional parameter (default = 1) that allows one to specify that the

summation is over non-integer increments of k.






503

VosejProduct

VosejProduct(Expression,jStart,jFinish,jIncrement,NonVolatile)


for a certain function f(j).

• Expression - the equation to be multiplied in string form, i.e. within " ", with the label jrepresenting the integral variable

• Jstart - the minimum value of j at which the product is to begin

• Jfinish - the maximum value of j at which the product is to end

• jIncrement -optional parameter (default = 1) that allows one to specify that the product is

over non-integer increments of j.



•




504

VosejSum

VosejSum(Expression,jStart,jFinish,jIncrement, NonVolatile)



• Expression - the formula to be evaluated. This may be any valid Excel calculation(including ModelRisk functions). The variable j to sum over must be written as "j".

• jStart - the first value of j for which the expression is to be evaluated.

• jFinish - the last value of j for which the expression is to be evaluated.

• jIncrement -optional parameter (default = 1) that allows one to specify that thesummation is over non-integer increments of j.

• NonVolatile - an optional recalculation mode parameter. If set to TRUE the function isevaluated only at the moment it is entered. If set to FALSE (default), the function is re-evaluated

on each spreadsheet recalculation.

Example

You are expecting BetaBinomial(100,7,32) insurance claims. Each claim follows a Lognormal(25,7) $000distribution. What is the probability that at least one claim exceeds $100k?

Let j be the number of claims. The probability of having exactly j claims is given by:

VoseBetaBinomialProb(j,100,7,32,0)

The probability that all of these j claims are below $50k is:

VoseLognormalProb(50,25,7,1)^j

Thus the probability that at least one claim exceeds $50k is then given by:

=1-

VosejSum("VoseBetaBinomialProb(j,100,7,32,0)*VoseLognormalProb(50,25,7,1)^

j",0,100)




505

VosejSumInf

VosejSumInf(Expression,jStart,Accuracy,jIncrement,NoTimeout)



• Expression - the equation to be summed in string form, i.e. within " ", with the label j

representing the variable to be summed over.

• Jstart - the minimum value of j at which the summation is to begin.

• Accuracy - the level of accuracy required in the calculation. Typically a very small value,

e.g. 0.00000000001

• Jincrement -optional parameter (default = 1) that allows one to specify that the


• NoTimeout - optional parameter with default = FALSE. If set to TRUE the function will

continue to calculate for as long as it takes to arrive at the required accuracy. By default thefunction returns a time-out error if it takes too long to reach a result.




506

Kendall's tau

VoseKendallsTau({var1},{var2})

Returns the Kendall tau rank correlation coefficient (a.k.a. Kendall's tau) for a two observed sets ofvariables.

• {Var1} - array with observations of one variable.

• {Var2} - array with observations of another variable.

This is used to measure the degree of correspondence between two variables, for example paired

observations. If the correspondence between the two variables is perfect, the coefficient has value 1 andif the disagreement between the two rankings is perfect the coefficient has value -1. For all otherarrangements, the value lies between -1 and 1, 0 meaning the variables are completely independent.

Kendall's tau, like Spearman's rho, is carried out on the ranks of the data. That is, for each variable

separately the values are put in order and numbered.

An estimate of Kendall's tau for a sample of n observations is given by:

where C is the number of concordant pairs and D the number of discordant pairs. This can also be writtenas:




507

VoseLibAssumption

VoseLibAssumption(“UniqueAssumptionCode”)

This function returns the value of an assumption stored within the ModelRisk Library.

• UniqueAssumptionCode – is a text string containing the unique assumptioncode for a particular

assumption within the ModelRisk Library

The function returns whatever value has been assigned to the assumption. This could be some specificvalue, like 2.87, some text like “Tournament 32”, or a ModelRisk Object.




508

VoseLibReference

VoseLibReference(“UniqueReferenceCode”)

This function is a marker to point to a reference within the ModelRisk Library.

• UniqueReferenceCode – is a text string containing the unique reference code for a

particular assumption within the ModelRisk Library

The function evaluates as zero within the spreadsheet cell. Thus, the function is typically appended to the

formula cell as follows:

=[formula] + VoseLibReference(“94D047C3”)

where 94D047C3 is the unique reference code for the particular reference.




509

VoseMeanExcessP

VoseMeanExcessP(Distribution,Pthreshold,Xmax)

This function calculates the mean excess e(T) for a claim distribution given a cumulative probabilityPthreshold associated with the deductible D and some maximum claim size Xmax.

• Distribution - a distribution object representing a claim size

• Pthreshold - the cumulative probability of the threshold D (i.e. F(D))

• Xmax - an optional parameter determining the maximum claim size if the limitation exists.

Mean Excess is defined as the mean of the ClaimDistribution conditional on it being above some value D.It is given by the following equation:

Comments

The VoseMeanExcessX function does the same calculation, but takes D directly as an input parameter.




510

VoseMeanExcessX

VoseMeanExcessX(ClaimDistribution,deductible D,Xmax)

This function calculates the mean excess e(T) for a claim distribution given a deductible D and somemaximum claim size Xmax. Mean Excess is defined as the mean of the ClaimDistribution conditional on itbeing above some value D. It is given by the following equation:

• ClaimDistribution - a distribution object representing claim size.

• Deduct - a percentile (U) value on [0,1]

• Xmax - If omitted Xmax is set to be the 99.9999th percentile or the maximum value (if it

exists) of the ClaimDistribution.

Mean Excess is defined as the mean of the ClaimDistribution conditional on it being above some value D.

It is given by the following equation:




511

VoseOutput

VoseOutput(Name, Units, RangeName, PositionInRange)

This function marks a cell as a model output for the purposes of collecting and analyzing the values thecell generates during a simulation.

• Name – is an optional parameter specifying a name to identify the variable during analysis of the

simulation results. If omitted the cell is identified by its cell address. It is a text entry (either textenclosed within “..”, or the address of a cell containing text)

• Units – is an optional text entry specifying the units of the variable (e.g “kg”, “miles”, “people”). It

is used in reporting the graphs and statistics of simulation results.

• RangeName – is a text entry used when several cells are collected together to produce a

forecast over time. All cells with the same output RangeName are collated within the ResultsViewer and displayed as a series as well as individual variables.

• PositionInRange – is an index variable (1,2,3,…) defining the order of the output amongst the

other output variables with which it shares the same RangeName.




512

VoseParameters

VoseParameters(Excel cell reference)

Returns the description of the parameters of ModelRisk functions in the specified cell.

• Excel cell reference - reference to an Excel cell.

When the cell that is referred to contains more than one ModelRisk function, the function will search for

the first ModelRisk function within the cell. Thus, for example, if cell A1 contains:

=VoseNormal(100,1)+VoseGamma(2,3)

then VoseParameters(A1) will return a description of the VoseNormal parameters. Also, if thereference is an array of multiple cells, VoseParameters will only look at the first cell of the array.




513

VosePrincipleEsscher

VosePrincipleEsscher(frequency distribution, severity distribution,

h)

This function calculates the insurance premium for given frequency and severity distributions using theEsscher principle.

• Frequency distribution - a frequency distribution object.

• Severity distribution - a severity distribution object.

• h - see the formula below.

For an insurance policy the premium charged must be at least greater than the expected payout E [ X ].Otherwise, according to the law of large numbers, in the long run the insurer will be ruined. The questionis then: how much more should the premium be over the expected value?

The Esscher method calculates the ratio of the expected values of XehX

to ehX

Premium = h > 0

The principle gets its name from the Esscher transform which converts a density function from f(x) toa*f(x)*Exp[b* x ] where a, b are constants.




514

VosePrincipleEV

VosePrincipleEV(frequency distribution, severity distribution,

theta)

This function calculates the insurance premium for given frequency and severity distributions using the

Expected value principle.



• theta - see the formula below.

For an insurance policy the premium charged must be at least greater than the expected payout E [ X ].Otherwise, according to the law of large numbers, in the long run the insurer will be ruined. The question

is then: how much more should the premium be over the expected value?

The Expected Value principle calculates the premium in excess of E [ X ] as some fraction θ of E [ X ]:

Premium = θ > 0

Ignoring administration costs θ represents the return the insurer is getting over the expected capital

required E [ X ] to cover the risk.




515

VosePrincipleRA

VosePrincipleRA(frequency distribution object, severity

distribution object, rho)

This function calculates the insurance premium for given frequency and severity distributions using theRisk Adjusted principle.



• rho - see the formula below.


The Risk Adjusted principle is a special case of the Proportional Hazards Premium Principle based oncoherent risk measures (see, e.g. Wang (1996). The survival function (1-F ( x )) of the aggregatedistribution which lies on [0,1] is transformed into another variable that also lies on [0,1]

Premium = ρ > 1

where F(x) is the cumulative distribution function from the aggregate distribution.




516

VosePrincipleStdev

VosePrincipleStdev(frequency distribution, severity distribution,

alpha)

This function calculates the insurance premium for given frequency and severity distributions using theStandard Deviation principle.



• alpha - see the formula below.


The Standard deviation principle calculates the premium in excess of E [ X ] as some multiple α of the

standard deviation of X :

Premium = α > 0

The problem with this principle is that, at an individual level, there is no consistency in the level of risk the

insurer is taking for the expected profit since σ has no consistent probabilistic interpretation.




517

VoseRuin

VoseRuin(ClaimInterval,ClaimSize,InitialReserve,PolicyPrice,Policy

SalesRate,Horizon,DividendThreshold, Discountrate)

The Ruin Calculation models scenarios for the cash flow that comes with an insurance policy: the

available funds are decreased by payment events of random size that occur randomly in time, andincreased by selling policies of fixed size.

A time horizon is set, and Ruin Calculation models whether or not we have a Ruin (i.e. funds droppingbelow zero within the time horizon).

An optional dividend threshold can be set. When the fund exceeds this threshold, a dividend is paid out toreduce the fund back to the threshold value.

A discount rate which is applied to the dividend cashflow to calculate the Net Present Value (NPV) of thefund in this scenario (i.e. the total value of the dividend stream within the time horizon).

• ClaimInterval - the distribution of time between each impact. Must be a distribution

object.

• Claimsize - the distribution of the size of each impact. Must be a non-negative

distribution object.

• Initial reserve - the funds available at time zero. This should be a value greater than

zero.

• PolicyPrice - the income generated by selling an individual policy. This should be a value

greater than zero.

• PolicySalesRate - the time between each income.


• DividendTreshold - (optional) the level of funds above which they are used for dividend.

• DiscountRate - (optional) the discount rate (per time unit) to be applied to the dividend

cashflows. This must be a value greater than or equal to zero.

As this function takes many parameters, we recommend using the Ruin window for performing thiscalculation to avoid errors.

The generated output is a 5 by 2 array containing the following values for the currently generated

scenario:

RuinTime - the time on which the resources are depleted if this happened before the time horizon.Otherwise this function returns -1.

RuinSeverity - the severity of depletion the first time the funds below zero (i.e. how deep the funds drop

below zero). Returns 0 if the funds never drop below zero.

RuinMaxSeverity - the deepest below zero the funds drop before the time horizon. Returns 0 if the funds

never drop below zero.

RuinFlag - returns 1 if the resources are depleted at some point before the time horizon and 0 if not.




518

RuinNPV - the Net Present Value (NPV) of the fund in the currently generated scenario (i.e. the total

value of the dividend stream within the time horizon taking into account the discount rate). If no dividendthreshold is set this returns zero.

The VoseRuinTime, VoseRuinSeverity, RuinMaxSeverity, RuinFlag and RuinNPV functions return eachof these outputs separately.




519

VoseRuinFlag



VoseRuinFlag is a function that only returns that part of the VoseRuin function that tells you if the

resources are depleted (returns 1) or not (returns 0) before a certain time horizon (Horizon).


object.




zero.


greater than zero.






As this function takes many parameters, we recommend using the Ruin window for performing this

calculation to avoid errors.




520

VoseRuinMaxSeverity



Returns the part of the VoseRuin function with the deepest level below zero the funds drop before the

time horizon. Returns 0 if the funds never drop below zero.


object.




zero.


greater than zero.






As this function takes many parameters, we recommend using the Ruin window for performing this

calculation to avoid errors.




521

VoseRuinNPV



Returns the part of the VoseRuin function with Net Present Value (NPV) of the fund in the currently

generated scenario (i.e. the total value of the dividend stream within the time horizon taking into accountthe discount rate). If no dividend threshold is set this returns zero.


object.

• Claimsize - the distribution of the size of each impact. Must be a non-negativedistribution object.


zero.


greater than zero.






As this function takes many parameters, we recommend using the Ruin window for performing thiscalculation to avoid errors.




523

VoseRuinTime



VoseRuinTime is a function that only returns that part of the VoseRuin function that gives you the time on

which the resources are depleted if this happened before the time horizon (Horizon).

Otherwise this function returns -1.




524

VoseRunoff

=VoseRunoff(N, TimeObject,{TimeStamps},ClaimsizeObject)

Loss reserving is very important for property and casualty insurance companies. For insurance policiesthat cover all damages or injuries occurred during the insured period the claims may be made or fullyregulated considerably after the insurance term. Future pay-outs have to be estimated for incurred but notreported (IBNR) claims to ensure that sufficient reserves are set aside that will cover the aggregate claimcost with a certain probability. The usual classification for IBNR is occurrence year versus reporting yearand expected costs are determined for each combination. However, this does not give a sense of the

distribution of costs over time nor their interdependence.

The VoseRunOff array function allows the stochastic modeling of costs over any desired period. Use thisfunction to model a number N of payment events appearing at random points in time, where each eventcan take a random size. VoseRunOff then models the total amount of payment appearing at eachyear/month... depending on the timestamps. The function parameters are as follows (using year as the

nominal measure of time):

• TotalClaims (n) - the total number of claims predicted to occur from a policy in a certain

occurrence year

• TimeObject - a distribution object describing the time from occurrence year until the year

of payout

• TimeStamps - an array of increasing points in time at which the payouts will occur

• ClaimSizeObject - a distribution object describing the possible size of a claim

You can also use this function to just count the number of events (instead of their total size) happening at

the timestamps by using a ClaimSizeObject that always returns 1 (E.g. VoseBernoulliObject(1))).

Example 1

You have 1000 agreed sales. Each sale will generate Lognormal(1.7,3.4) $k, but will take Weibull(2.6,4.3)weeks to complete. What does the income stream look like? The answer is shown in this example model:




525

Example 2

A total of Poisson(121) claims are expected to occur from events occurring last year but not yet reported.

The time until payout of a random claim follows a Lognormal(0.7,1.4) years distribution.




526

The size of a random claim in $1000 follows a Pareto(5,2) distribution:

We wish to model the payout per quarter for the next five years.

Note that the VoseRunoff function extends one cell beyond the TimeStamp array to show the total cost ofall claims that will occur beyond the last defined time point.

How VoseRunoff works

Imagine that the Poisson(121) generates a value of 115. The VoseRunoff function then generates 115values sampled from the time until payout distribution (Lognormal(0.7,1.4) in this example) and 115values from the claim size distribution (Pareto(5,2) in this example). It sorts each payout time into the binsdefined by the TimeStamp array and sums the corresponding claim sizes for each bin to give the outputvalues.

Payout time and claim size independence

Note the VoseRunoff algorithm assumes that the claim size and time until payout are consideredindependent. This may not be true. For example, where a very large claim occurs there may be someconsiderable dispute between the insured and insurer which could protract the time until payout. In this

situation, it would be better to separate claims out into different strata where each stratum is a range ofclaim sizes with an appropriate Payout time distribution, and then sum the resultant payout streams.




527

The VoseRunoff function uses Distribution Objects for the claim size and time until payment. Historic datawill usually be available for these variables, in which case one can use fitted distribution objects to directlyrefer to the data.




528

VoseSample

VoseSample({data})

This array function takes a random sample with replacement from a set of data. This is useful whenbootstrapping for example. The size of the output array determines the size of the sample taken. Fordirectly calculating a bootstrap estimate of a population parameter, use the VoseNBoot functions.

• {data} - an array with data to sample from

The output array must be smaller or equal to the size of the data array.




529

VoseShuffle

VoseShuffle({data})

Array function that returns the input array randomly shuffled in order. The output range and input arrayshould be of equal size.

• {data} - the input data array. Can contain numbers as well as strings.




530

VoseSimTable

VoseSimTable({List of values}, ValueWhenNotSimulating)

Example model

This function returns a value from the list of input values in a sequence corresponding to the currentsimulation number. It is used to rerun the same model several times, replacing a model variable with adifferent value for each simulation run.

• List of values – a list of values entered within curly brackets {…}, a cell range (e.g. A1:A5), or an

Excel range name.

• ValueWhenNotSimulating – is an optional parameter specifying the value that SimTable uses

when a model is not simulating. This extra control makes it easier to build a model and test how itbehaves.

Note: VoseSimTable returns the first listed value when a simulation is not running unless

ValueWhenNotSimulating has been provided.

How to use this function

1. Select one or more variables for which you wish to change the value between simulation

runs.

2. Write down the list of values you wish to use.

3. Use the VoseSimTable function in the cell of your model that represents the variable inquestion, and link it to the set of values you wish to use. For example, =VoseSimTable(A1:A5) ifthe values you want to use are in cells A1:A5. Alternatively, write the list directly within the

function, for example: =VoseSimTable({1,2,3,4,5}).

4. In the Settings | Model Settings dialog of ModelRisk change the number of simulations tomatch the number of values you wish to




531

use:

For example, =VoseSimTable({1,2,3,4,5}) has five values and changing the number ofsimulations to 5 as shown above will make ModelRisk rerun the model cycling through these fivevalues in order. Note that you will get the most direct comparison of results between simulations ifeach simulation uses the same sequence of random numbers, which is achieved by setting the

Seed generating option to Manual, and the Multiple Simulations seeds option to All UseSame Seed, as shown above.

5. If the number of simulations is greater than the number of values listed in VoseSimTable,ModelRisk will repeatedly cycle through the values. So, for example, if a cell contains the formula=VoseSimTable({1,2,3,4,5}) and ModelRisk is set to run 12 simulation runs, the value in that cellwill change with each simulation run following the sequence 1,2,3,4,5,1,2,3,4,5,1,2.




532

VoseSimulate

VoseSimulate(VoseDistributionObject)

Example model

Returns a randomly generated value from a distribution.

• VoseDistributionObject - a distribution object.

In many situations it is convenient to construct a distribution object in a separate spreadsheet cell, anduse VoseSimulate referring to that cell. For example if you want to use both sampled values and the

statistical moments from a given distribution, or if the distribution is constructed by fitting to data.

Example: parametric bootstrapping

Suppose we want to model the uncertainty about the population mean using parametric bootstrapping.Suppose we have reason to believe the data (stored in an array named Data) comes from a LogNormal

distribution.

1. Construct the fitted distribution by writing =VoseLogNormalFitObject(Data) in a

spreadsheet cell. Name this cell FittedDistribution.

2. Write =VoseSimulate(FittedDistribution) in a number of cells, generating a

random sample from the fitted LogNormal distribution.

3. Calculate the desired statistic of this sample (E.g. using the AVERAGE function fromExcel). This gives us the uncertainty distribution of the population statistic.




533

VoseTangentPortfolio

VoseTangentPortfolio(Expected Return,Deviation,Correlation

Matrix,Interest Rate, Labels)

Array function that uses the Capital Asset Pricing Model (CAPM) to find the tangent portfolio for a set of

assets: the composition of the portfolio that has optimal return rate for minimal variance (i.e. sensitivity formarket risk). This portfolio composition is returned as an array of asset weights (that sum to one).

• Expected return - array with the expected return of each asset

• Deviation - array with the standard deviation of each of the assets

• Correlation matrix - array with the matrix of correlation coefficients between the assets

• Interest rate - the risk-free interest rate

• Labels - (optional)

array that contains the namesof the assets

In the view of the CAPM model, twotypes of risk are at play for assets:

• The non-systematic

risk attached to an individualasset. This can be reduced (tothe point where it isneglectable) by diversifying

the portfolio, so this risk isalso known as diversifiablerisk.

• The systematic risk,

caused by the uncertainty ofthe market. This can be

thought of as the risk that isstill there when adding the

asset to a portfolio that isalready well diversified. This type of risk is called the non-diversifiable or market risk.

Sensitivity for the second type of risk (which is the most important, as the first can be diversified away),called the variance of the portfolio, is represented by beta coefficient in finance. An optimal portfolio isone that has the lowest variance - lowest beta coefficient - for a given return. In a variance-return plot,these optimal portfolio combinations make up the efficient frontier .

As total budget to invest is often a constraint when composing a portfolio, the quantities of each asset thatcomprise it are expressed in weights (proportions of the total budget). The budget constraint is accounted

for in the fact that the weights sum to one.

One other component can be incorporated. Rather than investing the entire budget in assets, one mightkeep part of the budget in cash, earning an (albeit lower) interest at the risk-free return rate. The

variance-return relationship of this is linear, and represented as the Security Market Line (SML).




534

Both components are optimally accounted for in the Tangent Portfolio: where the SML and efficient

frontier meet.




535

VoseThielU

VoseThielU({Series1},{Series2})

Returns Thiel's inequality coefficient that compares observed with estimated time series values.

• {Series1} - A series of observed time series values.

• {Series2} - A series of estimated time series values.

Thiel's inequality coefficient, also known as Thiel's U, provides a measure of how well a time series ofestimated values compares to a corresponding time series of observed values. The statistic measures the

degree to which one time series ({Xi}, i = 1,2,3, ...n) differs from another ({Yi}, i = 1, 2, 3, ...n). Thiel's U iscalculated as:

Thiel's inequality coefficient is useful for comparing different forecast methods: for example, whether afancy forecast is in fact any better than a naïve forecast repeating the last observed value.

The closer the value of U is to zero, the better the forecast method. A value of 1 means the forecast is no

better than a naïve guess.

Note that the formula is symmetric so switching Series1 and Series2 gives the same result.




536

VoseValidCorrmat

VoseValidCorrmat({matrix})

This array function returns the input matrix if it is a valid correlation matrix. If this matrix is not a validcorrelation matrix, then a valid correlation matrix is returned that is 'closest' to the input matrix.

• {matrix} - a symmetric matrix.

A correlation matrix enables the analyst to correlate several probability distributions together. The rankorder correlation coefficients are input into the cross-referenced positions in the matrix. Each distributionmust clearly have a correlation of 1.0 with itself so the top left to bottom right diagonal elements are all

1.0. Furthermore, because the formula for the rank order correlation coefficient is symmetric, as explainedabove, the matrix elements are also symmetric about this diagonal line. The table below is an example of(the upper half of) a correlation matrix.

There are some restrictions on the correlation coefficients that may be used within the matrix. Forexample, if A and B are highly positively correlated and B and C are also highly positively correlated, Aand C cannot be highly negatively correlated. For the mathematically minded, the restriction is that therecan be no negative eigenvalues for the matrix.

Examples

If the following matrix is entered as the input matrix:

then the VoseValidCorrmat function returns the same matrix because this matrix is a valid correlationmatrix.

If an invalid correlation matrix is entered, like:

then the function returns the following valid correlation matrix:




537




538

Database connectivity

ModelRisk database connectivity functions

See also: VoseDataObject,Data Object

ModelRisk offers a number of database connectivity functions that take a VoseDataObject as a parameterand return different properties of the referenced data.

Depending on the number of data columns that the VoseDataObject is referencing, the functions belowwill return a single (if number of data columns = 1) or multiple (one value per each column of data) values.

VoseDataMean(DataObject)

Returns the mean value of each data column

VoseDataStdev(DataObject)

Returns the standard deviation of each data column

VoseDataVariance(DataObject)

Returns the variance of each data column

VoseDataSkewness(DataObject)

Returns the skewness of each data column

VoseDataKurtosis(DataObject)

Returns the kurtosis of each data column

VoseDataCofV(DataObject)

Returns the coefficient of variance of each data column

VoseDataMin(DataObject)

Returns the minimum value of each data column

VoseDataMax(DataObject)

Returns the maximum value of each data column

VoseDataMoments(DataObject)

Returns the mean, variance, skewness and kurtosis of each data column

VoseDataRawMoments(DataObject)

Returns 4 raw moments of each data column

VoseDataPercentile(DataObject,Percentile)

Returns the percentile value of each data column

VoseDataProbability({x},DataObject,Cumulative)

Returns the probability mass (if Cumulative = FALSE) or cumulative probability (if Cumulative = TRUE)

for an array of x-values for each data column

VoseDataRow(DataObject, Row)

Returns data array from the given row




539

VoseDataRowCount(DataObject)

Returns the number of rows in the referenced data

VoseDataRowLabel(DataObject)

Returns the array of labels for the referenced data

VoseDataSample(DataObject)

Returns a randomly sampled value from each data column




541

Vose Data Object main window

view

In the “Define Data Source” field you can define the source of the data. The two buttons on the right ofthis field allow the creation of a new data source, which can be either a link to worksheet range (left

button), or a link to a database (right button).

If connection to a database needs authorization, check on the “Authorization

needed” field and fill in the details for the “Login” and “Password” fields.

Checking the connection with the data source can be done by clicking the “Verify database connection”button. If the check was done successfully, you will get a confirmation message.

Linking to data in the databases can be done by typing the SQL queries directly into the Query string fieldor using the Query constructor (click “Wizard” button).

Query constructor window view.




542

Query constructor window has three tabs:

“Select data source”

This tab is for constructing the main query line. The “Database table fields” lists all database tables andfields that the user can connect to. Just move all required fields into the “Selected fields” list. The “Query

string” field below will show the main query line for the selected data.

“Define filter options”

This tab is for filtering the selection made in the first tab.

Filtering consists of two levels of filters: Joining condition and Filter condition.

In the Joining condition you can specify the logic for combining the filters by selecting necessary valuefrom the list:

Filter condition is set by the left argument (“Condition Left argument” field), comparison sign (“Condition”field), and the right argument (Condition Right argument” field). Arguments can be single values as wellas database table fields.

To select a database table field as a condition argument, the user should click the following button:

Comparison sign should be picked up from the list (“Condition” field):




543

When the filter is created it should be added to the filters list by clicking “Add filter to list” button. To delete

it from the filters list, select it and click the “Delete filter” button.

Query string with the comprised filters is reflected in the “Query string” field.

“Define sorting options”

This tab allows adding sorting options to the selected entries. The left pane (Database tables fields”) listsall fields that are available for sorting. To sort the data, select the fields that need sorting and move themto the “Sorted fields order” list choosing the sorting direction in the control above.

The final query string will be reflected in the “Query string” field.

When the query is constructed, press the “OK” button and you will get back to the main window “VoseData Object”.

If desired, you can click the “Run query” button. This will run the constructed query provided that queryhas been constructed correctly. The “Query results” window will then display the query results in tabular

form.

Query results window view




544

The tabulated data can be exported into Microsoft Excel or Microsoft Word by checking the requiredExport type and clicking the “Export” button.

Attention: avoid exporting large data sets to Word, as it can take long time.

By closing the results window, you will get back to the main window “Vose Data Object”. After clicking the

“OK” button, a VoseDataObject() function with the parameters (reference to database/range on theworksheet , selection query etc.) will be placed in the range that was specified in the “Output location”field.




545

VoseDataObject

VoseDataObject(DataSource, Volatile)

Example model

This function defines a ModelRisk Data Object linked to a data source.

• DataSource – can be either an SQL query to a database, or a link to a spreadsheet range.

• Volatile – An optional Boolean parameter (FALSE by default) defining whether the data should

be retrieved from the data source with each spreadsheet recalculation. Setting to TRUE is

generally not required and will slow down a simulation run considerably.For more information about using the VoseDataObject function see the topic about Data Object window.




546

PK/PD module

PK/PD module

See also: VoseODE

Overview

ModelRisk’s PK/PD (Pharmacokinetic /Pharmacodynamic ) tool allows the user to define a PK/PD modeland return simulation results in Excel. It uses the fourth order Runga-Kutta method (also known as RK4)for generating projections, a more sophisticated method based on the same principles as the Euler

method, because of RK4’s much greater accuracy and computational efficiency. Input parameters to themodel can be linked to ModelRisk’s probability distribution functions within Excel allowing the user to see

the uncertainty in the resultant model outputs.

PK/PD menu

Three PK/PD – related buttons are available in a special sub-menu on the ModelRisk Ribbon:

1. Library, which opens a library of PK/PD models, offering the options to select/use a particularmodel

2. Create model, which opens the interface that allows creating custom PK/PD models3. Convert NONMEM, which allows importing a NONMEM output f ile into Excel/ModelRisk.

Library

The Library is organized into 3 categories: PK, PD and PK/PD models. For each category there are 2

sub-categories: Standard and User-defined models.




547

The Standard models come with the ModelRisk installation file, and the User-defined models are createdby users and stored on the user’s PC. All library models are saved into a "ModelRisk Library" folder underuser’s "My Documents". For each Standard model in the Library, there is a Name, a Description, and aschematic picture of the model. For each user-defined model, the interface allows the user to define aName and a Description. However, the schematic picture of the model can be set to a user-defined model

manually, by copying the picture of the model to the same folder in the ModelRisk Library as the modelfile, and making sure that the picture file name matches the model file name. For example, if the modelfile name is "1-CMT CL.pt", then the name of the picture file should be "1-CMT CL.png". The pictureformat should always be PNG and the picture dimensions set to 230x152 pixel.

To use a Library model, select the model and click the "Select" button, which will load the model andproceed to the next interface window, the Dosing Regime window.

To delete a User-defined Library model, select the model and click the Delete button.

If a user wishes want to send user-defined models to another user, s/he would need to go to "My

Documents\ModelRisk Library\PKPD" , find the user-defined model files (e.g. "1-CMT CL.pt"), and send

them to the other user, who would needs to place them in the ModelRisk Library on his or her computer.

Create model

The Create model interface enables users to create user-defined PK/PD models and save them in theLibrary.




548

The user has the option to add an existing Library model (or several) into the window and modify it, or

simply start from scratch. Each user-defined model needs a Name and a Description, and then a 4-stepprocess:

1. Differential equations. This is where the differential equations need to be entered. Eachequation needs to take the following form:dx/dt=par*x Each differential equation needs to have a "/dt" part, i.e. differentiation is always over t (time). "x"and par may be any letter or set of letters and numbers, but cannot start with a number.

2. Initial condition. This sets the initial condition for the differentiated variables, i.e. the values thatthe variables will have at time zero.

3. Model outputs. Model Outputs are entered in the form of expressions, e.g.y=x*V

4. Model parameters. This last section prompts for the default parameter values. It is important tonote that all time-dependent parameters have units = Hours (or 1/Hours). E.g. if K01 is a rateparameter and is equal to 2, then the model will interpret it as "2 per hour".

When all steps are complete, one can either save the model to the Library (by clicking the "Save" button),or click "Next" to proceed to the next interface without saving the model.

Dosing regime

The Dosing regime window controls doses into model compartments:




549

The window above has a picture of the model on the top left, a table for inserting the dosing logic at thebottom, and a chart visualizing the selected dosing regimen on the top right.

Doses can be added by clicking the "…" button and selecting the required configuration for each entry.Switching a dosing entry from "Single" to "Multiple" unlocks the "Dose Interval" and "Dosing Duration"controls, which allows specifying the interval and duration of the dosing. Changing the "Dose Input Rate"from "Instant" to "Zero-order" unlocks the "Input duration" control, which allows specifying the duration ofthe dosing injection. Doses may be introduced into any model compartment.

PK/PD model

This is the main PK/PD module window that provides a visual representation of the PK/PD model.




550

The left side of the window has four input tables:

1. Differential equations. This is a read-only table that shows the differential equations used in themodel and provides the option to view them on the chart.

2. Initial condition. This section allows setting the initial condition for the differentiated variables.3. Model parameters. This section is used for initializing the model parameters. Note that the time-

dependent model parameters are in the units of "hours".4. Model outputs. This section lists the model outputs and allows setting of a threshold for eachmodel output. If a threshold is set, the model will return the "time above threshold" to Excel. The"time above threshold" is calculated as the total time starting from 0 to the largest value in the"Output Time Points" list where the selected output exceeds the threshold.

The upper part of the window has input controls for the Output Time Points, i.e. the time points at whichthe output values are required. The reference lines showing the Output Time Points on the chart can be

switched on and off via the control below the chart.

If any of the model parameters are linked to volatile cells in Excel (e.g. probability distributions), clickingthe F9 key refreshes the interface and the chart with the new values.

One or two variables can be displayed at the same time and against a different x-axis variable selected atthe top right of the screen.

Placing the model into Excel is done by clicking the "To Excel" button, which prompts the user to selectthe upper-left corner of the area where the model needs to be pasted. Since the model inputs and outputsoccupy a large spreadsheet area, it is better to paste the model into a blank worksheet. This output alsoincludes the variables "Cmax" and "Tmax" that return the maximum value and time of maximum value for

each compartment and Model output from 0 to the largest value in the "Output Time Points" list.

Using View Function to return to a window

Some model inputs can be edited directly in the Excel sheet (e.g., change model output times), or you

can re-open the window for a ModelRisk function that is in a spreadsheet cell by using View Function.Select the spreadsheet cell and then select View Function from the ModelRisk menu/toolbar/ribbon.




551

NONMEM converter

The NONMEM converter is able to input native NONMEM output files and paste the extracted modelparameter estimates and covariance matrix (if available) to Excel, preserving the correlation structure

between the parameters. Appropriate multivariate normal distribution arrays (VoseMultinormal) areimplemented, to allow sampling from random-effect parameters and from the covariance matrix toaddress parameter uncertainty.

Clicking the "Convert NONMEM" icon from the PK/PD menu under "More Tools" will pop-up a file browsedialog prompting to select the NONMEM output file. Once the file is selected, the converter produces an

Excel worksheet with extracted model parameters that can be used as inputs in ModelRisk’s PK/PDmodule.

The output structure includes the NONMEM "PROBLEM NO." and any character description of theProblem (if provided). If the covariance matrix is included in the file, three tables are produced. If the

covariance matrix is not included, only the first table is produced.

Output tables:

1. Final Parameter estimates (THETAs, OMEGAs, SIGMAs), with multivariate normal distributionsto sample from OMEGA and SIGMA matrices.

2. Parameter estimates including parameter uncertainty ("Unc"). Structure is similar to above, butthe parameter estimates also are randomly sampled from a multivariate normal distribution.

3. The covariance matrix, used to provide parameter estimates ("Unc") that include parameteruncertainty.

The values of "Sample", "EstimateUnc", "MeanUnc", and "SampleUnc" can be refreshed by clicking F9,or randomly sampled a number of times according the value set in the "Samples" window.




552

Simulation Imported Data Files (SIDs)

SIDs (Simulation Imported Data Files)

Purpose of a SID

Creating a SID

Using a SID

Managing SIDs

A SID is a database of simulation values for one or more variables that can be used to generate a pre-

defined sequence of random samples within your ModelRisk model.

Purpose of aSID

The primary uses of SIDs are:

1. Consistency between models sharing the same variables

The same SID can be used in several different models, where the variables within the SID are directlyused in these models. By using SIDs we can be certain that the models are consistent. More importantly,we can be sure that a specific sample of a simulation of one model will exactly correspond with the sameparticular sample of other models using the same SID. This means that we can combined the modelstogether or, for example, directly compare two different options (built in two different models) sample by

sample.

2. Distributing or updating corporate stochastic forecasts of important variables

Corporate planning and treasury departments, etc. can create SIDs of their up-to-date official stochasticforecasts. Sharing those SIDs with colleagues makes it quick and simple to incorporate stochasticforecasts using the most up-to-date information.

3. Importing data from applications like OpenBUGS and other simulation programs

Many simulation tools have the ability to export their simulation data directly to Excel, or into a format thatExcel can then read. In ModelRisk, you can create a SID from these data and thereby incorporate thesimulated variable results into your model, preserving and correlation pattern that is present in thesimulated data.

Creating a SID

You can create a SID from two different data sources:

1. From the Results Viewer application.




553

Go to the Insert tab in the Results Viewer and click on the Data list button:

Select the variables to save as a SID on the left by checking on the checkboxes next to the requiredInputs or Outputs:

Go to the "List options" tab and click the “Create SID” icon in the Results Viewer ribbon.




554

This will open a dialog box. Enter a SID name (this must be unique, i.e. not a name that has already beenused for another SID), add a description if you wish, and click OK.

ModelRisk will confirm the SID has been created:

2. From an Excel Spreadsheet

Select the Library icon in the ModelRisk ribbon:

Click on the SID tab:




555

Click on the ‘Create SID from spreadsheet’ icon:

Enter the descriptive information in the dialog box:

Click OK. A window appears describing the format in which the spreadsheet data needs to appear:

Click OK. Then select the range of cells containing the data and header labels:




556

Click OK. ModelRisk confirms the creation of the SID, which now appears in the SID tab of the ModelRiskLibrary:




557

Alternatively , you can select the data in the spreadsheet, right-click and select ‘Create SID’ from themenu and follow the same procedure.

Using a SID

To enter a SID into your model, select the sheet in which you wish to place the SID. Then go to the SID

tab in the ModelRisk library, select the SID you need and click the ‘Insert to spreadsheet’ icon:

Specify where the SID should be placed. Note that the dialog box informs you of the size of the array

needed. You can enter the SID as either a horizontal or vertical array:

Click OK. A VoseSID array function is then added to your spreadsheet.

Managing SIDs

Sharing a SID

It can be very useful to share SID files with other modellers in your organisation. This ensures that you

are building models on the same quantitative uncertain assumptions.

Exporting a SID

To share a SID you have created with colleagues, select the SID in the ModelRisk Library and click the

‘Export selected SID’ icon:

You will then be prompted to browse to a location in which you want that SID to be copied. Select alocation on your intranet that other modellers have access to. The file name follows the formatSIDname.sid, where SIDname is the name originally provided by the user.

Importing a SID




558

Another modeller can now import the SID that has been exported by opening the ModelRisk Library ontheir computer and clicking the ‘Import new SID’ icon:

The SID will now appear in their ModelRisk Library, showing that it is imported. In this screen capture the‘Terramore results’ SID has been imported, so an import path is shown:

The SID has been imported into the ModelRisk Library. This allows the modeller to use the SID withouthaving to have access to the original file location at the time the SID is being used.

Updating an imported SID

An exported SID can be updated by simply overwriting the old SID, though care should be taken toensure that the same variables appear in the new version.

A modeller using an imported SID can update it by selecting the relevant SID and clicking the ‘Updatelinked SID’ icon:

The SID files will take the format: SIDName.sid

SIDs can be exported and imported. A user can select a SID and click “Export” – this will prompt for

Deleting a SID




559

Select the SID and click

Changing name and/or description of a SID

Select the SID and click

Drop down menu

The same SID management operations are available from the SID menu in the ModelRisk Library:




560

VoseSID

=VoseSID(NameSID, Sampling)

An array function for entering SIDs (Simulation Imported Data Files) into your model.

The function VoseSID allows two parameters:

• NameSID - the first parameter (required) is a text string on the name of the SID;

• Sampling - the second parameter (optional) allows you to control how the values are

drawn from the SID:

• 0, FALSE or omitted – the function will sample values during a simulation row by row in

the order in which they were provided. This is the most common option and allows you torun simulations of different model using the same SID and know that each sample in the

models will have matching assumptions. Note – if, for example, your SID contains 5000rows but you run a simulation of 10000 samples, the function will loop twice through theSID data;

• 1 or TRUE - – the function will sample values during a simulation row by row in a random

order.



561

ModelRisk CONVERTER

@RISK model converter

@RISK is a Monte Carlo Excel add-in from Palisade Corporation.

ModelRisk includes a tool to allow you to translate models with @RISK functions into the same modelusing ModelRisk functions. The tool can be accessed through the main ribbon for ModelRisk Standard:

And via the More Tools dropdown list in ModelRisk Professional or Industrial versions:




562

This opens the following dialog box:

If more than one Excel workbook is open, the converter will operate on the active workbook only. Werecommend that the Rename file option is selected to ensure that you do not overwrite your model. Bydefault this will create a new workbook stored in the same directory as the active workbook with‘Converted’ appended to the name, but you can enter a different name in the New name field.

We also recommend that you select the Show conversion report when finished option as this will show

you whether a complete conversion has been accomplished.

Note: it is not necessary for @RISK to be running or be installed in order to perform the

conversion.

Now click the Convert button . ModelRisk will search your model for all @RISK functions and replace

them with the equivalent ModelRisk functions where possible. It will also automatically save the new

converted model with the specified file name. At the end of the conversion, the following window will openprovided that the Show conversion report when finished option has been selected:



ModelRisk CONVERTER

563

This lists all of the cells in which there were @RISK functions and shows the original and convertedformulae together with a comment on whether ModelRisk was able to find a suitable formula.

If you have also selected the Output the conversion report to Excel option, the same table will appear in

a spreadsheet as text (i.e. without the “=” sign for formulae) so that the model does not include extraunrelated equations:

Incomplete conversion issues

The @RISK converter does not currently convert any @RISK VBA functions.

Not all @RISK spreadsheet functions have an exact equivalent in ModelRisk. For example, theRiskCompound function in @RISK is equivalent in purpose to VoseAggregateMC orVoseAggregateDeduct in ModelRisk, depending on the options selected in RiskCompound, but

ModelRisk uses distribution objects to define the variables to be summed, whilst @RISK does not. Theconversion report identifies where there is no direct conversion possible.

@RISK has some functions that calculate properties of distributions. For example:

=RiskTheoMean(RiskNormal(100,10))

will return the mean of the Normal(100,10) distribution.

The equivalent in ModelRisk would be:

=VoseMean(VoseNormalObject(100,10))

Both formulae will return the same value of 100. Note that ModelRisk uses a distribution object function to

define the normal distribution and distinguish it from functions that take random samples from thedistribution. The converter is not able to convert such formulae because, for example, an @RISK modelmight have:

A1: =RiskNormal(100,10)

A2: =RiskTheoMean(A1)

A3: =RiskOutput()+A1^2

The formula in A1 serves two purposes: to define a Normal distribution, and to sample from it. Afunctionally equivalent model in ModelRisk would be:

A1: =VoseNormalObject(100,10)

A2: =VoseMean(A1)




564

A3: =VoseOutput()+VoseSimulate(A1)^2

RiskMakeInput, RiskCollect

@RISK has two functions that will collect generated values for sensitivity analysis:

1. RiskCollect()

This is imbedded within a distribution sampling function, for example:

=RiskNormal(100,10,RiskCollect())

ModelRisk converts this function to VoseNormal(100,10) and ignores the RiskCollect() part because it isincompatible with how ModelRisk specifies input variables for sensitivity analysis. If you wish to make thecell a ModelRisk input for sensitivity analysis, add a VoseInput function as follows:

=VoseInput(“Name”)+VoseNormal(100,10)

If you had a formula with two or more RiskCollect functions, we recommend you separate out the formula.So for example, change:

A3: =RiskNormal(100,10,RiskCollect()+RiskGamma(2,3,RiskCollect())

To:

A1: =VoseInput()+VoseNormal(100,10)

A2: =VoseInput()+VoseGamma(2,3)

A3: =VoseInput()+A1+A2

This has the benefit of making it clear what exactly the output sensitivity is to each distribution, which maynot be apparent in the @RISK formulation above.

2. RiskMakeInput

This is wrapped around a formula, for example:

=RiskMakeInput(3+RiskNormal(100,10)+RiskGamma(2,3))

ModelRisk removes the RiskMakeInput function and replaces it with a VoseInput function, so the aboveformula would appear as:

=VoseInput()+3+VoseNormal(100,10)+VoseGamma(2,3)

RiskTheo statistical functions

@RISK has a number of statistical functions reporting probabilities, etc for @RISK distribution samplingfunctions. ModelRisk does not allow this, because it uses distribution objects to query properties. For

example, if cell A1 contains the formula ‘=RiskGamma(2,3)’, in @RISK you can write:

=RiskTheoMin(A1) which returns 0, the minimum value that the Gamma distribution may take.

To do the same in ModelRisk we define the Gamma distribution as an object:

A1: =VoseGammaObject(2,3)

Then =VoseMin(A1) returns the same value of 0.

Note that some properties of a distribution are infinite (in which case ModelRisk returns “+Infinity”) orundefined (in which case ModelRisk statistical functions return “Undefined”). Tested versions of @RISKreturn #VALUE!.



ModelRisk CONVERTER

565

The following table lists @RISK’s RiskTheo functions in alphabetical order and their ModelRiskequivalents, assuming that cell A1 contains an @RISK distribution sampling function or the equivalentModelRisk distribution object function:

RiskTheoKurtosis(A1) VoseKurtosis(A1)

RiskTheoMax(A1)

VoseMax(A1)

RiskTheoMean(A1) VoseMean(A1)

RiskTheoMin(A1) VoseMin(A1)

RiskTheoMode(A1) No equivalent, since this is often undefined

RiskTheoPercentile(A1,P) VoseSimulate(A1,P)

RiskTheoPercentileD(A1,Q) VoseSimulate(A1,1-Q)

RiskTheoPtoX(A1,P) VoseSimulate(A1,P)

RiskTheoQtoX(A1,Q) VoseSimulate(A1,1-Q)

RiskTheoRange(A1) No equivalent

RiskTheoSkewness(A1) VoseSkewness(A1)

RiskTheoStdDev(A1) VoseStdev(A1)

RiskTheoTarget(A1,x) VoseProb(x,A1,1)

RiskTheoTargetD(A1,x) 1-VoseProb(x,A1,1)

RiskTheoVariance(A1) VoseVariance(A1)

RiskTheoXtoP(A1,P) VoseProb(x,A1,1)

RiskTheoXtoQ(A1,Q) 1-VoseProb(x,A1,1)

Note that there is considerable redundancy among the @RISK functions, which may cause someconfusion:

RiskTheoPercentile(A1,P) = RiskTheoPtoX(A1,P)

RiskTheoPercentileD(A1,Q) = RiskTheoQtoX(A1,Q)

RiskTheoTarget(A1,x) = RiskTheoXtoP(A1,P)

RiskTheoTargetD(A1,x) = RiskTheoXtoQ(A1,Q)

Difference in modelling correlation

@RISK uses rank order correlation with a method developed by Iman and Conover some 30 years ago(Iman and Conover, 1980; Iman and Conover, 1982). Iman and Conover’s technique gives very similarresults to using the multivariate Normal copula in ModelRisk. @RISK uses RiskCorrmat, RiskIndepC, and

RiskDepC functions to produce correlations between variables. In contrast, ModelRisk simulates fromcopulas and connects the copula values directly to the appropriate distributions using the optional Uparameter. If there is any correlation in your model, the converter will create a separate sheet calledModelRiskCorrelation in which it will place the ModelRisk copula functions, and it will connect thecopulas to the distributions in your model. Note that ModelRisk offers many types of correlation structures(i.e. copulas, which are the more modern approach to modelling correlation), and can estimate correlationstructures from data, so you may wish to take the opportunity to update your model with a moreappropriate correlation structure.

The converter does not currently convert the multiple incidence feature of RiskCorrmat.

Discrepancies between calculated values

@RISK and ModelRisk use different numerical methods for estimating properties of distribution like itsmoments (mean, variance, etc). ModelRisk uses known equations for calculating moments where they




566

exist (i.e. where there is a known formula using the distribution parameters), and it appears that @RISKdoes the same. However, where distributions are truncated there do not generally exist any exactequations for the moments, and @RISK and ModelRisk results may differ significantly. ModelRisk usesadvanced adaptive numerical integration for continuous distributions, returning calculations with very high

accuracy, and summation techniques for discrete distributions. In this spreadsheet you can test a few

@RISK and ModelRisk estimates of a truncated distribution’s mean where a formula also exists.ModelRisk does not use these special case formulae, but does use the same numerical methods for all

distributions, so this should provide you with a neutral test of the accuracy of the approach of eachproduct.

Different approach to error generation

The parameters of many distributions are restricted to lie within certain ranges. For example, a Normaldistribution is defined by its mean and standard deviation, and the standard deviation cannot be negative.

In versions we have tested, @RISK and ModelRisk take a different approach:

=RiskNormal(100,-10) returns #VALUE!

= VoseNormal(100,-10) returns Error: sigma must be >=0

If you deliberately use the #VALUE! as part of your model logic, you may get different results withModelRisk. For example, using Excel’s ISERROR or ISERR functions will generate FALSE for theModelRisk error message, but TRUE for @RISK’s #VALUE!

Problems and suggestions

If you come across any problems in converting your models that are not described above, or have asuggestion to improve the converter, please send an email to [email protected].



ModelRisk CONVERTER

567

Crystal Ball model converter

Crystal Ball is a Monte Carlo Excel add-in from Oracle Corporation.

ModelRisk includes a tool to allow you to translate models with Crystal Ball entries into the same model

using ModelRisk functions. The tool can be accessed through the main ribbon for ModelRisk Standard:

And via the More Tools dropdown list in ModelRisk Professional or Industrial versions:




568

This opens the following dialog box:

If more than one Excel workbook is open, the converter will operate on all active workbooks, so we

strongly recommend that you only open one Crystal Ball model at a time. We also recommend that theRename file option is selected to ensure that you do not overwrite your model. By default this will create a



ModelRisk CONVERTER

569

new workbook stored in the same directory as the active workbook with ‘Converted’ appended to thename, but you can enter a different name in the New name field.

We also recommend that you select the Show conversion report when finished option as this will showyou whether a complete conversion has been accomplished.

Note: it is necessary for Crystal Ball to be running to perform the conversion.

Now click the Convert button . ModelRisk will search your model for all Crystal Ball entries and replacethem with the equivalent ModelRisk functions where possible. It will also automatically save the newconverted model with the specified file name. At the end of the conversion, the following window will openprovided that the Show conversion report when finished option has been selected:

This lists all of the cells in which there were Crystal Ball entries and shows the ModelRisk formulaetogether with a comment on whether ModelRisk was able to find a suitable replacement.

If you have also selected the Output the conversion report to Excel option, the same table will appear in

a spreadsheet as text (i.e. without the “=” sign for formulae) so that the model does not include extraunrelated equations:

Incomplete conversion issues

The Crystal Ball converter does not currently convert any Crystal Ball VBA functions. It also does not

convert ‘categorical’ decision variables.

Not all Crystal Ball entries have an exact equivalent in ModelRisk. For example, certain alternativeparameterisations of distributions are not supported within ModelRisk because they are not alwayssolvable. The Crystal Ball Custom distribution is also not supported because it has many different

parameter interpretations.

Difference in modelling correlation

Crystal Ball uses rank order correlation with a method developed by Iman and Conover some 30 yearsago (Iman and Conover, 1980; Iman and Conover, 1982). Iman and Conover’s technique gives very




570

similar results to using the multivariate Normal copula in ModelRisk. Crystal Ball’s Define Correlationdialog allows the user to produce correlations between the assumption variable in question and more ormore others. However if variable A is correlated to Variable B, and Variable B to Variable C, there is animplied range of correlation between A and C. The implied correlation matrix is calculated behind thescenes and not reported. In contrast, ModelRisk simulates from copulas and connects the copula values

directly to the appropriate distributions using the optional U parameter. It requires that a completecorrelation matrix be defined for connected variables. If there is any correlation in your model, the

converter will create a separate sheet called ModelRiskCorrelation in which it will place the ModelRiskcopula functions, and it will connect the copulas to the distributions in your model. It will also correct thecorrelation matrix is the Crystal Ball entries are not consistent. Note that ModelRisk offers many types of

correlation structures (i.e. copulas, which are the more modern approach to modelling correlation), andcan estimate correlation structures from data, so you may wish to take the opportunity to update yourmodel with a more appropriate correlation structure.

Converting decision variables

Crystal Ball decision variables are converted to ModelRisk VoseSimTable functions. ‘Categorical’decision variables are not converted.

Converting forecast variables

Crystal Ball forecast variables are converted to cells marked with ModelRisk VoseOutput functions.

Problems and suggestions

If you come across any problems in converting your models that are not described above, or have asuggestion to improve the converter, please send an email to [email protected].



ModelRisk CONVERTER

571

More on Conversion

ModelRisk includes conversion tools for the two most common competing risk analysis Excel add-ins:@RISK from Palisade Corporation; and Crystal Ball from Oracle. There are some differences in how

@RISK and Crystal Ball behave in comparison with ModelRisk. Although the converters will handle mostmodels, we recommend that you try running the original models and converted models and compare theresults to be sure the conversion has been performed correctly.

Services

Vose Software offers a fee-paying conversion service for models built in any Excel Monte Carlo add-inyou use. We can also convert some Monte Carlo models built in other modeling environments. The

service provides a complete conversion of your model, modifying its structure (upon request) where it can

be made more efficient or clearer, and testing and documenting the results against the originalapplication. We will explain the reasons for any numerical differences should they occur between theresults of the original and converted models.

Our consultants can also provide an auditing service at the same time, checking logic, appropriate use ofdistributions, correlation, etc. and suggesting improvements to your model that may be possible with theextra capabilities that ModelRisk offers.

Fees are based on a US$90 (in 2011) hourly rate. Estimates can be provided on request on submissionof a model. Please note that you may need to send a confidentiality agreement from your company inadvance of providing a model and, in any event, you should take steps to remove any commerciallyconfidential information from the model.

For further information, please contact us as follows:

Email: [email protected]

Tel: +32 498 504 544

Post: Iepenstraat 98, Gent 9000, Belgium



573

ModelRisk RESULTS VIEWER

ModelRisk Results Viewer

A common problem risk analysts face is how to distribute the Monte Carlo results of their

models, particularly if the person receiving the results does not have a copy of your

modeling software.

The ModelRisk Results Viewer solves this problem. It is a FREE stand-alone application(it does not install as an Excel add-in) that will read the simulation results files created by

ModelRisk. To share your model results, take the following steps:

1. Ask the recipient to download a copy of the ModelRisk Results Viewer from

www.vosesoftware.com/resultsviewer.php.

2. Once your simulation run is complete, produce the graphs and tables you want in

as many tabs as you need in the ModelRisk Simulation Results window.

3. Click the Save button and give a name for the results file. A file will be

created with ModelRisk’s .vmrs extension.

4. Share this file with the recipient. The file will show the ModelRisk icon in theselected folder on their PC if the ModelRisk Results Viewer has been loaded:




574

5. Double-clicking the file will automatically open the ModelRisk Results Viewer. All the

graphs that you created, including formatting, are preserved.

6. The recipient can interrogate the plots, add/remove variables, change formatting,

add/remove markers, create new plots – essentially everything that you can do in

ModelRisk, and then save all these changes in new file if desired. What the recipient

cannot do, of course, is change the actual underlying simulation data.

The ModelRisk Results Viewer contains a built-in help file:




575

ModelRisk Results Viewer layout

The ModelRisk Results Viewer will open simulation results files produced

by a ModelRisk user.

On the left is a list of the Outputs and Inputs of the simulated model which have been

defined by the VoseOutput and VoseInput functions. On the right is the selected graph.

Graphical reports

The ModelRisk Results Viewer opens the file with the graphs and statistical reports from a

simulation run. There may be several graphs and reports, one in each tab shown at the

bottom of the screen. The graph type can be changed by clicking any of the graphical icons:

These will display, in order:




576

Histogram plots;

Cumulative ascending plots;

Cumulative descending plots;

Box plots;

Pareto plots;

Time series plots;

Spider plots;

Scatter plots; and

Tornado plots

Click the link for each plot type to view a detailed description of its use and meaning.

General ControlsThe set of general controls is available on the Home Ribbon:

The View section of the Home Ribbon has the following control(s):

• Full screen: toggles the full screen mode for the active chart

• Outputs, Inputs, Statistics checkboxes: these are used to show and hide

corresponding panes with the lists of Outputs, Inputs and with Statistics for the

variables on an active chart.

• Windows: expands a drop-down menu with the options to duplicate a current

chart, rename a chart tab, and activate a chart tab.

The Variables section of the Home Ribbon has the following controls:

• Filter




577

Simulation results can be filtered so that one can look specifically at sets of generated

scenarios, as follows:

(1) Select the input or output of interest

(2) Click the filter icon . This opens a dialog box:

(3) Select how you wish to filter the simulation data. In this example, the results are filteredto show generated scenarios in which the selected output’s value is less than or equal to

zero. Click OK.

(4) The results shown are now filtered as required. The figure below shows the modified

histogram for the output, and also a small filter icon against the Output listing to show that

a filter is active:

Hovering over the filtered output with the mouse shows the filter that has been applied as a

tool tip pop-up:

To edit or delete the existing filter just click Filter button while filtered output is selected.

The following window will appear:

• Sort: Allows sorting the list of Inputs and Outputs.

Sort icon:

The Chart section of the Home Ribbon has the following control(s):




578

• Copy

• Print

• Zoom

Their functionality is explained below:

Editing, copying, zooming and printing graphs

Each ModelRisk result graph can be edited by right-mouse clicking over graph components

like titles. The user can zoom in on a section of the graph by clicking and then

selecting a region to display.

Graphs can also be copied at a Bitmap or Metafile by using following menu:

Graphs can be printed by clicking

The Report section of the Home Ribbon has following control(s):

• Report

Clicking on the Report icon:

opens the following dialog box:




579

Selecting ‘Report selected Charts’ will create a report in Excel that is a replica of the pages

the user has created in the ModelRisk Simulation Results window. Ticking the ‘Charts’ box

will place the charts you have created in Excel. Ticking the ‘Values’ box will place into thespreadsheet all the data used to create the reports which can be used for further analysis if

required.

Selecting ‘Report all variables’ will generate the ticked reports for all inputs and outputs.

One should be careful using this second option if there are a lot of inputs and outputs

because it will generate a very large file.

The Advanced tools section of the Home Ribbon has following control(s):

• Go To Sample: If the simulation is performed with the Go To Sample feature

turned on, the Go To Sample functionality becomes active.

Go To Sample icon:

Statistical and data reports

ModelRisk offers three kinds of statistics and data reports:

Table of all generated input and output values

Clicking on the Data List icon:




580

opens a list of all generated values, sorted by the order in which they were generated:

Clicking on a column selects the data. Right-click then allows one to copy these data andthen paste into another document (Word, Excel, etc) for further analysis. CTRL-Click allows

you to select several non-contiguous columns of data. SHIFT-Click allows you to select a set

of contiguous columns.

Clicking the header allows sorting the data according to the selected column. The arrow

pointing down and up indicates Descending and Ascending types of sorting correspondingly:

If the simulation is performed with the Go To Sample feature turned on, the Go To Sample

functionality becomes available in the pop-up menu if you right-click on a specific value:




581

The Go To Sample feature allows loading the selected sample into the spreadsheet model

and reproduce the exact simulation sample in full, i.e. all model cells will show exact same

values as during simulation at the selected sample. This is useful when, for example, one

wants to see how exactly the largest (smallest) value of the output was produced and what

were the values of other intermediary calculation cells.

Table of statistics

Clicking on the Statistics icon:

opens a list of statistics for the selected inputs and outputs:

Clicking the Options button allows you to increase the number of percentiles reported.

Pages (tabs)




582

Right-clicking any tab name allows you to rename the tab, or to make a copy. Making a

copy is useful if, for example, you wish to show two slightly different versions of the same

plot e.g. the same tornado plot but with one variable removed, or based on a different

output statistic.

If there are two or more pages present, right-clicking a page’s tab will also allow you todelete the page.

Saving the report

Once you are satisfied with your report you can save it as a file independently of your modelby clicking the save button and selecting a destination folder and file name:




583

The simulation results are stored with a .vmrs (Vose ModelRisk Simulation) extension.

The simulation results file can then be reloaded without opening the simulated model laterby clicking the open button and browsing for its location.




584

Box Plots

A box (or “box and whiskers”) plot provides another visual representation of the simulation results from a

model variable.

Box plots of simulation data can be produced in ModelRisk by selecting the variable(s) in the SimulationResults window and clicking:

The ModelRisk box plot shows five percentiles.

Graphing controls




585

allows you to copy the graph as a bitmap image or metafile

to print the graph

to zoom in on part of the graph

to change axis options

to plot together the same variable for multiple simulation runs and turn the legendon/off

opens a comprehensive dialog to edit the graph




586




587




588




589




590




591




592

Cumulative Plots

The cumulative frequency plot has two forms: ascending and descending:

Cumulative plots of simulation data can be produced in ModelRisk by selecting the variable(s) in theSimulation Results window and clicking:

or

The ascending cumulative frequency plot is the most commonly used of the two and shows the probabilityof being less than or equal to the horizontal-axis value. The descending cumulative frequency plot, on theother hand, shows the probability of being greater than or equal to the horizontal-axis value.

The cumulative frequency plot is very useful for reading off the probability of exceeding any value; forexample, the probability of going over budget, failing to meet a deadline or of achieving a positive NPV(net present value).




593

Graphing controls


to print the graph






594

to switch sliders on/off and define the position of sliders






595




596




597




598




599




600

Histogram Plots

The histogram, or relative frequency, plot is the most commonly used in risk analysis. A histogram plot of

simulation data can be produced in ModelRisk by selecting the variable in the Simulation Results windowand clicking:

The plot is produced by grouping the data generated for a model’s output into a number of bars orclasses. The number of values in any class is its frequency. The frequency divided by the total number ofvalues gives an approximate probability that the output variable will lie in that class’ range. We can easily

recognise common distributions like a triangular, normal, uniform, etc, and we can see whether a variableis skewed. The figure below shows a typical plot:

Graphing controls





601

to print the graph



to change between line and bar plots







602




603




604




605




606




607

The most common mistake in interpreting a histogram is to read off the y-scale value as the probability of

the x-value occurring. In fact, the probability of any x-value, given the output is continuous (and most are),is infinitely small. If the model’s output is discrete, the histogram will show the probability of eachallowable x-value, providing the class width is less than or equal to the distance between each allowablex-value.




608

Pareto Plots

The Pareto plot combines a histogram plot and an ascending cumulative plot . A Pareto plot of simulationdata can be produced in ModelRisk by selecting the variable in the Simulation Results window andclicking:

The figure below shows a typical plot:

The left-hand vertical axis relates to the histogram plot, the right-hand vertical axis refers to thecumulative plot.

More than one output can be shown together in the same plot, in which case the histogram and

cumulative components of the same output are color coordinated.




609

It will generally be easier to read such a graph if the slider bars are switched off by clicking "Show sliders"button:




610

Graphing controls


to print the graph



to change between line and bar plots




611







612




613




614




615




616

Note: The most common mistake in interpreting a histogram is to read off the y-scale value as the

probability of the x-value occurring. In fact, the probability of any x-value, given the output is continuous(and most are), is infinitely small. If the model’s output is discrete, the histogram will show the probabilityof each allowable x-value, providing the class width is less than or equal to the distance between eachallowable x-value.




617

Scatter plots

Plotting the values for an input and output variables that were generated in the sample model sample will

give perhaps the best understanding of the effect of the input on the output value.

Scatter plots can be produced in ModelRisk by selecting any two variables in the Simulation Resultswindow and clicking:

ModelRisk offers a number of controls for scatter plots:

Changes whether actual values or cumulative percentiles are plotted for the variables. Usually correlationis better appreciated in a percentile plot (the default)

Controls the number of points to be plotted




618

Switches the horizontal and vertical axes

Switches sliders on and off. Sliders split up the graph area to allow analysis of regions of the plot,as shown below




619

Spider plots

Spider plots describe how sensitive the value of an output variable is to the input variables of the model.ModelRisk uniquely offers a much faster method of producing spider plots than competing softwareproducts that is also more technically correct.

Spider plots can be produced in ModelRisk by selecting an output variable in the Simulation Results

window and clicking:

Producing a spider plot requires making the following choices:

1. Select the output of interest.

2. Select the statistic of interest by clicking on one of the options in menu for the conditional mean,conditional standard deviation, conditional coefficient of variation or conditional percentilerespectively.




620

3. If Percentile has been selected, use "Spider options" panel to define the required percentile.

Also there you can select the number of tranches to be used. The number of tranches define thenumber of points that will be plotted for each input variable. In the graph above this is 10, forexample.

Interpretation

In the plot above, an analysis has been performed of the sensitivity of the mean of the ‘Total Revenue’output. 10 tranches have been used. This means that an analysis has been performed by splitting up

simulation data from input distributions into ten groups in terms of their cumulative probability: 0%-10%,10%-20%, 20%-30%, …, 90%-100%.

The simulation data are filtered for each of these groups to find the corresponding output values thatoccurred when the input variable being analyzed lies within each percentile band listed above. Thestatistic of interest (the mean in the example above) is then calculated for the filtered data. Repeating this

analysis across each tranche for each selected input variable produces the spider plot.

In the plot above, the horizontal dashed line shows the mean of the unfiltered output values as areference (in this case about 1880). The vertical range that an input line covers reflects the degree ofsensitivity that output statistic has to this input value. So, for example, when Task 5 lies in its 0%-10%range, the Total Revenue mean is approximately 1180, and when Task 5 lies in its 90%-100% range, theTotal Revenue mean is approximately 2840 – a range of 1660. Reviewing the graph, one can easily seethat the output mean is least sensitive to Task 1.

We could also have selected the conditional percentile by clicking and selecting the 90th

percentile from "Spider options" panel:

The spider plot would then have shown the 90th percentile values for the output after conditioning oneach input lying within each tranche. So, for example, with 10 tranches each line would describe how the90th percentile of the output would look the input corresponding to the line in the graph were to lie in the0-10%, 10-20%, …, 90-100% sections of its distribution. That would tell us how sensitive the output right

tail is to the various inputs: the flatter the line, the less sensitive it is.

Why use spider plots?




622

Time series plots

A spreadsheet model will often have one or more arrays of cells that are modeling the random or

uncertain nature of a variable over time, for example:

• Exchange rates

• Sales market

• Volume of imports

• Price of commodities

• Growth of a bacterial population

• Oil price

ModelRisk provides the ability to produce time series plots of input or output arrays within your model.This involves naming a group of cells as a collective input or output – click here for more details on how.

Time series plots can be produced in ModelRisk by selecting a time series variable in the SimulationResults window and clicking:

In the graph above, the variable ‘Sales Volume’ is plotted from 2010 to 2015. The red central line showsthe mean for each period. The light blue region shows the 25%-75% range, whilst the dark blue regionshows the 1%-99% range. The plotted percentiles can be changed in "Percentiles" panel.




623

Time series plots allow the user to quickly appreciate the behavior of the trend and spread in the variableover time, as well as more subtle notions of cyclical or periodic behavior.

Note: Competing Monte Carlo software usually offer the option of plotting bounds around the mean interms of standard deviations. This option is not available in ModelRisk because a spread of say 1standard deviation around the mean will encompass a varying percentage of the distribution dependingon its form. That means that there is no consistent probability interpretation attached to mean +/- x

standard deviations, and such graphs are often misinterpreted. If you feel that your variables are roughlynormally, the following percentile ranges will give standard deviation spreads:

Standard deviations Low percentile High percentile ranges will give standard deviation spreads:




624

Tornado plots

Tornado plots describe how sensitive the value of an output variable is to the input variables of the model.

Tornado plots can be produced in ModelRisk by selecting an output variable in the Simulation Resultswindow and clicking:

Producing a tornado plot requires making the following selections:

1. Select the output of interest from the list of the left of the Results




625

2. Select the statistic of interest by clicking on one of the following options in mode menu:

Rank correlation – is the most common option. It calculates the rank correlation between the set of values

generated for the output and each input in turn. It is a crude form of sensitivity analysis, popular becauseof its simplicity, but mostly useful for identifying key variables that should be analyzed in more detail. Thescale runs from -1 (completely negatively linearly correlated) through 0 (no linear correlation), to 1(completely positively linearly correlated). This means that it is difficult to evaluate the effect of correlation

on the output in terms that are most familiar to the decision maker (e.g. dollars).

Proportional contribution to spread – is another common option. It is derived from the rank order

correlation and attempts to assess what fraction of the total uncertainty is due to each input variable. Itworks well when the input variables are uncorrelated and reasonably linearly related to the output (e.g.costs, sales volumes and prices for a financial analysis, or task durations for a project schedule analysis)

but can break down when these assumptions are strongly violated.

Contribution to variance – is similar to the Proportional contribution to spread option, except that itrescales the analysis to give an approximation of the amount of the output’s spread that is contributed byeach input. The analysis is therefore subject to the same assumptions as the first two options. It is

commonly termed ‘Contribution to variance’ because it is based on the fact that the variance of the sum ofvariables can be determined from variance of each independent variable and their correlation structure.

However, once each variance contribution has been estimated it is converted to standard deviation (i.e.one takes the square root) to provide values that are meaningful to the decision maker.

Conditional mean – determines the mean of the output for the lowest and highest tranches of the inputvariable, and uses these to define the ends of the tornado bar. This is one of the most useful plots toproduce because it allows the user to see how the output mean is sensitive to each input variable, and it

uses values that are meaningful to the decision-maker.

Conditional standard deviation – determines the standard deviation of the output for the lowest andhighest tranches of the input variable, and uses these to define the ends of the tornado bar. This plot is

only useful in special circumstances where one is studying an output variable’s spread only.

Conditional coefficient of variation – determines the coefficient of variation of the output for the lowest andhighest tranches of the input variable. This plot is only useful in special circumstances where one isstudying an output variable’s coefficient of variation (standard deviation divided by mean) only.

Conditional cumulative percentile – determines the specified percentile of the output for the lowest andhighest tranches of the input variable, and uses these to define the ends of the tornado bar. This is alsoone of the most useful plots to produce because it allows the user to see how the output tails aresensitive to each input variable, and it uses values that are meaningful to the decision-maker.

3. Other options

If the Conditional Cumulative Percentile statistic has been selected, you also can the percentile to be

analyzed (a value >0 and <100) in the Tornado options panel:




626

If the statistical analysis requires tranches, select the number of tranches to be used in the "Tornado

options" panel . The number of tranches define the number of points that will be tested each inputvariable, similar to the spider plot. The graph then plots from the lowest to the highest values to createeach tornado bar.

Tranches are used to organise the simulation data into equal groups for a specific input variable. Forexample, if 20 tranches are specified, ModelRisk divides the simulation data into 20 equal groups thatcorrespond to the 0-5%, 5-10%, …, 95-100% ranked data for an individual input. It then determines the

output statistic (like the mean) for each of these sub-sets of the simulation data, and plots the minimumand maximum of the output statistic across all these sub-sets in a tornado chart. This shows how muchthe output statistic can vary depending on what the value of the input variable might be. As a general rule,the more samples of the model you run in a simulation the greater the number of tranches you can useand the more precise the tornado chart will become.

The Number of Bars option controls the number of input variables that are plotted. ‘Auto’ will include all

variables that have a statistically significant relationship to the output. Replacing this with, for example, avalue of 5 will show only the 5 most significant input variables, or fewer if there are less than five that aresignificant. Additional variables can be added to the chart, or removed, by manually checking the boxes in

the Inputs list:

Note that if you run multiple simulations ModelRisk will by default produce charts relating inputs andoutput from the same simulation run. It is possible, however, to manually select inputs from othersimulation runs if such an analysis is needed, though this would be highly unusual.



627

ModelTree

ModelTree

ModelTree by Vose Software is a professional quality decision tree add-in to Microsoft Excel, and an idealcompliment to ModelRisk when faced with complex sets of decisions. Details are available here.

What is a decision tree?

A decision tree is a decision support tool. It illustrates decisions and their possible consequences,including chance event outcomes, using a tree-like graph. Decision trees are commonly used to helpidentify the strategy that is most likely to achieve a desired goal, or offers the greatest benefit.

Why is a decision tree useful?

1. It is a visual representation of a decision problem, helping communicate the issue andthink about different decision options and outcomes.

2. It shows all the factors that are considered relevant to the decision problem

3. By comparing the different sub-trees spreading to the right of decision or chance nodes itshows the nature of the problem’s structure evolves with each decision or random event.

What does a decision tree look like?

A decision tree is read from left to right in either temporal or logical order as the problem dictates. It

consists of 3 types of nodes connected by branches (the black lines):




628

Decision nodes.Thebranches coming out from the right side of a decision node represent

thedifferent available choices. The choices are mutually exclusive (you can’tpick more than one at thesame time). Decision trees always start witha decision node on the far left.

Chance nodes. The branchescoming out from the right side of a chance node represent the

differentpossible outcomes that might occur due to randomness or uncertainty. Theoutcomes aremutually exclusive and exhaustive (one of them must occur).

End nodes. These indicatethe end of each possible pathway.

Whatdata are needed to create a decision tree?

The user buildsa decision tree structure of decision and chance nodes and their branchesandthen enters the following information in the appropriate Excel cells:

Fora decision node

• Thenumber of different possible decision choices. In the example above,there are

three;

• Adescription of each decision. In the example above, these

arePremium,Standard , andNoadvertising;

• Therevenue (positive) or cost (negative) that are associated with thedifferent

decisions. In the example above, they are -$100, -$60 and$0

Branches can beadded to a decision node by clicking the ‘Add’ icon in the ModelTreeribbon.Selecting the end node of a branch and clicking the ‘Remove’ icon willremove the relevant decisionbranch:

For aprobability node



ModelTree

629

• Thenumber of different possible random outcomes. In the example above,there

are three;

• Adescription of each outcome. In the example above, theseareHighsales,Medium sales,andLow sales;

• Theprobability of each possible pathway (ModelTree checks thattheseprobabilities sum to 1). In the example above, they are20%,50%and30%;

• Therevenue (positive) or cost (negative) that are associated with thedifferent

outcomes. In the example above, they are $7, $3 and $1;

Branches can beadded to a chance node by clicking the ‘Add’ icon in the ModelTreeribbon.Selecting the end node of a branch and clicking the ‘Remove’ icon willremove the relevant chance

branch:

Howdoes ModelTree select the best decision option?

ModelTree worksbackwards from the right side of the tree. It calculates theexpectedvalueat eachchance node. The expected value (or mean) is the sumof all possible outcomes multiplied by their

probabilities. So, for example,for this chance node:




630

the expected valueis:

At each decisionnode, the expected value is calculated to incorporate the cost orrevenueassociated with implementing that decision. For example:

Here the expectedvalue of the chance node is:

The cost of theStandard advertising is $60 and the expected value of this option is therefore:

These figures areshown below the decision option label.

At eachdecisionnode, ModelTree selects the decision option that maximizes the

expectedmonetary value. The decision node is then evaluated as having the expectedmonetary value ofthe selected option, i.e. it assumes that this optionis the one chosen.

In this way ModelTreeprogressively works backwards through the tree to finally evaluatetheoptimal option to take for the initial decision that started the tree.

Finally, ModelTreeadds labels to decision nodes to indicate whether the option is optimal(TRUE)or not (FALSE), also presents the expected monetary value calculatingbackwards to each chance anddecision node, and gives the expected monetaryvalue and probability of occurrence at each end node:



ModelTree

631

Thisexample tree shows that the optimal decision isNoadvertising (it has the TRUE label) with an

expected monetary valueof $48 compared to Premium advertising (-$16) and Standard advertising($6).



633

Example models

Example models explaining risk analysis techniques

Sum of a random number of random variables

In most situations, we knew precisely the number of random variables we had to add together. However,a problem frequently arises where the number of random variables being summed up is itself a randomvariable. Some examples are:

• The total purchases by the number of customers N that might enter a shop next weekwhere we know the probability distribution of the purchase amount from a random customer.

• The amount of lake water that might be drunk by campsite visitors N this summer where

we know the probability distribution of the amount of lake water drunk by a random camper, andthe resultant number of giardia cysts that might be consumed, where we know the concentration

of giardia cysts in the lake water.

• The cost of insurance claims to an insurer where it knows the expected number of claims

it will receive in a period, and knows the probability distribution of the size of a random claim.

ModelRisk has many functions especially for handling the distribution of the sum of random variables.See Aggregate modeling in ModelRisk.

An in-depth explanation about summing random variables ('aggregate modeling'), including many moreexample models and advanced techniques can be found in the Aggregate distributions section.

Example 1

A company insures aeroplanes. They crash at a rate of 0.23 crashes per month. Each crash costs$Lognormal(120,52) million.

Question : What is the distribution of the value of the liability if we discount it at the risk free rate of 5%?

This requires that we know the time at which each accident occurred, using Exponential distributions. The

solution is shown in the example model plane_crashes2.

Example 2

For extremely large numbers of random variables, we can use the CLT identity. For example, suppose wethink that there will be Poisson(270000) potential customers passing by the front of a store, and that thereis a 3% probability that any one of them will enter the store. Assuming each passer-by makes their

decision to enter independently of any other passer-by, the number of people entering the store in a yearwill be Poisson(270000*3%). If there is a 10% probability that a customer in the store purchases andagain we assume that the make the decision to buy independently of others, the number of purchaserswill be Poisson(270000*3%*10%) = Poisson(810). Let's also suppose that we have empirical data on pastpurchase sizes that can be summarized in the following histogram plot:




634

A plot of the Poisson(810) distribution shows that the number of purchasers will in all probability be aboveabout 720.

Since the distribution of purchase size by customer is not too skewed, and the number we are addingtogether large, we can use Central Limit Theorem. The mean and standard deviation of the histogram plotare $12.71 and $7.27 respectively, so a model of the total sales receipts for the year can be built as

shown in the example model Sales_at_the_store.

The CLT limit distribution of a sum of random variables is implemented in ModelRisk with theVoseCLTSum function.



Example models

635

Financial risk analysis

Basel II - Credit risk

Background

To protect depositors and the financial system overall, the 1998 Capital Accord ('Basel I') placed

restrictions on the exposure a bank could have in relation to its capital (see Capital_required.xls for asimple illustration of how to calculate capital requirements for a non-financial firm). In other words, itrestricted how much a bank could lend in total with the goal to decrease the probability that, in an extremedownturn of the economy, depositors would lose their money and (since banks often lend to other banks)the banking system would collapse (i.e. systematic risk).

Basel II comprises three mutually reinforcing pillars:

• Pillar 1: The Minimum Capital Requirements (the part we will focus on);

• Pillar 2: The Supervisory review - about the dialogue between banks and their

supervisors;

• Pillar 3: About the disclosure requirements.

Pillar 1 says that the Capital Ratio, defined as below, should be no less than 8%:

Capital Ratio = Capital a bank has available / Risk-weighted assets ≥ 8%

Because the 1998 Capital Accord took a relatively unsophisticated view of the risk-weighted assets, theBasel Committee developed a more sophisticated risk sensitive framework, called Basel II. In Basel II therisk-weighted assets will explicitly include three types of risk:

1. Credit Risk (new treatment under Basel II)

2. Market Risk (in 1996, an amendment was made to the treatment of market risk)

3. Operational Risk (newly introduced in Basel II)

In this section, we will focus on Credit Risk. Basel II gives banks the freedom to choose from three distinctoptions for the calculation of credit risk and three others for operational risk. For credit risk, they are:

1. The Standardised Approach;

2. The Foundation Internal Ratings Based (IRB) approach;

3. The Advanced IRB approach.

The Standardised Approach makes use of external credit assessments to determine the weightings andto calculate the total risk-weighted assets. In this section, we will focus on the Internal Rating Based

approaches (second and third method) of the credit risk approach, since they include an internal riskassessment of the company. The primary inputs to the risk-weighted asset calculations, are:

1. Probability of Default (PD) - measures the likelihood that the borrower will default over agiven time horizon;

2. Loss Given Default (LGD) - measures the proportion of the exposure that will be lost if adefault occurs;

3. Exposure At Default (EAD) - measures the amount of the facility that is likely to be drawnif a default occurs.




636

The EAD depends on the insurance and hedging activities of the bank (they will be left out of this

example; see Integrated_Risk_Management.xls for an example of this). Banks will have to categorizetheir risk assets into risk classes, and for each class estimate the probability of default (PD) and theexpected loss given default (LGD).

Relevance

In this example, which is based on the BIS working paper of Altman et al. (2002), we look at a veryimportant assumption about credit risk, i.e. the relationship between the PD and the LGD. In other words,if macro-economic factors increase the PD (e.g. during a recession), does the LGD stay the same, go upor go down. It is often thought that if the PD goes up, the LGD will go up too. Most credit models currentlyused assume no relationship between the two variables. In this example, we will examine the effect of thisassumption on estimates of credit risk models, such as expected (average) losses and VaR - 99%.

Situation

You are working for a bank that has a portfolio of 250 loans (see graph below), ranging from $1,000 to$15,000 and belonging to seven different rating grades with long-term (historic) probability of default (PD)levels ranging from 0.5% to 5%.

The short term PD is, however, influence by a macro-economic factor, x 1, that is equal to all loans (withweight w1 equal to 50%) and an idiosyncratic (random) factor x 2, unique for every loan (with w 2 equal to50%), such that:

PDshort = PD long * (w1x1 + w2x2)

The two weights w1 and w2 always have to add up to 100%, i.e. w 2 = 1 - w1. Both factors x 1 and x 2 aremodelled as Exponential (1) distributions [same as Gamma (1, 1) distributions, see section Gammadistribution] that have a mean of one. An Exponential distribution was assumed since it is highly skewedto the right, representing the situation that default probabilities (PD's) are low most of the time butsometimes, during rare/extreme situations, can increase dramatically.

Three scenarios

Scenario 1. Assume that the LGD is deterministic; 30% for all borrowers

Scenario 2. Assume the LGD is stochastic but uncorrelated with the probability of default PD. Use a Beta

(9, 21), which results in a mean LGD of 30% (see section on Beta Distribution)

Scenario 3. Assume there is a perfect rank order correlation (see Rank Order Correlation) between the

macro-economic background factor, x 1, and the LGD.

Question



Example models

637

What are the losses and their distribution parameters under the three different scenarios?

Results

The solution to this example is provided in the following spreadsheet - Basel_II.xls

The resulting distributions of the losses of the portfolio are shown in the graph and figure below. Althoughthere is no real difference between scenario 1 and 2, the expected losses and the unexpected losses

(VaR) under scenario 3 are considerably higher.

Table 1. Main results under the three scenarios

LGD modelled according to approach

Scenario 1 Scenario 2 Scenario 3 % error

Expected losses 13603 13617 16566 21.8%

95% VaR 31376 31706 51573 64.4%

99% VaR 43756 44998 86858 98.5%

99.5% VaR 49053 50085 101242 106.4%

99.9% VaR 63762 64044 148833 133.4%

1 computed as [(scenario 1 - scenario 2)/scenario 1]

Conclusion

This relatively simple exercise illustrates that the relationship between the PD and the LGD is veryimportant to estimate credit risk. If in reality PD and LGD are both driven by some common (e.g. macroeconomic) forces and therefore are correlated, not only the expected but also the unexpected losses(VaR) in most portfolio credit risk models, will have been seriously underestimated if the correlation isignored.




638

Determining the NPV of a capital investment

This is a typical discounted cash flow problem. To illustrate the importance of including uncertainty in anNPV calculation, we will discuss a simple model. Due to its simplicity, it cannot reflect the reality, butinstead can provide with a set of useful techniques one could apply when modelling real data.

Let's consider the following simple problem: You are evaluating a new company making fuel cells forhospital power plants. Currently there are no competitors. The figure below shows the NPV calculation for

the project's 10-year life. This calculation is static, and no uncertainty is included; all input values areconsidered as most likely.



Example models

639

The NPV, discounted at 10% shows a negative figure of -$37,134.

Let's see how different uncertainties can affect our NPV result. The list of uncertainties is shown below:

(a, b, c) notation means a distribution with min = a, most likely = b, and max = c.

1. Product development cost have been estimated by F Gibbons to be (70000, 80000, 120000)spread over 2004 to 2006 in the ratio 5:2:1. However P Gumbel estimates the product development

costs are (70000, 100000, 140000) in the same ratio over the same period. Include theseuncertainties in the model. Capital expenses and overheads are assumed to be well defined and arenot subject to change.

2. Tax rate is fixed at 46% unless the Conservatives get in at the next election in 2007 (20%chance) when the rate would drop to (32%, 35%, 46%). Include this extra uncertainty in the model.

3. Market volume is expected to grow each year by (10%, 20%, 40%) beginning in 2006 at (2500,3000, 5000) up to a maximum of 20,000 units. The cost per unit in 2006 is estimated at (22.75,23.25, 24.5) and the sales price per unit is estimated at (45, 58, 65). Both the cost and sales priceper unit are subject to inflation from 2006 at a rate starting at (3%, 4%, 6%) and varying yearly in asimilar fashion to historic rates.

4. You expect one competitor to emerge as soon as the market volume reaches 3,500 units in theprevious year. A second would appear at 8,500 units. Your competitors' shares of the market wouldgrow linearly until you all have equal market share after three years.

The solution to the model is provided in the following spreadsheet: NPV of a capital investment (clickon the tab called "Solution")

There are three points in the model that need special attention:

1. Cell C39 uses a VoseDuniform function to model expert's opinion and return values from"Gibbons" and "Gumbel" with equal probabilities. Since we assumed both experts have equalweights, we assign 50% to each of them. If one of them was more experienced or trusted we wouldhave assign different weights to their opinions and used a VoseDiscrete function instead.

2. A common mistake here is to multiply their opinions by the weights and then take the sum : y =Gibbons*50%+Gumbel*50%. The reason why this is wrong is that the outcome value will always take




640

the value somewhere in the middle and use it for further calculations. This will result in the decreaseof the spread in the final outcome and underestimation of the risk arising from that particular riskfactor. The correct way to model this variable is therefore: y = VoseDuniform (Gibbons, Gumbel), asis explained in detail in the section about Incorporating Differences in Expert Opinion. Note that wecould also use the Vosecombined function for directly constructing the distributions of the combined

expert opinions.

3. A VoseCumulA function was constructed in the table (cells N9:P32), which was then used in themain table to model the inflation for the last 6 years.

If we run a simulation and graph the output cell, we will get the following distribution of NPV:

As we can see from the chart above, there is only 17% probability that the project will have a negative

NPV.

NPV calculations performed in a risk analysis spreadsheet model are usually presented as a distributionof NPVs because the cashflows selected in the NPV calculations are their distributions rather than theirexpected values. Theoretically, this is however incorrect. Since a NPV is the net present value, it canhave no uncertainty. The NPV is the amount of money that the company values the project at today. The

problem is that we have double counted the risk of the project by first discounting at the risk-adjusteddiscounted rate r and then showing the NPV as a distribution (i.e. it is uncertain).

Two theoretically correct methods for calculating an NPV in risk analysis are discussed below, along witha more practical, but strictly speaking incorrect, alternative:

• Theoretical approach 1: Discount the cashflow distributions at the risk free rate r f .

This produces a distribution of NPVs at r f and ensures that the risk is not double-counted.However, such a distribution is not at all easy to interpret since decision-makers will almostcertainly never have dealt with risk free rate NPVs and therefore have nothing to compare themodel output against.

• Theoretical approach 2: Discount the expected value of the project at the risk-adjusted

discount rate.This approach results in a single figure for the NPV of the project. A risk analysis is run todetermine the expected value and spread of the cashflows in each period. The discount rate is



Example models

641

usually determined by comparing the riskiness associated with the project's cashflows against theriskiness of other projects in the company's portfolio. The company can then assign a discountrate above or below its usual discount rate depending on whether the project beinganalyzedexhibits more or less risk than the average. Some companies determine a range ofdiscount rates (three or so) to be used against projects of different riskiness.

The major problems of this method are that it assumes the cashflow distributions are symmetric and that

no correlation exists between cashflows. We have seen that distributions of costs and returns very oftenexhibit some form of asymmetry. In addition, in a typical investment project, there is also almost alwayssome form of correlation between cashflow periods: for example, sales in one period will be affected byprevious sales, a capital injection in one period often means that it doesn't occur in the next one (e.g.

expansion of a factory) or the model may include a time series forecast of prices, production rates orsales volume that are autocorrelated. If there is a strong positive correlation between cashflows, thismethod will overestimate the NPV. Conversely, a strong negative correlation between cashflows willresult in the NPV being underestimated. The correlation between cashflows may take any number of,sometimes complex, forms.

• The practical approach: The above two theoretical approaches are difficult to apply or

interpret and beg an alternative. In practice, it is easier to apply the risk-adjusted discount rate r to the cashflow distributions to produce a distribution of NPVs. This method incorporatescorrelation between distributions automatically and enables the decision-maker to comparedirectly with past NPV analyses.

As it has been already explained above, the problem associated with this technique is that it will doublecount the risk: firstly in the discount rate and then by representing the NPV as a distribution. However, ifone is aware of this shortfall, the result is very useful in determining the probability of achieving therequired discount rate (i.e. the probability of a positive NPV). The actual NPV to quote in a report wouldbe the expected value of the NPV distribution, which in our case equals $ 65,776.00




642

Growth in a market over time

This section includes two different ways of modelling the development of the market. The first exampleprovides a simplified NPV calculation of the sales of widgets, where most of the uncertainty arises fromthe market trends. The second example produces a sales projection for a product that is in the marketwith a finite horizon.

Example 1

The finance director of the UK company you work for has asked you to determine an NPV for 10 years ofcashflows from opening a new store in Times Square, New York (assuming no residual value, e.g. alease end). The discount rate (your WACC - the weighted average cost of capital) is assumed to be 8.5%,and as a UK company you have to convert back the dollar profits to sterling. You can make profits fromboth selling your own brand as well as from selling other, proprietary brands.

Sales Volume

Management expect that eventually you are going to sell annually between 650000 and 1,090,000widgets, but most likely 800,000 widgets. This can be modelled with a PERT(650 000, 800 000, 1 090000). The initial total number of widgets you sell is assumed to be a percentage of this, depending on themoney spent on the product launch (this is a decision variable, see below). If the management decides tospend the 'normal amount' for the product launch, they believe that the first year's sales will be Pert(35%,

40%, 50%) of the eventual annual sales. The sales after the first year are expected to grow roughlyaccording to the following equation:

where i is the year from project start, δ i is the fraction remaining that is achieved in year i and λ = Pert(0.8, 1.2, 1.9).

All units (own brand and proprietary brands) will be sold at a US$19.22 retail price, which is increasing

with the rate of inflation.

Own brand/proprietary mix

Initially, our own brand products are expected to have a (25%, 28% 35%) share of all sales, but this shareis expected to rise to (45%, 48%, 55%) by year 5 and this rise is assumed to be roughly linear. The

margins of the proprietary sales are 35.4% of the sales price, while the margin for own brand is 47.3%.

Cost of product

The cost per unit for proprietary product is fixed at US$14.01, irrespective of volume. However, the costper unit for own brand product is a function of volume, and an expert has estimated the followingrelationship:

Own brand cost price (GBP)

Sales min most likely max

200 7.51 7.71 8.20

500 6.35 6.57 6.94

800 5.40 5.59 5.90

1100 4.59 4.74 5.02



Example models

643

Capex

The initial shop fit and launch are expected to cost US$(43.2, 43.3, 43.45) million.

Fixed costs

The fixed costs of this project are estimated to be US$2.15 million per annum.

Inflation

You have asked three experts for their opinion on the inflation rate for the next ten years. The threeexperts believe that the inflation rate in the UK will increase roughly linearly, but have varying opinions onthe degree of increase per year:

Expert A: PERT(0.7%, 0.9%, 1.0%)

Expert B: PERT(0.1%, 0.4%, 0.6%)

Expert C: PERT(-0.2%, 0.4%, 1.2%)

The inflation in the UK this year is 3.3%.

Inflation rate in the US is roughly Normal(0.5%, 0.03%) lower annually than in the UK.

You will increase store prices by inflation.

Exchange rate

The US$:GBP exchange rate is currently 0.62. Assuming that the pricing power parity (PPP) holds, theUS$:GBP exchange rate can be estimated with the following equation:

Xrate (US$/GBP) t = Xrate (US$/GBP) t-1 * (Inflation rate UK t-1/Inflation rate US t-1)

In addition, the exchange rate is expected to change Normal(0,3) % of itself each year.

Decision option

You could spend an extra US$6 million on the launch (Superlaunch), in which case the starting salesvolume is estimated to be (62%, 65%, 69%) of estimated ceiling, and then growing with the sameequation as above up to this ceiling. Evaluate the two options (planned launch or superlaunch), plottingthe NPV distributions together on the same graph. Which option, if either, should the company take?

What are the expected NPVs, and the probabilities of each achieving a positive NPV?

Discussion

The example model Market growth model provides a solution to the model.

The figure below shows the outcome distribution for two scenarios:



Example models

645

Integrated Risk Management

Increasingly, firms are finding that the simultaneous use of tools and techniques from insurance and the

finance can greatly enhance the value of their risk management efforts. A number of books have beenwritten about the subject of integrated risk management, including Doherty's 'Integrated RiskManagement' (2000, McGraw-Hill).

Example 1

You are CRO (Chief Risk Officer) of a non-financial firm that is exposed to two types of risk:

1. Price and volume risk (market risk) - depending on average temperature during the year.

2. Risk for accidents (operational risk)

Currently, your company does not hedge for the market risk, nor has insurance against the operationalrisk. You are asked to evaluate the following options:

1. Do nothing;

2. Hedge against all the price risk;

3. Take insurance against all accidents;

4. Combine both.

As the goal of this example is to illustrate some of the methods used in integrated risk management andtheir value added, the examples are kept fairly simple. However, even though a real world example would

be more complicated and likely involve more factors, the same techniques, methods and tools wouldapply.

Input

Weather Market risk Fire risk

Averagetemperature

Averagesales

Averageprice/unit

Averagecost/unit

Averageper year

Averageloss perevent

Warm year 25 125000 $50 $20 12.0 $200,000

Cool year 15 125000 $33 $20 2.4 $100,000

Sales Volume

The expected sales volume is 125,000 with a standard deviation of 10%, which we can model aslognormal(125000, 12500).

Market risk - Price and Cost per unit of product

Both sales price and costs per unit of product depend on the average temperature of the year, which is

assumed to be minimum 15, most likely 20 and maximum 25 Celsius. The related sales price and costsper unit are shown in the figure above. A linear relationship is assumed between the average temperatureduring the year and the sales price and costs per unit of product (in reality, there would be uncertainty

about this relationship which, for simplicity reasons, we ignore here).

Fire risk




646

Recent independent research has revealed that the expected number of fires occurring per year is anincreasing function of the average temperature for that year. In addition, the losses per fire (event)increase with the average temperature during the year. Again, a linear relationship is assumed betweenthe average temperature during the year and the number and size of the fires.

Decision option

You are asked to evaluate whether the company should hedge against the price risk, obtain insuranceagainst the fire risk or do both. Your bank has quoted a price of $25,000 to hedge against the price risk.In addition, you can assume that the yearly cost of insurance is equal to the expected losses per year,and that the coverage is 90% of you losses. .

Discussion

Example model Integrated Risk Management provides a solution to the problem.

There are several issues in this model that require special attention:

• The expected number of fires per year (lambda) is a rate. The actual number of fires inany one year can therefore be modelled with a Poisson distribution, with lambda equal to the

expected number of fires given a certain average temperature over the year.

• From the input data we can see that both risks have actually some correlated effect. In

other words, with high temperatures we have bigger margins on our products, but we also haveon average more fires that are also on average larger.

• In this example, we assume that the risk premium is equal to the expected losses (i.e. theinsurance company makes an expected 10% profit, as they only pay out 90% of the losses). Todo this, a simulation is run on Cell C31 which calculates the total cost of fires; the mean value isthen placed in Cell F16 and the model run again.

The figure below shows the outcome distribution for all four scenarios (do nothing, get insurance, hedgeor do both).

This figure shows that hedging against the price risk by itself actually increases the uncertainty (width ofthe distribution) of next year's net profits!! Insurance only does reduce the uncertainty of the net profits ofnext year only in the totally left of the distribution. If the company does not take any insurance nor

hedges, the negative (indirect) correlation between the market and operational risk already levels out a lotof the risks, as shown by the distribution when neither insurance nor hedging is purchased. The

combination of insurance and hedging provide us with a very narrow distribution of profits, but will cost usquite some money (expected profits of this scenario are about $1,350,000 lower).



Example models

647

From the Figure above it seems that insurance against the larger losses only would provide the companywith about the same uncertainty distribution but for much lower costs. Therefore, in addition to the abovefour options, a strategy was simulated in which the company obtained a fire insurance with a $1,000,000deductible. The results of that scenario are shown in the figure below.

How do you think the results would look if the company obtained an insurance policy with a $1,500,000deductible?

In conclusion, this very simplified example shows that it is important to consider all risks related to acompany in an integrated way. Ignoring the relationships between risks can result in making wrongdecisions as shown above with the hedging strategy. Finally, there are addition ways to determine theoptimal insurance and hedging scenario, but these go beyond the scope of this illustrative example.

Example 2

Required capital

Why does a corporation actually need capital? The answer is all to do with risk! The required capital for afirm is the sum of three components:

1. Operational capital

2. Risk capital

3. Signalling capital

First, a certain amount of capital the firm will need in every future scenario; this is the 'operational capital '.

The second type of capital is to cover the financial consequences of risk due to all the corporate activities.This capital is the 'risk capital' and its size depends on the risk tolerance of the firm. Risk capital can bedefined as that capital needed to keep the firm's probability of ruin below some defined level (e.g. 1%).The sum of the operational and risk capital is called 'economic capital '. The third and final form of capitalis called 'signalling capital ', and the purpose of signalling capital is to satisfy outsiders such as investors,suppliers, regulators, rating agents and analysts with the adequacy of the firm's capital. In other words, it

assures outsiders that the firm is indeed as strong as the managers know it to be.

Example

In this example, we will determine the required capital of a firm by simulation. The example firm has twomain risk, exchange risk and risk of liability suits. By simulating the capital requirements for many

scenarios, we can estimate the distribution of the capital requirements and subsequently the economic

capital.




648

Secondly, we will determine what the transfer of the two main risk (by insurance and hedging) means forthe economic capital of the firm. We will show that insurance and hedging can in fact be seen as a form of'off-balance-sheet capital '.

Firm

Our firm of interest is SlakerBrewery, an American beer brewery that exclusively brews beer for the UKmarket. It has a contract for the next year of 1 million cases for 10 pounds per case. Its capital to producethe beer is expected to be minimal 10% of sales, most likely 12% and maximum 15%.

Risks

In addition, SlakerBrewery is exposed to two main types of risk; exchange risk and liability risk. Thecurrent exchange rate is 1.6 pounds per dollar, and has a volatility of 10%. On average, the companyexpects one law-suite per two years for an amount of minimal $1000, most likely $10,000 and maximal$10 million.

Solution

The graph below shows for 10,000 scenarios the amount of capital it needs to stay in business. It showthat the minimum capital required is $1,237,000; this is equal to the operational capital. In 1% of the

scenarios, the required capital is more than $5,726,000, which means that that is the economic capital ofthe firm. Of the economic capital, $1,237.000 is operational capital and $5,726,000 - $1,236,707 =$4,489,000 is risk capital. Finally, considering the long tail at the right of the 99% percentileSlakerBrewery decides to keep another $1,000,000 signalling capital. In total, the amount of requiredcapital is therefore $9,758,299.

After doing this analysis, the management of SlakerBrewery asks you to do another analysis in whichthey would take an insurance policy against the liability claims and they would hedge the exchange risk.The premium of the insurance policy was set equal to the expected losses + 10% (the insurer's profit).The resulting distribution of the firm's required capital is shown below. The minimum capital required nowis $1,790,000; this is equal to the operational capital. In 1% of the scenarios, the required capital is more

than $2,466,000, which means that is the economic capital of the firm. Of the economic capital,$1,790,000 is operational capital and $2,466,000 - $1,790,000 = $676,000 is risk capital. Finally,considering the 'smaller tail' at the right of the 99% percentile SlakerBrewery decides to keep only$100,000 signalling capital. In total, the amount of required capital is therefore $2,566,000.



Example models

649

This simply example illustrates the insurance and hedging basically provided 'off-balance-sheet financing'to the firm. Simulation of the firm provided a useful way to determine the capital requirements.

Techniques are available to find the optimal financing strategy for a firm, taking into account both paid-upcapital (capital that appears on the balance sheet; retain the risk) and off-balance sheet capital (such asinsurance; transfer the risk). As these techniques fall beyond the scope of the current example, we refer

to e.g. Culp (2002) for overviews of these techniques. For all of these techniques, risk analysis canprovide insight and subsequent support corporate decisions making!

Spreadsheet Capital required illustrates the problem.

© Vose Software™ 2007. Reference Number: M-M0235-A




650

Modelling a retirement plan

How comfortable are you about your retirement? How much money will you actually need to safe forretirement and how do you know what you will receive once you retire? When it comes to retirement,proper planning is important. Unfortunately, there are many uncertainties associated with planning for the(long-term) future, including uncertainty about one's future earnings, the returns on the retirement fund

and even legal or political changes. In this topic, we will go through a simplified example to show howsomeone can estimate the distribution of money after a certain number of years of saving for retirement.Including the uncertainties about your retirement money into a model like the one below can help youplan better for the good days to come!

Example

You are a 32 years old citizen of Country X and like to start planning your retirement. The retirement agein X is 60, but there is a 75% chance that it will be changed to 65 years. You contribute 5% of your salaryto the retirement fund each year. Your annual salary this year is Ђ 20,000, and you expect it to riseLognormal(3%,1%) per year in real terms (i.e. over inflation). You estimate that the return on the pensionfund will be minimum 3%, most likely 4% and maximum 7% (assuming a Pert-distribution).

How much is your retirement fund worth upon retirement?

The file Finally retired provides the example of this problem. As you can see in the graph below, yourtotal worth at your retirement age has a wide distribution, with a 90% confidence interval between Ђ75,000 and Ђ 116,000. The left and right peaks represent respectively the situation in which theretirement age stays 60 years and the situation in which it increased to 65 years.

Of course, many other useful and interesting uncertainties can be added to the model and plenty ofadditional questions can be asked, but we leave that up to you!



Example models

651

NPV theory

Net Present Value

An NPV calculation attempts to determine the present value of a series of cashflows from a project thatstretches out into the future. This present value is a measure of how much the company is gaining attoday's money by undertaking the project: in other words, how much more the company itself will be

worth by accepting the project.

An NPV calculation discounts future cashflows at a specified discount rate r that takes account of:

1. The time value of money (e.g. if inflation is running at 4%, Ј1.04 in a years time is onlyworth Ј1.00 today)

2. The interest that could have been earned over inflation by investing instead in aguaranteed investment

3. The extra return that is required over (1) and (2) to compensate for the degree of risk thatis being accepted in this project.

Parts (1) and (2) are combined to produce the risk free interest rate, r f . This is typically determined as theinterest paid by guaranteed fixed payment investments like government bonds with a term roughly

equivalent to the duration of the project.

The extra interest r* over r f needed for part (3) is determined by looking at the uncertainty of the project.In risk analysis models, this uncertainty is represented by the spread of the distributions of cashflow foreach period. The sum of r* and r f is called the risk-adjusted discount rate r.

The most commonly used calculation for the NPV of a cashflow series over n periods is as follows:

where C i are the expected (i.e. average) values of the cashflows in each period and r is the risk-adjusteddiscount rate.

In our experience, NPV calculations performed in a risk analysis spreadsheet model are usually

presented as a distribution of NPVs because the cashflow values selected in the NPV calculations aretheir distributions rather than their expected values. Theoretically, this is incorrect. Since an NPV is thenet present value, it can have no uncertainty. It is the amount of money that the company values theproject at today. The problem is that we have double counted our risk by first discounting at the risk-adjusted discounted rate r and then showing the NPV as a distribution (i.e. it is uncertain).

Two theoretically correct methods for calculating an NPV in risk analysis are discussed below, along witha more practical, but strictly speaking incorrect, alternative:

• Theoretical approach 1: Discount the cashflow distributions at the risk free rate

This produces a distribution of NPVs at r f and ensures that the risk is not double-counted.However, such a distribution is not at all easy to interpret since decision-makers will almostcertainly never have dealt with risk free rate NPVs and therefore have nothing to compare the

model output against.




652

• Theoretical approach 2: Discount the expected value of each cashflow at the risk-

adjusted discount rate.This is the application of the above formula. It results in a single figure for the NPV of the project.

A risk analysis is run to determine the expected value and spread of the cashflows in eachperiod. The discount rate is usually determined by comparing the riskiness associated with theproject's cashflows against the riskiness of other projects in the company's portfolio. Thecompany can then assign a discount rate above or below its usual discount rate depending on

whether the project being analyzedexhibits more or less risk than the average. Some companiesdetermine a range of discount rates (three or so) to be used against projects of different riskiness.The major problems of this method are that it assumes the cashflow distributions are symmetricand that no correlation exists between cashflows. We have seen that distributions of costs andreturns very often exhibit some form of asymmetry. In a typical investment project, there is alsoalmost always some form of correlation between cashflow periods: for example, sales in oneperiod will be affected by previous sales, a capital injection in one period often means that itdoesn't occur in the next one (e.g. expansion of a factory) or the model may include a time seriesforecast of prices, production rates or sales volume that are autocorrelated. If there is a strongpositive correlation between cashflows, this method will overestimate the NPV. Conversely, astrong negative correlation between cashflows will result in the NPV being underestimated. Thecorrelation between cashflows may take any number of, sometimes complex, forms. We are not

aware of any financial theory that provides a practical method for adjusting the NPV to takeaccount of these correlations.

The practical approach:

The above two theoretical approaches are difficult to apply or interpret and beg an alternative. In practice,it is easier to apply the risk-adjusted discount rate r to the cashflow distributions to produce a distributionof NPVs. This method incorporates correlation between distributions automatically and enables thedecision-maker to compare directly with past NPV analyses.

As we have already explained, the problem associated with this technique is that it will double count therisk: firstly in the discount rate and then by representing the NPV as a distribution. However, if one isaware of this shortfall, the result is very useful in determining the probability of achieving the required

discount rate (i.e. the probability of a positive NPV). The actual NPV to quote in a report would be theexpected value of the NPV distribution.

Internal Rate of Return

The IRR of a project is the discount rate applied to its future cashflows such that it produces a zero NPV.In other words, it is the discount rate that exactly balances the value of all costs and revenues of theproject. If the cashflows are uncertain, the IRR will also be uncertain and therefore have a distribution

associated with it.

A distribution of the possible IRRs is useful to determine the probability of achieving any specific discountrate and this can be compared with the probability other projects offer of achieving the target discountrate. It is not recommended that the distribution and associated statistics of possible IRRs be used for

comparing projects because of the properties of IRRs discussed below.

Problems in using IRR in risk analyses

Unlike the NPV calculation, there is no exact formula for calculating the IRR of a cashflow series. Instead,a first guess is usually required, from which the computer will make progressively more accurateestimates until it finds a value that produces an NPV as near to zero as required.

If the cumulative cashflow position of the project passes through zero more than once, there is more thanone valid solution to the IRR inequality. This is not normally a problem with deterministic models becausethe cumulative cashflow position can easily be monitored and the smaller of the two IRR solutionsselected. However, a risk analysis model is dynamic, making it difficult to appreciate its exact behaviour.Thus, the cumulative cashflow position may pass through zero and back in some of the risk analysis

iterations and not be spotted. This can produce quite inaccurate distributions of possible IRRs. In order toavoid this problem, it may be worth including a couple of lines in your model that calculate the cumulativecashflow position and the number of times it passes through zero. If this is selected as a model output,



Example models

653

you will be able to determine whether this is a statistically significant problem and alter the first guess tocompensate for it.

IRRs cannot be calculated for only positive or only negative cashflows. IRRs are therefore not useful forcomparing between two purely negative or positive cashflow options e.g. between hiring or buying a pieceof equipment.

It is difficult to compare distributions of IRR between two options unless the difference is very large.Stochastic dominance tests will certainly be of little direct use. This is because a percentage increase inan IRR at low returns (e.g. from 3% to 4%) is of much greater real value than a percentage increase athigh returns (e.g. from 30% to 31%). Consider the following illustration: I am offered payments of Ј20 ayear for 10 years (i.e. Ј200 total) in return for a single payment now. I am asked to pay Ј200 - obviously abad investment giving an IRR of 0%. I negotiate to drop the price and thereby produce a positive IRR.The figure above illustrates the relationship between the reduction in price I achieve and the resulting

IRR. The reduction in price I achieve is directly equivalent to the increase in the present value of theinvestment, so the graph relates real value to IRR. As the savings I make approaches Ј200, the IRRapproaches infinity. Clearly there is no straight line relationship between IRR and true value. It istherefore very difficult to compare the value of two projects in terms of the IRR distributions they offer.One project may offer a long right-hand tail that can easily increase the expected IRR but in real valueterms this could easily be outweighed by a comparatively small diminishing of the left-hand tail of theother option.




654

Real options

Standard Net Present Value (NPV) analysis, in which future cash flows are discounted to their presentvalue implicitly assumes that firms hold real assets passively. In other words, standard NPV analysis of a

firm or project does not reflect the value of management and does not work for projects that during theirlifetime have imbedded options, hereafter called real options. The reason for this is that the risk of the

imbedded real option changes continuously and therefore there is no fixed opportunity cost of capital atwhich to discount. In this section, we'll look at examples of real options in capital budgets, and how theideas behind valuing financial options (such as puts and calls) can be applied to real financial businessevaluations. While the underlying for a financial option is a security such as a share of common stock, theunderlying for a real option is a tangible asset, for example a project or a business unit.

Examples of real world options

• Option to make follow-on investments if the project succeeds

e.g. buy neighbouring land for possible factory expansion

• The option to abandon a project

e.g. buy equipment easy to sell-on or decommission

• The option to wait before investing

e.g. buy mineral rights to land where not economic to extract

• The option to vary the type of production or mix

e.g. purchase machine that can be programmed to make a variety of products

These real options allow managers to act in response to circumstances and new, addition information, thevalue of which is not captured in a traditional NPV analysis.

How do we value real options?

In their famous paper about option pricing, Cox et al (1979) presented a simple discrete-time model forvaluing options. They concluded that the price of a financial option should always be equal to theexpectation, in a risk-neutral world, of the discounted value of the payoff it will receive. However, it isimportant to note that this does not imply that the equilibrium expected rate of return on the call is the risk-

free interest rate. Their conclusion comes however from a risk-neutral, no-arbitrage, argument that gives

results equivalent to the famous Black-Scholes equation.

We can however use this conceptual model of a risk-free world, to construct a model to value a realoption as follows:

• We make a separate, parallel model to our standard NPV model:

• Use the same projections except with inflation at the risk-free rate

• Simulate the extra cashflows arising just from exercising the option

• Discount these cashflows at the risk-free rate

• Calculate the expected value of the resultant distribution



Example models

655

This expected value of the resultant distribution is equal to the value of the real option (Cox andRubinstein, 1985). The real option value is then added to the expected value of the standard NPV to getthe total project value.

Model Real option provides an example.

In this example, we start with the same situation as in the model 'NPV of a capital investment'. However,

in addition to this static NPV model, we added the option of production for large factories in California. Webelieve that fuel cells may take off in three years for these large factories. If we go ahead with ourinvestment now, and if the price ever exceeds $63, we will enter this market for no extra capital oroperating costs.

The question is now to calculate the total value of the project: the NPV value of our investment plus therevenue from the option (discounted at the risk free rate, r f , see above) to enter this new market inCalifornia.

The model shows the solution of this problem. It appears that the real options value of this expansion-option is considerable and ignoring this value would certainly underestimate the true value of this project.

Further reading

• Black F and Scholes M (1973). 'The Pricing of Options and Corporate Liabilities', J

Political Economy, 81 (May-June) 637-654.

• Brealey R A and Myers S C (2000). Principles of Corporate Finance, McGraw-Hill.

• Cox J, Ross S and Rubenstein M (1979). 'Option pricing: a Simplified Approach.' JFinancial Economics 7 229-263.

• Hull J (1997). Options, futures and other derivatives. Prentice Hall.

• Merton R C (1973). 'Theory of Rational Option Pricing', Bell J of Economics and

Management science, 4 141-183.

• Wilmott P (1998). The Theory and Practice of Financial Engineering, John Wiley andSons.




656

Variation of sales over time

In most discounted cashflow models of capital investment projects we will have a number of time seriesthat we wish to project over the life of the project. As risk analysts, we want to include any uncertaintyabout those forecasts, of course. We would also like to include any interactions between these forecastvariables: for example, that if the exchange rate with the currency a client purchase in goes up, the client

can afford more of your product.

In this guide, we have developed a number of time series forecasts to give you some ideas of how toproduce a risked forecast model of sales volumes. The models cover a range of situations you might findyourself in:

Selling into a finite demand for a product

There is a maximum possible number of sales that you could make over the entire life of the project.

Model Sales projection for a finite market gives a way to model what proportion of those sales youmight eventually make, and combines it with an estimate of how likely you are to convert a remainingpotential buyer into a sale in each year. This type of model produces an eventual decline of sales as themarket becomes exhausted.

Selling a new product that may take off spectacularly, or fail, or something in-between

Offering a new product on the market carries the unpredictability of consumer reactions. Model Newproduct sales offers four different approaches to model a sales growth curve whose rate of acceleration isgiven by a probability distribution, as an elegant way of reflecting consumer reaction.

Selling a new product where a competitor may emerge, taking some of the market share

If your new product does very well, chances are that one or more competitors will produce a similar, oreven slightly better, version of the same product. The trigger for the introduction of a competitor will bewhether they can develop a competing product (maybe you have a crucial patent that will have to expire,

maybe they just need to tool up a factory), whether they see it making them or profit or perhaps see a

strategic advantage to keeping up with you. Model NPV of a capital investment takes its trigger from

the total sales that are made. If the market gets to a certain size, a competitor emerges and begins to eatinto your market. Then, if the market gets bigger still, another competitor enters. The model has a neattrick for allowing the market to be shared out.

Selling a product whose demand is a function of economic and other factors

There are often interactions in the real world between economic factors like exchange and interest rates,

and sales volumes, plus perhaps politically-driven variables like sales tax rates, or market variables likeraw material prices which in turn affect sales price which then affect sales volume. These factors mayinfluence more than one variable in the model which means that we need to explicitly describe their inter-relations to capture the correlation effects they produce between our model variables. For example, ifexchange rate to the US$ might affect our sales in the US, as well as being a component of the cost of

some raw materials we buy. Model Market growth model offers some techniques for modelling thesetypes of inter-relationships.



Example models

657

Project risk analysis

Duration of a project consisting of several inter-related tasks of uncertainduration

This is a typical project risk analysis problem. Let's imagine the following example: A constructioncompany is about to sign a contract for building a hospital in the middle of the city. The government of thecity wishes to know the estimate of the duration of the project for some planning purposes. In order tocalculate the duration, the project manager has divided the project into several stages, and assigned themost likely values to the duration of each stage:

Task Duration (most likely),weeks

Design 30

Planning 6

Dig holes 5

Archaeological excavation 3

Foundations 12

Walls 22

Roof 7

Services and finishings 15

Commissioning 17

Job over 117

Each stage may start only after the previous one is finished and there are no parallel tasks. Thus,summing up the durations of all 9 stages, we get the total duration of the project equal to 117 weeks.

Now the project manager wishes to know the uncertainty around this parameter, as the input estimatesare uncertain as well. He assigns the minimum, the most likely and the maximum possible values to the

duration of each stage:

Task Minimum (80%) Most likely Maximum (150%)

Design 24 30 45

Planning 4.8 6 9

Dig holes 4 5 7.5

Archaeological excavation 2.4 3 4.5

Foundations 9.6 12 18

Walls 17.6 22 33




658

Roof 5.6 7 10.5

Services and finishings 12 15 22.5

Commissioning 13.6 17 25.5

Job over 93.6 117 175.5

We have simplified thing here for illustration purposes by making the minimum and maximum value 80%and 150% of the most likely value respectively. In a real problem, the minimum and maximum would beestimated individually. In the most favorable scenario the total duration of the project will not be less than93.6 week, and in the worst scenario case the duration will not exceed 175.5 weeks.

There are two widely used distributions that are applied in project risk analysis to model expert's opinions.These are the Triangle distribution and the PERT distribution. Both of them take three parameters -minimum, most likely and maximum. Thus, the duration of each stage can be modelled by either of thesedistributions.

The figure below shows the two way of modelling the "Design" stage:

In this example we will use the PERT distribution for all stages of the project as it seems to provide amore realistic interpretation of these parameter values. Replacing the most likely values in the projectmanager's original calculations with distributions we get the structure as illustrated in this spreadsheet

model - hospital.

The outcome distribution for the total duration is shown below:



Example models

659

As we can see from this graph, the project manager's original estimate of 117 weeks is far from being theexpected value. There is only a 16.14% probability that the total duration will be less than or equal to 117weeks.

This is a very common problem for the project managers that try to estimate the duration of the project.By setting all their estimates to the most likely values they neglect the fact that usually the probability ofexceeding the most likely value for a particular stage duration (or cost, in fact) is higher than finishingearlier, i.e. the distribution has a longer tail to the right because in a project there is an absolute minimum

time, that any task will take, but life finds a way of introducing any number of obstacles to make that tasktake a very long time.

In this case we've had the minimum as 80% of the most likely and the maximum as 150% of the mostlikely, so we have a skewed distribution to the right for each of those tasks. As these distributions getadded up, the difference between the most likely and the mean value of the sum becomes more and

more large.




660

Generally, when you add a lot of probability distributions together, you get something that looks like anormal distribution (see Central Limit Theorem) with the 50th percentile of the Normal distribution prettymuch equivalent to the sum of those mean values of the individual task distributions.

A rough rule of thumb would be to take the minimum plus 4 times the most likely plus the maximumdivided by 6 for each of those tasks. That would give the mean of each of those PERT distributions andtheir summation would therefore give a rough approximation to the 50th percentile of the total project

duration. If you were to use a Triangle distribution then it would be the minimum plus most likely plusmaximum divide by 3.

Another example model for of this problem is provided here: silo. In this model we have tasks that aregoing in parallel . The discrepancy between our best guess estimate and the actual mean of the

distribution becomes greater and greater because when there are parallel tasks we are looking for themaximum duration of parallel path activities. This model is an extension to the previous one and it really

shows at a very basic level why risk analysis for projects is very helpful.



Example models

661

Other problems

A continuous variable with a long tail distribution

See also: Splicing Distributions window

What do we mean by a long-tailed distribution? One distribution is said to have a longer tail than anotherif its probability density (or mass) function is (asymptotically) larger than the other distribution's for verylarge values of the variable, i.e. for two distributions A and B:

Many socioeconomic and other natural random variables take long-tailed distributions. Examples are citypopulation sizes, occurrences of natural resources (e.g. size of reserves in a certain geological region),stock price fluctuations, size ofcompanies, income.

The most commonly fitted

distribution to the extreme of suchdata has been the Pareto. There isno decent theory to explain why thePareto distribution tends to fit the

tails of long-tailed variables, butmost people accept that it works anduse it anyway.

The Pareto is usually a poor fit forthe main body of the variable,

though. Thus, when modelling long-tailed distributions one usually doesso using a splice of one distribution(like the Lognormal, or Gamma, forexample), with a Pareto distribution to model the tail.

In ModelRisk you can use the Splicing Distributions window to splice two distributions together.




662

A discrete variable with a long tail distribution

We are sometimes in a position where we wish to model a discrete variable that has a long-taileddistribution. This section describes a number of distributions one might use.

Infinite-tailed discrete distributions

The eight discrete distributions offered by ModelRisk that have a tail to infinity are the Negative Binomial(of which the Geometric is a special case), the BetaNegBin (and the BetaGeometric), the Delaporte, theLogarithmic, the Polya and the Poisson. The variance and the mean of a Poisson distribution are both

equal to λ . However, a NegBin(s,p) distribution, has a mean µ and variance V as follows:

Thus, while a Poisson distribution has a ratio of variance to mean of one, the NegBin distribution has a

ratio V/µ = 1/p, which is always greater than one. Since a Negative Binomial distribution can be

constructed as a Gamma mixture of Poisson distributions, it follows that a Negative Binomial distributionwill always have a greater spread, and therefore a longer right tail, than a Poisson distribution with thesame mean. So, the NegBin distribution is a natural contender for modelling a discrete variable with along right tail.

A discretised Pareto distribution

Any continuous distribution can be made to be discrete by simply rounding generated values from acontinuous distribution to whole numbers. For example, the formula =ROUND(VosePareto(2,3),0) willgenerate values from a Pareto(2,3) distribution and round off to whole numbers. The Pareto distributionhas longer tails than the Negative Binomial distribution, and is the longest-tailed continuous distribution,

so this is a quick and easy to use method of getting long-tailed discrete distributions.

A variable with a long left tail

It is a simple matter to use the above distributions to model a variable that extends with a long tailtowards negative values rather than a long right tail. The technique is to subtract a long right-taileddistribution from some constant. For example, the variable =1000-VoseNegBin(2,0.03) has the shapegiven in the figure below. Care needs to be taken to ensure that such constructed distributions remainwithin the plausible bounds of the variable. For example, the variable =1000- VoseNegBin(2,0.03) canpotentially extend into negative values, although as the plot below reveals, this is not probably sufficientlylikely to matter.



Example models

663




664

Instantaneous failure rate

The principle of the instantaneous failure rate function

Reliability theory is much concerned with the probability distribution of the time a component or machine

will operate before failing. The instantaneous failure rate, often called the hazard function, of acomponent or device at time t is defined as:

where f(t) and F(t) are the probability density function and cumulative distribution function respectively forthe amount of time the component or machine will work before failing. In other words, z(t) is the rate of

failure of the component at time F(t) given that it has survived up to time t with probability 1-F(t).

It can be shown that the expression in Equation 1 for z(t) results in an equation for f(t):

Some common results

The Exponential distribution

In a Poisson process, the instantaneous failure rate z(t) is constant i.e. z(t) = λ, then

Using we have the equation of the Exponential distribution, i.e. the exponential distributiondescribes the distribution of survival time of a component given that it has a constant failure rate. The

alternative parameter is called the mean time between failures (MTBF).

The Weibull distribution

If z(t) is not assumed to be constant, but rather increases or decreases smoothly with time, we canconsider using the equation:

Equation 3

The equation looks unnecessarily complicated: it is in fact just z(t) = at b but the form used above helps in

producing a neater equation in the next step. The graph below helps to visualize how this function

behaves. If α = 1, the equation for z(t) reduces to: z(t) = λ which is the formula that produces the

exponential distribution. If α < 1, z(t) decreases with time which typifies the running in period for a

component. If 1< α < 2, z(t), increases with time, first rapidly and then more slowly. If α = 2, z(t),

increases linearly, and if α > 2, z(t), increases at an ever increasing rate, which typifies the period of the

end of a component's useful life.



Example models

665

Putting Equation 3 for z(t) into Equation 2 and then Equation 1, and using results in the following

expression:

which is the distribution function for the Weibull (α, β) distribution.

A limitation of the Weibull's equation for z(t) is that z(0 ) is either zero or infinite which is unrealistic(ignoring the constant z(t) exception). Also note that a component with a Weibull lifetime when first put

into service will never have the same, or any other Weibull-distributed lifetime afterwards because afterany amount of service time they have travelled along the z(t) curve, which is now neither zero of infinity.

More lifetime distributions

ModelRisk includes the following Lifetime distributions based on different, very flexible functional forms forz(t):

Distribution name z(t) Restrictions

Lifetime2 z(t) = a + bt a≥0,b≥0.MAX(a,b)>0

Lifetime3 z(t) = a + bt + ct 2 a>0,c>0,a-b^2/4c>0

LifetimeExp z(t) = exp[a + bt ] b>0

The Lifetime2 distribution has a linearly increasing instantaneous failure rate that may begin at a non-zerovalue:




666

The Lifetime3 distribution has a quadratic instantaneous failure rate that can begin at a zero or a positivevalue, can increase constantly or at an increasing rate, and which can also produce a bathtub curve(b<0):

The LifetimeExp distribution has an exponential form for the instantaneous failure rate, which is always>0 and may increase or decrease with time:



Example models

667

Each of these three distributions can be used at the beginning of a component’s service life and at somelater time T (where the lifetime left is now (t-T) ) in a consistent way, as follows:

Distribution name Initial z(t) z(t) after time T

Lifetime2 z(t)=a+bt z(t)=[a+bT]+b(t-T)

Lifetime3 z(t)=a+bt+ct 2 z(t)=[a+bT+cT 2 ]+[b+2cT](t-T)+c(t-T)2

LifetimeExp z(t)=exp[a+bt] z(t)=exp[[a+bT]+b(t-T)]

The ability to retain the same functional form for z(t) means that we can apply and reapply these same

distribution types throughout the lifetime of a component without contradicting any previous assumptions.

Instantaneous failure rates for other distributions

Provided a distribution is continuous, has a minimum of zero and smooth and calculable density anddistribution functions, we can use it for a lifetime distribution and investigate its instantaneous failure ratefunction. The following distributions comply with these requirements and are often used as lifetime

distributions:

Lognormal – also justified if one believes that a lifetime is the product of a large number of random factors

Gamma – if one believes that a lifetime is the sum of a number of exponential events

Fatigue – (with α=0 to have a minimum of zero) the fatigue life distribution is based on a conceptual

model of a crack growing to breaking point

Burr – because with its four parameters it has a lot of flexibility of shape

Inverse Gaussian – when a Lognormal has too heavy a right tail

LogGamma – (with λ=0 to have a minimum of zero) if one believes that a lifetime is the product of a

number of LogExponential events

Pareto2 – when you want a lifetime distribution with the longest possible right tail

The following z(t) plots illustrate some of the variety of forms that can be obtained with these families of




668

distributions.



Example models

669





670

Distance to the nearest neighbour when individuals are randomly distributed overan area or space

We want to model the distance to the nearest neighbour when many entities (fires, in this example) are

randomly spread over the area. In this problem, it is known that the average density of fires in the region

is λ/km2, but no information is available about the distribution of distance between these fires.

To solve the problem, we have to make a couple of assumptions. First, let's assume that the fires arerandomly distributed over the area. That would mean that the herds are not concentrated around anyspecific points. [If it does not hold we can look at sub-models for different areas.] Then, we can say that

for an area A, the actual number of fires is VosePoisson(λ*A).

The second assumption is that each fire is essentially concentrated at a single point, i.e. its diameter isinsignificant compared to the distances between fires. For situations where this is not the case the modelcould be extended by associating a radius with each random herd.

A simplified version of our solution is provided in model DistanceToNearestFire.

The model creates a large square area (large meaning that there will be a large number of fires expectedto exist within such an area, say 150 or so), and randomly places fires within that area. Assuming that therandom fire is located in the centre of the area, the distance to its nearest neighbour is calculated usingthe formula:

Distance =

Where {x,y} is the centre of the area (position of a fire), and {x0,y0) is the position of the randomlyselected fire. The formula can be extended to space rather than area very easily:



Example models

671

Distance =

A paper by Cliff and Ord. notes several results under the same assumptions we made. We convert them

here into ModelRisk formulae, where λ is the average concentration of fires/km2:

Distance to nearest neighbour (km) = VoseRayleigh(SQRT(1/(2*PI()*λ)))

Distance to nearest neighbour (km) = SQRT(VoseExpon(1/(PI()*λ)))

Distance to rth nearest neighbour (km) = SQRT(VoseGamma(r,1/(PI()*λ)))

The model includes the first two results and shows that they match exactly..





672

Lifetime of a device of several components

Let's consider the following example: A piece of electronic equipment is composed of six components Ato F. They have the following mean time between failures:

Component MTBF(hours)

A 332

B 459

C 412

D 188

E 299

F 1234

The components are in serial and parallel configuration as shown below:

What is the probability that the machine will fail within 250 hours?

We first assume that the components will fail with a constant probability per unit time, i.e. that their timesto failure will be exponentially distributed, which is a reasonable assumption implied by the MTBF figure.The problem belongs to reliability engineering. Components in series make the machine fail if any of thecomponents in series fail. For parallel components, all components in parallel must fail before themachine fails. Thus, according to the figure above, the machine will fail if A fails, or B, C and D all fail, or

E and F both fail. The figure below shows the spreadsheet modelling the time to failure.



Example models

673

Running a simulation with 10 000 iterations on Cell D16 gives an output distribution of which 63.5% of the

trials were less than 250 hours.

The spreadsheet of this model is reached here: Lifetime of a device.




674

Modelling a risk event

The figure below illustrates a model to estimate the impact of a set of risks that may impinge on a project.

In this model the total cost of a project is being estimated. Seven uncertain elements have beenmodelled:

• The base project cost;

• The potential impact of five identified risks: Health and Safety Executive intervention; a

strike; bad weather sub-contractor insolvency and a change in the ruling political party;

• The rate of inflation

The base project cost is modelled by a simple Triangle distribution in Cell C10. The inflation rate is

modelled in Cell C23 with a PERT distribution. The selection of a Triangle or PERT to express uncertaintygiven a three point estimate (minimum, most_likely, maximum) is discussed elsewhere.

The point of this model is really to illustrate a way of modelling inter-related risk events. H&S, badweather, and political change risks have 10%, 30% and 2% probability of occurring. The risk of strike,however, has a 15% chance of occurring unless the H&S risk occurs, when it is considered the probabilityincreases to 30%. The insolvency probability is 5%, but goes up to 75% if the H&S and the strike risks

both occur. We can use conditional logic with Excel's IF function, depending on whether or not the Fcolumn (see below) contains a zero, to alter the probability of these two risks accordingly.

Column E models the impact of the risk: a range of 80% to 150% of the most likely risk impact ismodelled using a Triangle distribution object (80% and 150% is for the convenience of illustration: werecommend that you review each risk separately). Column F uses the VoseRiskEvent function that

returns a random value from the impact distribution if the risk occurs, and a zero otherwise.

The effect of this model is to recognise that the H&S risk has a much more significant impact than onemight suppose when reviewing it in isolation. It is extremely common for risks to be inter-connected: forexample, a certain risk occurring might draw resources to manage it that are no longer available toprevent another risk. The occurrence of a risk might also affect the size of an impact of another risk. We

haven't shown it here, but it is simply modelled by using the same IF logic on the Most Likely (M L) valuecolumn.



Example models

675

The spreadsheet of this model, which also includes the Triangle and Pert distributions, is provided here:

risk portfolio




676

Modelling an extreme value for a variable

Imagine that we are building a bridge between two islands. The bridge must stand up to extreme weather

events, like very high or powerful waves, and very high sustained winds or gusts. For example, it might bespecified that the bridge must have a 90% probability of withstanding the highest sustained (>10minutes, for example) wind that might occur in the next one hundred years. Of course, we could be veryunlucky: the highest wind of the century could occur tomorrow, and then with 10% probability it blows the

bridge down! However, we can't build infinitely strong bridges and costs make us reach a specificationcompromise like the one above.

Since the wind speed at any moment is a continuous random variable, it follows that the greatest windspeed over the next century is also a continuous random variable. There are many such situations inwhich we wish to model not the entire range that a variable might take, but an extreme, either theminimum or maximum. For example, earthquake power impinging on a building - it must be designed tosustain the largest earthquakes with minimum damage within the bounds of the finances available to build

it; maximum wave height for designing offshore platforms, breakwaters and dikes; pollution emissions fora factory to ensure that, at its maximum, it will fall below the legal limit; determining the strength of achain, since it is equal to the strength of its weakest link; modelling the extremes of meteorological events

since these cause the greatest impact. People have put a lot of effort into determining the distributions ofthese extremes for various situations, but it is often not easy. You can imagine that if, for example, wehave only ten years of wind data, we will have to make some assumptions to estimate what the greatestwind speed of the century might be.

It is not just engineers that are interested in extreme values of a parameter (like minimum strength,maximum impinging force) because they are the values that determine whether a system will potentiallyfail. Insurance companies, for example, are also interested in the size of a claim from extreme events, likehurricanes and terrorist attacks.

The theory behind determining the extreme value distributions is as follows:

Let X be a random variable with cumulative distribution function F(x).

Let Xmax = MAX(X1, X2, ..., Xn) and Xmin = MIN(X1, X2, ..., Xn)

Then the cumulative distribution function of Xmax and Xmin are:

and

Substituting the cumulative distribution functions for each parent distributionand then letting n approach infinity gives the equations of each distribution'srespective extreme value distribution.

The ExtValueMax distribution offered by ModelRisk is also frequently known as the Gumbel distribution,or the Extreme Value distribution. Actually, it is one of only three possible extreme value distributions. The

other two distributions are a version of the Weibull distribution (the variable -X is Weibull distributed) andthe Frechet distribution though the Frechet is not popularly used. They have the following cumulativedistribution functions:

Distributions for largest extreme

Distribution CDF



Example models

677

Type I (GumbelMax(a,b) =

VoseExtValueMax(a,b) )

, ,

Type II (FrechetMax(a,b,c) )

, ,

Type III (Weibull-typeMax(a,b,c) )

, ,

Distributions for smallest extreme

Distribution CDF

Type I (GumbelMin(a,b))

, ,

Type II (FrechetMin(a,b,c) )

, ,

Type III (Weibull-typeMin(a,b,c) )

, ,

The theory of extreme values says that the largest or smallest value from a set of values drawn from thesame parent distribution tends to an asymptotic distribution that only depends on the tail of the parent

distribution. The Gumbel distribution is the extreme value distribution for all parent distributions of theExponential family, e.g. Exponential, Gamma, Normal, Lognormal, Logistic and itself. The Frechetdistribution is the extreme value distribution for parent distributions of the form of Pareto, Student-t,Cauchy, log-Gamma and itself. The Weibull distribution is the extreme value distribution for Beta, Uniform

and Weibull distributed variables but the convergence can be very slow.

As discussed above, the three standard extreme value distributions are the Gumbel, the Frechet (not

directly available with ModelRisk - but Model Frechet.xls generates the distribution), and the Weibull.

The problem with all these extreme value distributions is that:

a. they only work for certain types of parent distributions,




678

b. they are only asymptotically correct, meaning that one needs to be considering theextreme of a potentially very large set of observations before the extreme distribution is a goodmodel, and

c. the parameter values for these extreme distributions are also difficult to estimate, or evencalculate if one knows the parent distribution very well.

At times, a more practical approach to determining the extreme value distribution is to first estimate theunderlying parent distribution, and then simulate a set of observations from that distribution anddetermine at each iteration what the maximum (or minimum) of that set of observations is. The ModelRiskfunctions VoseLargest and VoseSmallest do this directly.

Thus, by running many iterations one arrives at a well-defined extreme distribution. A lot of iterations

(probably several thousand) are needed to determine the extreme distribution well because simulationstatistics like a maximum or minimum take a long time to stabilise.

The parameters of the Extreme Value distribution are usually determined by data fitting except in certaincircumstances where the parent distribution is known and the relationship between its parameter valuesand the parameters values of the appropriate extreme value distribution are also known. Gumbel (1958)

provides an old but still excellent treatise on extreme value theory.

Contagious extreme value distributions

Sometimes we are interested in the largest (or smallest) of a random number of random variables. Forexample, the largest flood that might occur in a period, where the number of floods is random, and also

the size of each flood is random. Other examples are earthquakes, explosions, stock price jumps, andaccidents. Sometimes, neat mathematical solutions are available for modelling the extremes of such

systems. For example, if the number of gas explosions in a period can be described by VosePoisson(λ )and the intensity of an explosion is described by a shifted Exponential distribution (e.g. = c +VoseExpon(b) ), then the maximum explosion intensity is given by an Extreme Value distribution: =

VoseExtValueMax(c+bLN(λ ), b). Example Model Contagious_extreme_value_distribution.xls

demonstrates the result by simulation.

Similarly, if the number of explosions in a period can be described by VosePoisson(λ ) and the size of an

explosion is described by a Pareto(θ ,a) distribution, then the maximum explosion intensity is given by a

Frechet(0, aλ 1/θ , θ ) distribution. Care needs to be taken here in that one is assuming that the frequency of

events and the event intensities are independent. For example, it is well-recognised that earthquakeintensities are related to the number of earthquakes: the more earthquakes, the more gently released thetectonic plate energy, and thus the lower the earthquake intensities. Similar arguments can be madeabout floods. Kottegoda and Rosso (1998) provide plenty of excellent worked examples.




Example models

679

Multivariate trials

We sometimes need to recognise the inter-relationship between probabilities of values for two or moredistributions.

In other words, these distributions are not independent of each other.

Some ModelRisk features and other modelling methods allow us to crudely model correlations between

several distributions. However, there are certain situations where specific multinomial distributions areneeded.

The following three common multivariate distributions are described here:

• Multinomial

• Dirichlet

• and Multivariate Hypergeometric

Multinomial distribution

For a set of n trials, each of which could take one of k different outcomes (like different colours of balls ina huge urn) with probabilities p 1..pk, the distribution of the outcomes is known as multinomial, which is justan extension of the binomial distribution. The only difference is the number of possible outcomes: only

two for the binomial and multiple for the multinomial. The Multinomial distribution has the followingprobability mass function:

is sometimes known as Multinomial coefficient.

Let's consider the following problem: The cars in a city are divided into 9 different categories. We knowthe proportion of the city's cars that are in each category. If we were to monitor 1000 cars that enter aparticular motorway, how many cars of each category would we see?

This is clearly a problem of multinomial trials since every car that enters the motorway can be any one of9 types.

To sample from a multinomial distribution we need to proceed as follows:

We know p1, p2,...,pk (proportions of each type) and n (our sample size - 1000).

First we simulate from binomial(n, p 1) - this gives us s1.




680

For each remaining category, we simulate s 2, s3, ...,, sk in order with s j = binomial(n-SUM(s1...s j-

1),p j/SUM(p j...pk))

Note that the marginal distribution for s j (i.e. the distribution of generated values for s j when looked at byitself) is simply a binomial(n,p j).

So, our model looks like this:

Our first category is simulated in cell C11, which is just a binomial distribution : VoseBinomial(1000, 5%)

As the second category now needs to take into account the result from the first type, the formula in cellD11 becomes as shown above - number of trials is decreased by the number of successes from the firstcategory, and the probability of success becomes the probability of category two divided by the sum of

the probabilities of the remaining 8 categories.This logic is consistent throughout the "Successes" row (cells D11: K11), and the row "Outputs" shows anice way of naming the output cells.

Dirichlet distribution

The conjugate to the multinomial distribution is the Dirichlet distribution, much like the beta distribution is

the conjugate to the binomial distribution. The Dirichlet distribution is used for modelling the uncertaintyaround probabilities of successes in multinomial trials.

The Dirichlet distribution has the following probability density function:

For example, if you've observed s 1, s2, ... s k of different types of outcomes from n trials, the Dirichlet

distribution provides the confidence distribution about the correct values for the probability that a random

trial will produce each type of outcome by setting α 1 = s 1+1,. Obviously these probabilities have to sum to

1, so their uncertainty distributions are inter-related.

Let's take the same problem that we used in the previous example: All cars in a city are divided into 9different types. But now we have monitored 1000 cars that were entering a particular motorway, and

counted the number of cars of each type. What is the uncertainty distribution for the proportions of eachtype in the total population of cars?

Putting the above logic into a spreadsheet model looks like this:



Example models

681

The second part of the equation in cells D10 to J10 follow similar logic as C10, which is then multiplied bythe (1 - sum of the previous cells in the same row). The last Cell K10 calculates the implied probability for

the last category as 1-sum(C10:J10).

The Dirichlet distribution is not as intuitive as the Multinomial distribution, but it is a very handy tool whenmodelling multinomial trials.

Multivariate hypergeometric

Sometimes we need to model sampling from a population without replacement with multiple outcomes

and when the population is small so the process cannot be approximated to a multinomial where theprobabilities of success remain constant. In this case we use the multivariate hypergeometric distribution,which is similar to the hypergeometric distribution, with the difference in the number of possible outcomesfrom a trial (two - in the hypergeometric and many - in the multivariate hypergeometric).

The figure below shows the graphical representation of the multivariate hypergeometric process. D 1, D2,

D3 and so on are the number of individuals of different types in a population, and x 1, x2, x3, ... are thenumber of successes (the number of individuals in our random sample (circled) belonging to eachcategory).

The Multivariate hypergeometric distribution has the following probability mass function:




682

, where

Let's imagine a problem where we have 100 coloured balls in a bag, from which 10 are red, 15 purple, 20blue, 25 green and 30 yellow. Without looking into the bag, you take 30 balls out. How many balls of each

colour will you take from the bag?

We cannot model this problem using the multivariate distribution, because when we take the first ball out,the proportions of the different colour balls in the bag change. The same happen when we take thesecond ball out and so on.

Thus, we must proceed as follows:

• Model the first colour (red for example) as x1= Hypergeometric(s, D1 ,M) , where s is thesample size = 30, D 1 is the total number of red balls in the bag = 10, and M is the population size- 100

• Model the rest as: x i = Hypergeometric (s - SUM(x 1: x i-1), D i , SUM(D i : Dn)) , where x i isthe number of successes of the type i in a sample, x i-1 is the number of successes of the type i-1in a sample, D i number of successes of type i in the total population, Dn in the number ofsuccesses of the last type in the total population.



Example models

683

Percent operating time of a machine with breakdowns and repairs

This problem normally requires a complex logical structure since we are trying to model many processessimultaneously.

Let's consider the following problem:

A power plant needs 2 water pumps operating ax max capacity to cool its turbines with river water. Sincethe pumps may break down, the power plant has installed two additional pumps. These four pumpsoperate at 50% capacity and if one or two pumps break down, the power plant can still operate.

The calculating complexities arise because if we have failures of any pumps, the remaining pumps wouldhave to work harder, thus increasing the remaining pumps' probability of failure. In other words, if wehave all 4 pumps running together, and we only need 2 for the station to operate, then the pumps are

working at half capacity but as soon as one pump fails the remaining pumps are working at 2/3 capacity,and so they have a higher failure rate or, equivalently, a lower mean time between failure.

The four pumps are of varying age and therefore of varying reliability. The following table summarises thedata:

Probability of failure (fail/day)

PumpPumps working

4 3 2

A 0.002 0.007 0.025

B 0.004 0.013 0.079

C 0.007 0.034 0.142

D 0.002 0.007 0.025

Each repair of the pump takes Lognormal(20,15) days.

The questions are: a) How long will it take before a shutdown occurs? b)How many shutdowns will thestation have in a year? c)What is the probability of one or more shutdowns per year?

The solution to this problem is illustrated in the following spreadsheet: Power station pumps

There are several inter-linked tables in this model. First we need to convert the table above (probability of

failure) into a table of mean time between failures using the following formula:

MTBF = -1/LN(1-P), where P is the probability of a failure.

So, we get the new table:

MTBF (days)

PumpPumps failed

0 1 2

A 499.4998332 142.3565575 39.49789021

B 249.499666 76.42198649 12.15137069

C 142.3565575 28.90888214 6.529495909




684

D 499.4998332 142.3565575 39.49789021

This table shows that the MTBF decreases dramatically with each new pump failure.

The leftmost column in the spreadsheet model (column B) shows the time t of the next event occurring:the event could either be the failure of a pump, or the completion of repairing of the pump.

Columns C to F show the status of the pumps when an event occurs, i.e. for every point in time t incolumn B, this table will show which pumps are broken.

The next table (columns G to J) use the MBTF table above to return the mean times between failures foreach pump depending on how many pumps are broken at point in time t .

The table "Time to repair completion new t" (columns K to N) checks if the pump is broken and if it is,then returns the Lognormal(20, 15) distribution, otherwise returning the value of 1000000. This largevalue is just a dummy showing that the pump is not broken. We use it instead of zero because we need tofind the next event to occur, i.e. the minimum time until the next event. As the value of 1000000 will neverbe a minimum, the algorithm will always return the required value.

The next table (columns O to R) also checks for both the broken pump and for the shutdown of the stationand puts the value of 1000000 if either of these factors are positive, otherwise returning the time until thenext failure for the pump.

The last three columns of this spreadsheet calculate the shutdowns of the power station. Column Sreturns the value of 1 if at any point in time t there are 3 failures. Column T returns 1 if the correspondingcell in column S is one and the time is column B is less than 366 (less than 1 year). Column U returns thepoint in time of the first power plant shutdown.

The outcomes of the model are located in cells E10 to E12. Cell E12 shows the one positive value fromcolumn U, providing the answer for question (a). Cell E10 returns a summation of the column T, thusanswering question (b). Cell E11 generates a discrete 1:0 distribution, a mean of which is the answer toquestion (c).



Example models

685

Predicting results of a random survey, and uncertainty about results

We often hear on the news from a recent poll of a population how people are expected to vote on some

issue or at an election. If the issue is a simple "yes" or "no", and the people are randomly andrepresentatively sampled from the population, then the poll is a Error! Hyperlink reference not valid.. Inthis case, our uncertainty about the fraction of voters p who will ultimately vote "yes" is described by anuncertainty distribution Error! Hyperlink reference not valid.:

p = VoseBeta(s+1,n-s+1)

where n is the number of people surveyed and s is the number among them who stated they would vote"yes". Built into this analysis is the assumption that people won't change their minds between the time thepoll was conducted and the date of the vote - which is always a tricky assumption!

A more interesting case is when there are more than two possible outcomes, for example, an election

where there are three or more significant competing parties. This is a multinomial process, and we would

therefore employ the Dirichlet distribution to represent our uncertainty about the fraction of the populationwho would vote for each party.

For example, imagine that we have surveyed 1027 people, asking them for which party they are intendingto vote. The results are as follows:

Voting choice Number with this preference

SDP 259

SMP 312

PSM 132

EDP 261

Abstaining 63

Total 1027

Using the Dirichlet distribution and assuming that people don't change their mind between the poll andelection time, we can answer questions like:

• How confident are we that SMP will win (get more votes than any other party)?

• If the SDP join forces with the EDP, and the SMP join forces with the PSM, how confidentare we that SDP/EDP will get more votes than SMP/PSM?

The example model Election demonstrates how to construct the Dirichlet distribution to calculate the

probabilities and their associated confidences.

The Dirichlet distribution is implemented in ModelRisk as VoseDirichlet.




686

Rare event risks

A rare event risk can be defined as an event that has a very low probability of occurring during the lifetimeof a project or investment or a specified period. Examples of such rare events could be: the rupture of a

pipe in a nuclear reactor, a flood caused by a tsunami, a financial market crash in a country or a part ofthe world, or an epidemic of an exotic disease. It is frequently reasonably easy to estimate the

approximate impact of a rare event by considering possible impact scenarios, but estimating theprobability of that rare event is usually extremely difficult because there are no data available.

The probability of the rare event occurring is often estimated by considering the small probability that arandom variable Y exceeds some large threshold. This random variable may be a function of severalother random variables {X}, and possible values for Y are generated by first generating values for each

{X}. Importance sampling technique can considerably ease the processing burden, without compromisingthe model accuracy.

The following example illustrates a simple method of modelling a rare event in estimating theconsequences of a flood caused by high-wave storms:

Problem: A vast area of land in country A is below the average sea level. Dikes are built along the sea

coast in order to protect the country from floods. Extreme wave-height and still water level are two veryimportant factors for causing flood along the sea coast. The scientists concerned with the safety of thearea found out that a catastrophe can occur if the wave-height and still water level (both measured inmeter) satisfy the following relation:

Catastrophe factor = 0.4 * wave-height [m.] + sea level [m.] > 6.2.

Taking into account the country's geography, the area of flooded land can be calculated using thefollowing formula:

LogNormal(1.38 * Catastrophe factor ^2, 3.52 * Catastrophe factor (̂1/4)) [100 sq. km.].

Wave-heights and still water levels during high tide have been reliably measured without any interruption

at several stations along the sea coast. The analysis of the data gave estimates for the parameters of theinput distributions:

Sea level during a storm: Normal(0.2, 0.3) [m.]

Distribution fitted to wave-height during a storm: LogNormal(0.5, 0.6), Shift(+0.5) [m.]

Giving that storms occur at a Poisson expected rate of 25 per year, we need to calculate the probability

that within 5 years there will be a flood that can occupy more that 7,000 sq. km.

Solution: This problem could be solved by Monte Carlo simulation, which might however take a very

great number of iterations to stabilize because of the low probability of the event being modelled. A floodthat occupies more than 7,000 sq. km. can occur more as a result of an extreme wave height than as a

result of an extreme sea level. Since these two input variables are uncorrelated, it is easiest to calculatethe probability that the wave-height exceeds some threshold (say 4 m.), truncate the variable to valuesbeyond the threshold, simulate this rare event and then adjust the outcome with the calculated probability.

Spreadsheet Floods.xls gives an example. A detailed explanation of the model is provided here.

Let's first calculate the probability that the wave height during a storm will exceed 4 meters:

From the LogNormal probability density function we have:

LognormalPDF:



Example models

687

where µ and σ are the mean and standard deviation of the natural log of the variable.

The parameters for the corresponding LogNormal distribution Mean = M and Standard deviation = S can

be calculated using the relationship below:

EXP(µ +σ 2/2) = M

EXP(2µ +σ 2) [EXP(σ 2)-1] = S2

Solving for µ and σ , we get

µ = LOG(M) - σ 2/2

where σ 2 = LOG(1 + S2/M2)

Thus, Excel's LOGNORMDIST(x,µ ,σ ) gives us the required truncation probability.

With the performed truncation the rare event of a flood (Catastrophe factor>6.2) occurs more frequentlyduring the simulation and the problem turns into a simpler one - modelling a risk event. The probability

that a storm with a wave height of more than 4 meters can produce the risk event of interest can becalculated by setting the flag (cell C27) to 1 if the area affected by a flood (Cell C26) is more than 7,000,

and to 0 otherwise. The mean of the cell C27 is the required probability, which can be multiplied by thecalculated above truncated probability to give the probability that a storm can produce a flood whichcovers more that 7,000 sq.km.

Knowing the average number of storms that can occur in 5 years (t) and the probability that a storm can

result in a flood (λ ) can give us the Poisson intensity parameter λ t. The probability that a flood can occur

in 5 years equals:

P[more than 1 flood] = 1 - P[zero floods]

And P[zero floods] is just EXP(λ t) from the Poisson CDF.




688

Stress and strength

Stress can refer to any effect impinging on the component or system that could cause it to fail, forexample: pressure, temperature, applied voltage, torque.

Strength is the limit at which the component can withstand the applied stress. It has the same units as thestress variable, of course. The figure below shows how both of these can be random variables. The

stress applied to a component or system can be a random variable dependent on weather and otheroperating conditions, the mode of use, etc. The strength of the component will vary somewhat from onecomponent to another due to age, amount of use, manufacturing variability, etc. Thus, for any randomlyselected component, its strength is also a random variable.

Here we pose the question: What is the probability that the applied stress is greater than the strength of

the component? Scenarios of interest occur in the shaded overlap area in the figure above. In formalmathematics this requires doing an algebraic integration, which may not be possible depending on thedistributions of stress and strain. However, with simulation we can determine this very easily. The

example model Stress and strength shows an example.




Example models

689

Sampling from a liquid containing suspended particles

If the sample is small (say less than 10% of the volume of the total liquid), and the particles are randomlydistributed in the liquid, we can use the Error! Hyperlink reference not valid. to model the particles in

our sample. If the sample is large, we will need to use the binomial process. The two approaches arediscussed below:

Poisson modelling

If λ is the concentration of particles in the liquid, and t is the volume of liquid in the sample, then Poisson

mathematics gives us the following results:

• Probability no particles in the sample: = EXP(-λ *t), or =POISSON(0,λ *t,0)

• Probability at least one particle in the sample: = 1-EXP(-λ *t), or =1-

POISSON(0,λ *t,0)

• Simulation of number of particles in the sample: =VosePoisson(λ *t)

So, for example:

100 bacteria are randomly distributed in a vat of 1000 liters of wine. If a sample of two liters of wine istaken from the vat, what is the probability that there will be at least one bacterium? What is thedistribution of the number of bacteria in that sample?

Answer:

λ = 100/1000 = 0.1 bacteria per litre

t = 2 liters

Probability at least one bacterium in sample = 1-EXP(-0.1*2) = 18.1269...%

Number of bacteria in sample = VosePoisson(0.1*2)

The problem with this approach is that the Poisson process potentially allows an infinite number of

particles to exist. Once our sample is large compared to the volume of liquid, we could start generatingnumbers of bacteria greater than are actually in the liquid. For example, if the sample was 800 liters, theabove approach would model the number of bacteria as: =VosePoisson(80). A plot of this graph belowshows that the distribution exceeds the total number of bacteria (=100).




690

It might look like the problem will only be important when we get close to the total volume, but acomparison of the binomial and Poisson methods below show that there are significant difference atmuch smaller samples.

Binomial modelling

We can think of each bacterium as a trial, and that being in the liquid sample is a success. If the bacteriaare randomly distributed in the liquid body, then each of the n trials has a probability v/V of being in thesample, where v = the sample volume and V is the volume of the whole liquid body. We now see that thisis a binomial process:

n trials = 100 bacteria

Probability of success p = v/V = 2/1000 = 0.2%

Then:

• Probability no particles in the sample: = (1-p)n, or =BINOMDIST(0,n,p,0) • Probability at least one particle in the sample: = 1-(1-p)n, or =1-

BINOMDIST(0,n,p,0)

• Simulation of number of particles in the sample: =VoseBinomial(n,p)

and the answers to the questions above are:

Probability at least one bacterium in sample = 1-(1-0.2%)100 = 18.1433...%

Number of bacteria in sample = VoseBinomial(100,0.2%)

A comparison of a Poisson(0.2) and a Binomial(100,0.2%) shows that the Poisson (skinny light redcolumns) is a very good approximation because the sample is so small.

But if the sample had been just, say, 30% of the volume the two modelling approaches

(Binomial(100,30%) and Poisson(30)) would have already started to give different answers;



Example models

691




692

Stock control example

You are a tractor retailer that sells various models of farm tractors. The Model 12 is your best seller, costsyou $80,000 to buy, and you retail it at $99,000. You have orders of, on average, 2.7 a month,irrespective of the time of year (we could extend this to take care of seasonal variations if important, using

a Seasonal Poisson model). If you can't meet the order, you know the potential client will buy from acompetitor. Your current policy is to keep no more than 7 in stock. You take an inventory at the end ofevery month. If you have less than 3 in (stock + already ordered, but not received) you immediately order

more to regain a stock of 7, but it takes 2 months to receive delivery. If you have more than 3 in stock,you don't order any more. This month you have 5 in stock, and $200,000 in cash. The tax rate is 30%.

Monthly running costs (wages, rent of premises, etc) are $28,000. There is a cost of debt of 2% permonth if your cash position goes below zero. The owners receive dividends of $8,000 per month providedthere is money in the account.

What does your future cash position profile look like? In particular, what is the maximum debtfacility you should maintain? Does this policy maximise your long-term profit? What, if any,changes to your inventory policy would improve your finances?

This spreadsheet models the cashflows over time.

A graph of the cash flow position for a single iterations of the model looks like this:

Figure 1: Snapshot of cash position over time

where the red dots are month's in which tax is paid. The business projection is that your company's cashposition will be very volatile, and frankly not very profitable.

We need to test models for their logic. An easy and intuitive way to check the model is by stressing themodel parameters, which you can see here. Let's change some of the parameters and review the effect

they have on this graph. This gives us a good visual check of the model's behaviour.

Set SalePrice = PurchasePrice=$99000:



Example models

693

Figure A1: Snapshot of cash position over time when sales price = purchase price

The prediction is a steady downward trend, which makes sense. Now, let's instead change the sales rateto a very low value (0.001 tractors/month):

You get essentially the same graph, but without the volatility, because you are not selling any tractors.Now let's instead set the monthly running costs to $0. We get:




694

Figure A2: Snapshot of cash position over time when there are no running costs

A dramatic improvement in profitability as you might expect. Now let's set the Maximum stock to 100tractors:

Figure A3: Snapshot of cash position over time when maximum stock = 100

At a periodic rate of roughly (100-2)/2.7 = 36 months you are making an order of some 98 tractors, sellingthem all, and reordering, producing the saw-tooth cash position. The costant negative cashflow positionshows the burden of carrying so much stock and paying interest on the debt.

You can continue in this fashion for each parameter value, or combinations, until you are satisfied that themodel is behaving well. It also provides a good method to convince others that the model works,particularly if they are not so familiar with modelling.

The first question we need to answer is what the cashflow position looks like over time. Single snapshotslike Figure 1 don't give us a very good feel because they are just one scenario. After running a simulation,

we can produce the following summary chart:



Example models

695

Figure 2: Cash position forecast summary chart

The chart shows that the way the business is setup it is expected to do no better than break even (redline) and it may need to borrow up to $50,000 to stay afloat.

Optimising the inventory policy

Let's see what we can do to improve the forecast cash position by changing the inventory policy. We havetwo parameters we can play with: MinimumStock and MaximumStock. The first clue is to look at acomparison of stock and sales:

Figure 3: Stock and sales comparison snapshot for 1st year

You can see in Figure 3 that Sales often equal stock, which means we are probably losing sales by nothaving stock available, and often stock dips down to zero. Let's look at what happens if we increase theMaximumStock from 7 to 10:




696

Figure 4: Stock and sales comparison snapshot for 1st

year, when MaximumStock = 10

Now sales are not limited for a while when an order has been placed, but the stock level is allowed to gotoo low before the next order. So let's leave MaximumStock at 10, and increase MinimumStock to 6:

Figure 5: Stock and sales comparison snapshot for 1st year, when MaximumStock = 10 and

MinimumStock = 6

Sales now seem much more free of stock restrictions. In order to optimise the inventory policy we need tospecify the measure we wish to maximise, and then vary the MaximumStock and MinimumStoc k

parameters until we find that maximum. We could use the mean of a cell that calculates the average profitper year, but let's be a little more conservative and use the 30 th percentile of that cell, i.e. the averageprofit level that we are 70% confident of exceeding.



Example models

697

Figure 5: 30th percentile of mean yearly profit over 10 years for various combinations of MinimumStock

and MaximumStock.

From this plot the best options appear to be {MinimumStock , MaximumStock } = {13,14} or {14,15}. Ofcourse the greater the MaximimSiock and MinimumStock the more capital will be tied up in stock, and the

greater the risk of a negative cash position. We could do some further analysis to see what those levelswere and balance the risk of having a high negative cash position against an improved long-term profitlevel. We could also perform a sensitivity analysis on, for example, the level of demand to see how robustthe ranking is of the various options.




698

Number of events in a specific period

In general terms, this is a renewal process problem. In a renewal process, the times (or distances, etc.)between successive events are independent and identical, but they can take any distribution. In a Error!Hyperlink reference not valid., the times between successive events are described by independent

identical Exponential distributions. The Poisson process is thus a particular case of a renewal process.The mathematics of the distributions of number of events in a period (equivalent to the Poisson

distribution for the Poisson process) and the time to wait to observe x events (equivalent to the Gammadistribution in the Poisson process) can be quite complicated, depending on the distribution of timebetween events. However, Monte Carlo simulation lets us bypass the mathematics to arrive at both ofthese distributions, as we will see in the following examples.

More generally phrased the question comes down to: how many random variables do we need to add

from a distribution to reach a fixed total?

The distribution that is the answer to this is calculated directly with the VoseStopSum function.

Example

It is known that a certain type of light bulb has a lifetime that is Weibull(1.3,4020) hours distributed.

First question: If I have one light bulb working at all times, replacing each failed light bulb immediately

with another, how many light bulbs will have failed in 10 000 hours?

The example model One light bulb provides the solution to this question. Note that it takes account of

the possibility of 0 failures.

Next question: If I have 10 light bulbs going at all times, how many will fail in 1000 hours assuming that I

immediately replace my failed bulbs?

The spreadsheet Ten light bulbs shows a model to provide the solution to this question. It follows

exactly the same logic as the model above.

The figure below compares the results for this question and for the previous one. Note that they aresignificantly different. Had the time between events been Exponentially distributed, the results would havebeen exactly the same:

Last question: If I had one light bulb going constantly, and I had ten light bulbs to use, how long would ittake before the last light bulb failed? The answer is simply the sum of 10 independent Weibull(1.3,4 020)

distributions.



Example models

699

The number of failures until a certain number of successes have been achieved

This is a typical sort of risk analysis problem where we need to achieve a certain number of successes

(one or more) and each attempt (trial) may or may not become a success according to some randomprocess. Recognizing the type of process is the staring point :

Binomial process

The simplest type of example is for a Error! Hyperlink reference not valid., where each trial has the

same probability of success. Then there is an elegant solution embodied in one distribution. If we requires successes and the probability that any individual trial will succeed is p, then the distribution of thenumber of trials we will need is given by:

Trials needed = s + VoseNegBin(s,p)

Note that the NegBin(s,p) distribution is modelling the number of failures. When added to s, we get the

total trials needed. When we only need one success, the above formula simplifies to:

Trials needed = 1 + VoseGeometric(p)

Because the Geometric(p) distribution is just the NegBin(1,p) distribution.

Example

Let's imagine that we have some machine making a component. We have an order with a very narrowperformance tolerance such that only 1 in 4 components this machine makes would pass the qualitycontrol. We'll further imagine that the machine has already been set up to produce the maximum chanceof the manufactured components complying (so the probability of compliance will not improve). We needto fill an order for 250 components. Each component costs us $12.50 to manufacture. What price/unit

should we quote to give us a 75% chance of making some profit?

The distribution of the number of components we may have to make is given by:

Manufactured components = 250 + VoseNegBin(250,1/4). The cost is therefore:

Cost of fulfilling order = 12.50 * (250 + VoseNegBin(250,1/4))

And the cost per unit is:

Cost per unit = 12.50 * (250 + VoseNegBin(250,1/4))/250

Simulating this formula gives the following distribution:




700

Conclusion: we should quote a per unit price of $51.80 because there is a 75% chance that the actualoutturn cost to us will be less than that figure, and we will therefore make at least some profit. Here's a

question for you: If the client changed their mind and said they now want just 100 units, should werecalculate the price?

Hypergeometric process

A Error! Hyperlink reference not valid. is one where we are taking random samples from somepopulation of size M of individuals that fall into two (or more) categories. Sticking for the moment to just

two categories (e.g. Labour voters, not Labour voters, or Male, Female, etc) we define a random samplefrom M to be a 'success' if we pick an individual from some sub-population of size D. The probability of

success changes from one trial to the next as we take consecutive samples from the population. TheNegative Binomial distribution won't be appropriate therefore unless the size of the sample we might takeis small relative to the size of the population (a rough rule of thumb is that the possible sample size

should be less than about 1/10 of the population).

The distribution corresponding to the Negative Binomial distribution, but for the hypergeometric process,is called the Inverse Hypergeometric distribution.

The Inverse Hypergeometric distribution has the probability mass function:

So, once again there is an elegant solution embodied in one distribution.

Trials needed (n) = s + InvHypergeo(s,D,M )

Note that, like the Negative Binomial distribution, the InvHypergeo(s,D,M ) distribution models the numberof failures. When added to s, we get the total trials needed.

Other processes

For other processes, there may be elegant solutions to the number of trials needed to achieve a certainnumber of successes, but it is much more likely that simulation models will need to be built from scratchto determine the distribution. We give three examples here for you to get an idea of the type of techniques

that will help you produce such models.



Example models

701

Example 1

You are a government body doing research into the effects of marriage and smoking on peoples' health.You are doing a random telephone survey and you require 50 people from each of the four possiblecategories. From previous studies you know that 32% of people agree to participate in this type of surveywhen called. How many calls will you need to make, given that previous studies show the population tobe split into the four categories as follows:

Population distribution Smoker Non-smoker

Married 7% 26%

Not married 28% 39%

Model Healtheffect determines how many calls you'll have to make. It uses the Multinomial distribution.

Example 2

You need a replacement PC. The IT manager says there are 22 PCs stored in the basement, but 3 havebad hard disks only, two have bad motherboards only, and one has both a bad hard disk and

motherboard. Of course, nobody can remember which ones. For reasons he alone understands, you canonly take out one PC at a time, coming to him to ask for the key, and then returning it afterwards.Presuming you can dismantle PCs and rebuild them, how many trips will you have to make to thebasement to get a working PC?

Model Computers in the basement shows the solution to this example.

Example 3

This is an extension to this topic. Here we will not only count the failures, but also sum the random

variables.

A manufacture is trying to extrude a single length of copper wire of 5 kilometres, but the extrusion process

has a certain failure rate of 0.07 failures per kilometre. If the failure occurs before he produced his 5 km ofwire, then he has to start again. We wish to determine the distribution of the total amount of wire that willbe produced in kilometres in order to get 5 kilometres of perfect wire and the distribution of the number of

times the production will need to be restarted.

Model COPPER shows the solution to this example.





702

The number of successes in a certain number of trials

In risk analysis, we often attempt to predict the results of a set of random trials, where the trials can resultin either a 'success' (the outcome we are most interested in) or a 'failure'. For example:

• How many airplane flights will result in crashes;

• How many people eating hamburgers will get E.coli infections;

• How many sales pitches will result in a sale;

• How many people entering a shop will make a purchase;

• How many women receiving fertility treatment will fall pregnant;

• How many cars will need a replacement engine within the guarantee period; etc.

Independent trialsIf each of these n trials is independent (meaning the result of each trial is not influenced by the result ofany previous trial), and if all trials have the same probability of success, the outcome of these trialsconforms to a Error! Hyperlink reference not valid.. Moreover, we can model the number of successess using a Binomial distribution:

s = VoseBinomial(n,p)

A binomial process is a random counting system where there are n independent identical trials, each oneof which has the same probability of success p, which produces s successes from those n trials (where0≤s≤n and n > 0 obviously). There are thus three parameters {n, p, s} that between them completelydescribe a binomial process.

To model the number of successes in a certain number of trials we will use:


where n is the number of trials and p is the probability of the trial becoming a success.

The simplest example of a binomial process is the toss of a coin. If I toss a fair coin (a coin with 50%probability of returning either head or tail) 10 times, what is the distribution of the number of heads I willget? The answer to this question can be modelled using just one formula:

s = VoseBinomial(10,0.5) , which will produce the following outcome distribution:



Example models

703

As expected, the most likely value and the mean number of successes equal 5.

Problem: What is the probability that I will have no heads at all?

As we see from the graph above, this probability is very low, so is better not determined by simulation aswe would need very many iterations to get an accurate answer. However, the probability could becalculated using the VoseBinomialProb function:

P(s=0) = VoseBinomialProb(0,10,0.5,0) = 0.000976563

Problem: What is the distribution of the maximum number of heads I can get in a row by tossing a fair

coin 10 times?

The solution to this problem is provided in the example model - Coins.

Hypergeometric process

The Error! Hyperlink reference not valid. occurs when one is sampling randomly without replacementfrom some population, and where one is counting the number in that sample that have some particularcharacteristic. In this situation we have four parameters: M, the population size; D, the number ofsuccesses in the population; n, the sample size; and s, the number of successes in the sample. We donot have a probability of success parameter (p) here since it is changing because the proportion ofsuccesses in the remaining population is changing with every sample we take.

Let's imagine a pack of playing cards (52 cards, without jokers), and 13 of them are hearts. If we considera card of hearts to be a success, then we have: M = 52 and D = 13. If we are to pick a single card fromthe pack at random, the probability of picking a heart is equal to 13/52 = 1/4.

What is the probability of picking two hearts in a row? Well, unlike in the binomial process, we cannot justmultiply the probability of 1/4 by itself, since the probability of the second card being a heart depends onthe suit of the first card:

1. if the first card was a heart (a success), then there are only 12 hearts remaining in the pack andthe probability of picking another heart reduces to 12/51

2. if the first card was not a heart (a failure), then all 13 hearts are still in the pack and theprobability of picking a heart increases to 13/51

Excel's HYPGEOMDIST(s,n,D,M) function calculates the probability of picking 2 hearts from two trials:

P(2 hearts) = HYPGEOMDIST(2,2,13,52) = 0.058823529

In general, direct calculation of a probability is to be preferred over simulation because it is faster of moreaccurate, however, probability problems quickly become far too complex for us to calculate and we thenresort to simulation.




704

Let's now consider the following problem: I have three full packs of cards, and I draw 10 cards from eachpack. What is the probability that I draw at least 10 hearts in total? This problem is more complicated asthere are many combinations of samples from each pack that would give the required result, so we resortto simulation.

The solution to this problem is provided in the example model cards

Clustered trialsSometimes, the trials are not independent of each other but grouped together in fixed, or variable, sized

groups, or clusters. For example, the number of airline passengers that might die in a year from a planecrash is strongly grouped because they are very likely to suffer the same fate if they are in an aircrafttogether that crashes. Other examples include:

• How many people in a village get divorced (they are paired) in a year

• How many infected blood samples are mis-diagnosed by a laboratory (if, for example, alab tests samples in batches and makes a mistake with a batch);

• How many manufactured items in a consignment fail to meet the required tolerance (if, for

example, a machine is not set up correctly for a production run).In such situations, one can often model the group using a Binomial distribution. Then, if the number ofindividuals in the group is constant (say takes a value k), the number of successes is

=k*VoseBinomial(n,p). So, for example, if we had 290 married couples in the village, and believed amarried couple had a 3% probability of divorcing in a year, we would estimate seeing2*VoseBinomial(290,3%) divorcees next year.

Alternatively, if the number in a group is variable, we need to create a model that sums a variable numberof random variables. For example, imagine that we consider that there is a 5% chance of incorrectly

setting up a machine to produce widgets resulting in a bad batch of out of tolerance widgets, and that weset up 10 machines on a day's production run, but that each machine will produce Poisson(250) widgets,what fraction of my production will be out of tolerance?

Model widgets provides the answer.

Probability randomly varies for the set of trials as a whole

If the probability of a trial becoming a success is a random variable itself, the resultant distribution of thenumber of successes is wider than a Binomial distribution. For example, the probability that the sheep in

a flock will survive the winter depends on whether the winter is particularly harsh. All of the sheep willendure the same conditions, so if the probability is higher for one it is higher for all, so you could arguethat the probability is variable, but the same for each sheep. A convenient way of modelling a probabilitythat varies is to use the Beta distribution. In ModelRisk, we could then model the number of successesas:

s = VoseBinomial(n,VoseBeta(α , β ))

where α and β are two parameters used to create the required shape for the Beta distribution. In fact, this

distribution for s is known in probability theory as the Beta-Binomial distribution. The Beta-Binomialdistribution always has a greater spread than its most closely matching Binomial distribution.

But how about scenarios where the probability of a trial's success is a random variable but each trial'sprobability is independent of the others? You might think that we would need to model the probability foreach trial separately as a random variable, for example as shown in the following spreadsheet:

indepprob.

But, in fact, we just need to use the following formula:

s = VoseBinomial(n, α /(α + β ))



Example models

705

where α /(α + β ) is the mean of a Beta(α , β ) distribution. The above model runs the calculation both ways

for you to compare, but can you work out why this is true?




706

Probability of the event

Generally the easiest and most illustrative way of calculating the probability of an event is by modellingthe whole process and setting the 0-1 flag that shows whether the event has occurred or not. The meanof this 1-0 discrete distribution will give the required answer. The following model illustrates thistechnique:

Two people agreed to meet under a clock between 1pm and 2pm. Each agrees to wait 20 minutesfor the other. What is the probability that they meet?

The spreadsheet with the solution to the problem is here: Waiting under the clock

This model has an imbedded graph that provides a visual illustration of the time that person A and personB arrive and leave at railway station. Cells C16 and C17 calculate the arrival time for the two personsusing the Uniform distribution. VoseUniform(min,max) will pick any value with equal probability within the[min,max] range.

Cells D16 and D17 calculate the departure time of the two person by simply adding the value of 20 (cellD8) to the corresponding cell in column C.

The output is located in cell E19, which is just a flag, returning a value of 1 if the times of the personsoverlap, and a value of 0 if they don't.

It is worth noting that a visual illustration of the model like the imbedded graph helps you check that themodel is really working, because whenever the two lines on the graph overlap, the output cell returns a 1and when they don't, the output cell returns zero.



Example models

707

The state of individuals sampled from a large or infinite population

Scenario 1: There are two possible states for each individual

If samples are taken from a very large or infinite population, then the probability that a sample will be in aparticular state is simply the prevalence of that characteristic in the population. In that case, the samplingis a Error! Hyperlink reference not valid., and the number of individuals s in a sample of size n that will

have some particular characteristic, where the prevalence of that characteristic is p, is given by:


Examples:

• 20% of the bulls of country X have disease Y. If 35 bulls are taken from that population,

how many will be infected with Y? Answer: = VoseBinomial(35,20%)

• A manufacturer produces AC adapters for laptops. If there is a 2% chance that an

adapter is faulty, how many faulty adapters will there be in a consignment of 100? Answer: =VoseBinomial(100,2%)

• Advertising brochures posted to households produce a 0.3% response rate. If 100,000brochures are sent out, how many responses will there be?

Answer: =VoseBinomial(100000,0.3%) = VosePoisson(300)

Scenario 2: There are several possible states for each individual

Sometimes we are interested in knowing which of several mutually exclusive and exhaustive states

individuals are taking in a random sample from a population. In this case, the sampling is a multinomialprocess, and the number of individuals in the sample that take each possible state is given by amultinomial distribution Multinomial(n, {p}). An explanation of the reasoning behind the model'sconstruction is given in the section on multivariate trials.




708

The state of individuals sampled from a small population

Consider some process where individuals are being randomly sampled from a population, not placedback into the population before the next sample, and any individual from that population has equalprobability of being selected. For the moment, we'll just assume that these individuals could be one of twotypes (e.g. male or female, infected or not infected, defective or not, Conservative or Liberal, pregnant or

not, etc).

Binomial approximation

If the population is very large relative to the sample size, the probability that each individual beingsampled is of one particular category is essentially fixed. For example, if we took a sample of size 10 froma set of 1000 bolts, of which 125 were defective, the probability that the first bolt sampled is defective is125/1000 = 0.125. The probability that the second bolt is defective is 124/999 = 0.124124... if the first boltwas defective, and 125/999 = 0.125125... otherwise. The probability that the tenth bolt is defective will be

between 116/991 = 0.117053...(but that scenario has less than 1 in 100 million of occurring) and 125/991= 0.126135...with the most likely scenario being 117/991 = 0.125126... In other words, the probability isnot deviating very significantly from its initial value of 0.125 for such small samples. Thus, it is areasonable approximation to assume that the probability is constant, which makes the sampling processfollow a Error! Hyperlink reference not valid., and the number of defective bolts in the sample can beestimated using a binomial distribution as:

Defective bolts in sample = VoseBinomial(10, 125/1000) = VoseBinomial(10, 0.125)

A general rule of thumb (be careful, though, it depends on the level of accuracy you need) is that if thesample is less than 10% of the population, you can use the binomial approximation.

Hypergeometric model

The much more interesting situation we want to get to here is where the sample is of the same order ofmagnitude as the population. In this situation, it is not accurate to use the binomial approximation. In fact,this is a Error! Hyperlink reference not valid. and the distribution of defective items is a hypergeometric

distribution. So, for example, if we were sampling 25 bolts from a set of 100, where 33 are defective, thedistribution would be:

Defective bolts in sample = VoseHypergeo(25,33,100)



Example models

709

The binomial approximation would have been =VoseBinomial(25, 0.33). The figure below shows that theBinomial distribution is not sufficiently close to the Hypergeometric, but was very close for the largepopulation example above.

A couple more examples of the hypergeometric process:

• 10 out of 45 people in the list are males. If I randomly pick 15 names from that list, how

many males would I get?

Answer: = VoseHypergeo(15,10,45)

• A manufacturer produces tyres for cars. He accidentally mixed 3 defective tyres among

the lot of 100. How many defective tyres would be shipped to the customer from this lot if the totalnumber of tyres shipped is 30?

Answer: =VoseHypergeo(30,3,100)

Modelling each sample, or sub-groups of samples, separately

The hypergeometric distribution provides a probability distribution of the total number in the sample that

have the characteristic of interest, but does not give us the history of how each individual sample, orgroups of samples, turned out. There may be situations where we need to know that.

If we are looking at consecutive samples, we can just nest Hypergeometric distributions. Problems 1 and2 provide some examples.

If we are interested in the outcome of each consecutive trial, each trial is just a Binomial distribution with n= 1, and p = (Number remaining 'defective')/(Number remaining in population).

Problem 1

Imagine that we produce specialist power units. We deliver these units to the client in batches of ten. Theclient has a quality control procedure for each consignment, as follows:

Three units are tested. If two or more of these samples are defective, the consignment is rejected. If oneis defective, another three are tested, and if any of these second set are defective the consignment isalso rejected. We want to construct a model that looks at the risk of rejection of a consignment for

different numbers of defective power units. The model Power Units offers a solution.

More than two different outcomes

So far we have dealt with scenarios where each individual can only take one of two states. However, inmany problems, an individual may take several states, for example: Labour, Liberal, Conservative, orGreen; not infected, sub-clinically infected, or clinically infected; Caucasian, Asian, African, or Aboriginal;Dell, Compaq, IBM, or Toshiba.




710

Sampling from a small population now becomes a multivariate hypergeometric process, for which the linkprovides generating models.



Example models

711

Time until an event occurs, or the lifetime of a device

The probability mathematics of the lifetime of machines and devices is the domain of reliability theory.

Reliability theory, at least the elements we consider here, concerns itself with the probability distribution ofthe time a component or machine will operate before failing. In the simplest case, a device is composedof one simple component, and fails when that component fails. This section looks at the distributions to

use to model the lifetime of a single component. We also offer another section that demonstrates how touse the distributions to build the model of the lifetime of a device made of many components. The samedistributions are also very useful for modelling the time until some specific event occurs.

First, a little mathematics...

The instantaneous failure rate z(x) of a component is defined as:

where f(x) and F(x) are the probability density function and cumulative distribution function for x in theusual way. In other words, z(x) is the rate of failure f(x) of the component at time x given that it hassurvived up to time x with probability 1-F(x ).

It can be shown that this expression for z(x) results in an equation for f(x) (the probability density function

for the lifetime of the component):

(1)

Two interesting results can be obtained from this equation:

The Exponential distribution

If the instantaneous failure rate z(x) is constant i.e. z( x ) = 1/ β , then putting z(x) into Equation (1) gives:

which is the probability density function of the Exponential distribution, i.e. the Exponential distributiondescribes the survival time of a component given that it has a constant failure rate. β is often called the

mean time between failures (MTBF) in reliability theory parlance.

A constant instantaneous failure rate means that the component 'has no memory', i.e. that it will have nogreater or lesser probability of failing at any particular moment no matter how long it has already beenrunning for. In other words, the Exponential distribution would not be appropriate to model a componentwith either a burn-in period in which it has a high probability of failure, or a component that has a natural

limited life, so its probability of failure at any moment increases with time.

β , the mean time between failures, is a scaling factor meaning that changing its value will change the

spread of the Exponential distribution but not its shape. That should make sense because, for example,

we might choose to measure time in terms of days, weeks, or years but whatever units we use should notchange the distribution's shape, although it will obviously change the scale. One way to confirm that is tolook at the cumulative distribution function of the Exponential distribution:




712

From this equation, you can see that multiplying the size of β by 365 (say) would have the same effect as

reducing the size of x by a factor of 365, (e.g. changing the units of x from years to days), but thefunctional relationship remains the same.

The Weibull distribution

If z(t) is not assumed to be constant, but rather increases or decreases smoothly with time, we canconsider using the equation:

(2)

The equation looks unnecessarily complicated: it is in fact just z(x) = a.t b where a (>0) and b (>-1) are

constants, but the form used above helps in producing a neater equation in the next step. If α = 1, the

equation for z(x) reduces to: z(x) =1/ β which is the formula that produces the exponential distribution. If α

is less than 1, z(x) decreases with time which typifies the running in period for a component. If α isgreater than 1, z(x) increases with time, which typifies the period of the end of a component's useful life.

Putting the above Equation (2) for z(x) into the f(x) Equation (1) results in the following expression:

which is the equation for the Weibull(α,β ) distribution. Thus the Weibull distribution is typically used to

model the lifetime of a component where its instantaneous failure rate is a function of time. Note that itcan only model a time until failure where z(x) is either an increasing function of time or a decreasing

function of time, but not both, as shown in the figure below.

The cumulative distribution function for the Weibull is:



Example models

713

It is quite similar to the Exponential distribution for F(x), and we can see that β is again just a scaling

factor. However, the α exponent has a very different influence than β . To demonstrate this, let's set β to

1 for convenience (since it is just a scaling factor). If α = 1, we have:

(3)

If α =2 we have:

(4)

If we put values of x = 1, 2, 3 into Equation (4), it would be equivalent to putting values of x = 1, 4, 9 into

Equation (3). In other words, an α parameter greater than 1 exaggerates the life of a component: its as if

the component has been working for a lot longer than it really has (compared to an Exponentially

distributed time to failure). Similarly, an α value between 0 and 1 is 'shrinking' time.

It looks from the plots of Weibull distributions above that by making α small we reduce the lifetime of the

component. However, as α reduces from 1 towards 0, the right tail gets extremely long and the mean

time to failure actually gets much larger.

The Lognormal distribution

The Lognormal distribution is also frequently used to model lifetimes of components. It doesn't share thesame instantaneous failure rate logical derivation as the Exponential and Weibull. From Error! Hyperlink

reference not valid. we know that the product of a large number of random variables can be lognormallydistributed. Thus, one can think of the Lognormal distribution as representing that the life of a componentis a function of a large number of random factors each of which multiply together to determine thecomponent's actual lifetime.




714

Times of arrivals and wait times in a queuing system(example of using Visual Basic macros with ModelRisk)

Let's consider the following problem: a post office has one counter that it recognises is insufficient for itscustomer volume. It is considering putting in another counter and wishes to model the effect on themaximum number in a queue at any one time, which is considered to be a measure of the quality of itsservice. The post office is open from 9am to 5pm each working day. Past data show that when the doorsopen at 9am there will be the following number of people waiting to come in:

People Probability

0 0.6

1 0.2

2 0.1

3 0.05

4 0.035

5 0.015

People arrive throughout the day at an average rate of 1 every 12 minutes. The amount of time it takes toserve each person is Lognormal(29,23) minutes. What is the maximum queue size in a day?

This problem requires that one simulates a day, monitors the maximum queue size during the day, andthen repeats the simulation. One thus builds up a distribution of the maximum number in a queue.

This is an advanced technique and, although this problem is very simple, one can see how it can begreatly extended. For example, one could change the rate of arrival of the customers to be a function of

the time of day; one could add more counters, and one could monitor other statistical parameters asidefrom the maximum queue size, like the maximum amount of time any one person waits or the amount offree time the people working behind the counter have.



Example models

715

Uncertainty about a population size(Bayesian inference worked example)

A game warden on a tropical island would like to know how many tigers she has on her island. It is a big

island with dense jungle and she has a limited budget, so she can't search every inch of the islandmethodically. Besides, she wants to disturb the tigers and the other fauna as little as possible. Shearranges for a capture-release-recapture survey to be carried out as follows:

Hidden traps are laid at random points on the island. The traps are furnished with transmitters thatsignal a catch and each captured tiger is retrieved immediately. When 20 tigers have been caught,the traps are removed. Each of these 20 tigers are carefully sedated and marked with an ear tag,then all are released together back to the positions from which they were originally caught. Someshort time later, hidden traps are laid again, but at different points on the island until 30 tigers havebeen caught and the number of tagged tigers is recorded. Captured tigers are held in captivity until

the 30th tiger has been caught.

The game warden tries the experiment and 7 of the 30 tigers captured in the second set of traps aretagged. How many tigers are there on the island?

The warden has gone to some lengths to specify the experiment precisely. This is so that we will be ableto assume with some reasonable accuracy that the experiment is taking a hypergeometric sample fromthe tiger population. A hypergeometric sample assumes that an individual with the characteristic ofinterest (in this case, being tagged) has the same probability of being sampled as any individual that doesnot have that characteristic (i.e. the untagged tigers). The reader may enjoy thinking through whatassumptions are being made in this analysis and where the experimental design has attempted to

minimise any deviation from a true hypergeometric sampling.

We will use the usual notation for a hypergeometric process:

n - the sample size, = 30,

D - the number of individuals in the population of interest (tagged tigers) = 20,

M - the population (the number of tigers in the jungle). In the Bayesian inference terminology, this

is given the symbol θ as it is the parameter we are attempting to estimate, and

s - the number of individuals in the sample that have the characteristic of interest = 7.

We could get a best guess for M by noting that the most likely scenario would be for us to see tagged

tigers in the sample in the same proportion as they occur in the population. In other words:

but this does not take account of the uncertainty that occurs due to the random sampling involved in theexperiment. We will perform a Bayesian inference calculation to determine the uncertainty distribution for

M. Let us imagine that before the experiment was started the warden and her staff believed that thenumber of tigers was equally likely to be any one value as any other. In other words, they knew absolutelynothing about the number of tigers in the jungle and their prior distribution is thus a discrete uniformdistribution over all non-negative integers.

The likelihood function is given by the probability mass function of the hypergeometric distribution, i.e.:




716

The likelihood function is zero for values of θ below 43 since the experiment tells us that there must be at

least 43 tigers: 20 that were tagged plus the (30-7) that were caught in the recapture part of theexperiment and were not tagged.

The probability mass function applies to a discrete distribution and equals the probability that exactly s

events will occur. Excel provides a convenient function HYPGEOMDIST(s, n, D, M) which will calculatethe hypergeometric distribution mass function automatically. We know that the total confidence must addup to one which is done in column F to produce the normalized posterior distribution. The shape of thisposterior distribution is shown below by plotting column B against column F from the spreadsheet.

The graph peaks at a value of 85 as we would expect but it appears cut off at the right tail which shows

that we should also look at values of θ larger than 150. The analysis is repeated for values of θ up to 300

and this more complete posterior distribution plotted below:



Example models

717

This second plot represents a good model of the state of the warden's knowledge about the number oftigers on that island. Don't forget that this is a distribution of belief and is not a true probability distributionsince there is an exact number of tigers on that island.

In this example, we had to adjust our range of tested values of θ in light of the posterior distribution. It is

quite common to review the set of tested values of θ , either expanding the prior's range or modelling

some part of the prior's range in more detail when the posterior distribution is concentrated around a

small range. It is entirely appropriate to expand the range of the prior as long as we would have beenhappy to have extended our prior to the new range before seeing the data. However, it would not beappropriate if we had a much more informed prior belief that gave an absolute range for the uncertainparameter that we are now considering stepping outside of. This would not be right because we would be

revising our prior belief in light of the data: putting the cart before the horse, if you like. However, if thelikelihood function is concentrated very much at one end of the range of the prior, it may well be worthreviewing whether the prior distribution or the likelihood function are appropriate, since the analysis couldbe suggesting that the true value of the parameter lies outside the preconceived range of the prior.

Continuing with our tigers on an island, let us imagine that the warden is unsatisfied with the level ofuncertainty that remains about the number of tigers which, from 50 to 250, is rather large. She decides towait a short while and then capture another 30 tigers. The experiment is completed and this time t tagged

tigers are captured. Assuming that a tagged tiger still has the same probability of being captures as anuntagged tiger, what is her uncertainty distribution now for the number of tigers on the island?

This is simply a replication of the first problem, except that we no longer use a discrete uniform

distribution as her prior. Instead, the distribution plotted above represents the state of her knowledge priorto doing this second experiment and the likelihood function is now given by the Excel function

HYPGEOMDIST(t, 30, 20, θ ). The six panels below show what the warden's posterior distribution would

have been if the second experiment had trapped t = 1, 3, 5, 7, 10 and 15 tagged tigers instead. Theseposteriors (in black) are plotted together with the prior (in blue) and the likelihood functions (in red), allnormalized to sum to 1 for ease of comparison.

You might initially imagine that performing another experiment would make you more confident about the

actual number of tigers on the island, but the graphs show that this is not necessarily so. The posteriordistributions for the two panels below are now more spread than the prior because the data contradicts

the prior (the prior and likelihood peak at very different values of θ ).

In the case of 5 tigers, the data disagree moderately with the prior but the extra information in the datacompensates for this, leaving us with about the same level of uncertainty but with a posterior distributionthat is to the right of the prior.

The right panel below (example with 7 tigers) represents the scenario where the second experiment hasthe same results as the first. You'll see that the prior and likelihood overlay on each other because theprior of the first experiment was uniform and therefore the posterior shape was only influenced by the

likelihood function. Since both experiments produced the same result, our confidence is improved andremains centred around the best guess of 85:




718

In the last panels below, the likelihood functions disagree with the priors, yet the posterior distributionshave a narrower uncertainty. This is because the likelihood function is placing emphasis on the left tail of

the possible range of values for θ , which is bounded at θ = 43:

In summary, these six panels show that the amount of information contained in data is dependent on twothings: (1) the manner in which the data were collected (i.e. the level of randomness inherent in thecollection), which is described by the likelihood function, and (2) the state of our knowledge prior toobserving the data and the degree to which it compares with the likelihood function. If the data tell uswhat we are already fairly sure of, there is little information contained in the data for us (though the datawould contain much more information for those more ignorant of the parameter). On the other hand, if thedata contradict what we already know, our uncertainty may either reduce or increase depending on thecircumstances. Thus, you could consider that the amount of information in a data set can be measured bythe degree to which our opinion changes. Alternatively, taking a more decision-focused view, there is onlyinformation in data if it changes what we would chose to do in managing the risk issue.



Example models

719

Uncertainty about a population statistic

Statistics is obviously a very large field, and much of it beyond the scope of this particular trainingprogram. However, there are several commonly known results which are readily adapted to risk analysismodelling.

Classical statistics tends to confine itself to producing confidence intervals when estimating the value of

some parameter. For example, assuming it has performed the correct statistic analysis, a classicalstatistician might produce a comment something like this:

'The population mean is estimated to be 5.2 with a 95% CI of [4.8, 5.6]'

This means that the true population distribution is unknown (but presumed a fixed value) and that we are95% confident that the true value lies somewhere between 4.8 and 5.6. It is not the same as saying there

is a 95% probability, or a 95% chance that the mean will be between these limits: it IS where it is, with

100% probability. Neither can we say that 95% of the time it will be between these limits: again, it ISwhere it is.

Classical statisticians are often reluctant to move beyond quoting confidence intervals, and describeentire uncertainty distributions. In most cases, statistics will give us any confidence intervals we require,which logically means that we have all the points necessary to define a distribution. The reluctance maybe that, in providing a distribution, one may be seen to be giving the impression that the parameter is arandom variable (rather than just an unknown fixed value). However, in risk analysis we are only able tocompound all the uncertainties if we define the entire distribution. This section provides a reformulation ofthe most common statistics results into distributions we can use in Monte Carlo simulation.

There are a number of traditional statistical techniques available for quantifying parameters under certainassumptions. These techniques are often considered to be exact techniques, but this is only true if theassumptions made in the statistical model are correct. Traditional statistical models have usuallyassumed either a binomial or normal (Gaussian) model. The Normal distribution certainly very closely

approximates a large number of distributions under certain conditions, usually when the mean is muchlarger than the standard deviation (the Normal approximation to a number of distributions is discussedhere) and so these classical statistics techniques have found very wide application. However, one needsto be cautious in using them when the assumption of normality is not very accurate and it is often difficultto appreciate the degree of inaccuracy one is adding by such approximation.

Estimating the mean of a Normal distribution

Standard deviation unknown

For a given set of n data values randomly sampled from an assumed Normal distribution, with unknownmean µ and unknown standard deviation σ , the distribution of uncertainty of the true mean is calculated

from a Student-t distribution:

(1)

where:

t(n-1) is a Student-t distribution with (n-1) degrees of freedom, x is the mean of that sample values and

is the unbiased single point estimate of the true standard deviation, calculated in EXCEL with itsSTDEV() function.

Note that this result is often known as the t-test.




720

So, if we had a set of data in a column, and we named that array 'Data', we could create a distribution ofour uncertainty about the population mean with Excel/ModelRisk as follows:

=VoseStudent(COUNT(Data)-1)*STDEV(Data)/SQRT(COUNT(Data))+AVERAGE(Data)

The Student-t distribution is unimodal and symmetric about zero. The formula therefore centres the

uncertainty distribution of the value of the true mean µ around the sample mean x which is the 'best

guess'. It also has a spread that increases with the sample standard deviation σ and decreases with thesquare root of the sample size n. The reduction of uncertainty as a square root of the number of data

points is a very common theme in statistics.

Standard deviation known

Occasionally, it is possible that the mean is unknown but the standard deviation is known, for examplewhen using some specific piece of equipment to take measurements (see measurement theory below).For a given set of n data values randomly sampled from an assumed Normal distribution, with unknown

mean µ and known standard deviation σ , the distribution of uncertainty of the true mean is calculated

from a Normal distribution:

(2)

which can be rewritten as:

(3)

where:

N(0,1) is a unit Normal distribution (a Normal distribution with mean = 0, standard deviation = 1), x is the

mean of that sample values and σ is the true population standard deviation.This result is often known as the z-test.

Equation 2 looks very like Equation 1. In fact the t-distribution approaches the N(0,1) as n gets bigger, asshown in the figure below. The t-distribution has more spread than the Normal because it takes intoaccount the additional uncertainty that comes from not knowing the population standard deviation.However, as n approaches 20-30, the difference is negligible, which brings about the rule of thumb thatsays one can use a z-test when you have 30 or so data points (and, of course, you believe the underlingpopulation distribution is Normal).



Example models

721

So, if we had a set of data in a column, and we named that array 'Data', we could create a distribution ofour uncertainty about the population mean with Excel/ModelRisk as follows:

=VoseNormal(AVERAGE(Data), σ )

Measurement theory

In measurement theory, one frequently is trying to obtain an estimate of a non varying measurablequantity, for example the length of an object. Repeated measurements are taken of that length and either

Equation 1 or Equation 3 is used, depending on whether one knows the standard deviation of error for themeasurement technique, to describe the uncertainty about the true length. In this context, the quantity

is referred to as the mean standard error.

Estimating the standard deviation of a Normal distribution

Mean and standard deviation both unknown

For a given set of data randomly sampled from a Normal distribution, whose mean µ is unknown and

unknown standard deviation σ , the distribution of uncertainty of the true standard deviation is calculated

from the formula:

(4)

where is a chi-squared distribution with n-1 degrees of freedom and has a mean of (n-1) so

Equation 4 centres around . is again the unbiased single point estimate of the true standarddeviation, calculated in EXCEL with its STDEV() function.

Written in EXCEL/ModelRisk, the following formula generates values from the resultant uncertaintydistribution:

=STDEV(Data)*SQRT((COUNT(Data)-1)/VoseChisq(COUNT(Data)-1))

Mean known and standard deviation unknown

In the rarer case where the mean of the Normal distribution is known, the uncertainty about the standarddeviation is given by the following spreadsheet model:




722

(5)

A1: {=SUM((Data-mu)^2)}

an array formula entered into a Cell by typing =SUM((Data-mu)^2) and then CTRL-SHIFT-ENTER

A2: = SQRT(A1/VoseChisq(COUNT(Data)))

Equations 4 and 5 are both versions of what is often called the Chi-squared test.



Example models

723

Uncertainty about a probability, fraction or prevalence

In risk analysis, we are frequently faced with having to estimate a probability, a fraction or a prevalence.We usually have some data that would help us produce this estimate, that come from surveys,experiments, or even computer simulations. If we can be sure that the data are collected according to aError! Hyperlink reference not valid., we can use the Beta distribution to describe our uncertainty about

the prevalence, fraction or probability by applying the formula:

p = Beta(s+1, n-s+1)

where n is the number of trials or samples, and s is the number of 'successes'.

The Beta distribution has a domain of [0,1] so is an immediate contender to model uncertainty orrandomness about a probability, fraction or prevalence. However, there are more technical reasons for

using the Beta distribution here; namely that it is the conjugate to the Binomial distribution and the above

formula is the result of a Bayesian inference calculation with an uninformed prior. Translation for thelayperson: the Beta distribution is the direct result of a statistical analysis where we assume that the datacome from a binomial process, and where we knew nothing about the parameter p being estimated, priorto collecting these data.




724

Uncertainty about the rate at which things occur in time or space

Lightning strikes, car accidents, machine failures, political crises, disease outbreaks - are all randomevents in time that can be thought of as being independent of each other. Daisies on a lawn, bacteria in a

liquid, mould in a silo, diamonds in a rock - can all be thought of as random events in either two (surface)or three (volume) dimensional space.

The most common modelling approach in modelling a distribution of how many of these events α might

occur in a given amount of time or space t is to assume that the counts are from a Error! Hyperlinkreference not valid., in which case the counts will take a Poisson distribution:

Counts α = Poisson(λ *t)

where λ is the mean (expected) number of events that would occur per unit t. Care needs to be taken

with the units of λ and t to ensure that they match. The product λ *t is the expected number of events

over the period t and is sometimes called the Poisson intensity .

By applying Bayesian inference with Poisson probabilities we arrive at a neat solution to the uncertainty

we have about λ , when we have observed α events in a time t:

λ = Gamma(α , 1/t)

Example

You have observed 12 sporadic (i.e. each occurring independently of the others) cases of disease X inyour country in the last 4 years. How many will there be next year, if the underlying risk remainsconstant? What is the probability that there will be greater than 6 cases next year?

Assuming that the Poisson process applies, we first need to estimate λ :

λ = VoseGamma(12, 1/t) = VoseGamma(12, 0.25)

The graph above shows that with the amount of information we have about λ , we believe it is very l ikely

to lie between 1 and 6 expected cases/year.

How many cases will there be next year? The answer is =VosePoisson(λ *1). If we wish to model a firstorder distribution, we write: =VosePoisson(VoseGamma(12, 0.25)) = VoseNegBin(12, 0.8). If we wish tomodel a second order distribution (one that separates uncertainty and randomness), the answer comes



Example models

725

from taken random samples from the Gamma distribution, and for each of these samples calculate thecomplete Poisson distribution. The answer is therefore a set of possible probability distributions. Thefigure below shows the two options. Either is acceptable, depending on management needs, but what is

not acceptable is to write =VosePoisson(12/4): in other words to ignore the uncertainty we have about λ .This third, incorrect option, is also shown below.

What is the probability that there will be greater than 6 cases next year? The Excel function POISSON willcalculate the probability of there being less than or equal to six case next year, so we can use that todetermine the probability we are actually interested in:

=1-POISSON(6,VoseGamma(12,0.25),1)

where the uncertainty about the intensity is provided by the imbedded Gamma distribution imbedded.Running a simulation for this cell gives the following output:

This plot shows that with the level of historical information, we believe with 80% confidence that there isbetween a 0.4% and 12.7% probability of having more than six cases next year.



727

About

About this Help File

To reference this help file

Please quote: Help File for ModelRisk Version 5 , ©

Vose Software (2007).

Referencing system and glossary

Each Help File topic page has a reference number.This is shown at the bottom of the page, like this:

© Vose Software™ 2007. Reference

Number: M0239

The reference number Mxxxx makes it easy to referto a particular topic and locate it. For example, thispage has reference number M0022. By selecting theSearch facility in this help file, typing this code intothe keyword field and clicking List Topics the pagewill be the only one listed:

Authors

Principal authors

David Vose, Timour Koupeev, Michael Van Hauwermeiren, Wouter Smet, Stijn van den Bossche.

Terms and conditions




728

Vose Software cannot accept any responsibility for any possible errors or omissions that may be presentin this Help File. Should you encounter an error, have a suggestion or would just like to give us youropinion about a topic, please contact [email protected].

Copyright

This Help File is the property of Vose Software BVBA, Belgium.



About

729

About Vose - contacting us

Vose Software BVBA is a software development company

that was created by internationally recognized specialists

in risk analysis. ModelRisk is thus "designed by risk

analysts for risk analysts".

We provide off -the-shelf and custom risk analysis

applications, as well as training on how to use their

products. Vose Software trainers are professional risk

analysts – they understand the reality of analyzing risk

and the strengths and limitations of our software

products.

We also provide training and consulting in a wide range

of fields and types of problems independently of our

software products, and will use any software tool that is

most appropriate to the problem and the client’s

requirements.

Our offices - location and contact details

Please see www.vosesoftware.com/contactus.php




730

Updates

ModelRisk and the content of this help file are updated regularly. As a registered ModelRisk user with amaintenance plan, you will receive automatic notification of updates. A log of update changes is availableat www.vosesoftware.com/mrupdatelog.php



About

731

FAQ - Troubleshooting

What follows is a list of known issues and problems with ModelRisk and their solution.

For up to date help and answers to your questions, please refer to the ModelRisk support section on our

website.

Should that not work, please fill in the feedback form on our web site and we will follow up to you as soonas possible.

ModelRisk is not loaded when I start Excel

This can be caused by Excel disabling the ModelRisk add-in, because it was not shut down properly afterthe previous session. Try following these steps:

Excel 2007

1. Click the Office button and then Excel Options.

2. Go to the Add-Ins section.

3. From the Manage drop-down menu, select Disabled items and then click the Go button.

4. From the list, select Addin:ModelRisk and then click the Enable button.

Now when you restart Excel/ModelRisk, the problem should be solved.

Excel 2003 and earlier

1. Click the help menu and then choose About Microsoft Excel .

2. Click the Disabled Items button.

3. Select ModelRisk from the list and press the Enable button.

Now when you restart Excel/ModelRisk, the problem should be solved.

Example models open within this help file window and not in Excel

(instructions for Windows XP)

1. Open Windows Explorer by right-clicking the Windows start button and selecting Explore from the

menu.

2. Select Folder Options... from the Tools menu.

3. The Folder Options dialog opens. Select the File types tab.

4. From the list of Registered file types, locate and select XLS - Microsoft Excel Worksheet:




732

5. Click the Advanced button.

6. In the Edit file type dialog that appears, make sure Browse in same window is NOT marked:

7. Click OK to close this window. Now when you click a link to an example model in this help file, it should

open in Excel.

Modeling



About

733

Why do I get a #VALUE! in a cell with a ModelRisk function?

ModelRisk functions return error messages when the parameters are incorrect. For example,VoseNormal(mu,sigma) generates random samples from a Normal distribution with mean mu and

standard deviation sigma. However, if one inputs a negative value for sigma, the function returns the error

message:

“Error: sigma must be >= 0”

If the formula in a cell includes this function in a calculation, Excel is unable to evaluate the formula anddisplays #VALUE!. For example, you will get #VALUE! in a cell containing the formula:

=10 + VoseNormal(100, -10)

You can easily see whether this is the reason for the error display using Excel’s Evaluate Formulafeature, in the Formula ribbon or toolbar:

Clicking the Evaluate button displays the problem:

Alternatively, using ModelRisk’s View Function tool you will see the error message displayed.




734



735

Glossary

A

A-D test: see Anderson-Darling test

Acceptable risk: Risk level judged to be compatible with the required amount of protection

Accuracy: Accuracy is the degree to which a statistical estimate based on a large number of

observations will match the "true" value. If the measurement system has a bias, it may be precise(arrive at a stable, repeatable, estimate) but this estimate will not reflect the true value.

Anderson-Darling test: The A-D test is similar to the Kologorov-Smirnoff test in determining whether adata set could have come from a particular distribution. The K-S statistic is the greatest verticaldifference between the emprical and fitted distribution's cumulative probability curves over allvalues of the variable. This tends to focus fitting at the middle of the distribution. The A-D statisticmeasures the area between these two curves weighted across the variable's range for howprobable such a difference could have occured by chance if the fitted distribution were to becorrect. The A-D test is superior to the K-S test, but they usually give very similar answers, and

the A-D test needs modification for each distribution type being fitted..

Array formula: An Excel formula that has multiple cells (i.e. an array) as output. It is entered by selecting

the desired output range, typing the formula and then pressing CTRL+SHIFT+ENTER. Excelautomatically inserts curly brackets to indicate it is an array formula, like this:=ArrayFormula(parameter)

Autocorrelation: A relationship in time series data in which elements in a sample set are correlated,

positively or negatively, to previous elements in the sample set. If a time series hasautocorrelation, then the past behaviour may be able to predict the future.

B

Beta Distribution: Beta distribution is a flexible, bounded PDF described by two shape parameters. It iscommonly used when a range of the random variable is known.

Bias: Bias is a term which refers to how far the average statistic lies from the parameter it is estimating,that is, the error which arises when estimating a quantity. Errors from chance will cancel each

other out in the long run, those from bias will not.

Boxplot: Boxplot is a graphical representation showing the center and spread of a distribution, along with

a display of outliers.

C

Central Limit Theorem: Central Limit Theorem says that for a relatively large sample size, the random

variable x (the mean of the samples) is normally distributed, regardless of the population’sdistribution.

Chi-Squared Goodness of Fit Test: The Chi-Squared Goodness of Fit Test is a test for comparing atheoretical distribution, such as a Normal, Poisson etc, with the observed data from a sample.

Coefficient of Variation: Coefficient of Variation is an estimate of relative standard deviation. Equals thestandard deviation divided by the mean. Results can be represented in percentages for

comparison purposes.

Conditional Probability: The probability of an event occurring conditioned on some other event already

having occurred.

Confidence Interval: A confidence interval gives an estimated range of values which is likely to include

an unknown population parameter, the estimated range being calculated from a given set ofsample data.

Confidence Limits: Confidence limits are the lower and upper values of a confidence interval




736

Continuous Probability Distribution: Continuous Probability Distribution is a probability distribution that

describes a set of uninterrupted values over a range. In contrast to the Discrete distribution, theContinuous distribution assumes there are an infinite number of possible values.

Continuous Random Variable: A continuous random variable is one which can take any value within its

range.

Correlation: Correlation is an investigation of the measure of statistical association among random

variables based on samples. Widely used measures include the linear correlation coefficient (alsocalled the product-moment correlation coefficient or Pearson correlation coefficient), and suchnon-parametric measures as Spearman rank-order correlation coefficient, and Kendall's tau.When the data are nonlinear, non-parametric correlation is generally considered to be morerobust than linear correlation.

Correlation Coefficient: A correlation coefficient is a number between -1 and 1 which measures thedegree to which two variables are linearly related. If there is perfect linear relationship withpositive slope between the two variables, we have a correlation coefficient of 1; if there is positivecorrelation, whenever one variable has a high (low) value, so does the other. If there is a perfectlinear relationship with negative slope between the two variables, we have a correlationcoefficient of -1; if there is negative correlation, whenever one variable has a high (low) value, theother has a low (high) value. A correlation coefficient of 0 means that there is no linear

relationship between the variables.

Covariance: A measure of the degree to which the values of two variables move in tandem. A positivecovariance indicates that the two variables move together, while a negative covariance meansthat they vary inversely.

Cumulative Distribution Function: All random variables (discrete and continuous) have a cumulative

distribution function. It is a function giving the probability that the random variable X is less than

or equal to x, for every value x.

Cumulative Frequency Distribution: Cumulative Frequency Distribution is a chart that shows the

number or proportion (or percentage) of values less than or equal to a given amount.

DDegrees of freedom: The number of elements in the calculation of a statistic that are free to vary,

effectively equal to the number of observations in a data set minus the number of statisticalparameters being estimated from that data.

Derivative: A financial instrument whose value is determined or derived from the values of an underlying,

or primitive, instrument. Derivatives can be traded on organized exchanges or privatelynegotiated over-the-counter. Swaps, forwards, futures, and options are all examples ofderivatives.

Deterministic Model: Deterministic Model, as opposed to a stochastic model, is one which contains norandom elements.

Discrete Probability Distribution: Discrete Probability Distribution is a probability distribution thatdescribes distinct values, usually integers, with no intermediate values. In contrast, thecontinuous distribution assumes there are an infinite number of possible values.

Discrete Random Variable: A discrete random variable is one which may take only a number of distinct

values such as 0, 1, 2, 3, 4, ... or 0, 1/3, 2/3, ....

Dispersion: The variation between observations. There are several measures of dispersion, the most

common being the standard deviation. In manufacturing or measurement, high precision isassociated with low dispersion.

Distribution: Distribution is the pattern of variation of a random variable.

Domain: The domain of a distribution is the set of possible values a variable with that distribution may

take.



Glossary

737

E

Enterprise Risk Management: The process by which an organization to takes a holistic approach to

managing its risks, measuring all types of risks and exposures to these risks throughout the entireorganization.

ERM: See Enterprise Risk Management

Estimate: An estimate is an indication of the value of an unknown quantity based on observed data.

Estimation: Estimation is the process by which sample data are used to indicate the value of an

unknown parameter. Results of estimation can be expressed as a single value, known as a pointestimate, or a range of values, known as a confidence interval.

Estimator: An estimator is any quantity calculated from the sample data which is used to give informationabout an unknown quantity in the population. For example, the sample mean is an estimator ofthe population mean.

Expected value: see Mean

Exponentially-Weighted Moving Average: A method of calculating the expected volatility of a time

series using historical data which gives a higher weight to the more recent past. A simplified

version of GARCH.

Extreme Value Theory: A field of statistical research which emphasizes modelling the extreme values of

a variable

F

Financial Risk Management: The process by which an organization takes a holistic approach tomanaging its risks, measuring all types of risks and exposures to these risks throughout the entireorganization. See Enterprise Risk Management.

Flowchart: Flowcharts provide a visual way to represent connections between several processes or

stages. The flow diagrams are intended to help the user recognize the order of inter-related

processes and relations between them. The coloured boxes represent a process or an event,while the arrows represent a flow, either the normal path or an alternative possible path.

Frequency: Frequency is the number of times a value recurs in a group interval.

Frequency Distribution: Frequency Distribution is a chart that graphically summarizes a list of values by

subdividing them into groups and displaying their frequency counts.

G

Goodness-of-Fit: Goodness-of-Fit is a set of mathematical tests performed to find the best fit between astandard probability distribution and a data set.

Goodness-of-Fit Test: Goodness-of-Fit Test is a formal way to verify that the chosen distribution isconsistent with the sample data.

Group Interval: Group Interval is a subrange of a distribution that allows similar values to be grouped

together and given a frequency count.

H

Hazard: Any agent that could produce adverse consequences

Hazard identification: The process of identifying any hazards

Heteroscedasticity: The degree to which the volatility of a variable changes over time. This is a

deviation from the normal distribution which may need to be corrected for in calculations.Histogram: Histogram is a plot of the range of values of a variable into intervals and displays only the

count of the observations that fall into each interval.




738

Homoscedastic: Homoscedastic or homoskedastic is an adjective describing a statistical model in which

the errors are drawn from the same distribution for all values of the independent variables.

I

iid: 'Independent and identically distributed'. A common assumption in statistical testing that elements in

a sample set all come from the same probability distribution and are independent of otherelements in the sample set.

Independent Events: Two events are independent if the occurrence of one of the events gives us no

information about whether or not the other event will occur; that is, the events have no influenceon each other. In probability theory we say that two events, A and B, are independent if theprobability that they both occur is equal to the product of the probabilities of the two individualevents.

Interquartile Range: Interquartile Range is the difference between the third quartile (75th percentile) and

the first quartile (25th percentile).

K

Kolmogorov-Smirnov Test: For a single sample of data, the Kolmogorov-Smirnov test is used to test

whether or not the sample of data is consistent with a specified distribution function. When thereare two samples of data, it is used to test whether or not these two samples may reasonably beassumed to come from the same distribution. The Kolmogorov-Smirnov test does not require the

assumption that the population is normally distributed.

Kurtosis: A measure of the peakedness of a distribution.

L

Least Squares: The method of least squares is a criterion for fitting a specified model to observed data.

For example, it is the most commonly used method of defining a straight line through a set ofpoints on a scatterplot. Least squares fitting implies that it is desired that the mean of the

estimate be as close as possible to the true value.

Leptokurtosis: A property of a probability distribution that has more extreme values than would be

expected in a normal distribution. These distributions are often called "fat tailed". See kurtosis.

Lognormal Distribution: Lognormal Distribution is the distribution of a variable whose logarithm is

normally distributed.

M

Marginal distribution: A marginal distribution is the distribution of the parameter looked at in isolation,

i.e. with the uncertainty of all other parameters integrated out.Mean: One of several measures of the location of a distribution. For a data set, the mean is the arithmetic

average of all values. For a probability distribution, the mean is the sum of all possible valuesweighted by their probability. It is also equivalent to the balance point of the distribution.

Mean Reversion: The tendency of certain financial variables, such as short-term interest rates, to overtime revert back to a long-term mean.

Measurement Error: Measurement Error is error introduced through imperfections in measurement

techniques or equipment.

Median: One of several measures of the location of a distribution. For a data set, the median is the value

halfway through the ordered data set, below and above which there lies an equal number of datavalues. For a probability distribution, the median is the value one has a 50% probability of beingbelow (and therefore being above). The median is the 0.5 quantile.



Glossary

739

Mode: One of several measures of the location of a distribution. For a data set, the mode is the most

frequently occurring value in a set of discrete data. For a probability distribution, the mode is thevalue with the highest probability (or probability density) of occurring. There can be more thanone mode if two or more values are equally common or probable.

Monte Carlo Simulation: Monte Carlo Simulation is a computer-based method of analysis developed in

the 1940's that uses statistical sampling techniques in obtaining a probabilistic approximation tothe solution of a mathematical equation or model. It is a method of calculating the probability ofan event using values, randomly selected from sets of data repeating the process many times,and deriving the probability from the distributions of the aggregated data.

Multiple Regression: Multiple linear regression aims is to find a linear relationship between a response

variable and several possible predictor variables.

N

Non-parametric Approach: Non-parametric Approach is one that does not depend for its validity uponthe data being drawn from a specific distribution, such as the normal or lognormal. A distribution-free technique.

Nonlinear Regression: Nonlinear regression aims to describe the relationship between a responsevariable and one or more explanatory variables in a non-linear fashion.

Nonparametric Tests: Nonparametric tests are often used in place of their parametric counterparts when

certain assumptions about the underlying population are questionable. For example, whencomparing two independent samples, the Wilcoxon Mann-Whitney test does not assume that thedifference between the samples is normally distributed whereas its parametric counterpart, the

two sample t-test does. Nonparametric tests may be, and often are, more powerful in detectingpopulation differences when certain assumptions are not satisfied. All tests involving ranked data,i.e. data that can be put in order, are nonparametric.

Normal Distribution: Normal Distribution is a probability distribution for a set of variable datarepresented by a bell shaped curve symmetrical about the mean.

O

Operational Risk: The risk of loss due to technical or human mistakes in the operations of a firm

occurring for any of a variety of reasons, including: fraud, payment errors, and physical loss

P

Parameter: A parameter is a value, usually unknown (and which therefore has to be estimated), used to

represent a certain eal world characteristic.

Parameterise: Parameterising a distribution means selecting the form of the parameters to describe a

variable. For example, a Uniform distribution requires two parameters, the usual notation being itsMin and Max. However, one could choose two other statistical parameters like (Min, Range)(Mean, Variance).

Parametric Approach: Parametric Approach is a method of probabilistic analysis in which definedanalytic probability distributions are used to represent the random variables, and mathematical

techniques (e.g., calculus) are used to get the resultant distribution for a function of these randomvariables.

Percentile: Percentiles are values that divide a sample of data into one hundred groups containing (asfar as possible) equal numbers of observations. For example, 30% of the data values lie belowthe 30th percentile.

Platykurtosis: A property of a probability distribution that has fewer extreme values than would be

expected in a normal distribution (I.e. kurtosis is less than 3).

Population: A population is any entire collection of people, animals, plants or things from which we may

collect data. It is the entire group we are interested in, which we wish to describe or draw




740

conclusions about. In order to make any generalisations about a population, a sample, that ismeant to be representative of the population, is often studied. For each population there aremany possible samples. A sample statistic gives information about a corresponding populationparameter. For example, the sample mean for a set of data would give information about theoverall population mean. It is important that the investigator carefully and completely defines the

population before collecting the sample, including a description of the members to be included.

Precision: Precision refers to how well a given measurement or results can be reproduced. Values can

be very precisely determined and still be very inaccurate. Conversely, a number of impreciseanalyses may average to a very accurate value.

Probabilistic Approach: Probabilistic Approach is an approach which uses a group of possible values

for each variable to estimate risk.

Probabilistic Model: Probabilistic Model is a system whose output is a distribution of possible values.

Probability: A probability provides a quantatative description of the likely occurrence of a particular

event. Probability is conventionally expressed on a scale from 0 to 1; a rare event has aprobability close to 0, a very common event has a probability close to 1. The probability of an

event has been defined as its long-run relative frequency. It has also been thought of as apersonal degree of belief that a particular event will occur (subjective probability).

Probability Density Function: The probability density function can be integrated to obtain the probabilitythat a continuous random variable takes a value in a given interval.

Probability Distribution: The probability distribution of a random variable is a list of probabilities orprobability densities associated with each of its possible values, together with those values.

Probability mass function: The probability mass function relates the possible value of a discretevariable to it's probability of occurrence.

Q

Qualitative risk assessment: An assessment where the conclusions on the likelihood of the outcome or

the magnitude of the consequences are expressed in qualitative terms such as high, medium, lowor negligible.

Quantile: Quantiles are a set of 'cut points' that divide a sample of data (or a probability distribution) into

groups containing (as far as possible) equal numbers of observations (equal probability).

Quantile-Quantile (Q-Q) Plot: Quantile-Quantile (Q-Q) Plot portrays the quantiles (percentiles divided by

100) of the sample data against the quantiles of another data set or theoretical distribution (e.g.,normal distribution). By comparing the data to a theoretical distribution with a straight line,departures from the distribution are more easily perceived.

Quantitative risk assessment: An assessment where the outputs of the risk assessment are expressed

numerically, as probabilities or distributions of probabilities.

RRandom Error: Random Error is error caused by making inferences from a limited database.

Random Number Generator: Random Number Generator is a method implemented in a computer

program that is capable of producing a series of independent, random numbers.

Random Sampling: Random sampling is a sampling technique where we select a group of subjects (a

sample) for study from a larger group (a population). Each individual is chosen entirely by chanceand each member of the population has a known, but possibly non-equal, chance of beingincluded in the sample. By using random sampling, the likelihood of bias is reduced.

Random Variable: Random Variable is a quantity which can take on any number of values but whose

exact value cannot be known before a direct observation is made. For example, the outcome ofthe toss of a pair of dice is a random variable, as is the height or weight of a person selected atrandom from the New York City phone book.



Glossary

741

Range: The range of a sample (or a data set) is a measure of the spread or the dispersion of the

observations. It is the difference between the largest and the smallest observed value of somequantitative characteristic and is very easy to calculate. A great deal of information is ignoredwhen computing the range since only the largest and the smallest data values are considered;the remaining data are ignored. The range value of a data set is greatly influenced by the

presence of just one unusually large or small value in the sample (outlier).

Regression Analysis: Regression Analysis (Simple) is the derivation of an equation which can be used

to estimate the unknown value of one variable on the basis of the known value of the othervariable.

Regression Equation: A regression equation allows us to express the relationship between two (or

more) variables algebraically. It indicates the nature of the relationship between two (or more)variables. In particular, it indicates the extent to which you can predict some variables by knowingothers, or the extent to which some are associated with others. A linear regression equation isusually written Y = a + bX + e where Y is the dependent variable a is the intercept b is the slopeor regression coefficient X is the independent variable (or covariate) e is the error term The

equation will specify the average magnitude of the expected change in Y given a change in X.

Regression Line: A regression line is a line drawn through the points on a scatterplot to summarise the

relationship between the variables being studied.

Relative Frequency: Relative frequency is another term for proportion; it is the value calculated by

dividing the number of times an event occurs by the total number of times an experiment iscarried out. The probability of an event can be thought of as its long-run relative frequency whenthe experiment is carried out many times. If an experiment is repeated n times, and event Eoccurs r times, then the relative frequency of the event E is defined to be rfn(E) = r/n

Risk communication: The interactive exchange of information on risk among risk assessors, risk

managers and other interested parties

Risk Factor: A measure whose change conditions the probability distribution of the value of the variable

of interest.

Risk management: The process of identifying, selecting and implementing measures that can be applied

to reduce the level of risk

S

Sample: A sample is a group of units selected from a larger group (the population).

Sample Mean: The sample mean is an estimator available for estimating the population mean . It is a

measure of location, commonly called the average, often symbolised as x bar . Its value dependsequally on all of the data which may include outliers.

Sample Variance: Sample variance is a measure of the spread of or dispersion within a set of sample

data. The sample variance is the sum of the squared deviations from their average divided byone less than the number of observations in the data set.

Sampling: One of two sampling schemes are generally employed: simple random sampling or Latin

Hypercube sampling. Latin hypercube sampling may be viewed as a stratified sampling schemedesigned to ensure that the upper or lower ends of the distributions used in the analysis are well

represented. Latin hypercube sampling is considered to be more efficient than simple randomsampling, that is, it requires fewer simulations to produce the same level of precision. Latinhypercube sampling is generally recommended over simple random sampling when the model iscomplex or when time and resource constraints are an issue.

Sampling Distribution: The sampling distribution describes probabilities associated with a statistic when

a random sample is drawn from a population. The sampling distribution is the probabilitydistribution or probability density function of the statistic. Derivation of the sampling distribution isthe first step in calculating a confidence interval or carrying out a hypothesis test for a parameter.

Sensitivity Analysis: Sensitivity Analysis is an analysis that attempts to provide a ranking of the model's

input parameters with respect to their contribution to model output variability or uncertainty. Inbroader sense, sensitivity can refer to how conclusions may change if models, data, or




742

assessment assumptions are changed. The difficulty of a sensitivity analysis increases when theunderlying model is nonlinear, nonmonotonic or when the input parameters range over severalorders of magnitude.

Simple Linear Regression: Simple linear regression aims to find a linear relationship between a

response variable and a possible predictor variable by the method of least squares.

Simple Random Sampling: Simple Random Sampling is a sampling procedure by which each possible

member of the population is equally likely to be the one selected.

Simulation: A technique used to obtain possible future values of a variable by randomly generating

scenarios with a fequency proportional to the probability one believes they have of occurring.

Skewness: Skewness is defined as asymmetry in the distribution. Values on one side of the distribution

tend to be further from the 'middle' than values on the other side.

Standard Deviation: Standard deviation is a statistical measure of the spread or dispersion of adistribution or data set. It is calculated by taking the square root of the variance and is symbolisedby s.d, or s for a sample data set (estimating the population distribution), and by sigma for apopulation or probability distribution.

Standard Error: Standard error is the standard deviation of the values of a given function of the data

(parameter), over all possible samples of the same size.Standard Error of the Mean: Standard Error of the Mean is the standard deviation of the distribution of

possible sample means. This statistic gives one indication of how precise the simulation is.

Statistic: A statistic is a quantity that is calculated from a sample of data. It is used to give informationabout unknown values in the corresponding population.

Statistical Inference: Statistical Inference makes use of information from a sample to draw conclusions(inferences) about the population from which the sample was taken.

Sufficient statistic: A statistic Tn=tn(xi) is called sufficient for estimating a parameter theta if the

conditional distribution of the data xi given Tn = some constant c does not depend on theta for allvalues of c, i.e. once the value of Tn is known, xi contain no extra information about theta.

T

Time Series: A time series is a sequence of observations which are ordered in time (or space). If

observations are made on some phenomenon throughout time, it is most sensible to display thedata in the order in which they arose, particularly since successive observations will probably be

dependent. Time series are best displayed in a scatter plot. The series value X is plotted on thevertical axis and time t on the horizontal axis. There are two kinds of time series data:Continuous, where we have an observation at every instant of time, e.g. lie detectors,electrocardiograms. We denote this using observation X at time t, X(t); Discrete, where we havean observation at (usually regularly) spaced intervals. We denote this as Xt.

Triangular Distribution: Triangular Distribution is a distribution with a triangular shape. It is

characterized by its minimum, maximum and mode (most likely) values. It is often used torepresent a truncated log-normal or normal distribution if there is little information available on theparameter being modeled.

U

Uncertainty: The lack of precise knowledge of the input values or to lack of knowledge of the system

being modelled

V

Value-at-Risk: Value at risk is defined as the amount which, over a predefined amount of time, losseswon't exceed with a specified confidence.

VAR: See Value-at-Risk



Glossary

Variability: Variability refers to observed differences attributable to true heterogeneity or diversity in a

population or exposure parameter which cannot be reduced by additional data collection.Sources of variability are the result of natural random processes and stem from environmental,lifestyle , and genetic differences among humans. Examples include human physiologicalvariation (e.g., natural variation in bodyweight, height, breathing rates, drinking water intake

rates), weather variability, variation in soil types and differences in contaminant concentrations inthe environment. Variability is usually not reducible by further measurement or study (but can be

better characterized).

Variance: The (population) variance of a random variable is a statistical measure of how widely spreadthe values of the random variable are likely to be; the larger the variance, the more scattered the

observations on average. Stating the variance gives an impression of how closely concentratedround the expected value (the mean) the distribution is; it is a measure of the 'spread' of adistribution about its average value.

Volatility: The measure of the magnitude of uncertainty of a financial price. The volatility is equal to onestandard deviation of the potential (or historic) percentage changes, usually expressed on an

annual basis. Volatility is a key input into option prices and risk measurements.