micro-level stochastic loss reserving · micro-level loss reserving models break down the claim...

ERASMUS UNIVERSITY ROTTERDAMMASTER THESIS

Micro-level Stochastic Loss ReservingA Fully Bayesian Approach

Author:M.J.F. PIETERS (480015)

Supervisors:Dr. M.G. POTTERS

Dr. A.A. NAGHI

A thesis submitted in partial fulfillment of the requirementsfor the degree of Master of Science

at the

Erasmus School of Economics

August 23, 2019

https://www.eur.nl

mailto:[email protected]

https://www.eur.nl/en/ese

iii

ERASMUS UNIVERSITY ROTTERDAM

AbstractErasmus School of Economics

Micro-level Stochastic Loss Reserving

by M.J.F. PIETERS (480015)

In order to meet future liabilities, insurers need to estimate losses that need to bepaid out in the future. Over the last decades insurers have mostly relied on high-level, macroscopic models to estimate such reserves. These models do not use infor-mation on a case level (micro-level data). Instead, they use aggregations of the claimdata. This was useful in the past as micro-level data was generally not available andcomputing power was limited. However, recent advances in storage capacity andcomputing power makes it possible to employ better models.

This thesis develops a fully Bayesian framework for micro-level loss reserving.To this end, we build on the model of Antonio and Plat (2014), extend and improveit, and implement it in a fully Bayesian context. Improvements range from finding arationale for a better distributions to the introduction of an adjustment for censoringinto the model.

The benefits of a Bayesian approach are numerous. It allows for a natural way todeal with parameter uncertainty, which makes the samples of the reserve include alluncertainty at every step of the model. Second, a micro-level model produces betterestimates even when small amounts of historical data are available. This is beneficialfor new insurers on the market as they can accurately estimate their reserves withoutthe need for a lot of historical data. Finally, it allows for the incorporation of expertopinions into the model by means of prior distributions.

A case study using the Bayesian micro-level loss reserving model is performedon data as provided by ARAG SE, a legal insurer in the Netherlands. The case studyresults in good fits of the model on the provided data. Also, introducing an adjust-ment for censoring results in an improvement of the model. Finally, the output ofthe model results in samples of the reserves.

The model is implemented using PyMC3, a state-of-the-art package for Bayesianstatistical modelling in Python. This allows our model to be easily implemented inpractise.

HTTPS://WWW.EUR.NL

https://www.eur.nl/en/ese

v

Contents

Abstract iii

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Claim Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Macro-level Loss Reserving . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Micro-level Loss Reserving . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5.1 A Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Research Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.7 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Theory 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Bayesian Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.3 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Posterior Predictive Checks . . . . . . . . . . . . . . . . . . . . . 12Watanabe-Akaike Information Criterion . . . . . . . . . . . . . . 12

2.3 Waiting Times in Bayesian Models . . . . . . . . . . . . . . . . . . . . . 142.3.1 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Proportional Hazard . . . . . . . . . . . . . . . . . . . . . . . . . 16Accelerated Failure Time . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Bayesian Micro-level Loss Reserving . . . . . . . . . . . . . . . . . . . . 18

2.4.1 Reporting Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.2 Occurrence Times . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Inhomogeneous Poisson Process . . . . . . . . . . . . . . . . . . 20Intensity Adjustment for IBNR cases . . . . . . . . . . . . . . . . 21

2.4.3 Settlement Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.4 Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Payment Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Payment Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Payment Amount . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.5 Reserves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4.6 RBNS Reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4.7 IBNR Reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

vi

3 Case Study: ARAG Legal Insurance 273.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Application of Micro-level Loss Reserving . . . . . . . . . . . . . . . . . 29

3.3.1 Reporting Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3.2 Occurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.3 Settlement Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.4 Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Payment Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Payment Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Payment Amount . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.5 Reserves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Conclusions 474.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Bibliography 51

A Python Code 53A.1 RBNS Reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53A.2 IBNR Reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54A.3 Reserving Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

B Run-off Triangle 59B.1 IBNR and RBNS reserve in a run-off triangle . . . . . . . . . . . . . . . 60

C Algorithms for sampling reserves 61

D Posterior Traces 63

E Heat-maps of the occurrence process 69—————————————————————————————-

vii

List of Symbols

tocc Occurrence Timetrep Reporting Timetset Settlement Timetpay Payment Timetacc Accounting Timetc Censoring Time

∆trep Reporting Delay∆tset Settlement Delay∆tpay Payment Delay

atype Payment Typeaamount Payment AmountHpay History of of payments

N Number of observed data-points in a datasetNp Number of payments in the datasetNc Number of cases in the datasetNcov Number of covariates in the modelNs Total number of different subject codes in the datasetNm Total number of months in datasetd Number of parameters in model

λ Intensity parameter of the Poisson distributionmi The last day of month iNocc(i) Number of occurrences in month iwi Number of active policies in month i

θ Parameters of a modely Observed datap(x) Probability of xζ Covariates

f (t) Probability density functionF(t) Cumulative density functionS(t) Survival functionh(t) Hazard functionL LikelihoodI(expr.) Evaluates to 1 when expr. is true, 0 otherwise

1

Chapter 1

Introduction

1.1 Motivation

The concept of insurance dates back to at least the 3rd millennium B.C., a time duringwhich Chinese merchants distributed their ware over many ships as to limit theirlosses in case one of their boats would capsize (Vaughan, 1996). Over the millenniathat followed simple constructs like the above evolved into the (complex) insuranceproducts known today.

A contemporary insurer has many clients. Each client has one or more insurancecontracts with the insurer, called policies. In a policy it is stated that the insurerunderwrites the risks covered in the policy given that the client pays the stipulatedpremium. In other words, the insurer guarantees payment of the client’s incurreddamages or losses. The benefit of these contracts at this scale is immediate: theclients pay an affordable premium to cover low-risk but costly events, whereas theinsurer utilizes statistics to make profit over appropriately chosen premiums.

One challenge an insurer faces is predicting its premium income and future li-abilities. Not only is this valuable to the insurer itself, but regulations stipulatedthrough e.g. the Solvency II framework demand such knowledge. In particular, aninsurer needs to have sufficient reserves to pay out in times of extreme economicalconditions. In this thesis we will look at the prediction of future liabilities, which arerequired to compute the required reserves of the insurer.

The majority of the reserves consists of as-of-yet unknown, but estimated, lossesthat need to be paid to clients in the future. The events leading to these losses typ-ically occur at arbitrary moments in time due to their intrinsic random nature. De-pending on the type of insurance, an insurer pays the client once or at a number ofrandom times after the reporting date1 of the event. Thus, a single risk event can in-stantiate a sequence of random moments in time: the time when an event is reportedto the insurer, the amount and moment the insurer pays or receives compensation tothe individual or external parties, and the time at which the case is closed. Althoughthe events and subsequent payment times and amounts are random they do tend tofollow specific distributions, allowing an insurer to build stable predictions basedon statistics. We will elaborate on this topic in Chapter 2.

Over the last decades insurers have mostly relied on high-level, macroscopicmodels to estimate the reserves. These models are widespread and often aggre-gate claim data to predict the future liabilities. For example, methods such as thechain-ladder (CL) method aggregate claim payments for a particular year and usesimple factors to project the aggregations into the future Friedland (2010). Thesemodels have become the standard in the business, mainly due to the lack of com-puting power and detailed claim data in the last thirty years. In recent years the

1This date will be formally defined in Section 1.2.

2 Chapter 1. Introduction

latter two statements are no longer true, raising the question of whether better mod-els should be employed. Secondly, due to tightening legislation, an insurer needs tomeasure the future cash-flows and the uncertainty therein with more detail (Anto-nio and Plat, 2014). Micro-level models do not use aggregations but instead focuson the whole dataset, which makes them more suitable for stating uncertainty in theestimates. In particular, this thesis focuses on micro-level loss reserving models.

Micro-level loss reserving models break down the claim process into several dis-tinct components at the individual claim level (we will explain these components indetail in the next section). This allows for accurate estimation of losses and increasedinterpret-ability. All random aspects within the claim process are retained and ex-plicitly modeled at a micro level, with the purpose of not disregarding a useful pieceof information. An additional benefit of these micro-level models is that they allowfor external information to be absorbed. This information could for example consistof details about the policy holder or the development of the claim until now. This isparticularly interesting with the current state of technology, as insurers often havelarge datasets available about the development of the claims; the insured client orentity; details about the policy and much more. Incorporation of such informationinto the reserving model allows the insurer as well as the regulator to get more in-sight into the insurance risks. It also enables inferring high-risk indicators for certaingroups based on the reserve estimations of the micro-level model.

The remainder of this chapter is organized as follows. The next section will ex-plain the claim process, as a micro-level model starts with a firm understanding ofthis process. Then, in Section 1.3 the more classical, macro-level approaches to lossreserving based on aggregations are specified. Following the macro-level section,Section 1.4 will explain the micro-level approach, together with previous work doneon this subject and the Bayesian approach taken previously. Finally, we will specifyour research goal in Section 1.6.

1.2 Claim Process

In the previous section it is mentioned that micro-loss reserving models make use ofthe different events that occur during the claim process. To be able to understandthe next sections, it is important to understand the claim process. This section willexplain how a typical claim is processed. This claim process is intentionally keptas simple as possible, such that it is applicable to a large range of insurance fields.Insurers usually keep record of the events as described in this section on a case level.

Figure 1.1 depicts the events during a typical claim process. A claim is precededby an event, called the occurrence, at random time tocc in which a client of the insureris involved. This occurrence event is unknown to the insurer for ∆trep time until theclient reports the event at time trep = tocc + ∆trep. In the time interval [tocc, trep) theoccurrence event has happened, but has not been reported yet. During this time, aclaim is referred to as Incurred But Not Reported (IBNR), which is the basis for theIBNR reserve. The number of reported events within a given time interval typicallyfollows a Poisson distribution with some rate parameter λ (Norberg, 1993; Arjas,1989), whereas the time between two subsequent occurrences typically follows anexponential distribution. Assuming the client’s policy covers the occurrence at timetocc, the claim is handled during the random time interval ∆tset = tset − trep, where∆tset represents the duration of a claim and tset is the time at which the claim issettled and closed. This duration typically follows a distribution for waiting timesfrom survival analysis (see Antonio and Plat, 2014). During handling of a claim, the

1.2. Claim Process 3

Occurrencetime

tocc

∆trep

Reportingtime

trep

∆tset

∆t1pay

Payment

∆t2pay

Payment

Settlementtime

tset

FIGURE 1.1: Development of a general claim process. tocc is the timeat which an event occurs that leads to a claim. The case is not reportedfor an ∆trep amount of time until trep. Then the payment process starts(see Figure 1.2). During this time from trep until tset, payments aremade on the case. This process ends at tset when the claim is closed.

claim is Reported But Not Settled (RBNS). These claims form the basis for the RBNSreserve (see next section). During ∆tset, payments are made. The development of a

t

CurrencyPayment 2

Payment 1

t2pay

a1amount

t1pay

∆t2pay

a2amount

trep

∆t1pay

Total payment

FIGURE 1.2: The payments for a claim within ∆tset (see Figure 1.1). ∆tcorresponds to waiting time until the next payment happens. αamountis the monetary amount of the payment. This figure only depicts pos-

itive payments for simplicity.

claim during ∆tset is depicted in Figure 1.2. The payments are random in both timeand monetary amount.

First, payments are made at random times t1pay, . . . , tNp

pay, where Np is a randombut finite integer representing the total number of payments. The respective waitingtime between pairs of subsequent payments (payment delays) ∆t1

pay, . . . , tNppay again

follows some distribution. Cook and Lawless (2007) define a process of inter-eventtimes such as the above as a Renewal Process. They define a process with gap times∆t, where the gap times are independent and identically distributed (I.I.D.)2. Weapply this Renewal Process on our time between the payments ({∆tj

pay}Npj=1).

The random monetary amounts of the payments can be positive (type I: pay-ments to the policy holder, service charges, lawyer costs, etc.) or negative (type II:

2 We will see in later sections that it is possible to add dependence in a Renewal Process by addingcovariates (see Section 2.4.4, under Payment Delay).


receivables from other insurers, etc.). Whether a payment is of type I or II may bedetermined using a Bernoulli distribution where its parameters could be a functionof the elapsed time since trep or other covariates (effectively making it a logistic-regression, see Section 2.4.4 under Payment type). Consequently, this distributiondetermines the sign of the monetary amount. Secondly, the payment amounts aredenoted by a1

amount, . . . , aNpamount and are usually modelled using a Lognormal distri-

bution since the payments by definition may not be negative.From the above, it is clear that even though many aspects within a claim pro-

cess occur at random, the collection of claim processes do follow some well definedstatistical distributions. This will become more apparent in the theoretical Chapter2 and when we use real-life data in Chapter 3. These distributions may be usedto do computations on many parts in the claim process, but also to predict reserves.These reserves are computed from the combination of the fitted distributions of eachprocess underlying the claim process.

Before we proceed to the goal of this thesis we first examine the classical reserv-ing approaches in the next section.

1.3 Macro-level Loss Reserving

Insurers often use the terms ‘development year’ and ‘occurrence year’ in their esti-mates for the reserve. The first lays in the future, the latter in the past. This allowsinsurers to set up two types of reserves for a particular occurrence year. First, theyconsider the claims for which the events (e.g. car crash) have occurred, but whichhave not been reported to the insurer yet. These claims are often called Incurred ButNot Reported (IBNR). Claims for which the development is currently after occur-rence time tocc (see Section 1.2 for the definition) but before trep can be attributed tothis IBNR reserve. Second, insurers consider claims that have not been settled yetand that are currently being handled by the insurer. These claims will likely haveadditional costs for the insurer in the future, but the occurrence time, reporting timeand possibly some payments have already been observed3. These claims are cur-rently at a point in time after trep, but before the settlement time tset. These claimsare often referred to as Reported But Not Settled (RBNS).

Classical reserving approaches frequently adopt a run-off triangle matrix as shownin Table B.1 of Appendix B.1. The horizontal axis of the triangle matrix denotesthe occurrence year, the vertical axis of the matrix denotes the development year.Within this triangle, payments for a combination of accident and development yearare summed (aggregated). The upper-left side of the triangle has values which areknown to the insurer, while the lower-right side contains the projected amounts.These triangles are made separately for both the IBNR as well as the RBNS reserve,since they both need a separate triangle matrix. Various methods have been de-veloped to make a projection within the triangle: the chain-ladder (CL) method,the Bornhutter-Ferguson (CF) method, and others (Friedland, 2010). Some of thesemodels try to quantify uncertainty of the estimated reserves, while the CL methodassumes that the relative change of the losses for different occurrence years remainsconstant between all development years. This allows for a point-wise estimation ofthe future liabilities. However, as stated in the previous section, summarizing usingaggregated claims disregards information at the individual claim and policy level.This is particularly unfortunate since insurers often have more detailed information

3 Suppose a reserve is estimated at accounting date tacc, then payments are observed if and only iftacc > tpay.

1.4. Micro-level Loss Reserving 5

about policies and claim developments. All this information is not used in macro-level models, such that a lot of information is ignored.

The CL method is used most often for models relying on ’run-off triangles’ andis known to have several shortcomings. These have subsequently given rise to avast amount of literature trying to mitigate some of the shortcomings. (Antonioand Plat, 2014) give a full overview of existing literature on the issues. The issuesrange from the existence of chain-ladder biases (Taylor, McGuire, and Greenfield,2003), the inability to deal with negative or zero elements in the triangle (Kunkler,2004), to criticism on the Poisson distribution, which forms the basis for the chain-ladder model. Finally, the use of aggregated data in the chain-ladder method hasbeen criticised by Taylor and Campbell (2002) and England and Verrall (2002). Theyargue that a lot of data remains unused that could otherwise lead to more precisepredictions of reserves.

1.4 Micro-level Loss Reserving

The classical reserving approaches do not model the claim process in terms of dis-tributions for each delay and payment. This causes the problems as stated in theprevious section.

A micro-level loss reserving model enables the insurer to model each aspect inthe claim process (see Figure 1.1 and 1.2) separately. This allows for the highestgranularity in evaluating the accuracy of the estimates. As stated at the beginningof this chapter, micro-level loss reserving models also allow for external covariateinformation to be added in every step of the process. These covariates can consist ofdetails about the policy holder as well as past events within a claim. For example,covariates consist of the age or the neighbourhood of the policy holder, but alsocontain information about the past claim behaviour of a client, policy type, premiumheight and so on. All these characteristics of a micro-level model causes the reservesto be more precise. Furthermore, it gives the insurer better insight into the structureof the claims.

1.5 Previous Work

Micro-level modelling is a relatively new topic in the field of loss reserving. Initialworks started with the Position Dependent Marked Poisson Process (PDMPP), asproposed in Arjas (1989) and Norberg (1993). A PDMPP is a Poisson process whereits intensity λ is a function of some parameter (e.g. time), and the associated point(or ‘mark’) is again randomly distributed (see Karr (1991)). This mark lies in a hyper-dimensional space with dimensions equal to the number of attributes that are addedto the PDMPP. In the papers of Arjas and Norberg, for instance, a point consists ofthe occurrence time (tocc in Figure 1.1) and the associated ‘mark’ is the reportingdelay (∆trep) together with the development of the claim as depicted in Figure 1.2.Antonio and Plat (2014) extended their model by laying out a parametric frame-work. They defined a likelihood function for all parts of the claim process, whichthey maximized numerically. This gives a maximum likelihood estimate (MLE) forthe parameters of the model. They used data from a Dutch insurer to get point esti-mates using a maximum likelihood approach. One drawback of their approach is therelative difficulty of adding extra covariates (Wüthrich, 2018). A machine learningapproach can also be applied in a ‘micro-level’ context, as shown by Wüthrich, who


used a regression-tree approach. This method is more flexible than the PDMPP be-cause adding extra features to the model is easier. However, the model in Wüthrichonly considers the number of payments and not the claim amounts paid.

The fact that micro-level models often outperform classical approaches has beenshown by Jin and Frees (2013) and Huang et al. (2015). Jin and Frees demonstratethat micro-level models generally outperform macro-level models by generating es-timates with higher precision and smaller errors. However, as argued by them: "Pa-pers that provide detailed and complete implementation of the micro-level models on empiri-cal data are currently lacking in literature". The same authors also mention the lack ofresearch articles on the topic of micro-level reserving. Huang et al. (2015) demon-strate both analytically and numerically the advantage of micro-level models overmodels based on aggregations. They state: “The research shows a significant increase inthe accuracy of loss reserving by using individual data compared with aggregate data".

The micro-level approach to reserving seems superior over classical approaches.An overview of the work of Antonio and Plat (2014) will be given now, as this thesisbuilds on their paper. Their model consists of three main components, each of whichare modeled using separate distributions, as is common in micro-level models. Theseparate components are laid out below:

1. Reporting Delay (∆trep in Figure 1.1): The time from an event occurrence un-til notification by the client at the insurance company. This delay will causeclaims to be incurred but not reported and forms the basis for the IBNR re-serve. The reporting delay can be modelled using standard distributions forwaiting times, as stated in Section 1.2. However, Antonio and Plat use a mix-ture of a Weibull distribution and nine fixed components for settlement withinthe first nine days. Unfortunately, the reason for doing this remains unclear intheir paper and the choice for this distribution seems rather ad-hoc. The dis-tribution for the reporting delay is denoted as frep(∆t). Maximum LikelihoodEstimation (MLE) is done in order to obtain estimated parameters.

2. Occurrence process: Given the distribution of the reporting delay, a piece-wiseconstant Poisson process is estimated (a Position-Dependent process, whereposition is time in this particular case). Imagine that the insurer has data ofthe occurrence until accounting date tacc, at which time it wants to estimatethe reserve. The Poisson process has separate intensities (λ) for each month,i.e., λ = λ(t). A point (mark) in this PDPP corresponds to a occurrence timetocc, making the overall model a PDMPP. The estimates for the intensities willrapidly decrease when the occurrence date gets closer to the accounting date(tacc), since a the fraction of the cases that are of the type IBNR gets higher. ThePoisson process is adjusted in such a way that the reporting delay ∆trep is takeninto account. The closer we come to the accounting time tacc, the more likelyit is for a case to not be reported. Adjusting for this reporting delay yields anadjusted Poisson process, as the process is adjusted for the IBNR claims usingthe distribution for the reporting delay (see Section 2.4.2). To estimate the in-tensity parameter for a single month Antonio and Plat (2014) use maximumlikelihood estimation (MLE).

3. Development process: The development process is depicted in Figure 1.2. Itis the part after a case becomes known to the insurer trep and before settlementtset (Figure 1.1). The claim development in Antonio and Plat (2014) consistsof two parts. The first part consist of the event type. They distinguish threeseparate event types:

1.5. Previous Work 7

(a) Settlement of the case without payment. This events ends the development ofthe claim. However, it is important to note that other events can happenbefore settlement (see item b and c).

(b) A payment together with settlement at the same time. This end the develop-ment of the claim, but draws an additional payment from the payment-distribution.

(c) A payment without settlement. Draws an item from the payment distribu-tion while the development of the claim stays open.

The second part of the development process consist of modelling the paymentamount, which is only applicable for event type b and c. For this they use aLognormal distribution.

1.5.1 A Bayesian Approach

Standard methods such as Maximum Likelihood Estimation (MLE) have been ap-plied on micro-level loss reserving (see e.g. Antonio and Plat, 2014). One draw-back of these methods is that parameter uncertainty and model uncertainty are onlypartly accounted for. This leaves ambiguity for the insurance company as well as forthe regulators on how accurate the predictions are in the first place.

This thesis takes a Bayesian approach on micro-level loss reserving. Bayesianmethods allow for full incorporation of all uncertainty in all the different parts of themodel. On top of that, the insurer is able to introduce his own views into a Bayesianmodel by specifying prior distributions via expert judgement. As stated by Arjas(1989): “Choosing a reasonable prior ... could be viewed as a good opportunity for an actuaryto use, in a quantitative fashion, his experience and best hunches." Also, a prior may beeffective when sample sizes are small or the data is biased or of questionable quality.Bayesian inference yields full distributions for each individual inferred parameter(posterior). Furthermore, posterior predictive sampling from a Bayesian model willalso incorporate uncertainty in the parameters (Greenberg, 2009). Common statisticsoften requested by regulators like Value at Risk (VaR) and Expected Shortfall (ES) areall generated by sampling from the posterior predictive distribution.

Haastrup and Arjas (1996) use the framework as laid out by Norberg (1993) to im-plement the first Bayesian micro-level stochastic model. They use a non-parametricapproach to each part of their model. However, as they state in their conclusion: “Wewere a little less enthusiastic about the non-parametric modelling; the computations turnedout very time consuming, and sometimes additional structure is needed . . . In the future, wemight want to model some components nonparametricaily and some parametrically.”. Also,they did not introduce any covariates into various parts of the model (for examplethe reporting delay) to reduce computation time. We will use parametric modellingand introduce covariates where it seems appropriate.

The use of a Bayesian method was already suggested by Arjas (1989), as heclaimed: “Since deciding on claim reserves is a management decision, rather than aproblem in science in which some physical constant needs to be determined, Bayesianarguments should not be a great deterrent to a practitioner." However, the workof Haastrup and Arjas dates from 1996, a time in which Bayesian samplers wherelargely inefficient and computing power was very limited (Andrieu et al., 2003).Their code was running on a 1989 DEC Workstation and Gibbs sampling was thepreferred method of sampling.


Recent developments in open-source computer packages such as PyMC3 (Sal-vatier, Wiecki, and Fonnesbeck, 2016) for Python, STAN (Carpenter et al., 2017) writ-ten in C++, and others, make large-scale Bayesian analysis feasible and applicable.All these packages include advanced and efficient samplers and allow for multi-coresampling, hereby exploiting all cores of the CPU.

The Bayesian approach seems to be superior to the frequentist approach, in thesame fashion as the micro-level reserving approach seems to be superior to the clas-sical reserving approaches. In this thesis we combine these two into a Bayesianmicro-level loss reserving framework. In the next section the research goal of thisthesis is formulated.

1.6 Research Goal

Antonio and Plat (2014) have developed a micro-level model that is flexible andtakes a radically new approach to loss reserving. Their approach mitigated the dis-advantages of the classical reserving approaches based on aggregated data. How-ever, their approach has a couple of drawbacks. First, model and parameter un-certainty is not accounted for in a statistically sound manner. Secondly, choices forcertain distributions seem sub-optimal and the effect of censoring is not taken intoaccount. Also, the Bayesian method of Haastrup and Arjas (1996) seems out-of-datetoday and their recommendation to use a combination of non-parametric and para-metric modelling (see Section 1.5.1) can be adopted.

For these reasons, this thesis develops a fully Bayesian framework for micro-level lossreserving based on the framework as proposed by Antonio and Plat (2014). This thesisadds the following distinctive contributions to the current literature:

I. A micro-level loss reserving model

This thesis takes a micro-level reserving approach, as opposed to the classicalreserving approaches that use aggregated data. We start by the model as pro-vided by Antonio and Plat (2014).

II. A fully Bayesian approach

In contrast to the approach of Antonio and Plat (2014), our model takes a fullyBayesian approach on loss reserving in which all uncertainty is taken into ac-count: from inference of the parameters to prediction of the IBNR and RBNSreserves. This will increase the accuracy in the uncertainty of the reserve. OurBayesian framework can furthermore include the expert judgement of the ac-tuary (which is not possible in the previous works) and allows for the incor-poration of covariates on every level, e.g. policy characteristics, claim processcharacteristics, etc. Furthermore, dependence between different parts of themodel can be introduced when needed. This leaves the framework open tomodification in order to suit the insurer’s needs. Finally, we will follow (Haas-trup and Arjas, 1996) and use both parametric as non-parametric componentsas they seem apt.

III. Improvements on Antonio and Plat

We will introduce various improvements on the model of Antonio and Plat.First, we will add covariates into the model when apt (see Section 2.3.1). Sec-ondly, we will compare various distributions for every component in the model(see Chapter 3) and find a rationale for why certain distributions might be moreappropriate to model parts of the claim process. The case study shows that the


fit on the data and the predictive accuracy improves when we use these distri-butions in our model. Thirdly, we add an adjustment for censoring. Censoringoccurs when certain events remain unobserved in a data set. Without adjust-ing the model for censoring, the predictions may be biased. More on censoringin Section 2.4.3. Finally, we model the payment delay as a renewal process (seeSection 2.4.4).

IV. A Case Study

The case study of Chapter 3 applies the model on real-life data as provided byARAG SE. Implementation of the framework is done in Python (in contrast toAntonio and Plat (2014), which was done is SAS). The Python package PyMC3(Salvatier, Wiecki, and Fonnesbeck, 2016) is used for Bayesian inference. PyMC3uses a C back-end for its computation (Theano Development Team et al., 2016)and uses the latest samplers for Bayesian sampling (see Section 2.2.2).

1.7 Overview

This thesis is structured as follows. Chapter 2 starts by laying out the theory behindBayesian interference, sampling and model evaluation criteria in Section 2.2. Afterthis, it discusses the basic theory of survival analysis and how to add covariates tosurvival-models in Section 2.3.

In Section 2.4 the Bayesian micro-level model is laid out by parts. This sectionwill explain the model in detail. We will model every aspect of Figures 1.1 and 1.2using distributional assumptions in a parametric way. The occurrence process willbe modelled as a Poisson process with a piece-wise constant intensity parameter foreach month.

Finally, in Chapter 3 case study will be performed on data provided by ARAGLegal Insurance. Within this case study, samples from the posterior are generatedfirst. Then predictions from the posterior predictive are generated. This is done forall the different building blocks of the model. Different distributions will be com-pared for each component of of the model. The case study results in good fits of themodel on the data provided. Finally, the predictions are combined, which enablesgeneration of samples from the IBNR as well as the RBNS reserve distributions.

9

Chapter 2

Theory

2.1 Introduction

In Section 1.2 we introduced the general claim process, which is independent of thetype of model used in loss reserving. In this and the following chapters we take aBayesian approach to model the claim process. We will refer to this model as theBayesian micro-level loss reserving model. The main benefit of this method is theability to include uncertainty in each aspect of the model, as will become clear inlater sections. In order to clearly explain the Bayesian micro-level loss reservingmodel we need to separately introduce several different subjects.

To this end, we first give a brief introduction to Bayesian statistics in Section 2.2.Section 2.2.1 is devoted to the topic of Bayesian inference, in which the difference be-tween frequentist and Bayesian parameter estimation is discussed. Next, the topic ofsampling distributions is discussed in Section 2.2.2. Here only a high-level overviewis given of some of the common sampling algorithms that are used today. Section2.2.3 introduces several measures of quality for Bayesian models, where in particularattention is given to model over-fitting and model comparison.

These basics of Bayesian modelling are followed by a non-exhaustive introduc-tion to Survival Analysis in Section 2.3. This section reveals the connection betweenthe modelling of several components in the claim process and the actual science ofsurvival of mechanical systems, diseases, etc.

The theory of Bayesian statistics and Survival Analysis are then combined inSection 2.4 to formulate all the components in the general claim process as shown inFig. 1.1 and 1.2 in a fully Bayesian setting. Lastly, it is shown in Section 2.4.5 howthe Bayesian models for these components are combined to compute reserves.

2.2 Bayesian Modeling

We recall that this thesis takes a Bayesian approach to micro-level loss reserving,in contrast to a frequentist approach as adopted by e.g. Antonio and Plat (2014).Although we will not go into the philosophical aspects of both approaches, we doshed some light on their differences.

For a Bayesian statistician probability is a degree of belief in an event to occur;a belief that may change over time when new measurements become available. Afrequentist, on the other hand, considers probability to be the frequency of a randomevent to occur in the limit of infinite draws.

For example, consider a fair six-sided die. A frequentist will indicate that theprobability to throw five eyes in the limit of sufficient throws is 1

6 . He assigns proba-bilities by throwing the dice an infinite number of times and averaging the outcome.

10 Chapter 2. Theory

However, his method suffers from the problem that it is only valid when an exper-iment can be executed an infinite number of times. A Bayesian statistician has asubjective view on probability. He starts with an a-priori belief that the dice shouldbe fair (or not), and adjusts his belief when data proves him to be wrong.

Another example is the following. Suppose we consider the probability that itwill rain tomorrow. A Bayesian statistician starts with a prior believe, and adjustshis belief when clouds start to appear. A Bayesian statistician can use its prior be-liefs to say something about the probability, however a frequentist cannot. Indeed,a frequentist considers probability to be the frequency of occurrence from infinitesamples. However, the day "tomorrow" only happens once, so how is probabilitydefined in this case?

There are many philosophical intricacies that may be found in the literature (forexample in Greenberg, 2009), but perhaps the above examples succeeded in shed-ding some light onto their differences.

We emphasize that a Bayesian approach, contrasting a frequentist one, allowsfor an expert to add his opinion into the model in a statically correct manner via aso-called prior. Furthermore, it offers a a natural way to deal with parameter, modeland prediction uncertainty (Greenberg, 2009).

The next sections describe how a Bayesian statistician obtains his probabilitiesand how sampling methods are used to infer these probabilities. Thereafter, Bayesianmethods are applied on the topic of loss reserving.

2.2.1 Bayesian Inference

Imagine a model with to-be-estimated parameters θ ∈ Rd, where d is the numberof parameters of a model, and consider observed data y = {yn}N

n=1, where N isthe number of observed data-points in the dataset. Next, consider any extra covari-ate information as ζ = {ζn}N

n=1. A frequentist considers a model with a likelihoodfunction defined by the conditional probability p(y|θ, ζ) (read: the probability of ygiven θ and ζ). The likelihood function encapsulates an analytical model that ap-proximates the process generating the observations. To find the parameter estimatesθ in a frequentist setting, the likelihood function with respect to the parameters ismaximised, so that

θMLE = argmaxθp(y|θ, ζ). (2.1)

This so-called Maximum Likelihood Estimate is computed using a numerical opti-misation algorithm.

The Bayesian approach, on the other hand, views the parameters θ as randomvariables. Without any observed data y and ζ the parameters are believed to bedistributed according to a so-called prior belief, denoted p(θ). As the data y andζ become available through observations the prior belief is updated, resulting ina so-called posterior belief. The posterior is a probability density function of theparameters θ and is computed using Bayes’ rule:

p(θ|y, ζ) =p(θ,y|ζ)

p(y|ζ) =p(θ)p(y|θ, ζ)

p(y|ζ) ∝ p(θ)p(y|θ, ζ), (2.2)

where p(y|θ, ζ) is the likelihood function as introduced in Eq. (2.1), p(θ) the priordistribution (or prior belief), and p(θ|y, ζ) the posterior distribution. The MaximumA Posteriori estimate of θ equals the mode of the posterior distribution and reads:

θMAP = arg maxθ

p(θ|y, ζ) = arg maxθ

p(θ)p(y|θ, ζ). (2.3)

2.2. Bayesian Modeling 11

Effectively, the MAP-estimate is a point estimate of the uncertain parameters θ. No-tice that when p(θ) is uniform (i.e. a constant function) Eq. (2.3) is equivalent to Eq.(2.1).

Bayesian inference focuses on the posterior distribution, which is defined as thethe distribution of the random variable θ, conditional on observing the data y andcovariates ζ. Under most conditions, then, the maximum likelihood estimate θMLEof the frequentist approach is equal to the maximum a posteriori (MAP) estimate inthe limit of infinite observations (in this case the prior no longer has an influenceon the posterior). However, whereas a Maximum Likelihood Estimator from thefrequentist approach delivers only a single best estimate, the Bayesian approach re-turns an entire probability distribution for the posterior of θ. Consequently, one canuse the posterior to compute uncertainty in variables that are a function of it. Thelast step in Eq. (2.2) shows that the term p(y) is independent of θ, i.e., the posteriorp(θ|y, ζ) is proportional to the prior and the likelihood.

In later sections it will become clear that it is important to be able to generatea new sample yn+1 from the distribution p(yn+1|θ, ζ). A naive way would be tosample p(yn+1|θMAP, ζ∗), where the MAP estimate is defined by Eq. (2.3) and ζ∗ isthe newly measured covariate. However, this does not account for the uncertaintyin the MAP estimate. A better way to compute p(yn+1|θ, ζ∗) is done by integratingover (marginalizing) the distribution yn+1 given θ over the posterior distributionp(θ|y) (c.f. Eq. (2.2)):

p(yn+1|y, ζ) =∫

p(yn+1|θ,y, ζ)p(θ|y, ζ)dθ, (2.4)

where the integration is performed over the appropriate domain of θ. The left-handside of this equation is known as the Posterior Predictive Density (PPD). We will usethis equation in a later section to quantify the quality of Bayesian models.

It is important to realize that Eqs. (2.2) and (2.4) are not guaranteed to be analyti-cally tractable. Only in a particular situation the multiplication of the likelihood andthe prior distribution in Eq. (2.2) results in a closed-form expression of the posterior:the posterior needs to have the same algebraic form as the prior. This is achievedwhen the posterior lies in the same family of distributions as the prior. In this case,the prior and posterior are called conjugate distributions.

The benefit of having a conjugate prior is that standard methods can be used to(i) easily take samples from the posterior or (ii) to infer descriptive statistics aboutthe distribution (mean, mode, variance). In practice, however, one does not fre-quently end up in this favourable situation. Consequently, to use Bayesian inferencein any setting, one requires numerical sampling methods to be able to compute theposterior. The next section describes such sampling methods.

2.2.2 Sampling

To make Bayesian inference possible in general, one should be able to sample fromany posterior (that is proportional to the product of likelihood and prior, see Eq.(2.2)), which is a non-trivial task (Greenberg, 2009). Furthermore, sampling ana-lytically intractable multivariate distributions is difficult in general. Markov ChainMonte Carlo (MCMC) algorithms such as the Metropolis-Hastings, the Gibbs, andthe rejection/importance sampling became leading for this task in the 1990’s. Mostof these were based on random walk simulations (Andrieu et al., 2003).

These early algorithms were made obsolete with the introduction of the next-generation Hamiltonian Monte Carlo (HMC) algorithm. It uses gradient information


of the distributions in order to sample more efficiently. This algorithm convergesmore quickly for high-dimensional distributions and is nowadays the preferred sam-pling method in many applications (Hoffman and Gelman, 2014). The performancecosts for the HMC method per drawn sample from a distribution of dimension Ndim

is around O(N5/4dim ), while the costs for random walk methods often grow as O(N2

dim)(Creutz, 1988). As stated by Hoffman and Gelman, HMC algorithms require an a-priori specification of parameters such as step size and number of steps. A wrongchoice can lead to significant performance degradation. For this reason, the im-plementation of standard HMC algorithms into computer packages was often notfeasible. The No-U-Turn Sampler (NUTS) by Hoffman and Gelman eliminated thisproblem. Therefore, the NUTS sampler is now the state-of-the-art standard and isimplemented in packages such as PyMC3 for Python (Salvatier, Wiecki, and Fonnes-beck, 2016), STAN in R (Carpenter et al., 2017), and others.

2.2.3 Model Validation

An important subject of modelling is model validation. Model validation is a branchof modelling devoted to analysing and quantifying the quality of a model. It revealspossible bias and variance the model can have, due to e.g. over-simplification of thedata-generating process generating the observations. The next sections describe twoBayesian methods of model validation.

The next section about Posterior Predictive Checks describes how samplers can beapplied to Eq.(2.4) to obtain new data-points {yn+1, yn+2, . . .} given past observa-tions y, and how these can be compared to the observed data y in order to get anidea for the quality of the model.

The section thereafter on the Watanabe-Akaike Information Criterion (WAIC) de-scribes how to evaluate the predictive accuracy of a Bayesian model and how tocompare the outcomes of different Bayesian models in a Bayesian fashion (as op-posed to a frequentist fashion). Both methods will be used in Chapter 3.

Posterior Predictive Checks

Predictive checks in a frequentist setting were first mentioned by (Box, 1980). Hecompared observed data with samples drawn from a model built on previously-observed data. Gelman, Meng, and Stern (1996) started from frequentist evaluationapproaches (i.e. goodness of fit tests) and reformulated this approach in the Bayesianframework. They did this by drawing samples from the PDD (c.f. Eq. (2.4)) and com-paring these with the distribution of the observed data (y). The comparison betweenthe samples from the model fit on historical data and the historical data itself can bedone numerically (using test-statistics) or graphically (by plotting histograms of thedata and the samples, employing Q-Q plots, etc.). We will use the graphical methodand refer to it as the Posterior Predictive Checks (PPC). We will apply it in Chapter3 to evaluate our fitted models.

Watanabe-Akaike Information Criterion

While PPCs check whether a model corresponds to reality, we still need some wayto compare models to one another. When we introduce more parameters to a model,the fit on training data will always improve. However, the out-of-sample (sampleson which the model has not been trained) performance will decline, resulting in anover-fitted model. Out-of-sample tests such as cross-validation can be performed on

2.2. Bayesian Modeling 13

new data to determine the accuracy of the prediction1. However, when there is nonew data available or when data is scarce, frequentists use model evaluation criteriasuch as Akaike’s Information Criterion (AIC) to evaluate the prediction capabilitiesof a model relative to other models.

The AIC value is computed as

AIC = 2d− 2 log L, (2.5)

where L is the maximum likelihood of the model based on d parameters. The lowerthe AIC value, the better the model. Observe from Eq.(2.5) that the AIC trades offthe number of parameters versus the likelihood. Indeed, increasing the number ofparameters results in a higher value of the likelihood but also increases the firstterm on the rhs. Consequently, as the maximum likelihood L is larger when moreparameters are used (and thus decreasing the value of AIC), the term 2d serves as apenalty for adding more parameters and thus avoids over-fitting.

Bayesian methods, however, do not use a maximum likelihood approach. Howto formulate a criterion like Eq. (2.5) in this setting? Various methods for a Bayesiancontext have been developed, see e.g. Gelman, Hwang, and Vehtari (2014). For ex-ample, the Deviance Information Criterion (DIC) was a popular criterion for Bayesianmodels in the past. It was incorporated into the BUGS project (Spiegelhalter et al.,1996), which provided the tools for Bayesian interference in the previous decade(Vehtari, Gelman, and Gabry, 2017). However, the DIC was still based on a pointestimate of the posterior distribution and is therefore not a fully Bayesian criterion(Vehtari, Gelman, and Gabry, 2017). With the introduction of new computer pack-ages in the last decade, the Watanabe-Akaike Information Criterion (WAIC) gainedpopularity. The WAIC uses the full posterior distribution instead of a point estimate.It consists of two components: the first component is a replacement for the L termin the AIC (see Eq. (2.5)), whereas the second term is the adjustment for over-fitting(the term 2d in Eq. (2.5)).

The WAIC is computed as follows. First, we recall the equation for the PosteriorPredictive Density (PPD) as introduced in Section 2.2.1:

p(yn+1|y, ζ) =∫

p(yn+1|θ,y, ζ)p(θ|y, ζ)dθ. (2.6)

We can use this distribution to generate new samples that are a result of a model thathas been trained on the historical data y, as discussed previously. However, the sameequation can also be used to compute the log point-wise predictive density (lppd)for each of the observed data points yn ∈ y. The lpdd is a measure to evaluate thefit of the model to the observed data points (Gelman, Hwang, and Vehtari (2014)). Itis analogous to the L term in Eq. (2.5). The lpdd for N observed data points is givenby

lppd = logN

∏n=1

p(yn|y, ζ) =N

∑n=1

log∫

p(yn|θ, ζ)p(θ|y, ζ)dθ. (2.7)

In practice, one needs to approximate the integral in the last term of the above equa-tion. Packages such as PyMC3 perform this approximation by drawing S poste-rior samples from p(θ|y, ζ), denoted {θs}S

s=1, and then calculating the likelihood

1 Examples of cross-validation are K-fold cross validation, stratified cross validation, etc. (Jameset al., 2014)


p(yn|θs, ζ). Suppose we have N observed data points y ∈ RN , N observed covari-ates ζ ∈ RN , then the lppd is defined as (Gelman, Hwang, and Vehtari (2014))

ˆlppd =N

∑n=1

log

(1S

S

∑s=1

p(yn, |θn, ζs)

). (2.8)

As stated by (Gelman, Hwang, and Vehtari, 2014), the lppd will overestimate thepredictive accuracy of the model as it is analogous to the L term in Eq. (2.5). Thelppd is therefore corrected to incorporate the effectiveness of the parameters; seeGelman, Hwang, and Vehtari (2014) for an extensive explanation, resulting in theWAIC criterion

WAIC = lppd− pWAIC, (2.9)

where the correction term pWAIC is defined as the posterior variance

pWAIC =N

∑n=1

Var[

log(∫

p(yn|θ, ζ)p(θ|y, ζ)dθ)]

. (2.10)

The WAIC together with Posterior Predictive Checks will be used in Chapter 3 toevaluate the models for different parts of the claim process. The WAIC functionalityfrom the PyMC3 package Salvatier, Wiecki, and Fonnesbeck (2016) is used to thisend.

2.3 Waiting Times in Bayesian Models

In the claim process as depicted in Figures 1.1 and 1.2 there are several time delayspresent: a waiting time from event occurrence until reporting (∆trep), a waiting timebetween reporting and settlement (∆tset), and one or more payment delays (∆tpay).These delays all play the role of the data (y) in the equations in Section 2.2. To beable to perform Bayesian micro-level loss reserving on claims following the processabove, it is necessary to develop a model for each of these time delays.

The above-mentioned time delays depend on many factors but all are subject tohuman behaviour. Consequently, it makes sense to utilise existing theory on delaymodelling. One field of study particularly attractive for this purpose is SurvivalAnalysis: it is the study of modelling the duration until some event happens, and ismost often used in the analysis of failure of mechanical systems, incubation time ofdiseases, or in models for mortality predictions.

We introduce some notation. Let f (t) be the Probability Density Function (PDF)of waiting times (∆t in Figures 1.1 and 1.2) to a particular type of event (report-ing, settlement or payment within a claim) at time t. Here, time is relative in thesense that it resets to zero at the end of each delay in the claim process, so that tis effectively ∆t. In each component of the claim process as defined in Section 1.2,0 ≤ t ≤ ∆tz, where z ∈ {rep, set, pay} represents a delay type. For example, f (t)could be the distribution of the waiting time between an occurrence (tocc) and timeof reporting (trep). Furthermore, let

F(t) =∫ t

−∞f (u)du (2.11)

2.3. Waiting Times in Bayesian Models 15

be the cumulative distribution function of f (t). Then the probability of the waitingtime being greater than t = T is defined as (Kleinbaum and Klein, 2005):

S(t) = Pr(T > t) = 1− F(t). (2.12)

For convenience, we also introduce the so-called hazard function h(t). It is definedas the instantaneous rate at which an event occurs conditional on the event not oc-curring up to time t:

h(t) =f (t)S(t)

. (2.13)

Parametric assumptions are often made in order to model S(t) and h(t). The dis-tributions used in this context only have positive support, as the delay ∆t until anevents happens is equal or larger than zero. Common continuous distributions usedin Survival Analysis are (amongst others): Exponential, Weibull, Gamma, and Log-normal distributions. Figure 2.1 shows respectively the PDF and the hazard functionfor the Weibull distribution for several pairs of its parameters. Note that the PDF inFigure 2.1A is either monotonically decreasing (k < 1) or has a ‘hump‘ at some spec-ified point t (k > 1). The hazard function in Figure 2.1B in contrast, is monotonicallydecreasing or increasing.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

Time (t)

Prob

abili

ty(f(t))

λ=1, k=0.25λ=1, k=0.75λ=1, k=1.5λ=1, k=2.5

(A) Weibull PDF

0 0.2 0.4 0.6 0.8 1 1.20

1

2

3

4

5

Time (t)

Haz

ard

(h(t))

λ=1, k=0.25λ=1, k=0.75λ=1, k=1.5λ=1, k=2.5

(B) Weibull Hazard Rate

FIGURE 2.1: The probability density function and hazard rates of theWeibull distribution. The PDF is either monotonically decreasing orhas a ‘hump‘. Note that when k < 1 the hazard function is decreas-ing, while k > 1 makes the hazard function increasing. Thus, theprobability of an event on time t conditional on no event up to time t

decreases or increases with time.

2.3.1 Covariates

Up to this point we have assumed that the delay distributions in the claim processare independent of distinctive features of claims and policy holders. In reality, how-ever, this simplification does not hold. Indeed, a feature such as claim type (carclaim, legal claim) or policy holder age can have a substantial impact on the shapeof the distribution of reporting delay, occurrence probability, etc. These distinctivefeatures are also called covariates, as they "co-vary" with the delays.

The advantage of using covariates is twofold. First, the model accuracy mightimprove when we expect certain covariates to affect delays. Indeed, having dis-tributions that are a function of covariates may result in more distinct variations.


Secondly, we can use the parameters of the estimated model to show whether thedifferences in delays can be explained by certain covariates. This makes the insurermore aware about which factors influence the claim process. The insurer may usethis knowledge for e.g. pricing purposes and getting a competitive gain over thecompetition.

As stated before, we want to be able to add covariates to the delay models for thereporting delay ∆trep, settlement delay ∆tset, and payment delays ∆tpay. Time-delaymodels often allow incorporation of covariates, usually in two ways (Bradburn et al.,2003).

The first way to include covariates is to shift the hazard function h(t) as definedin Eq. (2.13). This results in so-called Proportional Hazard models. The second wayis to scale the survival function S(t) (see Eq. (2.12)) along the time dimension. Suchmodels are called Accelerated Failure Time models. Which of these two models ismore appropriate in introducing covariates depends on the context, assumptionsof the researcher, and the expected behaviour of the models. We will cover bothvariations in more detail in the following sections.

Proportional Hazard

A Proportional Hazard (PH) model h(t) is defined by (Rodríguez, 2010)

h(t) = h0(t) exp(Xβ), (2.14)

where h0(t) is the baseline hazard rate (c.f. Eq. (2.13)), β ∈ Rd is a vector of dparameters, and X ∈ RN×Ncov are the Ncov number of measured covariates; onefor each observed data point yn in y = {yn}N

n=1. The Proportional Hazard modeluniformly scales the base hazard function h0(t) by means of an exponential functionthat contains the product of measured covariatesX and its parameters β.

Let us consider a concrete example. The Weibull distribution with scale λ ∈ R>0and shape parameter k ∈ R>0 has a baseline hazard rate of

hweibull0 (t) =

f (t)S(t)

=f (t)

1− F(t)=

kλktk−1 exp−(λt)k

exp−(λt)k = kλktk−1. (2.15)

Substituting Eq. (2.15) into Eq. (2.14) yields

hweibull(t) = kλktk−1 expXβ, (2.16)

which is the Proportional Hazard model of the Weibull distribution. Note that sub-stituting λ∗ = λk−1 expXβ into the above equation reveals that hweibull(t) is still aWeibull distribution. Note furthermore that, for only a couple of distributions, thissolution exists, which enables the modeller to scale the hazard function through theparameters of the model in a multiplicative fashion.

Accelerated Failure Time

An Accelerated Failure Time (AFT) model scales the survival function S(t) (c.f. Eq.(2.12)) with a certain constant along the dependent variable. The addition of covari-ates effectively makes time pass more quickly or slowly according to the value ofcertain covariates. Suppose that a vector of random time-to-events is denoted by

2.3. Waiting Times in Bayesian Models 17

∆t ∈ RN , then an AFT model imposes

∆t = exp(Xβ+ ε), (2.17)

where ε ∈ RN , where each element εi ∈ ε is Independent and Identically Distributed(I.I.D.) by some distribution f . For example, when εi ∼ N (0, σ2), then the model isalso called a log-linear model. An AFT model for a Lognormal distribution can there-fore be fit using a simple Ordinary Least Squares (OLS) regression. This propertywill be used in Chapter 3, to introduce covariates in a Bayesian Lognormal regres-sion context.

We consider again the example of the Weibull distribution. To derive the AFTmodel for a Weibull distribution, we consider the survival function S(t) in Eq. (2.12)to be our baseline, i.e., S(t) = S0(t). Suppose that we have a baseline survivalfunction. Then an AFT model will scale the function S0(t) using covariates so that

S(t) = S0(teXβ). (2.18)

Using (2.13) and substituting (2.18) yields:

S(t) = S0(teXβ)⇔ 1− F(t) = 1− F0(teXβ) (2.19)

f (t) =ddt

F0(teXβ) = f0(teXβ)eXβ (2.20)

Now the hazard rate in terms of the base rate reads

h(t) =f (t)S(t)

=f0(teXβ)eXβ

S0(teXβ)= h0(teXβ)eXβ. (2.21)

We apply this AFT model on a Weibull distribution again as

hweibull0 (t) = kλktk−1. (2.22)

Then we apply Eq. (2.21) on the baseline hazard rate

h(t) = kλktk−1eXβ(k−1)eXβ = kλktk−1eXβk. (2.23)

We can now see that the hazard rate is another Weibull distribution with λ∗ =λ expXβ. The Weibull distribution is special in the sense that it allows both thePH as the AFT model by adjusting the intensity parameter λ using the covariates.We will use this convenient property in the application of some delay models inChapter 3.

2.3.2 Censoring

A characteristic property of time delay analysis is the phenomenon of incompleteinformation about the time delays in e.g. a dataset or study. This is called censoring.There are several censoring possibilities, the two most common being Type I andType II. In this section we shall only focus on the former, since it is the only typebeing present in the general claim process.

Type I arises when not observing certain events before an observation periodends. An example arises immediately when we look into the claim process. Sup-pose we are at accounting time tacc, then some cases might not be closed yet (i.e.tset > tacc). However, we do know that these cases will have a settlement delay


(∆tset) which is at least tacc − trep. This is an example of the phenomenon called right-censoring, which is characteristic for Type I censoring (Leung, Elashoff, and Afifi,1997).

Type I censoring can occur in two ways. The delays either have a common cen-soring time (t1,c = · · · = tN,c) or a censoring time for each subject individuallyt1,c 6= · · · 6= tN,c, where N denotes the total number of claims. The likelihood func-tion for an single observation i can be written as (Leung, Elashoff, and Afifi, 1997):

Li =Pr(T = ti)1−Ii Pr(T > ti,c)

Ii

= fi(ti)1−Ii [1− F(ti,c)]

Ii ,log Li =(1− Ii) log[ fi(ti)] + Ii log[S(ti,c)],

(2.24)

where Ii is the indicator function indicating whether observation i is censored, Ii = 1if claim i is censored, Ii = 0 if claim i is not censored, and where we used Eq. (2.12)in the last step. Type I censoring with a common censoring time is relevant formicro-level loss reserving. For example, some cases may not be closed when thereserve has to be made (the date at which the reserve is made is the censoring time).Disregarding these cases will lead to a biased estimate of the settlement delay (∆tsetin Figure 1.1) as the length settlement delay will be underestimated.

2.4 Bayesian Micro-level Loss Reserving

We recall that Bayesian micro-level loss reserving uses Bayesian techniques in orderto do loss reserving on an individual claim level. In order to do reserving on a claimlevel, the insurer needs the data for the components of the general claim process asshown in Figures 1.1 and 1.2. To improve the model, additional data such as policyand external claim information could be used. These covariates could be incorpo-rated as in Section 2.2.3. Every claim is connected to one policy, but one policy canhave multiple claims. The policy can contain information about the policy holder,the insurance product, and more. A claim can also have extra covariate information.All the covariate information for claim i will be denoted by ζi in the remainder ofthis thesis.

2.4.1 Reporting Delay

We recall that the reporting delay (∆trep) is defined as the time between tocc and trep;see Figure 1.1. It forms the basis for the IBNR reserve. The reporting delay is onlyavailable in the dataset for the cases which are reported to the insurer. This waitingtime from tocc and trep can be modelled using distributions from Survival Analysis asdiscussed in Section 2.3. The PDF for the reporting delay is denoted by frep(∆trep).

Antonio and Plat (2014) use a mixture of a Weibull distribution (see Figure 2.1),with nine fixed components for the first nine days (as stated in Section 1.5). Thischoice seems artificial as the Weibull distribution does not fit natively. An introduc-tion of these fixed components seems to improve the fit, but the number of to-be-estimated parameters also increases. One could question whether the underlyingassumptions for using the Weibull distribution are accurate.

Human behaviour is of great interest when studying the reporting delay, as thereporting delay is a result of various human decisions with concomitant time inter-vals. Studies in human response times seem to mainly use the Lognormal distribu-tion. The Lognormal distribution was used by Linden (2006) in a Bayesian model

2.4. Bayesian Micro-level Loss Reserving 19

for human response times, which showed a very good fit to the data. Linden fol-lowed (amongst others) Schnipke and Scrams (1999) and Thissen (1983) by choos-ing the Lognormal distribution for modelling response times. Schnipke and Scrams(1999) compared the fit of the Lognormal distribution, with the Normal, Gamma andWeibull distributions for response times for a computer-administrated test. Theyfound that the Lognormal distribution provided the best fit on the data.

We will now look at some characteristics of the Lognormal distribution and com-pare these with the Weibull distribution of Figure 2.1. When we compare the PDFof the Lognormal distribution with the Weibull distribution (of Figure 2.1A), theyboth seem to take roughly the same shape. However, as shown in Figure 2.2B, theLognormal hazard rate could take various different forms. For values of σ > 1 itincreases during the lower values of t, reaches a maximum after some t, and de-creases afterwards. This leaves more possibility in the form of the hazard functionin the hazard function in contrast to the Weibull hazard function from Figure 2.1B,which is only monotonically increasing or decreasing. Intuitively, it makes sense forthe reporting delay to have a low hazard rate for small values of t, as clients needsome time to report the case to the insurer or do not notice the case until a certainmoment (or are not able to report the case due to some reason). It also makes senseto have a low hazard rate for large values of t, since clients are not likely to reportcases which occurred a long time ago (they might forget the case; may not be coveredwhen the case occurred etc.). For these reasons, the Lognormal distribution seems apriori more appropriate than the Weibull distribution. The fit on insurance data forthe Lognormal distribution alongside other distributions is evaluated (by means ofthe PPC and WAIC, see Section 2.2.3) in Chapter 3.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

Time (t)

Prob

abili

ty(f(t))

µ=0.5, σ=0.25µ=0.5, σ=0.75µ=0.5, σ=1.5µ=0.5, σ=2.5

(A) Lognormal PDF

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

Time (t)

Haz

ard

(h(t))

µ=0.5, σ=0.25µ=0.5, σ=0.75µ=0.5, σ=1.5µ=0.5, σ=2.5

(B) Lognormal Hazard Function

FIGURE 2.2: The probability density function f (x) and hazard func-tion h(x) for the Lognormal distribution. Note that the probabilitydensity function and the hazard function both start at zero when x iszero. The hazard function can take a variety of shapes, in contrast to

the Weibull hazard rate in Figure 2.1B.

The log-likelihood (p(y|θ, ζ) in Eq. 2.2)) for observing the reporting delays (y =∆trep ∈ RNc ) for claim 1, . . . , Nc (Nc is the total number of claims), with modelparameters θrep ∈ Rdrep , where drep is the number of parameters for the reportingmodel, and conditional on the covariate information ζrep is given by

prep(∆trep|θrep, ζrep) =Nc

∑i=1

frep,i(∆trep,i|θrep, ζrep,i), (2.25)


where frep is the probability density function of the reporting delay and ∆trep,i is thereporting delay for claim i. Applying Eq. (2.2), combining the likelihood with theprior p(θrep) for the parameters of the model, yields:

prep(θrep|∆trep, ζrep) ∝ p(θrep)p(∆trep|θrep, ζrep), (2.26)

where p(θrep) = ∏Ni=1 p(θrep,i).

Equation (2.26) is the posterior probability of observing the parameters for thereporting model (θrep). Samplers (see Section 2.2.2) will be used to get samples fromthis distribution.

2.4.2 Occurrence Times

The second part of our model is the occurrence process of the events that start aclaim. A Poisson process with intensity-parameter λ is used by Antonio and Plat(2014) to model the occurrence process (events at tocc in Figure 1.1), following Ar-jas (1989) and Norberg (1993). The intensity parameter λ is modelled separately forevery month (as discussed in Section 1.5). We will follow the approach of Antonioand Plat, in the sense that we also treat the occurrence process as an Inhomoge-neous Poisson Process. However, our methods of adjustment for the unobservedoccurrences is a bit different, as this thesis utilises Bayesian methods. We will nowelaborate on the inhomogeneous Poisson process, after which we will show how theintensity parameter is adjusted for the censored claims using the reporting delay toget the λIBNR parameter.2

Inhomogeneous Poisson Process

A homogeneous Poisson point process is formed by simulating a process from thePoisson distribution with constant λ. The likelihood of observing Nocc occurrencesbefore time t is given by

p(Nocc|λ) =(λt)Nocc

Nocc!exp (−λt), (2.27)

where λ is equal to the expected number of occurrences during a unit time and isconstant during the time of analysis.

When λ becomes dependent on time s.t. λ = λ(t) then the likelihood of observ-ing Nocc occurrences within time interval a < t ≤ b becomes (Cook and Lawless,2007)

p(Nocc|λ(t)) =1

Nocc!

b∫a

λ(t)dt

Nocc

exp

− b∫a

λ(t)dt

, (2.28)

and the process is referred to as an inhomogeneous Poisson process.Splitting this expression for every month m for a total of Nm months, where mk

denotes the day-number at the end of month k and mk−1 denotes the day-number ofthe end of month k− 1 and with Nk

occ denoting the number occurrences in month k,

2 The intensity parameter for a inhomogeneous Poisson process for the unobserved Incurred ButNot Reported claims.


yields the likelihood

p(N1occ . . . NNm

occ |λ(t)) =Nm

∏k=1

1Nk

occ!

mk∫mk−1

λ(t)dt

Nkocc

exp

− mk∫mk−1

λ(t)dt

.

Taking a piece-wise constant λ for every month k, so that λ(t) is the infinite set ofrates {λ1, . . . , λNm}, the integral simplifies to

p(N1occ . . . NNm

occ |λ1, . . . , λNm) =Nm

∏k=1

(λk)Nk

occ

Nkocc!

exp (−λk) . (2.29)

This likelihood is used to estimate the intensity of the occurrence process in a Bayesiansense. However, we did not correct the intensity for the not-reported cases. There-fore, we have to adjust the intensity parameter somehow, using the distribution forthe reporting delay. This will be done now.

Intensity Adjustment for IBNR cases

Suppose that the reserve is estimated at accounting time tacc. When the occurrencedates gets closer to tacc, a lot of cases have not been reported to the insurer yet. Tothis end, the rate of occurrence (λ) of the Poisson process should be adjusted with thedistribution for the reporting delay to account for the cases which incurred but arenot reported (IBNR). The probability that a given occurrence event is not reported attime tocc becomes

Pr(∆trep > tacc − tocc) = 1− Frep(tacc − tocc) = 1−tacc−tocc∫

0

frep(t)dt, (2.30)

where F is the CDF as defined in Eq. (2.11) for the reporting delay ∆trep, and frep(t)is the PDF of the reporting delay where 0 ≤ t ≤ ∆trep. Equally, the probability that aclaim has been reported at time tocc is equal to

Pr(∆trep ≤ tacc − tocc) = 1− Pr(∆trep > tacc − tocc) = Frep(tacc − tocc). (2.31)

The monthly rate of occurrence (λ) in Eq. (2.29), has to be adjusted using the distri-bution for the reporting delay. This will yield the intensity for the occurrences thathave not been reported yet. We correct this intensity as

λIBNRk = ρIBNR

k λk, (2.32)

where we define ρIBNRk as the average probability of not having observed the claim

in month mk, divided by the average probability of having observed the claim in mk.

ρIBNRk =

λIBNRkλi

=

1mk−mk−1

mk∫mk−1

Pr(∆trep > tacc − t)dt

1mk−mk−1

mk∫mk−1

Pr(∆trep < tacc − t)dt=

mk∫mk−1

1− Frep(tacc − t)dt

mk∫mk−1

Frep(tacc − t)dt.

(2.33)


The CDF (Frep(t)) in Eq. (2.33) will be computed using samples from the PosteriorPredictive Density (see Section 2.2). Also, the parameter λk is again inferred usingBayesian methods. This makes every part of the occurrence process fully Bayesian.

2.4.3 Settlement Delay

The settlement delay yset = ∆tset ∈ RNc , where Nc is the number of claims in thedataset, is defined in Figure 1.1 as the time from trep until tset. It is the driver for theRBNS reserve as it contains the claims that are reported to the insurer but are notsettled yet. The insurer can therefore expect more costs on the claim in the future.Again, this distribution is expected to follow a distribution from Survival Analysisas it is a waiting time to a one-time event. One could use the same arguments as usedin Section 2.4.1 to choose the Lognormal distribution as the preferred distribution,since the settlement delay is again a function of various human behaviours withcorresponding time delays. The distributions from Survival Analysis are comparedin Chapter 3. The settlement delay is only available for cases that are closed. Thiscreates a censoring bias (see Section 2.4.3), as the cases which are open for a long timemight be excluded from the dataset. The next section explains how the likelihood isadjusted to take this bias into account.

Censoring

Censoring, as discussed in Section 2.3.2, applies in particular to the settlement de-lay. Suppose a claim is currently in its development phase tc > trep, but the truesettlement time (tset) is not observed. This makes a settlement censored with ∆tset >trep − tc. One could opt to disregard the claims that have not been settled, but thisleads to a biased estimate as we would underestimate the settlement delay. GivenNc claims, there are Nc settlement delays ∆tset. However, some of them are censoredat time tc so that the observed value ∆tset > tc − trep. Then the likelihood functioncan be written as

L(∆tset) =Nc

∏i=1

[Pr(T = ∆ti,set)

I(ti,set<tc)Pr(T > tc − ti,rep)I(ti,set>tc)

](2.34)

=Nc

∏i=1

{f (∆ti,set)

I(ti,set<tc)[1− F(tc − ti,rep)

]I(ti,set>tc)}

(2.35)

log L(∆tset) =Nc

∑i=1

[I(ti,set < tc) log f (∆ti,set) + I(ti,set > tc) log S(tc − ti,rep)

], (2.36)

We can view this as a multiplication of the probabilities of the settled claims ( f (∆ti,set))times the probability for a settlement delay to be larger than the currently observedtime from reporting until the censoring date (1− F(tc − ti,rep)). The indicator func-tion I is unity when the expression within the brackets is true, and zero otherwise.The likelihood (adjusted for censoring) in the equation above is used to get the pos-terior distribution for the parameters using the formula for Bayesian inference (Eq.(2.2)):

pset(θset|∆tset, ζset) ∝ p(θset|ζset)p(∆tset|θset, ζset) (2.37)

∝ p(θset|ζset)Nc

∏i=1

{f (∆ti,set)

I(ti,set<tc)[1− F(tc − ti,rep)

]I(ti,set>tc)}

, (2.38)


where in the last step we substituted Eq. (2.34) for the term p(∆tset|θset, ζset), θset ∈Rdset where dset is the number of parameters for the settlement model and ζset ∈ RNc

is covariate information for the settlement model.

2.4.4 Payments

The payments within a case are made according to Figure 1.2. Each payment consistsof a time tpay, type atype and an amount aamount. The inter-event time between pay-ments (∆tpay) is modelled instead of the time from trep to a payment, which makesthe gap time between payments a renewal process, see Cook and Lawless (2007) (p.39-43). Cook and Lawless assume the gap times to be Independent and IdenticallyDistributed (I.I.D.). This allows for using standard methods from Survival Analysis.

Furthermore, a payment can be negative when it is received, which is denotedby the payment type atype. The next paragraphs will describe each part of the pay-ment process separately. For consistency, the total number of claims in the dataset isdenoted by Nc, while the total number of payments in the dataset is denoted by Np.

Payment Delay

The delays in Figure 1.2 are the payment delays for payment j as ∆tjpay, which is

the time from the previous payment (tj−1pay ) to the next payment tj

pay. We opt formodelling the delays between payments, instead of modelling the time from trep toa payment. This makes the gap time between payments a renewal process, see Cookand Lawless (2007) (pg. 39-43). Cook and Lawless assume I.I.D. gap-times, thisallows to use standard methods from Survival Analysis. We can therefore also usestandard methods from Survival Analysis for adding covariates (see Section 2.2.3),which are only applicable waiting times to a one-time event.

Assuming independent and identically distributed (I.I.D.) gap times within a re-newal process, might be a strong assumption. Therefore, we introduce covariates(ζpaydelay) about the previous gap times to makes the gap-times dependent on eachother. Survival models allow us namely to introduce various covariates into theequation using the familiar Proportional Hazard and Accelerated Time Failure mod-els (Cook and Lawless, 2007). This makes it possible to make the hazard functions ofthe gap times dependent on past information (e.g. the amount of time passed sincethe reporting date and the last payment.

A Poisson process can be seen as a special case of the renewal process, in which thegap times are I.I.D. and exponentially distributed (Cook and Lawless, 2007). Writ-ing a model as a renewal process lets us therefore model the payment occurrenceprocess with approximately the same properties as an inhomogeneous Poisson pro-cess, in which the intensity parameter λ depends on the position on the time-line.Covariates for the payment delay model (time since previous payment and extra in-formation about the claim) are denoted by ζpaydelay, model parameters by θpaydelay.Filling in the posterior for the payment delay using Eq. (2.2) gives

p(θpaydelay|∆tpay, ζpaydelay) ∝ p(∆tpay|θpaydelay, ζpaydelay)p(θpaydelay), (2.39)

where θpaydelay ∈ Rdpaydelay (dpaydelay are the number of parameters in the paymentdelay model), {y, ζ} = {∆tpay, ζpaydelay} where Np is the number of payments in thedataset and ζpaydelay is a vector with covariate information.

With regards to the distribution of ∆tpay we expect the following two character-istics:


I. The hazard rate is expected to be low for small values of t as the possibility oftwo consecutive payments is low for small values of ∆tpay.

II. The hazard rate is expected to be higher for increasing values of ∆tpay. Thiscorresponds to the expected behaviour that cases on which no payment hasbeen made for a long time are more likely to have a payment in the future.

Statement I and II argue for the use of a distribution with an ever increasing haz-ard rate. Distributions within the field of Survival Analysis that have these proper-ties are the Weibull distribution (see Figure 2.1B) and the Gamma distribution. TheWeibull distribution is often used to model failure times of mechanical devices. Fig-ure 2.1 shows the Weibull PDF and hazard rate for different parameter values. Fork < 1 the mean time in between failures decreases, while this time increases fork > 13. The Gamma distribution is often use to describe the amount of time neededfor k events to occur, when the inter-event times are modelled using a Exponentialdistribution. One could argue that the payment delay is a combination of exponen-tial waiting times, which makes the Gamma distribution a worthy contender. Figure2.3 shows both the PDF and the hazard rate for the Gamma distribution.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

Time (t)

Prob

abili

ty(f(t))

shape=0.25, scale=1shape=0.75, scale=1shape=1.5, scale=1shape=2.5, scale=1

(A) Gamma PDF

0 0.5 1 1.5 2 2.5 30

1

2

3

Time (t)

Haz

ard

(h(t))

shape=0.25, scale=1shape=0.75, scale=1shape=1.5, scale=1shape=2.5, scale=1

(B) Gamma Hazard Function

FIGURE 2.3: The probability density function f (t) and hazard func-tion h(t) for the Gamma distribution for different shape parameters.Note that the hazard function asymptotically reaches the value of 1,whereas the Weibull hazard function from Figure 2.1B reaches 0 or ∞

in the limit.

Payment Type

With the payment delay defined, we now define the sign of the payment. Pay-ments on a claim process can be both positive (type I) or negative (type II). Negativepayments are interpreted as receivables (the insurer receives money from a party).Whether a payment is of type I and II is modelled by logistic regression.

The independent variables of the logistic regression are ζpaytype, which can in-clude covariates about the claim and the time from trep to the previous paymentt

previouspay as tprevious

pay − trep. The intercept of the regression is α ∈ R. The coefficients ofthe regression are β ∈ RNcov (Ncov is the total number of covariates) for independent

3 For k = 1 the mean time in between failures in constant and the Weibull distribution reduces tothe exponential distribution.


variable ζpaytype. Whether a payment i is of type I or II is indicated by a zero or unityindicator variable ai

type.The probability of observing vector atype (the likelihood) is given by a product of

Bernoulli distributions as

p(atype|α,β, ζpaytype) =Np

∏i=1

(πi)ai

type(1− πi)1−ai

type (2.40)

where ypaytype = atype, with aitype ∈ atype, θpaytype = (α,β), ζpaytype = (t

previouspay −

trep, . . . ). We use a logistic-regression, so that {πi}Npi=1 becomes a logistic function as

πi =[1 + exp(−α + βζpaytype)

]−1 . (2.41)

Applying Eq. (2.2) the posterior density is given by

ppaytype(α,β|atype, ζpaytype) ∝ p(α,β)Np

∏i=1

(πi)ai


type (2.42)

where aitype ∈ atype. We will apply Eq. (2.42) in Chapter 3 to sample from the poste-

rior to infer the parameters of the distribution.

Payment Amount

Each payment in Figure 1.2 comes with a monetary amount aamount. Bahnemann(2015) gives an overview of the distributions often used by actuaries to model theclaim-size, referred to as the loss distribution. He investigates distribution with pos-itive support only, as he assumes that claims cannot be negative. In our case, intro-ducing the payment-type in the previous section allows to use these distributionswith positive support only, as the event type simply indicates whether a payment ispositive is negative. Bahnemann uses the Gamma distribution, as well as the Log-normal and Pareto distributions. The Lognormal distributions has shown to be agood fit for the payment amount, as Antonio and Plat (2014) report after compar-ing the Weibull, Lognormal and Gamma distributions. As before, we use introducecovariates into the payment distribution as ζamount. As one of these covariates weadd the time passed from reporting until the current payment as (tcurrent

pay − trep). TheGamma-model allows to use a Generalised Linear Model (GLM) to add covariates(Frees, Meyers, and Derrig, 2016). The Lognormal distribution can not be used in aGLM context, as the distribution is not of the exponential family (Frees, Meyers, andDerrig, 2016). Antonio and Plat (2014) overcome this difficulty by using categoricalvariables as covariates, by which they effectively estimate separate models for eachcombination of covariates. Another option would be to use a Lognormal regression,which takes the logarithm of the response variable (claim amount) and uses a ordi-nary least squares (OLS) regression model with covariates. We will use this optionduring our case study in Chapter 3. The likelihood for payment amount is given by(starting from Eq. (2.2))

pamount(θamount|aamount, ζamount) ∝ p(aamount|θamount, ζamount)p(θamount), (2.43)

were y = aamount ∈ RNp , ζamount = (tcurrentpay − trep, . . . ).


2.4.5 Reserves

We have now defined all the separate models needed. Chapter 3 will infer thesemodels on data provided by a insurer. However, the procedure of sampling reservesat the accounting date tacc will be given now. Using the posterior distributions asobtained in the previous sections and applying Eq. (2.4), we can get samples fromthe Posterior Predictive Density from our models. Simulating from the PPD is doneby using PyMC3, a package for Bayesian interference in Python (Salvatier, Wiecki,and Fonnesbeck, 2016).

2.4.6 RBNS Reserve

Algorithm 1 of Appendix C shows the procedure of sampling the RBNS reserve inpseudo-code. The input for the model are the claims for a particular occurrenceyear yocc, which are yet to be closed. Then, lines 3-6 calculate the settlement delay,conditional on the time passed since trep and other covariates. Then, lines 7-18 ofthe algorithm draw the paths of the development process as in Figure 1.2 for eachopen claim. This procedure halts when all claims are closed, what happens when (I)a claim is past its settlement time or (II) a claim is larger than the development yearydevel as we only need payments within a certain development year.

2.4.7 IBNR Reserve

Algorithm 2 of Appendix C shows how samples from the IBNR reserve are obtained.The procedure of drawing the development paths are equal to line 7-30 of Algo-rithm 1. However, the procedure for the IBNR reserves add three more steps. First,samples are taken from the occurrence process, which is a Poisson process with anreporting-delay adjusted intensity parameter for month i as λIBNR

i (see Section 2.4.2).These samples are used as beginnings of claim developments. Then, with theseIBNR claims a reporting delay is sampled (see line 6-8). The rest of the procedure isequal to the algorithm for the RBNS reserve.

27

Chapter 3

Case Study: ARAG Legal Insurance

3.1 Introduction

In Chapter 2 the theory behind Bayesian inference and its application to a generalclaim process are explained. In this chapter we consider a case study in which weapply the theory from Chapter 2 to historical data from the Dutch office of the legalinsurer ARAG SE.

ARAG (an acronym for Allgemeine Rechtsschutz-Versicherungs-AG) SE is a legal in-surer founded in 1935. It is the second-largest legal insurer in the world and servescustomers in the United States, the United Kingdom, and thirteen European coun-tries including Germany and The Netherlands. The founding principle - giving allcitizens (not only the wealthy ones) the ability to assert their legal rights - is stillstrongly abided within the company.

We focus in this thesis on data from the Dutch office of ARAG. ARAG’s lawyershandle legal cases with different subjects. These cases can be grouped into differentsubject categories. The categories have certain common characteristics. For example,the most common cases (such as Traffic cases) will take less time to handle comparedto more complicated cases. Also, the payments, reporting delay, or time betweenpayment may depend on the subject of a case. All these factors influence the loss-reserves in the end, and are interesting to account for in a model. This makes themodel ’micro-level’ as well, as the reserve will be based on case-details.

This chapter is organised as follows. In Section 3.2 we discuss the data setthat is used in our Bayesian micro-level loss model. Next, we tailor the Bayesianmicro-level loss reserving methodology explained in Chapter 2 to the legal insur-ance branch. The results of the Bayesian inference on the data of ARAG NL, asobtained by using the state-of-the-art library PyMC3 for Python, are shown and dis-cussed in Section 3.3. We motivate from a fundamental point of view why the choiceof some distributions differ from those selected in literature. Furthermore, we showthat the Bayesian methodology is a natural and ideal choice for selecting the properdistribution with certainty.

3.2 Data

The data that is used in the next sections is a subset of all claim data of ARAGNetherlands (ARAG NL). The dataset contains the detailed development of 52033claims from a reporting date of January 1, 2000 and until (and including) December31, 2018. Furthermore, each case contains a subject code, reporting date, settlementdate, payment dates and payment amounts. The payments within a case can be splitinto external and internal costs. External costs are defined as the amount of moneyARAG pays (receives) to (from) an external party on a case. In contrast, internal costs

28 Chapter 3. Case Study: ARAG Legal Insurance

on a case are the costs that ARAG makes internally (lawyer costs, facility etc.). Thepayments in our dataset only include the external costs e.g. court-fees, surrender-costs or the cost to hire external professionals. Note that this is just a subset of thedata that ARAG offers. The data contains both open and closed cases, meaning thatcensoring has to be taken into account during inference.

The dataset contains the following information for on a case level in the dataset:

I. Claim occurrence date (tocc): the date at which the event happens that is the reasonfor the start of the case.

II. Claim reporting date (trep): the date at which the claim is reported to ARAG.

III. Claim settlement date (tset): the settlement date for the case.

IV. Payment dates (tpay): dates at which the external costs occur on a case.

V. Payment amounts (without taxes) (aamount): the amount of the external costs.

VI. Subject code (as covariate information ζ): the subject-code for the case.

The time variables tocc, trep, tset, tpay and aamount correspond to the definitions givenin Section 1.2. Note that all this information is required to perform a fully Bayesiananalysis on the claim process as defined in Section 2.4. The data is pre-processedso that it yields the corresponding delay and payment information, cf. Section 1.2.We see that the dataset includes the subject code for each claim. We will show thatadding this covariate into the analysis of the Bayesian models, often results in a bet-ter fit than without it. The legend in Figure 3.1 shows the color coding for the sub-jects that will be used throughout the rest of this chapter. Subjects that contain lessthan a thousand cases will be added to the Other/Remaining category. The number

FIGURE 3.1: The legend for the subjects of the cases. The color codingwill be used during the rest of the case-study. Cases for which lessthan thousand cases are available are put in the Other/Remaining

category.

of cases in the data as function of subject code is shown in Figure 3.2A. Notice thatthe category with the Other/Remaining category, contains a relatively large amountof cases. This tell us that there are a relatively large amount of cases for which thesubject code contains less than one-thousand observations. Figure 3.2B shows thenumber of occurrences against different months. Notice that the number of reportedoccurrences declines fast after 2016. This is due to the fact that the occurrences (tocc)for this date range have not all been reported to ARAG yet (IBNR cases). Further-more, we see large peaks in occurrences at the beginning of the year. This is often

3.3. Application of Micro-level Loss Reserving 29

due to the fact that the exact occurrence date of a claim is often hard to determine,or that a contract starts at the beginning of the year. The first is a problem in thedataset, whereas the latter described a phenomenon in reality. We will investigatethis effect further in Section 3.3.2.

(A) The number of claims corresponding asa function of subject code. The color codescorrespond to the legend in Figure 3.1. Wesee that the Traffic and Injury subject contain

the most cases.

(B) The number of reported occurrences permonth. We see a couple of large peaks at the

start of each year.

FIGURE 3.2: Case numbers by (A) subject code and by (B) occurrencedate.

3.3 Application of Micro-level Loss Reserving

With the data explored, we are now going to fit the distributions of the componentsas in Section 2.4 of the Theory chapter on data as provided by ARAG NL. Bayesianinference is used to get posterior samples for the model parameters and performanceof the models is assessed by means of the WAIC and Posterior Predictive Checks (seeSection 2.2.3).

This section is structured as follows. First, we will look into the reporting delay(∆trep). To this end, we compare various distributions (Lognormal, Weibull, Gamma,Exponentiated Weibull and Exponential) to see which model is the most accurate(using methods from Section 2.2.3). Second, we look into the occurrence processto get the intensity parameters of the Poisson distribution for the IBNR cases. Thereporting delay is then used to adjust the intensity of the occurrence process (seeSection 2.4.2). Third, we look into the settlement delay (∆tset) and compare the Log-normal, Gamma and Weibull distributions. We also see how censoring plays animportant role within this settlement process. Finally, we will examine the devel-opment process (see Figure 1.2). For the payment delay (∆tpay), the Lognormal,Gamma and Weibull distributions will be regarded, as they are often used in lit-erature (Bahnemann, 2015). The payment type is inferred using a Bayesian logisticregression. The payment amount (aamount), the Gamma and Lognormal distributionwill be compared and the fit will be compared when we introduce covariates intothe model.


3.3.1 Reporting Delay

The reporting delay is considered and modelled separately for different subject codes.Figure 3.3 shows a violin plot for the reporting delay. The colors of the violins corre-spond to the colors and subject codes of the legend in Figure 3.1. The horizontal axisshows the various subjects and the vertical axis denotes the reporting delay (∆trep)in days. The white dot in the middle of a violin denotes the mean of the data and theupper and lower edges of the black boxes denote the higher and lower quartile of thedata respectively. The bodies (widths) of the violins are estimated with Kernel Den-sity Estimation (KDE), using the Seaborn package in Python. Most noteworthy arethe subjects Traffic and Injury, which seem to have a much smaller reporting delayon average. Also, referring to Figure 3.2A, these are the two largest subject codes inthe dataset. The fact that these reporting delays are smaller on average may be par-tially explained by the fact that the occurrence of these events (a car crash or injury)often have consequences that are immediately noticeable, whereas negative conse-quences of a contractual agreement (tocc is the date at which the contract is signed)only becomes apparent when a dispute arises. Furthermore, Traffic and Injury havea clear occurrence date (clients will most likely remember the exact date of the trafficincident or the injury, it might also by part of a police report), while for other subject-codes the exact date may be unknown or difficult to determine. Also, when an exactoccurrence date is not known, the occurrence date is often set on the first day of theyear (this will become apparent in Section 3.3.2).

FIGURE 3.3: Violin-plot for the reporting delay for different subject-codes. The vertical axis denotes the reporting delay, the horizontalaxis the subject, and the width of the violin depicts the Kernel DensityEstimate (KDE) of the reporting delay. Noteworthy are the Traffic andInjury subjects, where the reporting delay is on average a lot smallercompared to other subjects. These subjects also have the largest claim

count (see Figure 3.2A).

In Section 2.4.1, we showed the shape of the hazard rate for both the Weibulland Lognormal distribution. Next, we examine the empirical hazard function h(t)of our data. To see whether the shape matches one of the distributions of Section2.4.1. We do this by again aggregating our data, such that we do not discriminate by


subject-code. The empirical hazard-rate follows as:

h(t) =f (t)S(t)

=

(1

Nc

Nc

∑i=1

I(∆tirep = t)

)(1

Nc

Nc

∑i=1

I(∆tirep > t)

)−1

, (3.1)

where I(a = b) is an indicator function, which is 1 when a = b and 0 when a 6= b;∆trep ∈ R

Nc+ , where Nc is the number of cases in the dataset.

Figure 3.4 shows the empirical hazard rate (h(t)) for the reporting delay. Recallthe approach of Antonio and Plat (2014): they used a Weibull distribution to the re-porting delay, with nine fixed components for the first nine days. From our data,the Weibull distribution seems a logical choice after the first ten days, as the hazardrate seems to decrease exponentially. This decay seems to correspond to the hazardfunction in Figure 2.1B. However, in approximately the first ten days, the hazardfunction seems to increase, rather than decrease. To this end, the Lognormal haz-ard rate (see Figure 2.2B) is even more appropriate, as its shape better matches theempirical hazard rate.

FIGURE 3.4: Emperical hazard rate, as computed by applying for-mula in Eq. (3.1). When we compare the hazard function of the Log-normal distribution in Figure 2.2B with the hazard function of theWeibull in Figure 2.1B, we see that the Lognormal can take a more

appropriate fit to the data.

The reporting delay is modelled parametrically, with the subject code as an co-variate, as the violin plot in Figure 3.3 shows large variation over subject code. Thus,the covariates for the reporting delay (ζrep) for claim i reduces to ζ i

rep = xis ∈ RNs ,where Ns denotes the number of different subjects in the dataset. Here, xi

s ∈ RNs isa one-hot encoded vector (for example, when Ns = 3 then a subject code of 2 getsrepresented as xs = (0, 1, 0)). The covariate matrixXs ∈ RNc×Ns is defined as

Xs = [x1s ,x2

s , . . . ,xNcs ]T, (3.2)

where Nc is the number of claims in the dataset.The next paragraph will compare various distributions of the reporting delay

by means of the WAIC and PPC. This will show that the Lognormal distributiondescribed the data most accurately. For now, we first formally define the model. For


the Lognormal reporting model, the vector yrep = ∆trep ∈ RNc is represented as

yrep = ∆trep|Xs,βµs ,βσ

s ∼ Lognormal(Xsβµs ,Xsβ

σs ) (3.3)

where βµs ∈ RNs and βσ

s ∈ RNs , θrep = (βµs ,βσ

s ) and ζrep = Xs so that

∆trep = exp(Xsβµs + ε), (3.4)

where ε ∈ RNc ∼ N (0,Xsβσs ). For this Lognormal-model, the posterior can be

written as (starting from Eq. (2.26)):

p(θrep|∆trep, ζrep) ∝ p(∆trep|θrep, ζrep)p(θrep)

p(βµs ,βσ

s |∆trep,Xs) ∝ p(∆trep|βµs ,βσ

s ,Xs)p(βµs ,βσ

s ),(3.5)

where Xs is defined as in Eq. (3.2). Furthermore, we choose to take an (almost) flatprior on our parameters s.t. βµ

s ∼ N (0, 200) and βσs ∼ N (0, 200);βσ

s > 0.Figure 3.5 shows a comparison of the values of the Watanabe-Akaike Information

Criterion. The lower the deviance, the better the model fits. From Figure 3.5, we see

FIGURE 3.5: The WAIC comparison for different models of the report-ing delay ∆trep. The open-dot (IV), corresponds to the value of theWAIC. The black error-bars correspond to the standard deviation ofthe WAIC. The dashed-gray line (II) corresponds to the lowest valueof the WAIC. The black dot (I) corresponds to the WAIC without thecorrection term pWAIC (see Eq. (2.9) of Section 2.2.3). The gray trian-gle and the corresponding error bars (III) indicate the standard error

of the differences between the model and the best model.

that the Lognormal model has a better fit to the data than either the Weibull, Gamma,Exponentiated Weibull and Exponential distributions. This corresponds with ourinitial expectations as expressed in Section 2.4.1. We also see that the confidence in-tervals are non-overlapping between the Lognormal distribution and the rest of themodels, which provides great evidence for the statistically significant superiority ofthe Lognormal model over the other models. We also see that the Weibull distribu-tion is the second best option, which is the distribution used by Antonio and Plat(2014). However, we expect to find that the Lognormal distribution fits better thanthe Weibull distribution for smaller values of t, as we do not introduce the rather ar-bitrary fixed components. This will become more clear in the next paragraph, when


we look into the PPC for both the Weibull and the Lognormal distribution.We used the NUTS sampler within PyMC3 to infer these posterior samples from

the model, using the method pymc3.sample(samples, model, ...) with a samplesize of 1000. The resulting posterior distributions for the Lognormal model is givenin Figure D.1 in Appendix D. The posterior distributions for both the βµ

s (mu in theupper left panel of Figure D.1) as well as the βσ

s (sd in the lower left panel of FigureD.1) parameter are shown. The color codes correspond to the legend in Figure 3.1.We see that the Traffic, Injury and Labour cases both have the lowest mean. Thiscorresponds to the violin-plot in Figure 3.3, where these subject codes a have a muchsmaller reporting delay on average. Furthermore, we see that all distributions arewider when less data is available for subject codes. Indeed, both the Tenancy and theAdministrative law codes have less than 1000 data points (Fig. 3.2A), and have a verywide standard deviation in the posterior densities.

We now analyse the quality of the model using the PPC check as defined in Sec-tion 2.2.3. To this end, we take samples 1000 from our Posterior Predictive Distribu-tion (see Eq. (2.4)) using the pymc3.sample_posterior_predictive(samples, model,...) method in PyMC3. We plot the histogram of the data on top of the histogram ofour samples from our Posterior Predictive Density. This corresponds to a PosteriorPredictive Check (PPC), as described in Section 2.2.3. Figure 3.6 shows the PPC forthe Lognormal distribution. Figure 3.7 shows the PPC for the Weibull distribution.The Lognormal distribution is expected to overlap more with the histogram of thedata than the Weibull distribution does. Indeed, Figure 3.6 seems to be accurate bothin the lower ranges of t, as well as in the tail range. Next, we look into the PPC of

(A) PPC of the first 60 days (B) PPC of the tail of the distribution.

FIGURE 3.6: Posterior Predictive Check for Posterior Predictive Sam-ples from the Lognormal model versus the observed data. Both figure(a) and (b) seem to fit properly. The tail of the distribution fits partic-

ularly well.

the Weibull distribution. Here, we see a misfit where t < 15, which is in line withour expectations in the previous paragraph.

3.3.2 Occurrence

Until now, we have modelled the reporting delay of the claim process (trep in Figure1.1). The next component to model will be the occurrence rate. This is the rate atwhich the events happen that initiate the claim in the (see tocc in Figure 1.1) as fully


(A) PPC of the first 60 days (B) PPC of the tail of the distribution.

FIGURE 3.7: Posterior Predictive Check for Posterior Predictive Sam-ples from the Weibull model versus the observed data. We see thatthis fit is not as good as the PPCs for the Lognormal distributions, as

depicted in Figure 3.6.

explained in Section 2.4.2. As described in Section 2.4.2, we will infer an inhomo-geneous Poisson process on the occurrence times (tocc) of the claims, then we willuse the distribution of the reporting delay to adjust for the unobserved occurrencesusing Eq. (2.32). We consider Nm = 228 months (January 1, 2000 until December 31,2018) in our dataset, so that λ ∈ R

Nm+ . Furthermore, we estimate the λ parameter

vector for each subject individually. This allows us to use the reporting distributionfor a matching subject code to adjust for the IBNR cases (see Section 2.4.2).

FIGURE 3.8: Heatmap of posterior samples from the λ parameter vec-tor of the occurrence rate for cases with the Contractual (160) subject-code. Note that this heat map is not adjusted for the non-reportedcases. Therefore, we observe a sharp decline of the intensity param-eter after 2017. The color coding of the heat map corresponds to thenumber of samples. The vertical axis denotes the value of λ, the hor-

izontal axis denotes the month.


Figure 3.8 shows the (unadjusted) posterior occurrence rate for the Contractualsubject code. The vertical axis contains the Poisson intensity parameter, while thehorizontal axis denotes the corresponding month. The color coding of the heat-mapcorresponds to the number of samples from the posterior distribution of λ. We see apeak in occurrence every first month of the year. This is often due to the fact the exactdate of occurrence is hard to determine. Next to this effect, the Contractual subjectcode might include more cases for which a legally-binding agreement is signed forwhich the starting date is the beginning of the year. The part of the heat-map, whichis close to 2019-01, decreases rapidly. This is due to the fact these cases have notbeen reported to the insurer. When we take the reporting delay into account andadjust the occurrence rate, we expect to see more occurrences in the last months.This is depicted in Figure 3.9. This again shows the occurrence heat-map for theContractual subject, but this time the Incurred But Not Reported (IBNR) cases havebeen included using Eq. (2.32). Note that the Contractual subject has a long reportingdelay on average (see Figure 3.3). Therefore, the effect of not observing certain cases,will still be apparent quite a long time before the accounting moment tacc.

FIGURE 3.9: Heat map of posterior samples from the λ parame-ter vector of the occurrence-rate for cases with the Contractual (160)subject-code. This heat map is adjusted for the non-reported cases,using the distribution for the reporting delay. The sharp decline weobserved in Figure 3.8 is no longer visible. See Fig. 3.8 for a descrip-

tion of the axes.

Figure 3.9 shows the heat map of samples of the λ parameter for the Contractualsubject-code, adjusted with the distribution for the reporting delay. Interestingly andlogically, we see that the rate is properly adjusted after corrections. Note that, with calcu-lating the adjustment, all uncertainty has been fully incorporated. That is, samplesfrom the distribution of ∆trep have been used to compute samples of Frep(tacc − t) inEq. (2.33), after which the samples of λIBNR were determined to adjust the (alreadyuncertain) λ in Figure 3.8.

Figure E.1 in Appendix E shows samples of λIBNR only. This is effectively the dif-ference between the non-adjusted heat map in Figure 3.8 and the adjusted heat mapin Figure 3.9. We see that the adjustment factor becomes more-and-more uncertainbut also increases when we get closer to the accounting date tacc of December 31,


2018. This behaviour is expected as the number of unreported occurrences increaseswhen we get closer to the accounting date.

As visible from Figure 3.3, the subject-code Traffic has a shorter reporting delayon average. It is therefore interesting to examine the posterior results of reportingdelays with this subject code. The non-adjusted heat-map has been depicted in Fig-ure E.2 of Appendix E. Notice that we do not observe high peaks in the intensityparameter (λ) in the beginning of each year. This is due to the nature of the Trafficcases, which often have a very clear occurrence day (the day at which an accidenthappens for example). Notice the steep incline of the intensity parameter in 2012, itis due to a policy change within ARAG.

Finally, we plot the intensity parameter for the IBNR cases with the Traffic subject-code in Figure E.3 of Appendix E, with the same scale for the horizontal axis asFigure E.1. We observe that the number of occurrences decreases much faster formonths into the past than in Figure E.1. This makes perfect sense, as the the averagereporting delay is much smaller for the Traffic subject (see Figure 3.3).

3.3.3 Settlement Delay

The components occurrence and reporting in the claim process of Figure 1.1 havenow been addressed. We now look into the settlement delay as the from trep untiltset, see ∆tset in Figure 1.1. Figure 3.10 shows a violin plot for the settlement delay fordifferent subject codes in the same fashion as Figure 3.3 depicted the reporting delayfor various subject codes. However, we see different results compared to Figure 3.3.Still, the Traffic cases have a small delay. However, while Injury cases have a the 2nd

smallest reporting delay on average, they do not have a very short settlement delayon average. This might again be due to the nature of the claim: after an injury, itbecomes apparent rather quickly that a problem has occurred. However, it may bedifficult to determine what happened exactly, or to figure out which party holds theliability of the injury. A case might have a longer settlement delay for these reasons.

The next paragraph will compare distributions for the settlement delay by meansof the WAIC and PPC. This will show that the Lognormal distribution captures thedata most accurately. For now, we first formally define the Lognormal model asspecified similarly to Section 3.3.1. For this Lognormal model, the vector yset =∆tset ∈ RNc is represented as

yset = ∆tset|Xs,βµs ,βσ

s ∼ Lognormal(Xsβµs ,Xsβ

σs ), (3.6)

where Xs as defined in Eq. (3.2), βµs ∈ RNs and βσ

s ∈ RNs , θset = (βµs ,βσ

s ) andζset = Xs so that

∆tset = exp(Xsβµs + εset), (3.7)

where εset ∈ RNc ∼ N (0,Xsβσs ) .

The posterior from Eq. (2.26) can be written as:

p(θset|∆tset, ζset) ∝ p(∆tset|θset, ζset)p(θset)

p(βµs ,βσ

s |∆tset,Xs) ∝ p(∆tset|βµs ,βσ

s ,Xs)p(βµs ,βσ

s )(3.8)

Furthermore, we choose to take an (almost) flat prior on our parameters s.t. βµs ∼

N (0, 200) and βσs ∼ N (0, 200);βσ

s > 0.


FIGURE 3.10: Violin-plot for the settlement delay for different subject-codes. The vertical axis denotes the settlement delay, the horizontalaxis the subject, and the width of the violin depicts Probability Den-sity Function (PDF) of the settlement delay. Note that, the Traffic sub-ject still has a smaller settlement delay on average. However, wherethe Injury had smaller-than-average reporting delay (see Figure 3.3),it does not have a very small settlement delay. This is largely due to

the complexity of the cases.

Figure 3.11 shows the values for the Watanabe-Akaike Information Criterion. Wesee again that the Lognormal distribution provides a much better fit to the data thenboth the Gamma or the Weibull distribution.

FIGURE 3.11: The WAIC comparison for different uncensored distri-butions of the settlement delay ∆tset. For a description of the axes and

content, see Figure 3.5.

As explained in Section 2.4.3, we need to adjust the data for cases for which thetrue settlement delay has not been observed (yet). We do this by adjusting the likeli-hood according to Eq. (2.34). Figure 3.12 compares the WAIC for both the censoredas well as the uncensored Lognormal model. We see that when we introduce theadjustment for censoring, the model dramatically improves in terms of the WAIC.

We used the same sampling size and method as for the reporting delay. FigureD.2 in Appendix D shows the traces for the censored Lognormal settlement model


FIGURE 3.12: The WAIC comparison for both the censored and un-censored Lognormal models of the settlement delay ∆tset. For a de-

scription of the axes and content, see Figure 3.5.

(whose PPC was plotted in Figure 3.14). The random variable mu corresponds tothe βµ

s parameter vector, while the random variable sd corresponds to βσs parameter

vector. We see lower values for βµs for both the Traffic and Contractual subject-code,

which corresponds to the observation from the violin-plot in Figure 3.10. We againsee a lower standard deviation in the mu as well as the sigma parameter for subject-codes which have a large number of cases in the dataset (see Figure 3.2A). In otherwords, the model is quite confident about the mu and sigma in the Traffic and theInjury category.

Now, we look into the Posterior Predictive Checks (PPC’s). The dramatical im-provement of the WAIC when adjusting the likelihood for censoring (as visible inFigure 3.12) is not directly visible when we look into the Posterior Predictive Checksin Figures 3.13 (Uncensored) and 3.14 (Censored). However, the PPC is a bit mis-leading in this context as we cannot compare the observed data with the output ofthe model. This is due to the fact that the observed data only contains cases thatare closed, the longer-running cases are discarded from our dataset. Since we areonly regarding data after a reporting date of January 1, 2016 and our dataset stopsat December 31, 2018, the maximum observed settlement delay is 1095 days. This isexactly what we see happening in Figure 3.13B. The red area in the histogram is theobserved data, which stops rather abruptly after a t of 1095 days.

Figure 3.13 shows the PPC for the likelihood unadjusted for censoring. Thismodel does not take censoring into account. We see that the fit deviates from thedata for 60 < t < 150 in Figure 3.13A. Also, we see no adjustment in the tail area(Figure 3.13B) for the unobserved samples, which makes the model less realistic.

Figure 3.14 shows the PPC for the likelihood adjusted to incorporate censoring.Adjusting makes the histogram of the posterior predictive samples continue withmuch larger values of t. However, the overall fit of the Lognormal distribution seemsto be appropriate. We see that the fit for 60 < t < 150 improves compared to Figure3.13A.


(A) PPC for the first 300 days (B) PPC for the tail of the distribution

FIGURE 3.13: Posterior Predictive Check (PPC) of the settlement-datatogether with the Posterior Predictive samples from the UncensoredLognormal model. We see that, compared to Figure 3.14B, the sam-

ples from our model decay quicker when t > 1000.

(A) PPC for the first 300 days (B) PPC for the tail of the distribution

FIGURE 3.14: Posterior Predictive Check (PPC) of the settlement-datatogether with the Posterior Predictive samples from the CensoredLognormal model. We observe that the data suddenly stops around1100 days, but the probability of Post. Pred. samples continues still.


3.3.4 Payments

Up to this point, we have modelled the components ∆trep, the occurrence process(tocc) and ∆tset of the claim process as defined in Figure 1.1. In the previous sections,the results for the delays in Figure 1.1 were laid out. This section looks into theresults for the payment process (i.e. ∆tpay,atype,aamount) from Figure 1.2. It is the lastremaining part of the model.

Payment Delay

First, we consider the payment delay (∆tpay) in our payment process. Again, weplot the violin-plot for different subject-codes in Figure 3.15. The vertical axis nowdenotes the payment delay in months. This is different as the previous delays aremeasured in days. Noteworthy are again the Traffic and the Contractual subject code,which have the lowest mean (white dot in violin-plot). This is in line with our ex-pectations, as the settlement delays for these subject codes were also the smallest ofaverage. Notable however, are the real-estate and insurance subject codes, whichdo not have a large payment delay on average while they both had the largest re-porting delay(on average) in Figure 3.10. One explanation for this could be that alarge number of payments are made on a Real Estate or Insurance case on average,explaining the shorter delay between payments. We first consider the model with as

FIGURE 3.15: Violin-plot for the payment delay (∆tpay) for differentsubject-codes. The y-axis denotes the payment delay, the x-axis thesubject, and the width of the violin depicts Probability Density Func-

tion (PDF) of the payment delay.

covariate only the time from trep until the previous payment tpreviouspay , we denote this

by tpreviouspay − trep. Applying Eq. (2.2), combined with Eq. (2.39) yields

p(θpaydelay|∆tpay, ζpaydelay, tpreviouspay − trep)

∝ p(∆tpay|θpaydelay, ζpaydelay, tpreviouspay − trep)p(θpaydelay)

(3.9)

The for the likelihood-term p(∆tpay|θpaydelay, ζpaydelay, tpreviouspay − trep), we use a

Lognormal, Gamma and Weibull distribution respectively. We again chose ratherflat priors for θpaydelay. The comparison in terms of the WAIC is given in Figure 3.16.From this model, we see again that the Lognormal distribution is better in terms of


FIGURE 3.16: The WAIC comparison for different (Lognormal,Gamma and Weibull) models for the Payment Delay ∆tpay. For a de-

scription of the axes and content, see Figure 3.5.

the WAIC compared to the Gamma and Weibull distribution.For this model, we again take samples with a reporting date after January 1,

2016. Therefore, we need to add censoring to account for payment delays that havenot (yet) been observed fully. Furthermore, we decided to view the time from the lastobserved payment tj

pay until the settlement tset as a censoring moment for the paymentdelay. For these two reasons, adding censoring to the model should improve themodel drastically. Furthermore, when looking at Figure 3.15, we see large variabil-ity in ∆tpay for the different subject codes. We therefore also add the subject-matrixXs ∈ RNc×Ns (see Eq.(3.2)) as a covariate. Figure 3.17 shows the WAIC comparisonfor the Lognormal model as in Eq. (3.9), with adding censoring (C) and introducingthe subject code (S). We see that the Lognormal model with subject-code and cen-soring (Lognormal C+S), is better in terms of the WAIC than the censored Lognormal(Lognormal C) model and the Lognormal model without subject-code nor censoring(Lognormal).

The C+S Lognormal model specifies ∆tpay as

ypaydelay = ∆tpay|ζpaydelay,θpaydelay

∼ Lognormal(Xsβµsubject + (t

previouspay − trep)β

µdelay, σpay),

(3.10)

where θpaydelay = (βµs , β

µdelay, σpay), ζpaydelay = (Xs, t

paylast − trep), β

µs ∈ RNs , σpay ∈ R

and βµdelay ∈ R so that

∆tpay = exp(Xsβµsubject + (t

previouspay − trep)β

µdelay + ε) (3.11)

where ε ∈ RNp ∼ N (0, σpay). Figure D.3 in Appendix D shows the posterior tracesfor the C+S (Censoring + Subject) Lognormal model. The titles in Figure D.3, de-note mu_beta_subject for βµ

subject, mu_beta_covar for βµdelay, and sd_alpha= σpay. The

distributions with the lowest means, correspond to the mean in the violin-plot fromFigure 3.15. The width of the distribution seems to be influenced by the number ofobservations in the dataset (the model is more unsure where less data is available).We see that the βdelay distribution is negative. This indicates that when the time


FIGURE 3.17: The WAIC comparison for different versions of the Log-normal distribution for the Payment Delay ∆tpay. C denotes censor-ing, S denotes the subject covariate. For a description of the axes and


from reporting from the previous payment gets larger, the average payment delaydecreases.

Figure 3.18 shows the PPC for the Lognormal model with subject-code as covari-ate and adjustment for censoring. We see that the model is able to approximate thedata quite accurately, especially in the tail-end of the distribution.

(A) PPC for 30 months (B) The PPC for the 10 to 50 month range

FIGURE 3.18: Posterior Predictive Checks (PPC) of the payment-delay data together with the Posterior Predictive samples of the Log-normal model with subject-code as covariate and adjustment for cen-soring (Lognormal C+S). The y-axis denotes the probability, the x-axis

shows the payment delay in months.

Payment Type

The payment type is implemented as a Bayesian logistic regression, where zero indi-cates a receivable and unity a payment (as laid out in Section 2.4.4). For the covariateinformation about the claims (ζ), we take the subject-matrix Xs as in Eq. (3.2) and


the time from the reporting delay to the previous payment as tpreviouspay − trep so that

ζpaytype = (Xs, tpreviouspay − trep). The corresponding makes β = (βs,βdelay).

Recall the posterior density as in Eq. (2.39):

ppaytype(α,βs,βdelay)|atype,Xs, tpreviouspay − trep) ∝ p(α,βs,βdelay)

Np

∏i=1

(πi)ai


type ,

(3.12)

where aitype ∈ atype. We use a logistic-regression, so that {πi}

Npi=1 become a logistic

function asπi =

[1 + exp(−α + βdelay(tlast

i,pay − ti,rep) + βsxis)]−1

, (3.13)

where tpreviouspay − trep ∈ RNp , denotes the time passed since reporting for the previous

payment and xis ∈ RNcov .

Figure D.4 of Appendix D shows the posterior traces of the model. Within thisfigure, alpha = α, beta_subject = βs and beta_delay = βdelay We see that the α parameterof the regression is positive. This indicates that over all subject codes, the averageprobability of having a payment is larger than a receivable. Furthermore, we seenegative values for βs for claims with subject code Traffic, Insurance and Injury. Thisindicates that relatively more receivables occur within that subject code. This is un-derstandable for Traffic and Injury cases, the liable party (the cause of the accidentor injury) should be held responsible for the cost within a claim. Furthermore, wesee a small but positive βdelay. This indicates that the change of payment (relativeto receivable) increases when more time has passed from reporting to the currentpayment time.

Finally, Figure 3.19 shows - per subject-code - the probability of a payment versusa receivable. The dashed line in the middle is the mean value as given by the data.We see that the mean value from the data roughly corresponds to the peaks of thedistributions, which shows that our model for the payment type is accurate.

FIGURE 3.19: Samples from our model combined with a KDE plot.The color-codes correspond to the legend in Figure 3.1. The blackvertical lines correspond to the sample average of being a paymentversus being a receivable as given by the data. We see that the sample

average corresponds to the peak of the density.


Payment Amount

With the payment delay (∆tpay) and payment type (atype) defined, we now look intothe model for payment amounts. We are provided with the costs on the cases byARAG, which contain external costs without any taxes. It should be noted, however,that in principle any payment can be used in this model. Figure 3.20 shows theWAIC comparison for two models for the payment amount. Again, the Lognormaldistribution is statistically significant superior to the Gamma distribution.

FIGURE 3.20: The WAIC comparison for a Lognormal and Gammamodel, without any covariates. For a description of the axes and con-

tent, see Figure 3.5.

Next, we add covariates to the models in the same fashion as in previous sec-tions. Figure 3.21 shows the WAIC comparison for the models including a versionwhere we added covariates. We start with adding the time from trep until the currentpayment (tpay − trep) as covariate. This covariate is denoted by D (delay). Further-more, we also add the subject-code S and the payment type T (indicating a receivableor payment).

We see from the WAIC comparison in Figure 3.21 that the Lognormal distributionwith as covariate the subject-code and the time from reporting to the current payment(tcurrent

pay − trep), performs best in terms of the WAIC. However, the D+S+T Lognormalmodel is not significantly different. In the Lognormal D + S model, the model outputis specified as

yamount = aamount|Xs, tcurrentpay − trep,βµ

s ,βµdelay, σamount

∼ Lognormal(Xsβµs + (tpay − trep)β

µdelay, σamount),

(3.14)

where ζamount = (Xs, tcurrentpay − trep) with Xs is specified as in Eq. (3.2), θamount =

(βµs , β

µdelay, σamount), β

µs ∈ RNs , β

µdelay ∈ R, tpay − tpay ∈ RNc , σamount ∈ R+. The

Lognormal model specifies aamount as

aamount = exp(Xsβµs + (tpay − trep)β

µdelay + ε), (3.15)

where ε ∼ Normal(0, σamount).Figure 3.22 shows the PPC for the Lognormal D + S payment amount model. We

see that the Lognormal distribution fits well in the tail of the distribution. However,


FIGURE 3.21: The WAIC comparison for including different covari-ates in the payment amount (aamount) model. D denotes the delayfrom the reporting time, until the current payment tpay − trep covari-ate, S denotes the subject covariate. For a description of the axes and


we see that the data has some very large peaks in the first bins of the histogram inFigure 3.22A. This makes inferring samples from this distribution difficult, as no sin-gle distribution will be able to capture these large peaks in the histogram of the data.We therefore conclude, that the model is not able to capture the underlying processaccurately. This is not a limitation of the Bayesian inference, but this is probably adata-quality issue.

(A) PPC for t in between 0 and 2000 days. (B) PPC for t in between 500 and 3000 days.

FIGURE 3.22: Posterior Predictive Check for the payment amount(aamount). We see that the model does not fit properly (especiallyfor 0 < t < 500). It seems very difficult to model the paymentamount in general, as no clear pattern seems visible and the data for

the payment-amount seems unstructured.


3.3.5 Reserves

The previous section went into how various parts of the model described our under-lying data. This section will combine the model and Algorithm 1 and 2 of AppendixC, to compute the reserves. To this end, the algorithm utilises Monte Carlo simula-tion. It draws 1000 paths (as in Figure 1.2) of claim developments for each separateIBNR or RBNS case. An example of the simulated paths can be seen in Figure 3.23.This figure shows some paths for IBNR cases, for which the occurrence date is theyear 2016. The vertical axis denotes the cumulative payment amounts, while thehorizontal axis denotes the date. The lines represent the development of the claim(as in Figure 1.2). The settlement of the simulated cases is denoted by a black dot.Noteworthy is the case that starts around 2017-11, this case has a relatively largereporting delay (of around 1 year).

2016-012016-05

2016-092017-01

2017-052017-09

2018-012018-05

2018-092019-01

Date

0

200

400

600

800

1000

1200

1400

Cur

renc

y

FIGURE 3.23: Some paths simulations for IBNR cases with tocc in2016. The vertical axis denotes the payment amounts cumulative pay-ment amounts, while the horizontal axis denotes the date. The linesrepresent the development of the claim (as in Figure 1.2). The settle-

ment of the simulated cases is denoted by a black dot.

Finally, we will look at two examples for the RBNS and IBNR reserve for caseswith an occurrence-year of 2018. Note that the amounts (on the horizontal axis)should be viewed with caution, as the payment-amount was not modelled accu-rately due to the quality of the data (see PPC of Figure 3.22). One should regard theresults as a proof of concept.

Figure 3.24 shows a histogram together with a KDE-plot of the output of Algo-rithm 1 on page 61. Only cases with an occurrence within the year 2018 are regarded.We observe a right-skewed distribution with a relatively long tail.

Figure 3.25 shows samples the IBNR reserve as an KDE plot together with thehistogram of the (binned) samples. The samples are computed by Algorithm 2 onpage 62. We again observe a right-skewed distribution, with a relatively long tail.


FIGURE 3.24: Samples for the RBNS reserve for cases with an occur-rence time (tocc) in the year 2018. The y-axis denotes the probability,

the x-axis denotes the reserve amount in Euro.

FIGURE 3.25: Samples for the IBNR reserve for cases with an occur-rence time (tocc) in the year 2018. The vertical axis denotes the proba-

bility, the vertical axis denotes the reserve amount in Euro.

47

Chapter 4

Conclusions

In this thesis we developed a fully Bayesian framework for micro-level loss reserv-ing. We started from the model by (Antonio and Plat, 2014), extended the model,improved it where possible and cast it into a Bayesian framework. The thesis hasfour distinctive contributions:

I. The micro-level loss reserving model

As opposed to the classical reserving approaches that often use aggregateddata, micro-level models calculate reserves on an individual case level, whichallows the insurer to deploy policy on a individual level as well. Furthermore,these micro-level models allow the insurer to model each process in the claimseparately, which allows for more insight into the underlying processes.

II. The fully Bayesian approach

This thesis took a fully Bayesian approach. This Bayesian approach has plentyof advantages; it makes the model fully account for all parameter uncertainty,which in turn causes the reserves to incorporate all uncertainty as well. Thisallows to answer questions like: "What is the Incurred But Not Reported (IBNR) orReported But Not Settled (RBNS) reserve amount so that the realised claim costs in thefuture are with 95% certainty lower than the reserve estimate?" The incorporation ofall uncertainty made the model harder to implement as Bayesian models relyon computational intensive sampling methods. However, modern advances incomputing power, Hamiltonian Monte Carlo (HMC) samplers and packagessuch as PyMC3 make Bayesian models more accessible than ever.

Contrary to the Bayesian approach to loss reserving of (Haastrup and Arjas,1996) which was fully non-parametric, we took a parametric approach in ourmodel. This made it possible to make distributional assumptions about theunderlying processes. This is according to the suggestion in their paper forfurther improvements on their (non-parametric) model.

III. The improvements on Antonio and Plat (2014)

This model was built on the work of Antonio and Plat (2014). However, wemade some improvements where apt.

First, we introduced an adjustment for censoring in the likelihood of the set-tlement and payment delay. This allows insurers with shorter datasets to getaccurate estimates as well (as these shorter datasets include relatively more cen-sored samples). Additionally, it allows to use more recent data to calibrate themodels. This is important, since the internal processes within a company or inthe claim process can change, which makes the most recent data more repre-sentative for the current processes.

48 Chapter 4. Conclusions

Secondly, we compared various distributions for the components of the claimprocess and came up with a rationale for the use of these distributions insteadof the ones proposed by (Antonio and Plat, 2014).

Thirdly, we added covariates about the claim or about the previous steps in theclaim process in every part of the model.

Finally, we modelled the payment delay as a renewal process (see Cook and Law-less (2007)). This allowed us the standard theory on survival analysis. Also, wemade the delays dependent on each other by introducing the time from report-ing as a covariate.

IV. The Case Study

It is mentioned by Jin and Frees (2013) that: “Papers that provide detailed and com-plete implementation of the micro-level models on empirical data are currently lackingin literature”. This thesis added an implementation to the literature by means ofa case study.

In summary, the case study showed very good fits on almost all componentsof the claim process. The payment amount was an exception, but this is nota limitation of the Bayesian framework, but probably a data quality issue. Wesaw that adding covariates (such as the subject code of a claim) resulted in animprovement of the model. We also saw that an introduction of an adjustmentfor censoring caused to model to be more accurate.

The case study confirmed the fact that the reporting delay was distributed by aLognormal distribution. We also saw some large differences in the distributionsfor different subject-codes, which we accounted for in the model. The PosteriorPredictive Checks (PPC) provided confidence for the Lognormal distribution,both in the head as the tails of the distributions. We were able to model thereporting delay very accurately. The occurrence process was modelled (fol-lowing Antonio and Plat (2014)) as an inhomogeneous Poisson process. Thisprocess was adjusted using the distribution for the reporting delay in a fullyBayesian manner, to account for the IBNR claims. For the settlement delay, wehad to modify the likelihood to account for the censored claims (see Section2.4.3). We saw that the Lognormal distribution described the settlement delaymost accurately, and we saw a decrease in the values for the Watanabe-AkaikeInformation Criterion (WAIC) when we introduced adjustment for censoring1.The payments were modelled using three steps. We first modelled the paymentdelay, which was again Lognormally distributed. We saw that introducing thesubject-code as covariate, and adjusting the likelihood for censoring, improvedthe model substantially. Finally, the payment type and payment amount weremodelled. The mean of the payment type distribution, corresponded to themean of the data. The payment amount was modelled using a Lognormal dis-tribution. However, since the data for the payment amount was of insufficientquality (it contained some strange peaks in the lower end of the distribution),no model could be found to accurately model these amounts. Finally, the com-ponents of the model were combined and samples of the RBNS and IBNR re-serve were simulated.

1 Implementing censoring in PyMC3 is non-trivial. See https://github.com/pymc-devs/pymc3/issues/1833

4.1. Future Work 49

4.1 Future Work

One of the most convincing arguments against the use of Bayesian models, is thedifficulty of implementing them. Antonio and Plat state: While a formal Bayesianapproach is very elegant, it generally leads to significant more complexity, which is not con-tributing to the accessibility and transparency of the techniques toward practising actuar-ies (Antonio and Plat, 2014). Advances in computing power and packages such asPyMC3 (Salvatier, Wiecki, and Fonnesbeck, 2016) have made it more accessible thanever to use Bayesian techniques in practise. However, most of the time during thethesis process has been devoted to programming. The goal was to create a functionaland broadly applicable framework for loss reserving. To this end the thesis is suc-cessful: the code works and has been structured so that is it ready for modificationand expandable. Furthermore, the whole process of inferring the model, generatingthe posterior-predictive samples and sampling the reserves, is relatively quick.

For future work, the model for payment amount should be improved. As thecurrent data for the payment amount was not suitable to produce reasonable models.Furthermore, one could look whether extra covariates could be added to improvethe fit of the models. While in this thesis the subject-code was chosen, one could optfor other covariates. This could potentially improve the model, but the choice forthese covariates should be determined by the insurer, as it depends heavily on thekind of product the insurer offers and the information it has available.

Furthermore, to make the framework usable in a large-scale environment, oneshould be able to schedule jobs, re-run them when they fail and generate reportsautomatically. Open-source tools like Airflow by the Apache Software Foundationcould aid in creating such a work-flow. Using these tools efficiently will make lossreserving fully Bayesian and completely automatic, all while retaining control overthe results.

51

Bibliography

Andrieu, Christophe et al. (2003). “An introduction to MCMC for machine learning”.In: Machine learning 50.1-2, pp. 5–43.

Antonio, Katrien and Richard Plat (2014). “Micro-level stochastic loss reserving forgeneral insurance”. In: Scandinavian Actuarial Journal 2014.7, pp. 649–669.

Arjas, Elja (Jan. 1989). “The claims reserving problem in non-life insurance: Somestructural ideas”. In: ASTIN Bulletin 19.

Bahnemann, David (2015). Distributions for actuaries. CAS Monograph Series 2. Ca-sualty Actuarial Society. ISBN: 978-0-9624762-8-0.

Box, George E. P. (1980). “Sampling and Bayes’ Inference in Scientific Modellingand Robustness”. In: Journal of the Royal Statistical Society. Series A (General) 143.4,pp. 383–430. ISSN: 00359238. URL: http://www.jstor.org/stable/2982063.

Bradburn, M. J. et al. (2003). “Survival Analysis Part II: Multivariate data analysis- an introduction to concepts and methods”. In: British Journal of Cancer 89.3,pp. 431–436. ISSN: 1532-1827.

Carpenter, Bob et al. (2017). “Stan: A Probabilistic Programming Language”. In: Jour-nal of Statistical Software, Articles 76.1, pp. 1–32. DOI: 10.18637/jss.v076.i01.

Cook, Richard J. and Jerald Lawless (2007). Models and frameworks for analysis of re-current events. New York, NY: Springer New York. ISBN: 978-0-387-69810-6.

Creutz, Michael (1988). “Global Monte Carlo algorithms for many-fermion systems”.In: Physical Review D 38.4, p. 1228.

England, Peter D and Richard J Verrall (2002). “Stochastic claims reserving in generalinsurance”. In: British Actuarial Journal 8.3, pp. 443–518.

Frees, Edward W., Glenn Meyers, and Richard A. Derrig (2016). Predictive Model-ing Applications in Actuarial Science. Ed. by Christopher Daykin and Angus Mac-donald. Vol. II. International Series On Actuarial Science. Cambridge UniversityPress.

Friedland, Jacqueline (2010). Estimating Unpaid Claims Using Basic Techniques. Casu-alty Actuarial Society.

Gelman, Andrew, Jessica Hwang, and Aki Vehtari (2014). “Understanding predic-tive information criteria for Bayesian models”. In: Statistics and computing 24.6,pp. 997–1016.

Gelman, Andrew, Xiao-Li Meng, and Hal Stern (1996). “Posterior Predictive As-sessment Of Model Fitness Via Realized Discrepancies”. In: Statistica Sinica 6.4,pp. 733–760. ISSN: 10170405, 19968507. URL: http://www.jstor.org/stable/24306036.

Greenberg, Edward (Jan. 2009). Introduction to Bayesian Econometrics. Vol. 85. DOI:10.1017/CBO9780511808920.

Haastrup, Svend and Elja Arjas (1996). “Claims Reserving in Continuous Time; ANonparametric Bayesian Approach”. In: ASTIN Bulletin 26.2, 139–164. DOI: 10.2143/AST.26.2.563216.

Hoffman, Matthew D and Andrew Gelman (2014). “The No-U-Turn sampler: adap-tively setting path lengths in Hamiltonian Monte Carlo.” In: Journal of MachineLearning Research 15.1, pp. 1593–1623.

http://www.jstor.org/stable/2982063

https://doi.org/10.18637/jss.v076.i01



https://doi.org/10.1017/CBO9780511808920

https://doi.org/10.2143/AST.26.2.563216

https://doi.org/10.2143/AST.26.2.563216

52 Bibliography

Huang, Jinlong et al. (2015). “An individual loss reserving model with independentreporting and settlement”. In: Insurance: Mathematics and Economics 64, pp. 232–245. ISSN: 0167-6687. DOI: https://doi.org/10.1016/j.insmatheco.2015.05.010. URL: http://www.sciencedirect.com/science/article/pii/S0167668715000918.

James, Gareth et al. (2014). An Introduction to Statistical Learning: With Applications inR. Springer Publishing Company, Incorporated. ISBN: 1461471370, 9781461471370.

Jin, Xiaoli and Edward W. Frees (2013). “Comparing Micro- and Macro-Level LossReserving Models”. In:

Karr, Alan (Mar. 1991). Point Processes and Their Statistical Inference. Probability: Pureand Applied. CRC Press. ISBN: 9780824785321.

Kleinbaum, David G. and Mitchel Klein (2005). Survival Analysis: A Self-Learning Text.Springer Science and Business Media, LLC.

Kunkler, Michael (2004). “Modelling zeros in stochastic reserving models”. In: Insur-ance: Mathematics and Economics 34.1, pp. 23–35.

Leung, Kwan-Moon, Robert M. Elashoff, and Abdelmonem A. Afifi (1997). “Censor-ing Issues In Surival Analysis”. In: Annual Review of Public Health 18.1, pp. 83–104. DOI: 10.1146/annurev.publhealth.18.1.83.

Linden, Wim J van der (2006). “A lognormal model for response times on test items”.In: Journal of Educational and Behavioral Statistics 31.2, pp. 181–204.

Norberg, Ragnar (1993). “Prediction of outstanding liabilities in non-life insurance”.In: ASTIN Bulletin: The Journal of the IAA 23.1, pp. 95–115.

Rodríguez, Germán (2010). Parametric Survival Models. Handout. URL: https://data.princeton.edu/pop509/ParametricSurvival.pdf.

Salvatier, John, Thomas W. Wiecki, and Christopher Fonnesbeck (Jan. 2016). “Proba-bilistic programming in Python using PyMC3”. In:

Schnipke, Deborah L and David J. Scrams (May 1999). “Representing Response-TimeInformation in Item Banks”. In: Law School Admission Council Computerized TestingReport. LSAC Research Report Series Law School Admission Council, Princeton,NJ. P. 20.

Spiegelhalter, David et al. (1996). “BUGS 0.5: Bayesian inference using Gibbs sam-pling manual (version ii)”. In: MRC Biostatistics Unit, Institute of Public Health,Cambridge, UK, pp. 1–59.

Taylor, Greg and Mireille Campbell (Jan. 2002). “Statistical Case Estimation”. In:SSRN Electronic Journal. DOI: 10.2139/ssrn.2660061.

Taylor, Greg, Grrinne McGuire, and Alan Greenfield (Jan. 2003). “Loss Reserving:Past, Present and Future”. In: SSRN Electronic Journal. DOI: 10.2139/ssrn.2660062.

Theano Development Team, The et al. (May 2016). “Theano: A Python frameworkfor fast computation of mathematical expressions”. In:

Thissen, David (1983). “An Approach Using Item Response Theory”. In: New Hori-zons in Testing. Ed. by David J. Weiss. Timed Testing 9. San Diego: AcademicPress, pp. 179 –203. ISBN: 978-0-12-742780-5.

Vaughan, E.J. (1996). Risk Management. New York: Wiley. ISBN: 978-0471107590.Vehtari, Aki, Andrew Gelman, and Jonah Gabry (2017). “Practical Bayesian model

evaluation using leave-one-out cross-validation and WAIC”. In: Statistics andComputing 27.5, pp. 1413–1432.

Wüthrich, Mario V (2018). “Machine learning in individual claims reserving”. In:Scandinavian Actuarial Journal 6, pp. 465–480.

https://doi.org/https://doi.org/10.1016/j.insmatheco.2015.05.010

https://doi.org/https://doi.org/10.1016/j.insmatheco.2015.05.010

http://www.sciencedirect.com/science/article/pii/S0167668715000918

https://doi.org/10.1146/annurev.publhealth.18.1.83

https://data.princeton.edu/pop509/ParametricSurvival.pdf

https://data.princeton.edu/pop509/ParametricSurvival.pdf

https://doi.org/10.2139/ssrn.2660061

https://doi.org/10.2139/ssrn.2660062

53

Appendix A

Python Code

This appendix only shows the top-level code. The code makes use of a lot more underlyingclasses, methods and functions.

A.1 RBNS Reserve

1 from microlevellossreserving import data as data2 from datetime import date3 import numpy as np4

5 from models.event_type import EventTypeModel6 from models.occurrence import OccurrenceModel7 from microlevellossreserving.models.payment_amount import SubjectLognormalPaymentAmountModel8 from microlevellossreserving.models.payment_delay import

CensoredSubjectLognormalPaymentDelayModel9 from microlevellossreserving.models.reporting_delay import LognormalReportingModel

10 from models.reserve import ReservingModel , ReservingModelResult , ReservingModelType11 from models.settlement_delay import CensoredLognormalSettlementModel12

13 occ_year = 201814 paths_per_dossier = 200015 max_path_len = 10016

17 from_file = True18

19 # Load data20 development = data.get_development_frame(from_file=True)21 development = development[development[’ReportingDate ’] > date(year =2016, month=1, day=1)]22

23 # Load occurrences24 occ_count = data.get_occurrence_count(min_total_occurrences =5000, from_file=True)25 occ_count = occ_count [( occ_count.index > ’2000 -01 -01’)]26 occ_count = occ_count.fillna (0)27

28 # Load Models29 rep = LognormalReportingModel(development , from_file=from_file)30 occ = OccurrenceModel(rep , occ_count , from_file=from_file)31 set = CensoredLognormalSettlementModel(development , from_file=from_file)32 pay_delay = CensoredSubjectLognormalPaymentDelayModel(development , from_file=from_file)33 event = EventTypeModel(development , from_file=from_file)34 pay_amnt = SubjectLognormalPaymentAmountModel(development , from_file=from_file)35

36 m = ReservingModel(rep ,occ ,set ,pay_delay ,event ,pay_amnt ,ReservingModelType.RBNS ,paths_per_dossier)

37

38 r = ReservingModelResult(ReservingModelType.RBNS , max_path_len , paths_per_dossier)39

40 development_all = data.get_development_frame(from_file=True)41 open_development = data.get_open_filter_duplicates(development_all)42 mask_occurrence = (date(year=occ_year , month=1, day=1) < open_development[’OccurrenceDate ’]) & (

open_development[’OccurrenceDate ’] < date(year=occ_year + 1, month=1, day=1))43 development_for_occurrence = open_development[mask_occurrence]44

45 subjects = np.array(development_for_occurrence[’SubjectIndex ’])

54 Appendix A. Python Code

46 settlement_delay_until_now_vector = np.array(development_for_occurrence[’SettlementDelayUntilNow’])

47 dates_reporting = np.array(development_for_occurrence[’ReportingDate ’]).astype(’datetime64[D]’)48 months_already_active = np.array(( development_for_occurrence[’AccountingDate ’] -

development_for_occurrence[’ReportingDate ’]) / np.timedelta64 (1, ’M’)) # months afteropening

49 r.dates_reporting = np.tile(dates_reporting , reps=( paths_per_dossier , 1))50

51 print(’======== SAMPLING ======== ’)52 m.set_subjects_dates(subjects)53 r.clear_memory(size=m.width)54 r.dates_close = m.sim_settlement_dates(r.dates_reporting , months_already_active)55 r = m.simulate_paths(r)56 r.add_sample ()57 r.to_file("occ_year_2018")58 print(’========== DONE ========== ’)

LISTING A.1: RBNS

A.2 IBNR Reserve

1 from microlevellossreserving.models.event_type import EventTypeModel2 from microlevellossreserving.models.occurrence import OccurrenceModel3 from microlevellossreserving.models.payment_amount import TypeSubjectLognormalPaymentAmountModel4 from microlevellossreserving.models.payment_delay import WeibullPaymentDelayModel5 from microlevellossreserving.models.reporting_delay import LognormalReportingModel6 from tqdm import tqdm7 from microlevellossreserving import data8 from datetime import date9

10 from microlevellossreserving.models.reserve import ReservingModelType , ReservingModel ,ReservingModelResult

11 from microlevellossreserving.models.settlement_delay import CensoredLognormalSettlementModel12

13 from_file = True14

15 reserve_type = ReservingModelType.IBNR16

17 occ_year = 201618 samples_reserve = 519 paths_per_dossier = 10020 max_path_len = 50021

22 # Load data23 development = data.get_development_frame(from_file=True)24 development = development[development[’ReportingDate ’] > date(year =2016, month=1, day=1)]25

26 # Load occurrences27 occ_count = data.get_occurrence_count(min_total_occurrences =5000, from_file=True)28 occ_count = occ_count [( occ_count.index > ’2000 -01 -01’)]29 occ_count = occ_count.fillna (0)30

31 # Load Models32 rep = LognormalReportingModel(development , from_file=from_file)33 occ = OccurrenceModel(rep , occ_count , from_file=from_file)34 set = CensoredLognormalSettlementModel(development , from_file=from_file)35 #pay_delay = CensoredSubjectLognormalPaymentDelayModel(development , from_file=from_file)36 pay_delay = WeibullPaymentDelayModel(development , from_file=from_file)37 event = EventTypeModel(development , from_file=from_file)38 #pay_amnt = SubjectLognormalPaymentAmountModel(development , from_file=from_file)39 pay_amnt = TypeSubjectLognormalPaymentAmountModel(development , from_file=from_file)40

41

42 m = ReservingModel(rep ,occ ,set ,pay_delay ,event ,pay_amnt ,ReservingModelType.IBNR ,paths_per_dossier)

43 r = ReservingModelResult(ReservingModelType.IBNR , max_path_len , paths_per_dossier)44 dossiers = m.occurrence.simulate_ibnr_dossiers(occ_year , samples_reserve)45

46 print(’======== SAMPLING ======== ’)47 # For every set of dossiers , simulate paths48 for i in tqdm(range(dossiers[’length ’])):

A.3. Reserving Model 55

49 m.set_subjects_dates(dossiers[’subjects ’][i],dossiers[’date’][i],)50 r.clear_memory(size=m.width)51 r.dates_reporting = m.sim_reporting_dates ()52 r.dates_close = m.sim_settlement_dates(r.dates_reporting , m.zeros)53 r = m.simulate_paths(r)54 # Save the result55 r.add_sample ()56 r.to_file("occ_year_2016")57 print(’========== DONE ========== ’)

LISTING A.2: RBNS

A.3 Reserving Model

1 from models.event_type import EventTypeModel2 from models.occurrence import OccurrenceModel3 from microlevellossreserving.models.payment_amount import PaymentAmountModel4 from microlevellossreserving.models.payment_delay import PaymentDelayModel5 from microlevellossreserving.models.reporting_delay import ReportingModel6 from models.settlement_delay import SettlementModel7 import pickle8 from pathlib import Path9 from datetime import datetime

10 from enum import Enum11 import numpy as np12 import pandas as pd13 from pyfiglet import figlet_format14 import matplotlib.pyplot as plt15 import seaborn as sns16

17

18 def update_dates(month_delay , months_ao):19 # Add delays to month delay20 for i, delay in enumerate(month_delay):21 months_ao[i] = months_ao[i] + month_delay22 return months_ao23

24

25 def to_vector(matrix: np.ndarray):26 return matrix.flatten ()27

28

29 def to_matrix(vector:np.ndarray ,n):30 # n is the number of rows31 height = n32 width = int(len(vector)/height)33 return vector.reshape ((width , height))34

35 class ReservingModelType(Enum):36 RBNS=’RBNS’37 IBNR=’IBNR’38

39 class ReservingModelResult ():40

41 def __init__(self , type:ReservingModelType ,max_path_len , paths_per_dossier):42 self._content = []43 self._max_path_len=max_path_len44 self._paths_per_dossier = paths_per_dossier45 self._width = None46 self._reservingType = type47

48 self.dates_reporting = None49 self.dates_close = None50 self.result_eventtype = None51 self.result_payments = None52 self.result_months_active = None53 self.result_dates_now = None54 self.result_is_closed = None55

56

57


58 def add_sample(self):59 # Payment is zero when closed60 self.result_payments[self.result_is_closed == True] = 061 # Replace event type with minus62 self.result_eventtype[self.result_eventtype == 0] = -163 self.result_payments = self.result_payments * self.result_eventtype64 result_payments_sum = self.result_payments.sum(axis =0)65

66 self._content.append ({67 ’dates_reporting ’:self.dates_reporting ,68 ’dates_closed ’:self.dates_close ,69 ’result_eventtype ’: self.result_eventtype ,70 ’result_payments ’: self.result_payments ,71 ’result_payments_sum ’: result_payments_sum ,72 ’result_delays ’: self.result_delays ,73 ’result_months_active ’: self.result_months_active ,74 ’result_dates_now ’: self.result_dates_now ,75 ’result_is_closed ’: self.result_is_closed76 })77

78 def clear_memory(self , size):79

80 self.result_eventtype = np.empty(shape=(self._max_path_len , self._paths_per_dossier ,size), dtype=’int8’)

81 self.result_payments = np.empty(shape=(self._max_path_len , self._paths_per_dossier , size), dtype=’int32’)

82 self.result_delays = np.empty(shape=(self._max_path_len , self._paths_per_dossier , size),dtype=’uint16 ’)

83 self.result_months_active = np.empty(shape=(self._max_path_len , self._paths_per_dossier ,size)) #WAS UINT16

84 self.result_dates_now = np.empty(shape=(self._max_path_len , self._paths_per_dossier ,size), dtype=’datetime64[D]’)

85 self.result_is_closed = np.empty(shape=(self._max_path_len , self._paths_per_dossier ,size), dtype=np.bool)

86

87 def to_file(self , str_prefix:str=""):88 file_name = "{} _result_simulation_ {}.p".format(self._reservingType.value , str(datetime.

now()).replace(’-’, ’’).replace(’ ’, ’’).replace(’:’, ’’).replace(’.’, ’’))89 print(’>>> Dumping simulation result to file: \’{}\’’.format(file_name))90 pickle.dump(self , open(Path(__file__).parent.parent / ’results ’ / "{}{}".format(

str_prefix ,file_name), "wb"))91

92 def plot_payments_sum(self):93 plt.style.use([’mike -base’, ’mike -widescreen -small’])94 payments_sum = np.sum(self._content [0][’result_payments_sum ’], axis =1)95

96 ax: plt.Axes = sns.kdeplot(payments_sum)97 ax.hist(payments_sum , bins=80, alpha =0.5, density=True)98 plt.ylabel(’Probability ’)99 plt.xlabel(’Reserve Amount (euro)’)

100 plt.title()101 plt.show()102

103

104 class ReservingModel ():105

106 def __init__(self ,rep:ReportingModel , occ:OccurrenceModel , set:SettlementModel , pay_delay:PaymentDelayModel , event_type:EventTypeModel , pay_amount:PaymentAmountModel ,type:ReservingModelType , paths_per_dossier:int):

107 print(figlet_format(’MICRO LEVEL LOSS RESERVING FOR {}’.format(type.value)))108 self._type = type109 self.reporting: ReportingModel = rep110 self.occurrence: OccurrenceModel = occ111 self.settlement: SettlementModel = set112 self.payment_delay: PaymentDelayModel = pay_delay113 self.event_type: EventTypeModel = event_type114 self.payment_amount: PaymentAmountModel = pay_amount115

116 self.dates= None117 self.subjects = None118 self.width = None119 self.zeros = None

A.3. Reserving Model 57

120 self.zeros_matrix = None121 self.subject_matrix = None122

123 self.paths_per_dossier = paths_per_dossier124

125

126 def set_subjects_dates(self , subjects , dates=None):127 self.dates = dates128 self.subjects = subjects129 self.width = self.subjects.shape [0]130 self.zeros = np.zeros(shape=self.width)131 self.zeros_matrix = np.tile(np.array(self.zeros), reps=(self.paths_per_dossier , 1))132 self.subject_matrix = np.tile(np.array(self.subjects), reps=(self.paths_per_dossier , 1))

.astype(’int8’)133

134

135 def sim_reporting_dates(self):136 occurrence_dates = np.tile(np.array(self.dates), reps=(self.paths_per_dossier , 1)).

astype(’datetime64 ’)137 result_reporting_delay = self.reporting.sample_posterior_predictive(self.subjects , self.

zeros , self.paths_per_dossier).astype(np.int16)138 dates_reporting = occurrence_dates + result_reporting_delay * np.timedelta64 (1, ’D’)139 return dates_reporting140

141 def sim_settlement_dates(self , dates_reporting , conditional_delay):142 result_settlement = self.settlement.sample_posterior_predictive(self.subjects ,

conditional_delay , self.paths_per_dossier).astype(np.int16)143 dates_close = dates_reporting + result_settlement * np.timedelta64 (1, ’D’)144 return dates_close145

146 def sim_path_step(self ,months_active):147 result_eventtype = to_matrix(self.event_type.sample_posterior_predictive(subject_index=

to_vector(self.subject_matrix),payment_delay=to_vector(months_active),samples =1)[0], self.width)

148 result_delays = to_matrix(self.payment_delay.sample_posterior_predictive(months_ao=to_vector(months_active), subject_index=to_vector(self.subject_matrix), is_positive=to_vector(result_eventtype), samples =1)[0], self.width)

149 result_delays = np.clip(result_delays , 0 ,32000) # Cap months_active at 32000 months (to ensure small matrices)

150 new_months_active = months_active + result_delays151 new_months_active = np.clip(new_months_active , 0,750) # Cap months_active at 32000

months (to ensure small matrices)152 result_payments = to_matrix(self.payment_amount.sample_posterior_predictive(months_ao=

to_vector(new_months_active),subject_index=to_vector(self.subject_matrix),is_positive=to_vector(result_eventtype), samples =1)[0], self.width)

153 return new_months_active , result_eventtype , result_payments154

155 def current_date(self ,months_active , reporting_dates):156 time_deltas_days = (pd.DataFrame(months_active) * np.timedelta64 (1, ’M’)).to_numpy ().

astype(’timedelta64[D]’)157 result_dates_now = time_deltas_days + reporting_dates158 return result_dates_now159

160 def is_closed(self ,date_now , dates_close):161 # check if all is closed162 is_nat = np.isnat(date_now)163 result_is_closed = (date_now > dates_close) | is_nat164 return result_is_closed165

166 def simulate_paths(self ,r: ReservingModelResult):167 j = 0168 open_paths = True169 while open_paths:170 if j == 0:171 r.result_months_active[j, :, :], r.result_eventtype[j, :, :], r.result_payments[

j, :, :] = self.sim_path_step(self.zeros_matrix)172 else:173 r.result_months_active[j, :, :], r.result_eventtype[j, :, :], r.result_payments[

j, :, :] = self.sim_path_step(r.result_months_active[j - 1, :, :])174

175 r.result_dates_now[j, :, :] = self.current_date(r.result_months_active[j, :, :], r.dates_reporting)


176 r.result_is_closed[j, :, :] = self.is_closed(r.result_dates_now[j, :, :], r.dates_close)

177

178 # check if all is closed179 closed = np.sum(r.result_is_closed[j, :, :])180 percentage_closed = np.round(( closed / np.size(r.result_dates_now[j, :, :])) * 100,

2)181 print(’\r>>> Percentage closed: {}%, {}/{} paths’.format(str(percentage_closed), int

(closed),182 np.size(r.result_dates_now[

j, :, :])), end="")183 if self._type == ReservingModelType.RBNS:184 print(’\r>>> Percentage closed: {}%, {}/{} paths’.format(str(percentage_closed),

int(closed),np.size(r.result_dates_now[j, :, :])), end="")185

186 if percentage_closed > 99.5: # stop when 99.9% of the paths are closed187

188 open_paths = False189 # else do some other iteration190 j = j + 1191 print("")192

193 #Remove unwanted data194 r.result_dates_now = r.result_dates_now [:j, :, :]195 r.result_months_active = r.result_months_active [:j, :, :]196 r.result_is_closed = r.result_is_closed [:j,:,:]197 r.result_eventtype=r.result_eventtype [:j, :, :]198 r.result_payments=r.result_payments [:j,:,:]199

200 return r

LISTING A.3: Reserving Model

60 Appendix B. Run-off Triangle

Appendix B

Run-off Triangle

B.1 IBNR and RBNS reserve in a run-off triangle

OY/DY 2015 2016 2017 2018

2015 514 489 323 196

2016 1.439 1.335 1.281 826

2017 1.889 1.806 1.281 826

2018 2.129 2.052 1.454 899

0.950 0.674 0.601

OY/DY 2015 2016 2017 2018

2015 2.290 2.221 1.599 1.025

2016 2.450 2.381 1.746 1.140

2017 2.609 2.321 1.638 1.069

2018 2.951 2.605 1.808 1.101

0.929 0.713 0.640

IBNR Reserve

DY to DY factors for IBNR

RBNS Reserve

DY to DY factors for RBNS

FIGURE B.1: An example for a run-off triangle as used during mostclassical reserving approaches. OY denotes the occurrence year (theyear at which an event occurs). DY denotes the development year(the year at which the payments are made for a claim with a partic-ular occurrence year). The white areas of the table are known to theinsurer at the present moment. The grey parts are projections. Projec-

tions are made using the factors below the table.

61

Appendix C

Algorithms for sampling reserves

Algorithm 1: Algorithm for sampling the RBNS reserveData: RBNS claimsResult: Samples from the RBNS Reserve (rRBNS)Input: Number of samples of the RBNS reserve(Nsamples), Occurrence year

(yocc), Development year (ydevel), claim and policy covariates (ζ)

1 rRBNS ← 0;2 for i← 1 to Nsamples do3 for j← 1 to number of open claims do4 k← number of payments already in claim j;5 tset[j]← SimSettlement(tpay[k, j]− trep[j], ζ);6 end

Simulate claim developments ∆tset7 while open claims in yocc do8 for j← 1 to number of open claims do9 k← number of payments already in claim j;

10 if tpay[k, j] < tset[j] or tpay[k, j] > ydevel then11 tpay[k + 1, j]← tpay[k, j]+SimDelay(tpay[k, j]− trep[j], ζ[j]);12 ptype[k + 1, j]←SimPaymentType(tpay[k + 1, j]− trep[j], ζ[j]);13 pamount[k + 1, j]←SimPaymentAmount(tpay[k + 1, j]− trep[j], ζ[j]);14 else15 close claim;16 end17 end18 end

Aggregate paths to get samples from the reserves19 for j← 1 to number of claims in yocc do20 for m← 1 to number of payments in claim do21 if tpay[m, j] in ydevel then22 if ptype[m, j] is payment then23 rRBNS[i]← rRBNS[i] + pamount[m, j];24 else25 rRBNS[i]← rRBNS[i]− pamount[m, j];26 end27 end28 end29 end30 end

62 Appendix C. Algorithms for sampling reserves

Algorithm 2: Algorithm for sampling the IBNR reserveResult: Samples from the IBNR Reserve (rIBNR)Input: Number of samples of the IBNR reserve(Nsamples), Occurrence year

(yocc), Development year (ydevel), claim and policy covariates (ζ)

1 rIBNR ← 0;2 for i← 1 to Nsamples do3 foreach month in months of yocc do4 claims [month ] = SimOccurrence(month);5 end6 for j← 1 to number of claims do7 trep[j]← SimReporting(ζ[j]);8 end9 for j← 1 to number of claims do

10 k← number of payments already in claim j;11 tset[j]← SimSettlement(tpay[k, j]− trep[j], ζ);12 end13 while open claims in yocc do14 for j← 1 to number of open claims do15 k← number of payments already in claim j;16 if tpay[k, j] < tset[j] or tpay[k, j] > ydevel then17 tpay[k + 1, j]← tpay[k, j]+SimDelay(tpay[k, j]− trep[j], ζ[j]);18 ptype[k + 1, j]←SimPaymentType(tpay[k + 1, j]− trep[j], ζ[j]);19 pamount[k + 1, j]←SimPaymentAmount(tpay[k + 1, j]− trep[j], ζ[j]);20 else21 close claim;22 end23 end24 end

Aggregate paths to get samples from the reserves25 for j← 1 to number of claims in yocc do26 for m← 1 to number of payments in claim do27 if tpay[m, j] in ydevel then28 if ptype[m, j] is payment then29 rRBNS[i]← rRBNS[i] + pamount[m, j];30 else31 rRBNS[i]← rRBNS[i]− pamount[m, j];32 end33 end34 end35 end36 end

63

Appendix D

Posterior Traces

FIGURE D.1: Posterior densities for the βµs (as mu) and βσ

s (as sd) pa-rameters of the model. The two sub-figures on the left show a Kernel-Density-Estimation (KDE) of the posterior samples from the modelparameters. The vertical axis denotes the number of samples (Fre-quency) and the horizontal axis denotes the value of the sample. Thefigures on the right shows the sample-number on the horizontal axis,while plotting the value of the model parameter on the vertical axis.We see that the results for the mu parameter corresponds to our ob-servations in Figure 3.3. For example, the Traffic subject seems to have

the lowest (log) average mu, which corresponds to Figure 3.3.

64 Appendix D. Posterior Traces

FIGURE D.2: Posterior densities for the mu= βµs and sigma= βσ

s pa-rameters of the censored Lognormal settlement model. See Figure D.1

for a description of the axes.

Appendix D. Posterior Traces 65

FIGURE D.3: Posterior densities for the βsubject, βdelay and σpay pa-rameters of the C+S Lognormal payment delay model. See Figure D.1

for a description of the axes.

66 Appendix D. Posterior Traces

FIGURE D.4: Posterior densities for the parameters of the paymenttype model. The titles in the figures are mapped as: alpha = α,beta_subject = βs and beta_delay = βdelay. See Figure D.1 for a descrip-

tion of the axes.

Appendix D. Posterior Traces 67

FIGURE D.5: Posterior densities for the parameters of the Lognor-mal+D+S model. The beta_mu_subject = β

µs , beta_mu_covar = β

µdelay and

alpha_sd=σamount. See Figure D.1 for a description of the axes.

69

Appendix E

Heat-maps of the occurrence process

FIGURE E.1: Heat-map of samples of the λ parameter vector of theIBNR occurrences (λIBNR) for the cases with the Contractual subject.Which can be seen as the difference between Figure 3.8 and 3.9. See

Fig. 3.8 for a description of the axes.

70 Appendix E. Heat-maps of the occurrence process

FIGURE E.2: Heatmap of posterior samples from the λ parameter ofthe occurrence rate for cases with the Traffic (420) subject-code. SeeFig. 3.8 for a description of the axes. We observe a decline in lambdain (approximately) the last month only, this corresponds to the violin

plot of the reporting delay of the Traffic subject in Figure 3.3.

FIGURE E.3: Heatmap of samples of the λ parameter (λIBNR) of theIBNR occurrences for cases with the Traffic (420) subject code. See Fig.

3.8 for a description of the axes.

micro-level stochastic loss reserving · micro-level loss reserving models break down the claim...

Documents