generalized linear mixed models (glmms) for dependent …valdez/seoul2017-glmm-valdez.pdf ·...

Generalized linear mixed models (GLMMs) fordependent compound risk models

Emiliano A. Valdez, PhD, FSA

joint work with H. Jeong, J. Ahn and S. Park

University of Connecticut

Seminar Talk atYonsei University

Seoul, Korea

15 May 2017

Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 1 / 21

OutlineIntroduction

Data structureThe frequency-severity model

Exponential dispersion familyCovariates

Compound risk modelsGeneralized linear mixed models

Model specificationsModel estimates

Claim frequencyClaim severityTweedie models

Model comparisonGini index

ConclusionAppendix A - Singapore

Insurance marketJeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 2 / 21

Introduction Data structure

Data structure

“Policyholder” i is followed over time t = 1, . . . , Ti years, where Ti isat most 9 years.

Unit of analysis “it” – a registered vehicle insured i over time t (year)

For each “it”, could have several claims, k = 0, 1, . . . , nitHave available information on: number of claims nit, amount of claimyitk, exposure eit and covariates (explanatory variables) xit

covariates often include age, gender, vehicle type, driving history andso forth

We will model the pair (nit, cit) where

cit =

1

nit

nit∑k=1

yitk, nit > 0

0, nit = 0

is the observed average claim size.


The frequency-severity model

The frequency-severity model

Traditional to predict/estimate insurance claims distributions:

Cost of Claims = Frequency × Severity

The joint density of the number of claims and the average claim sizecan be decomposed as

f(N,C|x) = f(N |x)× f(C|N,x)joint = frequency × conditional severity.

This natural decomposition allows us to investigate/model eachcomponent separately and it does not preclude us from assuming Nand C are independent.

For purposes of notation, we will use the notation C = NC to be theaggregate claims.


The frequency-severity model Exponential dispersion family

Exponential dispersion family

We say Y comes from an exponential dispersion family if its density hasthe form

f(y) = exp

[yθ − ψ(θ)

φ+ c(y;φ)

].

where θ and ψ are location and scale parameters, respectively, and b(θ)and c(y;ψ) are known functions.

The following well-known relations hold for these distributions:

mean: µ = E[Y ] = ψ′(θ)

variance: V ar[Y ] = φψ′′(θ) = φV (µ)

Reproductive EDF: If Y1, . . . , YN are mutually independent belonging tothe EDF(θ, φ), then its average Y also belongs to EDF(θ, φ/N).



Examples of members of this family

Normal N(µ, σ2) with θ = µ and φ = σ2, V (µ) = 1

Gamma(α, β) with θ = −β/α and φ = 1/α, V (µ) = µ2

Inverse Gaussian(α, β) with θ = −12β

2/α2 and φ = β/α2, V (µ) = µ3

Poisson(µ) with θ = logµ and φ = 1, V (µ) = 1, V (µ) = µ

Binomial(m, p) with θ = log [p/(1− p)] and φ = 1,V (µ) = µ(1− µ/m)

Negative Binomial(r, p) with θ = log(1− p) and φ = 1,V (µ) = µ(1 + µ/r)



The link function and linear predictors

The linear predictor is a function of the covariates, or predictorvariables.

The link function connects the mean of the response to this linearpredictor in the form η = g(µ).

g(µ) is called the link function with µ = E(Y ) = linear predictor.

The transformed mean follows a linear model as

η = x′iβ = β0 + β1xi1 + · · ·+ βpxip

with p predictor variables (or covariates).

You can actually invert µ = g−1(x′iβ) = g−1(η).


The frequency-severity model Covariates

Covariates

Vehicle Type: automotive (A) or others (O).

Marital status: married (M), single (S), others (O)

Gender: male (M) or female (F).

Coverage type: Comprehensive (Comp) or Others

Age: in years, grouped into 3 categories

Young (Y), Middle Aged (M), Old (O)


Compound risk models

Compound risk models

See Garrido, et al. (2016)

In the case of independence, we have:

Mean: E(C) = E(N)E(C)

Variance: V ar(C) = E(N)V ar(C) + V ar(N)[E(C]2

In case we do not assume independence, we have

Mean: E(C) = E[NE(C|N))] 6= E(N)E(C)

Variance: V ar(C) = E[NV ar(C|N))] + V ar[NE(C|N)]


Compound risk models Generalized linear mixed models

Generalized linear mixed models

GLMMs extend GLMs by allowing for random, or subject-specific, effectsin the linear predictor which reflects the idea that there is a naturalheterogeneity across subjects. For insurance applications, our subject i isusually a policyholder observed for a period of Ti periods.

Given the vector bi with the random effects for each subject i, Yit belongsto the EDF:

f(yit|bi) = exp

[yitθit − ψ(θit)

φ+ c(yit;φ)

]with the following conditional relations:

mean: µit = E[Yit|bi] = ψ′(θit)

variance: V ar[Y |bi] = φψ′′(θ + it) = φV (µit)

Link function: g(µit) = x′itβ + z′itbi


Model specifications

Model specifications

We calibrated models with the following specifications:

For the count of claims (frequency): negative binomial for ourbaseline model

more suitable than the traditional Poisson model because it can handleoverdispersion and individual unobserved effects

For the severity of claims: Gamma (due to its reproductive property)

For the random effects for GLMMs: Normal with mean zero andunknown variances

different variances for frequency and claim severity

The Tweedie model is a special case of the GLM family and it hasmass at zero (avoids modeling frequency and severity separately).

for some case, its density has no explicit form, but the variancefunction has the form V (µ) = µp e.g. compound Poisson Gamma with1 < p < 2.


Model estimates Claim frequency

Negative binomial model for claim frequencyNB GLM NB GLMM

(Intercept) −2.66∗∗∗ −2.67∗∗∗(0.08) (0.12)

VTypeCar −0.01 0.01(0.07) (0.12)

MaritalM −0.53 −0.54(0.35) (0.54)

SexM 0.11∗∗∗ 0.11∗∗

(0.02) (0.04)Comp 1.09∗∗∗ 1.09∗∗∗

(0.03) (0.05)AgeYoung 0.25∗∗ 0.23

(0.08) (0.13)AgeMid 0.14∗∗∗ 0.14∗∗∗

(0.02) (0.03)

AIC 104835.76 104922.06BIC 104915.37 105011.62Log Likelihood -52409.88 -52452.03Deviance 65407.89Num. obs. 155063 155063Num. groups: ID 49978Variance: ID.(Intercept) 0.28Variance: Residual 2.27∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05

NB GLM NB GLMM

MAE 0.2038 0.2039MSE 0.2390 0.2391


Model estimates Claim severity

Gamma distributions models for claim frequencyGamma-indep GLM Gamma-dep GLM Gamma-dep GLMM

(Intercept) 9.24∗∗∗ 8.86∗∗∗ 8.85∗∗∗

(0.56) (0.31) (0.31)VTypeCar 0.43 −0.70∗ −0.69∗

(0.53) (0.29) (0.28)MaritalM −1.34 −0.51 −0.51

(2.29) (1.25) (1.24)SexM 0.10 0.03 0.02

(0.15) (0.08) (0.08)Comp 0.41 0.48∗∗∗ 0.48∗∗∗

(0.23) (0.12) (0.12)AgeYoung −0.56 −0.01 −0.01

(0.56) (0.31) (0.30)AgeMid 0.06 −0.01 −0.01

(0.14) (0.07) (0.07)log(Freq) −0.94∗∗∗ −0.94∗∗∗

(0.04) (0.04)

AIC 292952.28 280007.81 280008.25BIC 293012.39 280075.44 280083.39Log Likelihood -146468.14 -139994.90 -139994.12Deviance 40735.09 19294.67Num. obs. 13548 13548 13548Num. groups: Year 8Variance: Year.(Intercept) 0.01Variance: Residual 13.77∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05

Gamma-indep GLM Gamma-dep GLM Gamma-dep GLMM

MAPE 141 1.49 1.49MAE 21738 1576 1575MSE 22803 4937 4937


Model estimates Tweedie models

Tweedie model estimatesTweedie GLM Tweedie GLMM

(Intercept) 5.69 4.38VTypeCar 0.45 −0.18MaritalM −2.03 −1.91SexM 0.15 −0.53Comp 1.64 1.86AgeYoung −0.34 −0.04AgeMid 0.16 0.19

AIC 385205 376779BIC 385294 376869Log Likelihood -192593 -188381Deviance 385187 376761Num. obs. 155063 155063Num. groups: ID 49978Variance: ID.(Intercept) 2.33Variance: Residual 594.78

Tweedie GLM Tweedie GLMM

MAE 2131 478MSE 3020 2125


Model comparison Gini index

Using the Gini index to compare models

See Frees, et al. (2016).

In computing the Gini index, we employ the following steps:

1 Sort observed claims Yi|i = 1, · · · ,M according to risk scoreSi|i = 1, · · · ,M (in our case, estimated value from each method) inascending order. That is, calculate Ri|i = 1, · · · ,M , where each Ri isthe rank of Si between 1 and M so that R1 = arg max (Si).

2 Compute Fscore(m/M) =∑M

i=1 1(Ri≤m)/M ,

Floss(m/M) =∑M

i=1 Yi1(Ri≤m)/∑M

i=1 Yi for each m = 1, · · · ,M .

3 Plot Fscore(m/M) on x-axis, and Floss(m/M) on y-axis.



Comparing the Gini index for the two-part models



Comparing the Gini index for the Tweedie models


Conclusion

Concluding remarks

We are still in the early stages of this work.

In particular, this is an early exploration of the promise of usingGLMMs to account for dependency between frequency and severity.

This work hopes to extend the literature on this type of dependency:

H. Jeong (2016) work on ”Simple Compound Risk Model withDependent Structure”Garrido, et al. (2016)Frees and Wang (2006), Frees, et al. (2011)Czado (2012)Shi, et al. (2015)

Further work is needed to measure the financial impact of usingGLMMs versus other models:

risk classificationa posteriori prediction and applications in bonus-malus systemsmodified insurance coverage and reinsurance applications


Appendix A - Singapore

A bit about Singapore


Appendix A - Singapore

A bit about Singapore 1

Singa Pura: Lion city. Location: 136.8 km N of equator, betweenlatitudes 103 deg 38’ E and 104 deg 06’ E. [islands between Malaysiaand Indonesia]

Size: very tiny [647.5 sq km, of which 10 sq km is water] Climate:very hot and humid [23-30 deg celsius]

Population: nearly 5 mn. Age structure: 0-14 yrs: 16%, 15-64 yrs:76%, 65+ yrs 8%

Birth rate: 9.34 births/1,000. Death rate: 4.28 deaths/1,000; Lifeexpectancy: 81 yrs; male: 79 yrs; female: 83 yrs

Ethnic groups: Chinese 74%, Malay 13%, Indian 9%; Languages:Chinese, Malay , Tamil, English

1Updated: February 2010Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 20 / 21

Appendix A - Singapore Insurance market

Insurance market in Singapore

As of 2009 2: market consists of 45 general ins, 8 life ins, 7 both, 17general reinsurers, 2 life reins, 7 both; also the largest captivedomicile in Asia, with 59 registered captives.

Monetary Authority of Singapore (MAS) is the supervisory/regulatorybody; also assists to promote Singapore as an international financialcenter.

Insurance industry performance in 2009:

total premiums: 11.4 bn; total assets: 113.3 bn [20% annual growth]life insurance: annual premium = 251.6 mn; single premium = 759.5mngeneral insurance: gross premium = 1.9 bn (domestic = 0.9; offshore= 1.0)

Further information: http://www.mas.gov.sg

2Source: wikipediaJeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 21 / 21

generalized linear mixed models (glmms) for dependent …valdez/seoul2017-glmm-valdez.pdf ·...

Documents