generalized linear mixed models (glmms) for dependent …valdez/seoul2017-glmm-valdez.pdf ·...
TRANSCRIPT
Generalized linear mixed models (GLMMs) fordependent compound risk models
Emiliano A. Valdez, PhD, FSA
joint work with H. Jeong, J. Ahn and S. Park
University of Connecticut
Seminar Talk atYonsei University
Seoul, Korea
15 May 2017
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 1 / 21
OutlineIntroduction
Data structureThe frequency-severity model
Exponential dispersion familyCovariates
Compound risk modelsGeneralized linear mixed models
Model specificationsModel estimates
Claim frequencyClaim severityTweedie models
Model comparisonGini index
ConclusionAppendix A - Singapore
Insurance marketJeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 2 / 21
Introduction Data structure
Data structure
“Policyholder” i is followed over time t = 1, . . . , Ti years, where Ti isat most 9 years.
Unit of analysis “it” – a registered vehicle insured i over time t (year)
For each “it”, could have several claims, k = 0, 1, . . . , nitHave available information on: number of claims nit, amount of claimyitk, exposure eit and covariates (explanatory variables) xit
covariates often include age, gender, vehicle type, driving history andso forth
We will model the pair (nit, cit) where
cit =
1
nit
nit∑k=1
yitk, nit > 0
0, nit = 0
is the observed average claim size.
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 3 / 21
The frequency-severity model
The frequency-severity model
Traditional to predict/estimate insurance claims distributions:
Cost of Claims = Frequency × Severity
The joint density of the number of claims and the average claim sizecan be decomposed as
f(N,C|x) = f(N |x)× f(C|N,x)joint = frequency × conditional severity.
This natural decomposition allows us to investigate/model eachcomponent separately and it does not preclude us from assuming Nand C are independent.
For purposes of notation, we will use the notation C = NC to be theaggregate claims.
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 4 / 21
The frequency-severity model Exponential dispersion family
Exponential dispersion family
We say Y comes from an exponential dispersion family if its density hasthe form
f(y) = exp
[yθ − ψ(θ)
φ+ c(y;φ)
].
where θ and ψ are location and scale parameters, respectively, and b(θ)and c(y;ψ) are known functions.
The following well-known relations hold for these distributions:
mean: µ = E[Y ] = ψ′(θ)
variance: V ar[Y ] = φψ′′(θ) = φV (µ)
Reproductive EDF: If Y1, . . . , YN are mutually independent belonging tothe EDF(θ, φ), then its average Y also belongs to EDF(θ, φ/N).
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 5 / 21
The frequency-severity model Exponential dispersion family
Examples of members of this family
Normal N(µ, σ2) with θ = µ and φ = σ2, V (µ) = 1
Gamma(α, β) with θ = −β/α and φ = 1/α, V (µ) = µ2
Inverse Gaussian(α, β) with θ = −12β
2/α2 and φ = β/α2, V (µ) = µ3
Poisson(µ) with θ = logµ and φ = 1, V (µ) = 1, V (µ) = µ
Binomial(m, p) with θ = log [p/(1− p)] and φ = 1,V (µ) = µ(1− µ/m)
Negative Binomial(r, p) with θ = log(1− p) and φ = 1,V (µ) = µ(1 + µ/r)
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 6 / 21
The frequency-severity model Exponential dispersion family
The link function and linear predictors
The linear predictor is a function of the covariates, or predictorvariables.
The link function connects the mean of the response to this linearpredictor in the form η = g(µ).
g(µ) is called the link function with µ = E(Y ) = linear predictor.
The transformed mean follows a linear model as
η = x′iβ = β0 + β1xi1 + · · ·+ βpxip
with p predictor variables (or covariates).
You can actually invert µ = g−1(x′iβ) = g−1(η).
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 7 / 21
The frequency-severity model Covariates
Covariates
Vehicle Type: automotive (A) or others (O).
Marital status: married (M), single (S), others (O)
Gender: male (M) or female (F).
Coverage type: Comprehensive (Comp) or Others
Age: in years, grouped into 3 categories
Young (Y), Middle Aged (M), Old (O)
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 8 / 21
Compound risk models
Compound risk models
See Garrido, et al. (2016)
In the case of independence, we have:
Mean: E(C) = E(N)E(C)
Variance: V ar(C) = E(N)V ar(C) + V ar(N)[E(C]2
In case we do not assume independence, we have
Mean: E(C) = E[NE(C|N))] 6= E(N)E(C)
Variance: V ar(C) = E[NV ar(C|N))] + V ar[NE(C|N)]
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 9 / 21
Compound risk models Generalized linear mixed models
Generalized linear mixed models
GLMMs extend GLMs by allowing for random, or subject-specific, effectsin the linear predictor which reflects the idea that there is a naturalheterogeneity across subjects. For insurance applications, our subject i isusually a policyholder observed for a period of Ti periods.
Given the vector bi with the random effects for each subject i, Yit belongsto the EDF:
f(yit|bi) = exp
[yitθit − ψ(θit)
φ+ c(yit;φ)
]with the following conditional relations:
mean: µit = E[Yit|bi] = ψ′(θit)
variance: V ar[Y |bi] = φψ′′(θ + it) = φV (µit)
Link function: g(µit) = x′itβ + z′itbi
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 10 / 21
Model specifications
Model specifications
We calibrated models with the following specifications:
For the count of claims (frequency): negative binomial for ourbaseline model
more suitable than the traditional Poisson model because it can handleoverdispersion and individual unobserved effects
For the severity of claims: Gamma (due to its reproductive property)
For the random effects for GLMMs: Normal with mean zero andunknown variances
different variances for frequency and claim severity
The Tweedie model is a special case of the GLM family and it hasmass at zero (avoids modeling frequency and severity separately).
for some case, its density has no explicit form, but the variancefunction has the form V (µ) = µp e.g. compound Poisson Gamma with1 < p < 2.
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 11 / 21
Model estimates Claim frequency
Negative binomial model for claim frequencyNB GLM NB GLMM
(Intercept) −2.66∗∗∗ −2.67∗∗∗(0.08) (0.12)
VTypeCar −0.01 0.01(0.07) (0.12)
MaritalM −0.53 −0.54(0.35) (0.54)
SexM 0.11∗∗∗ 0.11∗∗
(0.02) (0.04)Comp 1.09∗∗∗ 1.09∗∗∗
(0.03) (0.05)AgeYoung 0.25∗∗ 0.23
(0.08) (0.13)AgeMid 0.14∗∗∗ 0.14∗∗∗
(0.02) (0.03)
AIC 104835.76 104922.06BIC 104915.37 105011.62Log Likelihood -52409.88 -52452.03Deviance 65407.89Num. obs. 155063 155063Num. groups: ID 49978Variance: ID.(Intercept) 0.28Variance: Residual 2.27∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
NB GLM NB GLMM
MAE 0.2038 0.2039MSE 0.2390 0.2391
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 12 / 21
Model estimates Claim severity
Gamma distributions models for claim frequencyGamma-indep GLM Gamma-dep GLM Gamma-dep GLMM
(Intercept) 9.24∗∗∗ 8.86∗∗∗ 8.85∗∗∗
(0.56) (0.31) (0.31)VTypeCar 0.43 −0.70∗ −0.69∗
(0.53) (0.29) (0.28)MaritalM −1.34 −0.51 −0.51
(2.29) (1.25) (1.24)SexM 0.10 0.03 0.02
(0.15) (0.08) (0.08)Comp 0.41 0.48∗∗∗ 0.48∗∗∗
(0.23) (0.12) (0.12)AgeYoung −0.56 −0.01 −0.01
(0.56) (0.31) (0.30)AgeMid 0.06 −0.01 −0.01
(0.14) (0.07) (0.07)log(Freq) −0.94∗∗∗ −0.94∗∗∗
(0.04) (0.04)
AIC 292952.28 280007.81 280008.25BIC 293012.39 280075.44 280083.39Log Likelihood -146468.14 -139994.90 -139994.12Deviance 40735.09 19294.67Num. obs. 13548 13548 13548Num. groups: Year 8Variance: Year.(Intercept) 0.01Variance: Residual 13.77∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
Gamma-indep GLM Gamma-dep GLM Gamma-dep GLMM
MAPE 141 1.49 1.49MAE 21738 1576 1575MSE 22803 4937 4937
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 13 / 21
Model estimates Tweedie models
Tweedie model estimatesTweedie GLM Tweedie GLMM
(Intercept) 5.69 4.38VTypeCar 0.45 −0.18MaritalM −2.03 −1.91SexM 0.15 −0.53Comp 1.64 1.86AgeYoung −0.34 −0.04AgeMid 0.16 0.19
AIC 385205 376779BIC 385294 376869Log Likelihood -192593 -188381Deviance 385187 376761Num. obs. 155063 155063Num. groups: ID 49978Variance: ID.(Intercept) 2.33Variance: Residual 594.78
Tweedie GLM Tweedie GLMM
MAE 2131 478MSE 3020 2125
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 14 / 21
Model comparison Gini index
Using the Gini index to compare models
See Frees, et al. (2016).
In computing the Gini index, we employ the following steps:
1 Sort observed claims Yi|i = 1, · · · ,M according to risk scoreSi|i = 1, · · · ,M (in our case, estimated value from each method) inascending order. That is, calculate Ri|i = 1, · · · ,M , where each Ri isthe rank of Si between 1 and M so that R1 = arg max (Si).
2 Compute Fscore(m/M) =∑M
i=1 1(Ri≤m)/M ,
Floss(m/M) =∑M
i=1 Yi1(Ri≤m)/∑M
i=1 Yi for each m = 1, · · · ,M .
3 Plot Fscore(m/M) on x-axis, and Floss(m/M) on y-axis.
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 15 / 21
Model comparison Gini index
Comparing the Gini index for the two-part models
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 16 / 21
Model comparison Gini index
Comparing the Gini index for the Tweedie models
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 17 / 21
Conclusion
Concluding remarks
We are still in the early stages of this work.
In particular, this is an early exploration of the promise of usingGLMMs to account for dependency between frequency and severity.
This work hopes to extend the literature on this type of dependency:
H. Jeong (2016) work on ”Simple Compound Risk Model withDependent Structure”Garrido, et al. (2016)Frees and Wang (2006), Frees, et al. (2011)Czado (2012)Shi, et al. (2015)
Further work is needed to measure the financial impact of usingGLMMs versus other models:
risk classificationa posteriori prediction and applications in bonus-malus systemsmodified insurance coverage and reinsurance applications
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 18 / 21
Appendix A - Singapore
A bit about Singapore
Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 19 / 21
Appendix A - Singapore
A bit about Singapore 1
Singa Pura: Lion city. Location: 136.8 km N of equator, betweenlatitudes 103 deg 38’ E and 104 deg 06’ E. [islands between Malaysiaand Indonesia]
Size: very tiny [647.5 sq km, of which 10 sq km is water] Climate:very hot and humid [23-30 deg celsius]
Population: nearly 5 mn. Age structure: 0-14 yrs: 16%, 15-64 yrs:76%, 65+ yrs 8%
Birth rate: 9.34 births/1,000. Death rate: 4.28 deaths/1,000; Lifeexpectancy: 81 yrs; male: 79 yrs; female: 83 yrs
Ethnic groups: Chinese 74%, Malay 13%, Indian 9%; Languages:Chinese, Malay , Tamil, English
1Updated: February 2010Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 20 / 21
Appendix A - Singapore Insurance market
Insurance market in Singapore
As of 2009 2: market consists of 45 general ins, 8 life ins, 7 both, 17general reinsurers, 2 life reins, 7 both; also the largest captivedomicile in Asia, with 59 registered captives.
Monetary Authority of Singapore (MAS) is the supervisory/regulatorybody; also assists to promote Singapore as an international financialcenter.
Insurance industry performance in 2009:
total premiums: 11.4 bn; total assets: 113.3 bn [20% annual growth]life insurance: annual premium = 251.6 mn; single premium = 759.5mngeneral insurance: gross premium = 1.9 bn (domestic = 0.9; offshore= 1.0)
Further information: http://www.mas.gov.sg
2Source: wikipediaJeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 21 / 21