$' 1= e aglm>8 ¡ ³ × Ü î » b s u b ¹ î ± § å « b /¡ #Ý 8 s glm...
TRANSCRIPT
リスクと保険 45
AGLM
GLM
*
2019 3 15
GLM
Accurate GLM AGLM
GLM
1
1 GLM
* Guy Carpenter Japan, Inc. [email protected] [email protected] [email protected]
46 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
GLM: Generalized Linear Model
Accurate GLM2 AGLM GLM
GLM
AGLM GLM Discretization O Ordinal Dummy VariablesRegularization
GLM
GLMAGLM
AGLM
AGLM GLM AGLMGAM: Generalized Additive Model CART: Classification And Regression Tree
GLM GLM GLM
3
2 Accurate 3 [3] [22]
リスクと保険 47
(1)
4
0
(2)
GLM GLM
GLM [1]
(3)
2.1
2.1
4
5
48 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
GLMGLM
Tweedie
GLM
GLM
GAM
CART
GAM GAM [2]
(4)
GAM
Lowess
GCV [3] GLM GAM
リスクと保険 49
CART [4]
[5]
Years Hits
{Years 4.5} {Years 4.5} {Years 4.5} {Years 4.5 Hits 117.5} {Years 4.5 Hits 117.5}
2.2
2.2
Recursive Binary Splitting Algorithm BSA
{ }
Pruning
50 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
Bagging Boosting [6]
Random Forest [7]
6
GLM O
AGLM
20 60GLM
7
Discretization
O O
(5)
7
リスクと保険 51
1 1 0 0 0 2 0 1 0 0 3 0 0 1 0
0 0 0 1
O
O O Ordinal Dummy Variables
1 0
(6)
3.1 10 89 10 10 1 202 O
3.1 O
35 3 0 0 1 0 0 0 0 0 45 4 0 0 0 1 0 0 0 0
O
35 3 0 0 0 1 1 1 1 1 45 4 0 0 0 0 1 1 1 1
8 split-coding [28] Thermometer Encoding [25]
52 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
O3.1 30 40
O
A B
Regularization
O
(7)
0
Ridge [8] Lasso [9] Elastic Net [10]1
リスクと保険 53
(8)
Lasso
Elastic Net
GLM
(9)
Accurate GLM AGLM AGLM
AGLM O
GLM O
GLM
EDA: Exploratory Data Analysis
GLM 9
[11] [12]GLM
AGLM
GLM O
Lasso OFused Lasso
10
9 [24] GLM [27] [26] O [28] O
Fused Lasso 10 Fused Lasso O Fused Lasso
54 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
AGLM
10 8910 O 3.2
3.2 O
15 1 0 1 1 1 1 1 1 1 25 2 0 0 1 1 1 1 1 1 35 3 0 0 0 1 1 1 1 1 45 4 0 0 0 0 1 1 1 1
Lasso O
10 20 30 50 60 7080
AGLM O
11
Equal Width Method
AGLM GLM
GLM
AGLM
GAM GAM
AGLM GAMGLM
AGLM AGLM
[21] 11 [23]
リスクと保険 55
AGLM
AGLM
AGLM
AGLM
AGLM
OAGLM
AGLM
AGLM
AGLM
FD Frequency Damageability Method
AGLM 4.1 Data 1
4.1 Data 0
4.1 Data 1 Data 8
56 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
4.1
Data 0 4.4.2
Data 1 Predictive Modeling vol.2 [13]
Chap.1 sim-modeling-dataset.csv Data 2 NL Ins Pricing with GLM [14]
Case Study mccase.txt [15] Data 3 MACQUARIE University [16] car.csv
Data 4 R CASdatasets [17]
freMTPLfreq/sev Data 5 R CASdatasets freMPL1
Data 6 MACQUARIE University persinj.xls
Data 7 kaggle Automobile Dataset [18]
Data 8 kaggle Medical Cost Personal Datasets [19]
4.2 Data
6 Data 8 NA
Data 2 Duration 0 0
4.2
Data 0 50,000 8,543 Data 1 40,760 3,169 Data 2 64,548 670 Data 3 67,856 4,624 Data 4 413,169 15,390 Data 5 30,595 3,265 Data 6 NA 22,036 Data 7 NA 164 Data 8 NA 1,338
AGLM
リスクと保険 57
50%
AGLM
4.2 Data 6 Data 8
Tweedie AGLM4.2 Data 1 1
AGLM
GLM
GLM Ridge/Lasso/Elastic Net
AGLM
AGLM Ridge/Lasso/Elastic Net
GAM
CART
Tweedie Tweedie
Tweedie
4.3 Full Penal
Elastic Net Ridge LassoRidge Lasso Elastic Net
AGLM/Full AGLM AGLM AGLMO
4.3 Freq Freq Sev Sev
58 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
4.3 Freq Sev
GLM/Full GLM/Penal/ GLM/Penal/ .5 GLM/Penal/ AGLM/Full AGLM/Penal/ AGLM/Penal/ .5 AGLM/Penal/ GAM/Full CART/Full
AGLM O
12
4.2 Data 1 Data 22
AGLM
AGLM OAGLM AGLM
4.2 Data 1
1 GLM O
2 GLM O
3 GLM O
4.4
AGLMLasso 1 O
12
others
リスクと保険 59
O
4.4 Data 1
4.3 13
4.5
4.5
gender Male/Female area Urban/Rural age 20 79 O
freq sev
: / /
4.6 Male/Urban/25
: /
/ 4.6
13 AGLM/Full AGLM
AGLM1 GLM O Lasso
0.39128 0.39684 699147833.77139 0.39037
2 GLM Lasso O0.39128 0.39063 0.39553 0.39037
3 GLM Lasso O0.39128 0.39684 0.39553 0.39037
60 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
4.6
4.7
4.7 4.8/
/
4.8 AGLM GAMO
AGLM w/ Ridge ( )
4.9 4.94.10
4.10 AGLM GAM
AGLM w/ Lasso ( )
AGLM GAM
λ λ α β α/βBase 0.16 0.16 1 50gender Male/Female M + 0.02 + 0.02 + 0 + 0
F + 0 + 0 + 0 + 0area Urban/Rural U + 0.02 + 0.02 + 0.20 + 10
R + 0 + 0 + 0 + 0age 20-79 20-29 + 0.02 + 0.02 + 0 + 0
30-69 + 0 + 0 + 0 + 040-49 + 0 + 0 + 0.2 + 1050-59 + 0 + 0 + 0.2 + 1060-69 + 0 + 0 + 0 + 070-79 + 0.02 + 0.02 + 0 + 0
50,000 8,543
Freq Sev~Poisson(λ) ~Gamma(α, β)
0.02
リスクと保険 61
4.7
1
4.8
2
GLM
Freq
Freq Ridg
eFr
eqEn
et (α
:0.5
)Fr
eq Lass
oFr
eq Ridg
eFr
eqEn
et (α
:0.5
)Fr
eq Lass
oFr
eq GA
MFr
eqC
ART
Fem
ale
Rura
l20
-29
0.18
00.
172
0.17
30.
178
0.17
80.
181
0.17
80.
178
0.17
70.
188
30-6
90.
160
0.17
00.
171
0.17
80.
178
0.17
00.
172
0.17
20.
167
0.18
870
-79
0.18
00.
168
0.16
90.
178
0.17
80.
179
0.17
80.
178
0.17
50.
188
Urba
n20
-29
0.20
00.
194
0.19
40.
193
0.19
30.
199
0.19
70.
198
0.20
00.
188
30-6
90.
180
0.19
20.
192
0.19
30.
193
0.18
70.
191
0.19
10.
189
0.18
870
-79
0.20
00.
190
0.19
00.
193
0.19
30.
197
0.19
70.
197
0.19
80.
188
Mal
eRu
ral
20-2
90.
200
0.18
00.
181
0.18
00.
180
0.18
90.
184
0.18
40.
186
0.18
830
-69
0.18
00.
178
0.17
90.
180
0.18
00.
178
0.17
70.
177
0.17
50.
188
70-7
90.
200
0.17
60.
177
0.18
00.
180
0.18
70.
183
0.18
30.
184
0.18
8Ur
ban
20-2
90.
220
0.20
40.
203
0.19
40.
194
0.20
70.
203
0.20
30.
209
0.18
830
-69
0.20
00.
201
0.20
10.
194
0.19
40.
195
0.19
60.
196
0.19
80.
188
70-7
90.
220
0.19
90.
199
0.19
40.
194
0.20
60.
202
0.20
20.
207
0.18
8
TRUE
GLM
AG
LM
GLM
λλ
Freq
Freq Ridg
eFr
eqEn
et (α
:0.5
)Fr
eq Lass
oFr
eq Ridg
eFr
eqEn
et (α
:0.5
)Fr
eq Lass
oFr
eq GA
MFr
eqC
ART
Base
0.16
0.16
gend
erM
ale/
Fem
ale
Mal
e+
0.02
+ 0.
02+
0.01
+ 0.
01+
0.00
+ 0.
00+
0.01
+ 0.
01+
0.01
+ 0.
01▲
0.0
0Fe
mal
e+
0+
0+
0+
0+
0+
0+
0+
0+
0+
0+
0ar
eaU
rban
/Rur
alU
rban
+ 0.
02+
0.02
+
0.02
+ 0.
02+
0.01
+ 0.
01+
0.02
+ 0.
02+
0.02
+ 0.
02▲
0.0
0Ru
ral
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
age
20-7
920
-29
+ 0.
02+
0.02
+ 0.
00+
0.00
+ 0.
00+
0.00
+ 0.
01+
0.01
+ 0.
01+
0.01
+ 0.
0030
-69
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
70-7
9+
0.02
+ 0.
02▲
0.0
0▲
0.0
0+
0.00
+ 0.
00+
0.01
+ 0.
01+
0.01
+ 0.
01+
0.00
0.67
402
0.67
402
0.67
423
0.67
423
0.67
281
0.67
350
0.67
379
0.67
379
0.67
380
Freq
~Poi
sson
(λ)
GLM
AG
LM
62 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
4.9
1
4.10
2
GLM
Sev
Sev
Ridg
eSe
vEn
et (α
:0.5
)Se
vLa
sso
Sev
Ridg
eSe
vEn
et (α
:0.5
)Se
vLa
sso
Sev
GA
MSe
vC
ART
Fem
ale
Rura
lO
ther
s50
5555
5555
5251
5151
6040
-59
5055
5555
5562
6161
6160
Urba
nO
ther
s60
6464
6363
5959
5960
6040
-59
6064
6463
6370
7071
7160
Mal
eRu
ral
Oth
ers
5054
5455
5552
5151
5160
40-5
950
5454
5555
6261
6160
60Ur
ban
Oth
ers
6063
6363
6359
5959
5960
40-5
960
6363
6363
7071
7170
60
TRUE
GLM
AG
LM
GLM
αβ
α/β
Sev
Sev
Ridg
eSe
vEn
et (α
:0.5
)Se
vLa
sso
Sev
Ridg
eSe
vEn
et (α
:0.5
)Se
vLa
sso
Sev
GA
MSe
vC
ART
Base
150
gend
erM
ale/
Fem
ale
Mal
e+
0+
0▲
1.0
▲ 0
.9▲
0.0
+ 0.
0▲
0.4
▲ 0
.0▲
0.0
▲ 1
.0+
0.0
Fem
ale
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
area
Urb
an/R
ural
Urb
an+
0.2
+ 10
+ 9.
2+
8.9
+ 7.
2+
7.2
+ 7.
7+
8.4
+ 8.
5+
9.5
+ 0.
0Ru
ral
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
+ 0
age
20-7
9O
ther
s+
0+
0+
0+
0+
0+
0+
0+
0+
0+
0+
040
-59
+ 0.
2+
10▲
0.0
▲ 0
.0+
0.0
▲ 0
.0+
10.6
+ 10
.7+
10.8
+ 10
.3▲
0.0
0.90
130
0.90
130
0.90
158
0.90
157
0.89
096
0.89
097
0.89
096
0.89
212
0.90
677
0.02
Sev
~Gam
ma(α,
β)
GLM
AG
LM
リスクと保険 63
4.2
4.2 Data 6 Data 8
4.11 AGLM w/Elastic Net ( ) Lasso ( ) 4.11
Freq Freq 14
4.11
1 10NA Data 6 Data 8
AGLM GLM GLM AGLM AGLM
O4.12
4.12 AGLM
Freq Freq Freq Freq
AGLM Freq Freq AGLM
14
Data1 Data2 Data3 Data4 Data5 Data6 Data7 Data8
Freq GLM/full 7 8 8 10 NA 8.3Freq GLM/Penal/α=0 5 7 4 9 1 5.2Freq GLM/Penal/α=0.5 3 6 5 8 4 5.2Freq GLM/Penal/α=1 4 5 6 7 3 5.0Freq AGLM/full 10 10 9 5 NA 8.5Freq AGLM/Penal/α=0 8 3 1 4 2 3.6Freq AGLM/Penal/α=0.5 1 2 3 2 6 2.8Freq AGLM/Penal/α=1 2 4 2 3 5 3.2Freq GAM/full 6 1 7 1 8 4.6Freq CART/full 9 9 10 6 7 8.2
64 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
4.13 GLM AGLM w/ Ridge ( ) 4.13 Sev
4.13
1 10
NA
AGLM O
4.14 AGLM GLM
AGLM
AGLMO AGLM GLM
AGLM
AGLMAGLM AGLM
EDA
Data1 Data2 Data3 Data4 Data5 Data6 Data7 Data8
Sev GLM/full 8 9 7 9 8 5 8 5 7.4Sev GLM/Penal/α=0 1 6 1 7 2 4 3 1 3.1Sev GLM/Penal/α=0.5 4 5 2 3 3 7 6 3 4.1Sev GLM/Penal/α=1 5 4 2 3 3 6 2 4 3.6Sev AGLM/full 10 10 10 10 10 3 9 10 9.0Sev AGLM/Penal/α=0 7 1 6 2 1 2 1 7 3.4Sev AGLM/Penal/α=0.5 3 8 2 3 3 9 5 6 4.9Sev AGLM/Penal/α=1 2 3 2 3 3 9 4 9 4.4Sev GAM/full 9 2 9 1 9 1 NA 2 4.7Sev CART/full 6 7 8 8 7 8 7 8 7.4
リスクと保険 65
4.14 AGLM
Sev Sev Sev Sev
AGLM Sev Sev AGLM
4.15 100 10 1015
4.15
Tweedie 1 100
Data 6 Data 8
AGLM w/ ENET ( ) Lasso ( ) 4.15 Freq
AGLM w/ Ridge ( ) Lasso ( ) 4.15 Sev
GLM
AGLM
Data1 Data2 Data3 Data4 Data5 Data6 Data7 Data8
Freq *Sev 19 4 26 24 26 19.8Freq *Sev 37 11 39 17 5 21.8Freq *Sev 17 3 30 29 30 21.8Freq *Sev 42 12 38 16 4 22.4Freq *Sev 3 32 44 28 9 23.2Freq *Sev 20 21 26 24 26 23.4Freq *Sev 2 30 45 33 11 24.2Freq *Sev 51 7 34 18 13 24.6Freq *Sev 28 20 26 24 26 24.8Freq *Sev 75 2 3 23 21 24.8Freq *Sev 18 19 30 29 30 25.2Freq *Sev 29 23 26 24 26 25.6Freq *Sev 21 18 30 29 30 25.6Freq *Sev 22 22 30 29 30 26.6Freq *Sev 56 27 41 15 2 28.2
15
66 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
OAGLM
AGLM GLM
GAM CARTAGLM
AGLM GLM O
GLM
AGLM
AGLM
AGLM GLMAGLM
4.4.3 AGLM
Ridge Lasso AGLM
Ridge Lasso
EDA
AGLM
50%
k
リスクと保険 67
AGLM AGLM O
AGLM GLMGLM
O
EDA
EDAOne-way
Two-way
EDA
AGLM
GLM
ASTIN
68 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
Tweedie AGLM R h2o [20] A1.1
A1.1 Tweedie AGLM
Aggregate Loss
Twd AGLM/Penal/ /VarPower /LinkPower
Twd AGLM/Penal/ /VarPower /LinkPower
Twd AGLM/Penal/ /VarPower /LinkPower
VarPower Tweedie
15 LinkPower 16
LinkPower LinkPower
4.4.3 Freq Sev Tweedie AGLM103 Data 1 Tweedie
A1.2 1 103 Tweedie AGLM 50 70
A1.2 Tweedie
1 103
15 16 LinkPower =
Data1
Freq *Sev 1Freq *Sev 2Freq *Sev 3Freq *Sev 4Freq *Sev 5
Twd 50Twd 51
Twd 73
Freq *Sev 103
リスクと保険 69
4.2 Data 1/ Data 2 / /
AGLM OO
AGLM
A2.1
Data 2 /
EDA
70 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
A2.1
Twee
die
Dat
a1D
ata2
A
B A
- B
A
B C
A -
CB
- C
AG
LM/P
enal
/α=0
0.38
216
0.38
300
-0.0
0084
0.08
934
0.08
907
0.09
050
-0.0
0116
-0.0
0142
AG
LM/P
enal
/α=0
.50.
3855
30.
3853
00.
0002
30.
0898
80.
0903
10.
0904
1-0
.000
53-0
.000
10A
GLM
/Pen
al/α
=10.
3853
50.
3851
40.
0002
10.
0896
50.
0902
00.
0904
4-0
.000
79-0
.000
23A
GLM
/Pen
al/α
=01.
9865
81.
9946
2-0
.008
041.
3641
31.
4674
61.
4850
5-0
.120
92-0
.017
59A
GLM
/Pen
al/α
=0.5
2.05
511
2.05
536
-0.0
0025
2.00
815
2.00
975
2.00
975
-0.0
0160
0.00
000
AG
LM/P
enal
/α=1
2.05
485
2.05
516
-0.0
0030
1.83
596
1.76
212
1.76
385
0.07
211
-0.0
0173
44.5
9357
44.7
4592
-0.1
5235
89.7
4707
91.3
9521
101.
5036
3-1
1.75
657
-10.
1084
345
.140
9145
.249
55-0
.108
6499
.297
5699
.120
0298
.684
680.
6128
80.
4353
445
.139
0545
.248
06-0
.109
0197
.114
6596
.271
6994
.032
253.
0824
02.
2394
444
.864
5544
.917
75-0
.053
2090
.158
4492
.883
3392
.907
90-2
.749
47-0
.024
5745
.420
6545
.399
290.
0213
610
0.03
816
101.
3842
510
1.08
992
-1.0
5176
0.29
432
45.4
1866
45.3
9770
0.02
096
97.8
1352
98.0
5799
98.1
9633
-0.3
8281
-0.1
3834
44.8
4317
44.8
9627
-0.0
5310
89.8
7033
92.6
8684
92.9
4178
-3.0
7146
-0.2
5495
45.3
9720
45.3
7579
0.02
141
99.6
1975
101.
1130
310
1.11
321
-1.4
9346
-0.0
0018
45.3
9522
45.3
7420
0.02
101
97.4
2304
97.8
3014
98.2
2968
-0.8
0663
-0.3
9954
Dat
a1D
ata2
A
B A
- B
A
B C
A -
CB
- C
AG
LM/P
enal
/α=0
0.39
447
0.39
388
0.00
059
0.09
373
0.09
247
0.09
202
0.00
172
0.00
046
AG
LM/P
enal
/α=0
.50.
3906
10.
3903
60.
0002
50.
0922
90.
0920
90.
0920
00.
0002
90.
0000
9A
GLM
/Pen
al/α
=10.
3906
10.
3903
70.
0002
50.
0924
20.
0921
10.
0920
20.
0004
00.
0000
9A
GLM
/Pen
al/α
=02.
0179
22.
0141
10.
0038
12.
4396
81.
6830
61.
6778
00.
7618
80.
0052
6A
GLM
/Pen
al/α
=0.5
1.99
926
1.99
880
0.00
046
1.98
822
1.98
862
1.98
862
-0.0
0039
0.00
000
AG
LM/P
enal
/α=1
1.99
932
1.99
879
0.00
053
1.84
177
1.76
680
1.76
726
0.07
451
-0.0
0046
45.6
6566
45.4
9691
0.16
875
161.
7624
310
1.35
207
103.
1215
858
.640
85-1
.769
5145
.604
2345
.479
340.
1248
911
7.25
209
104.
1189
710
0.03
318
17.2
1892
4.08
579
45.6
0413
45.4
7878
0.12
535
115.
7546
310
0.31
819
103.
3680
812
.386
54-3
.049
8945
.258
3345
.171
210.
0871
211
4.03
695
99.5
0373
101.
3560
412
.680
91-1
.852
3145
.223
9845
.149
900.
0740
810
3.35
072
103.
6409
110
2.23
761
1.11
310
1.40
330
45.2
2381
45.1
4922
0.07
459
101.
3170
899
.467
9298
.988
702.
3283
90.
4792
245
.253
7845
.167
040.
0867
411
5.88
791
99.2
7903
101.
3092
414
.578
67-2
.030
2145
.217
7445
.143
810.
0739
310
3.66
439
103.
3298
310
2.15
077
1.51
362
1.17
906
45.2
1757
45.1
4313
0.07
445
101.
7232
799
.238
4298
.919
242.
8040
30.
3191
8
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
Freq
*Sev
リスクと保険 71
[1] P. McCullagh and J. A. Nelder, "Generalized Linear Models," CRC Press, 1972.
[2] R. T. J. and T. H. J., "Generalized Additive Models," Chapman & Hall/CRC, 1990.
[3] E. W. Frees, R. A. Derrig and G. Meyers, Predictive Modeling Applications in Actuarial Science,
Volume 1. Predictive Modeling Techniques, Cambridge University Press, 2014.
[4] B. Leo, F. J. H., O. R. A. and S. C. J., "Classification and regression trees," Monterey, CA:
Wadsworth & Brooks/Cole Advanced Books & Software, 1984.
[5] G. Witten, D. Hastie, R. Tibshirani and R. James, "An Introduction to Statistical Learning: with
Applications in R," Springer, 2013.
[6] J. R. Quinlan, "Bagging, Boosting, and C4.5," AAAI/IAAI, 2006.
[7] L. Breiman, Random Forests, Springer, 2001.
[8] A. E. Hoerl and R. W. Kennard, "Ridge regression: Biased estimation for nonorthogonal
problems," Technometrics, 1970.
[9] R. Tibshirani, "Regression Shrinkage and Selection via the lasso," Journal of the Royal Statistical
Society, 1996.
[10] H. Zou and T. Hastie, "Regularization and variable selection via the Elastic Net," Journal of the
Royal Statistical Society, 2005.
[11] J. Gertheiss and G. Tutz, "Sparse modeling of categorial explanatory variables," The Annals of
Applied Statistics, 2010.
[12] W. D. Stephen, A. R. Feinstein and C. K. Wells, "Coding ordinal independent variables in multiple
regression analyses," no. American Journal of Epidemiology, 1987.
[13] E. W. Frees, R. A. Derrig and G. Meyers, Predictive Modeling Applications in Actuarial Science,
Volume 2. Case Studies in Insurance, Cambridge University Press, 2016.
[14] E. Ohlsson and B. Johansson, Non-Life Insurance Pricing with Generalized Linear Models,
Springer, 2015.
72 AGLM:アクチュアリー実務のためのデータサイエンスの技術を用いた GLM の拡張
[15] "Case studies and examples: data and SAS programs and comments," [Online]. Available:
http://staff.math.su.se/esbj/GLMbook/case.html. [Accessed 2019].
[16] "MACQUARIE University," 2012. [Online]. Available:
http://www.businessandeconomics.mq.edu.au/our_departments/Applied_Finance_and_Actuarial_
Studies/research/books/GLMsforInsuranceData/data_sets.
[17] "Package CASdatasets," 2016. [Online]. Available: http://cas.uqam.ca/.
[18] R. Srinivasan, "kaggle - Automobile Dataset," 2019. [Online]. Available:
https://www.kaggle.com/toramky/automobile-dataset/home.
[19] M. Choi, "kaggle - Medical Cost Personal Datasets," 2019. [Online]. Available:
https://www.kaggle.com/mirichoi0218/insurance.
[20] 2019. [Online]. Available: https://cran.r-project.org/web/packages/h2o/index.html.
[21] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu and K. Knight, "Sparsity and Smoothness via the
Fused lasso," Journal of the Royal Statistical Society, 2005.
[22] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Springer, 2016.
[23] D. J., K. R. and S. M., "Supervised and unsupervised discretization of continuous features,"
Machine Learning: Proceedings of the Twelfth International Conference, 1995.
[24] J. Friedman, T. Hastie and R. Tibshirani, "Regularization Paths for Generalized Linear Models via
Coordinate Descent," J Stat Softw, 2010.
[25] S. Garavaglia and A. Sharma, "A SMART GUIDE TO DUMMY VARIABLES: FOUR
APPLICATIONS AND A MACRO," Proceedings of the Northeast SAS Users, 1998.
[26] R. Adamczak, A. Porollo and J. Meller, "Accurate prediction of solvent accessibility using neural
networks–based regression," PROTEINS, 2004.
[27] D. D. Rucker, B. B. McShane and K. J. Preacher, "A researcher's guide to regression,
discretization, and median splits of continuous variables," Journal of Consumer Psychology, 2015.
[28] J. Gertheiss, S. Hogger, C. Oberhauser and G. Tutz, "Selection of Ordinally Scaled Independent
Variables," Applied Statistics 62, 2009.
リスクと保険 73
AGLM: an extension of GLM for actuarial practice using data science techniques
Suguru Fujita*, Toyoto Tanaka†, Hirokazu Iwasawa‡
Abstract
Predictive modeling in machine learning and data science is paid much attention in recent years, and it is now one of the most important tasks for actuaries to apply it in practice. However, due to the characteristic features of insurance data and such priorities as the interpretability of models and regulatory requirements in their decision-making, most actuaries may find difficulties in using those advanced techniques. The aim of our research is not to apply existing methods per se into actuarial practice, but rather to construct methods to fulfill the actuary’s original needs. We propose, from this standpoint, Accurate GLM (AGLM), a modeling method that is developed by combining data science techniques with the generalized linear model (GLM). Keywords: Regression Analysis, Generalized Liner Model (GLM), Discretization, Dummy Variable, Regularization
* Guy Carpenter Japan Inc., Mail: [email protected] † Tokio Marine & Nichido Fire Insurance Co., Ltd., Mail: [email protected] ‡ Mail: [email protected]