machine learning applied to insurance problems › ~wueth › talks › 2017... · boosting machine...

Machine LearningApplied to Insurance Problems

Mario V. Wuthrich

RiskLab, ETH Zurich

Artificial Intelligence in Insurance and FinanceZurich University of Applied Sciences ZHAW

Winterthur, September 7, 2017

Insurance Technology (InsurTech)1 2

• Artificial Intelligence: learn and accumulate useful knowledge through data.

• Big Data Analytics based on the research of massive data.

• Cloud Computing for real-time operations.

• Block Chain Technology for a more efficient and anonymous exchange of data(decentralization of information, insured-insurer/insurer-insurer).

• Internet of Things: integration and interaction of physical devices (e.g. wearables,sensors) through computer systems to reduce and manage risk.

1innovation and application of advanced technology to the insurance industry2source: China InsurTech Development White Paper, 2017

1

Near Future InsurTech Applications

• health and accident insurance

• life and pension insurance

• non-life insurance

• re-insurance

B product development and insurance pricingB marketing and distribution of productsB administration, accounting and portfolio optimizationB claims handling, triage of potentially large claims and claims reservingB risk drivers analysis and risk management

2

• Section 1: Supervised Learning

3

Insurance Pricing: State-of-the-Art

Basic Assumption:There are structural differences which can be explained by a regression function

µ : X → R, x 7→ µ(x).

• X is the feature space, covariate space (containing all potential insurance policies);

• x ∈ X is the feature, covariate, explanatory variable of a single policy;

• µ(·) is the regression function/pricing functional describing structural differences.

Example of feature:

x = (age, gender, type of car, claims code, income, body weight, smoker, . . .)

4

Supervised Learning

Determine the (unknown) regression function (pricing functional)

µ : X → R, x 7→ µ(x)

from (noisy) observations D = {(Y1,x1), . . . , (Yn,xn)} being of type

Yi = µ (xi) + εi with3 E[εi] = 0.

This problem is not new in insurance, but X is increasingly more complex.

3Typically, we also assume independence between the εi for policies i = 1, . . . , n.

5

Classical and New Solutions to Pricing Problems

• Classical approaches use generalized linear models (GLMs) and generalized additivemodels (GAMs), implemented in commercial software like Emblem or SAS.

• New approaches use neural networks, regression trees, random forests, boosting.

⇒ These are still at an experimental stage in the insurance industry:

B Issue 1: standard software does not exist, yet (deviance statistics, big data);B Issue 2: data warehouses may still bear some issues;B Issue 3: stability/robustness over time still needs to be explored;B Issue 4: graphical illustrations are still poor;B Issue 5: communication to management and customers is still difficult;B Issue 6: uncertainty measures are completely missing.

We give some examples illustrating the state-of-the-art.

6

Sensitivity Testing with Neural Networks4

●

●

●● ●

●

●

●

●

●

●

●

1994 1996 1998 2000 2002 2004

0.05

0.06

0.07

0.08

0.09

0.10

reporting delay

accident year

●

●

●

● ●●

●

●

●

●

●

●

0.91

00.

915

0.92

00.

925

0.93

00.

935

0.94

0

●

●

mean of T (left axis)probability of T=0 (right axis)

●●

●

●●

●●

●●

●

●

●●

●

●

●

●

●

●

● ●

●

●

● ● ● ●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

0.00

0.05

0.10

0.15

0.20

0.25

reporting delay

injured part

10 13 16 22 25 31 34 37 42 45 52 55 61 64 70 81

● mean of T

Neural networks are used for sensitivity testing and analysis of non-linearities.

4more than 10 mio. accident claims over 12 accounting years

7

Prediction Uncertainties in Neural Networks

Process uncertainty from Yi = µ (xi) + εi and parameter uncertainty from5

logµ(x) = β0 +

q2∑l=1

βl φ

w(2)l,0 +

q1∑k=1

w(2)l,k φ

w(1)k,0 +

d∑j=1

w(1)k,jxj

.

inj_part

age

cc

LoB

AQ

C

5deep neural network with two hidden layers and 132 parameters

8

Regression Trees6 for Risk Driver Analysis7

time lags t+ 1: Y1 Y2 Y3 Y4 Y5 Y6numbers of leaves 8 11 18 12 4 4

feature components used for split questionsclaim closed cl cl cl cl cl cl

lawyer involved law law law

claims code cc cc cc cc cc

claims diagnosis diag diag diag diag diag

reporting delay j j j

previous payments Y0 Y1 Y2 Y3 Y4 Y5previous payments Y1 Y2

Payment process (Yt)t≥0: relevant feature information and Markov condition.

6CART, Breiman, Friedman, Olshen and Stone (1984)7insurance claims cash flows of 90k claims

9

Boosting Machine for the Lee-Carter Model8

tree improved Lee−Carter fit; females

year

age

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

1876

1881

1886

1891

1896

1901

1906

1911

1916

1921

1926

1931

1936

1941

1946

1951

1956

1961

1966

1971

1976

1981

1986

1991

1996

2001

2006

2011

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Stage-wise adaptive learning applied to residuals is known as a Boosting Machine.

8Swiss female mortality data: back-testing the Lee-Carter model

10

• Section 2: Unsupervised Learning

11

Telematics Car Driving Data9

B Feature Extraction: Find patterns in (noisy) features X0 = {x1, . . . ,xn}.

9GPS location data second by second: 200 individual trips of 3 car drivers.

12

Normalized v-a Heatmaps

B Unsupervised Learning: Find patterns in (noisy) features X0 = {x1, . . . ,xn}.

Normalized v-a heatmaps of 8 car drivers in speed bucket [5km/h, 20km/h).

13

Application of K-Means Algorithm for K = 4

14

Conclusions should be here ...

... and your remarks!

B Lecture notes on SSRN preprint server (first draft):

Data Analytics for Non-Life Insurance Pricing, Manuscript ID 2870308.15

machine learning applied to insurance problems › ~wueth › talks › 2017... · boosting machine...

Documents