mortality forecasting using neural networks and an application to cause-specific data for insurance...

14
Copyright © 2008 John Wiley & Sons, Ltd. Mortality Forecasting Using Neural Networks and an Application to Cause- Specific Data for Insurance Purposes PARAS SHAH* AND ALLON GUEZ Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, USA ABSTRACT Mortality forecasting is important for life insurance policies, as well as in other areas. Current techniques for forecasting mortality in the USA involve the use of the Lee–Carter model, which is primarily used without regard to cause. A method for forecasting morality is proposed which involves the use of neural networks. A comparative analysis is done between the Lee–Carter model, linear trend and the proposed method. The results confirm that the use of neural networks performs better than the Lee–Carter and linear trend model within 5% error. Furthermore, mortality rates and life expectancy were formulated for individuals with a specific cause based on prevalence data. The rates are broken down further into respective stages (cancer) based on the individual’s diagno- sis. Therefore, this approach allows life expectancy to be calculated based on an individual’s state of health. Copyright © 2008 John Wiley & Sons, Ltd. key words mortality forecasting; cause-specific mortality; neural networks; Lee–Carter INTRODUCTION AND MOTIVATION Current trends Mortality forecasting is important for life insurance purposes for determining a customer’s ability to acquire life insurance policies. A method is presented in this paper to replace existing methods in calculating mortality rates, which factors in the state of health for the individual (cause-specific data). The method involves the use of neural networks trained on historical data. Any trends seen year to year are maintained by the neural network using a learning algorithm. Generally, mortality rates have decreased over the last several years. These reductions have occurred for two reasons. The first reason is that the reduction of mortality rates at advanced ages contributes less to life expectancy increases than that at lower ages (Wong-Fopuy and Haberman, 2004). The second reason is that mortality improvements arise during advanced ages (Wong-Fopuy and Haberman, 2004). Thus, mortality rates have been constantly underestimated on a year-to-year basis. Life expectancy at birth was seen to increase from 47.7 years in 1900 to 76.6 years in 2000 Journal of Forecasting J. Forecast. 28, 535–548 (2009) Published online 2 December 2008 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/for.1111 * Correspondence to: Paras D Shah, 115 Charter Court, Trevose, PA 19053, USA. E-mail: [email protected]

Upload: paras-shah

Post on 11-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Copyright © 2008 John Wiley & Sons, Ltd.

Mortality Forecasting Using Neural Networks and an Application to Cause-Specifi c Data for Insurance Purposes

PARAS SHAH* AND ALLON GUEZDepartment of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, USA

ABSTRACTMortality forecasting is important for life insurance policies, as well as in other areas. Current techniques for forecasting mortality in the USA involve the use of the Lee–Carter model, which is primarily used without regard to cause. A method for forecasting morality is proposed which involves the use of neural networks. A comparative analysis is done between the Lee–Carter model, linear trend and the proposed method. The results confi rm that the use of neural networks performs better than the Lee–Carter and linear trend model within 5% error. Furthermore, mortality rates and life expectancy were formulated for individuals with a specifi c cause based on prevalence data. The rates are broken down further into respective stages (cancer) based on the individual’s diagno-sis. Therefore, this approach allows life expectancy to be calculated based on an individual’s state of health. Copyright © 2008 John Wiley & Sons, Ltd.

key words mortality forecasting; cause-specifi c mortality; neural networks; Lee–Carter

INTRODUCTION AND MOTIVATION

Current trendsMortality forecasting is important for life insurance purposes for determining a customer’s ability to acquire life insurance policies. A method is presented in this paper to replace existing methods in calculating mortality rates, which factors in the state of health for the individual (cause-specifi c data). The method involves the use of neural networks trained on historical data. Any trends seen year to year are maintained by the neural network using a learning algorithm.

Generally, mortality rates have decreased over the last several years. These reductions have occurred for two reasons. The fi rst reason is that the reduction of mortality rates at advanced ages contributes less to life expectancy increases than that at lower ages (Wong-Fopuy and Haberman, 2004). The second reason is that mortality improvements arise during advanced ages (Wong-Fopuy and Haberman, 2004). Thus, mortality rates have been constantly underestimated on a year-to-year basis. Life expectancy at birth was seen to increase from 47.7 years in 1900 to 76.6 years in 2000

Journal of ForecastingJ. Forecast. 28, 535–548 (2009)Published online 2 December 2008 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/for.1111

* Correspondence to: Paras D Shah, 115 Charter Court, Trevose, PA 19053, USA. E-mail: [email protected]

Page 2: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

536 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

(Wilmoth, 2003). Furthermore, an 81% decrease was seen for mortality rates above 65 years of age (Wilmoth, 2003), which may have occurred as a result of advances in technology and medicine.

State of prior methodsThe Lee–Carter method has been specifi cally developed from US mortality data, as well as from other countries. This method is being used by the Social Security Administration to predict percent-age changes in life expectancy for future years (Wilmoth, 2003) and is being used for all-cause and cause-specifi c mortality data (Girosi and King, 2005). The Lee–Carter model involves modeling mortality rates as mat = aa + balt + eat. The mortality mat denotes the log of the mortality at time t and age group a. The parameters a, b, and l are estimated, while eat is a random disturbance. The death rates are recalculated by taking the logarithm and subtracting the empirical average over time of the age profi le in each age group (Girosi and King, 2005). See the paper by Girosi and King (2005) for model implementation and an example.

The original Lee–Carter method, without any extensions only works when fewer variations exist in log mortality data since the 1D approximation of log-mortality age profi les move along a straight line in RA (Girosi and King, 2005). The model was developed as if the log-mortality data has one principal component (1D) (Girosi and King, 2005), which allows the extrapolation of forecasts from a straight-line approximation. This straight-line approximation is only good when viewing mortality data of age profi les for certain races, countries and causes (Girosi and King, 2005). The model also assumes that age groups whose mortality rates have been declining the fastest will continually decline and receive larger penalties, which depend upon whether penalties are correlated or uncorrelated among age groups. These penalties represent weather patterns and environmental changes but not changes in health-related events (Girosi and King, 2005). Another disadvantage of the Lee–Carter model is its inability to accurately predict over a long-term range. The Lee–Carter model with a random walk with drift is preferred when working over a short time interval (Girosi and King, 2005). It was found that the forecasts produce errors depending on the ages it is used to predict (Lee and Miller, 2000) and under-predicts future gains, where long-term forecasts are made (Lee and Miller, 2000).

Life tables were developed to show the mortality experience over each age group, which allow multiple divisions to occur between genders and race. Mortality tables can be constructed from past data, which are used to forecast future data. Parameters in life tables are calculated from data from the Census, which include death rate, life expectancy, survival rate, number of deaths between age groups, and number of survivors. For more information with regard to the formulas for survival and probability of dying, see Bowers et al. (1986) and Gupta and Varga (2002).

The problems associated with mortality tables are critical in several aspects. The fi rst aspect is that mortality tables generally mask trends in the long run since it is assumed to vary over a short time frame. Further, the data found are usually a few years old and mask any technological advance-ment or any changes in mortality due to living conditions or health. Generally, mortality tables are created for future years by the use of regressional analysis, which involves linear forecasting using trend analysis (Wilmoth, 2003).

Currently, projections for mortality and disability by cause are made but they do not include projec-tions if an individual suffers from a specifi c cause and stage. Murray and Lopez formulated a model to project mortality and disability by cause, which forms regression models for mortality rates for region, smoking, education, and cause-specifi c data. The model allows the projection of mortality based upon socioeconomic variables and its infl uence on future health status (Murray and Lopez, 1997).

Another method that is used in forecasting is the linear decline model, constrained to forecast the Oeppen–Vaupel line of life expectancy. Two assumptions are made, which involves the continuation

Page 3: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 537

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

of the linear trend in female life expectancy into the future (Andreev and Vaupel, 2006). Second, the difference in life expectancy of a country stays constant over time (Andreev and Vaupel, 2006). A general model to forecast life expectancy at birth can be made. See Andreev and Vaupel (2006) for the steps for forecasting life expectancy.

Biosignia, Inc. is a corporation that has formulated methods to calculate diseases. Their patent involves the calculation of the individual’s disease status for a particular disease by applying a multivariate disease prediction equation (Hu and Root, 2000). The contributions of each disease along with their associated factors are calculated and association data representing the disease factors are used in the equation. However, it should be known that this method is used for predicting whether a certain cause will occur.

PurposeThe main reason of this paper is to aid insurance companies to assign contracts to individuals who have certain diseases, to acquire adequate insurance. Insurance companies such as State Farm and Univest life insurance companies base their potential customers and premiums on mortality experience over the years, which are typically done by larger insurance companies as stated by Prudential and State Farm (Univest Actuarial analyst). Smaller companies such as Univest use life expectancy and mortality rates that can be found from the Commissioners 2001 Standard Ordinary Mortality Table. Typical factors that are used and infl uence calculation of premiums include smoking status, cholesterol levels, state of resi-dence, and gender. Hence, cause-specifi c mortality rates are not used in determining mortality rates.

PROPOSED METHOD

Data collection

1. Data was collected regarding the death rates of individuals from U.S.A.(a) Death rates for males and females were used separately.(b) Data from a range of 42 years was used (1960–2002).(c) Death rates were also separated within various intervals (5-year age groups).

2. Data was collected regarding death rates from various causes of people who live in the USA.(a) Death rates for males and females were used.(b) Data from a smaller range was used (1995–2001). The range was because of insuffi cient data

published and recorded for prevalence before 1995.(c) Data for certain diseases were used. Currently, prevalence rates have not been used in cal-

culating survival rates. In this paper a relationship is found using conditional probability, as shown in the following steps.i. Diseases that were chosen were among the leading causes of death.

3. All data taken were from the Human Mortality Database (HMD) and the National Center for Health Statistics (NCHS). HMD data corresponded to death rates that do not include cause of death. NCHS data included cause of death.

4. Data regarding prevalence were also needed for diseases.(a) The prevalence rates were taken depending on 2(c).(b) All prevalence rates were taken from the Centers for Disease Control (CDC, 2003), National

Cancer Institute, the American Heart Association, SEER Cancer Statistics (2006), and Thom et al. (2006).

5. For cancers, the survival rates were converted into death rates for each year at each age by using the relationship

Page 4: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

538 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

q px x= −1 (1)

(a) The survival rates obtained were 0- to 10-year survival rates from the time of diagnosis. Extra survival rates were interpolated after the 10-year period. Interpolation was carried out using trend lines from Excel for each stage and for each cancer.

(b) Four stages were used for cancer. These include: localized, regional, distant, and unstaged.(c) The equation generated for each stage was used in calculating survival rates beyond the 10-

year interval. An R2 range for all cancers of 93.6–99.99% was seen. Figure 1 shows a sample plot of the actual data and function fi t.

(d) Once the function was fi tted to the survival rates (0- to 10-year), the death percentages were calculated by subtracting the survival percentages from one. Then, the death percentages were normalized by scaling the divided death percentage for each individual stage by the sum of the death percentage of all stages.

Diseased individual death rate calculation

1. Data from death rates for diseases in individuals were used from the NCHS and prevalence rates were taken from the CDC. The prevalence rates were approximated between each age interval given using linear approximation, owing to lack of data.

2. Classifi cation for the data was made based on probabilities.(a) P(Bi) was classifi ed as the prevalence rate.(b) P(A) was classifi ed as the probability of dying at age x regardless of cause.(c) P(A and Bi): assume individual has one leading cause.(d) P(ABi) is an unknown quantity.(e) P(B and Cj) is the prevalence of disease and is currently at stage j in the cancer.

3. Using the formula for conditional probability for different stages in cancer yields

P A B CP A B C

P B Ci j

i j

i j

and and and

and ( ) =

( )( )( )

(2)

Figure 1. Plot of the power function fi t to actual data for local stage death probabilities from diagnosis year

Page 5: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 539

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

4. The P(A) was checked using the law of total probability. The formula is given by

′ ( ) = ( ) ( )=∑P A P A B P Bi ii

k

1

(3)

This formula gives the probability of dying at age x regardless of the individual having a disease.

Neural network implementation

1. Matlab Release 14 version 7.0 was used to implement the back propagation algorithm.2. The inputs were defi ned as past or historical data until the target year. The target year is the year

before the forecasted year. Forecasted year is defi ned to be the year the user wishes to forecast up to (any year after the present year (2002)).(a) The data from the past year was used to forecast up to the forecasted year. The target year

is also forecasted in order to estimate the life expectancy of the user-inputted future year for those years which exceed the historical data on hand (i.e., forecasted year is 2006, years from 2003 to 2005 are also forecasted). Hence, the program enables the user to enter in a future year (year the individual wants to be forecasted) and the calculation of life expectancy is made for that future year through forecasting years in the future until the use of indicated year.

(b) The inputs of the data used were either the P(A) or P(ABi and Cj).3. The backpropagation algorithm was implemented in Matlab using the Levenberg–Marquardt

training algorithm. The type of neural networks used was the perceptron and the learning algo-rithm back-propagation algorithm (BPA). The parameters of interest include the number of epochs, maximum error, and number of neurons at the input and output. For the paper, the number of epochs was set to default. The maximum error was set to 1e−10 or when the epoch had concluded for the output layer.

4. Several networks were tested by using past years for which data are known and then estimating the last year for which data was known.

5. Validation was done in order to determine whether the network could forecast known data given when there are historical data. Once the network was constructed, the network with the lowest sum of the errors over an entire age profi le was chosen.

6. The data was split into two divisional ranges. Ages 0–60 were trained by network 1 and age 61 and over were trained by network 2, which allows preservation of trends for older ages. More fl uctuations were seen in ages greater than 61.(a) A separate divisional classifi cation was used when dealing with cancers. The network was

trained for age 0 to the individual’s current age at diagnosis for the cancer. The current age at diagnosis is the individual’s age when diagnosed with cancer. The second network is trained for ages after the diagnosis age.

7. Each disease type and P(A) was passed through the network in order to simulate subsequent years.

8. The user inputs in the Matlab program the data for age, sex, current year (the year that the user is inputting the information), and then if the individual has a diagnosed cause (assume only one or none). The forecasted year is then prompted.

9. The life expectancy of the individual is calculated from the forecasted death rates.

Page 6: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

540 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Calculating life expectancyLife expectancy was calculated in a similar manner to those found in life tables. Two quantities were calculated by this method from the death rates. lx+n, which is the number of survivors during that age interval (5-year age intervals, used owing to the prevalence of statistical data found in 5-year age intervals). The subscript n refers to the age length of the age interval (5-year, e.g. 0, 1–4, 5–9). Lx is the number of survivors in the current age group surviving to future age groups, thereafter.

l l q lx x x= −( ) =− −1 1 01 , 100,000 (4)

L l lx x x= +( )+1 2 (5)

Life expectancy is calculated from the lx and Lx quantities. Lx can be found by solving equation (4) in the future. The life expectancy is given by the following equation:

eT

lT L Nx

x

xx y

y x

N

= = ==∑, , number of age groups (6)

where Tx is the sum of survivors of the current age group surviving to all future age groups, thereafter.

Adjustments made to mortality data for disease case

1. Since only individual diseases are considered, additional death rates were added to take into account other possible causes unrelated to disease, resulting from accidents, unintentional injuries or assault (homicide).

Diseases of interestThe following diseases were incorporated based on leading causes of death found from the CDC:

1. Cancer (malignant neoplasms)(a) Breast cancer(b) Colon cancer(c) Endometrial and uterine cancer(d) Lung cancer(e) Ovarian cancer(f) Prostate cancer(g) Skin cancer

2. Diabetes mellitus3. Diseases of the heart

(a) Hypertensive heart disease(b) Ischemic heart disease

4. Chronic lower respiratory disease(a) Asthma(b) Bronchitis(c) Emphysema

RESULTS

TrainingNetwork training was done to see the performance of forecasting by varying different types of parameters. The proportion of data used for testing was determined by the optimal network. This

Page 7: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 541

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

was in the range of 16.67–33.33%. The total dataset was 6 years data with 86 data points. For example, the input data are 1995–1999 and the target data are 2000–2001. The typical number of hidden layer neurons that were used was fi ve and nine. Output layer neurons were determined by the lowest sum-of-square error when testing for the optimal network, which was done by summing the errors for each age across all ages. The network that consistently provided the lower summed error was chosen. For certain diseases, low sum-of-square error were not seen since there were rapid fl uctuations for the mortality rates. These rapid fl uctuations arise due to the rapid changes in death rates for the disease at certain ages. Hence, another network was chosen, which had the second lowest error.

The error percentages, when split over age groupings, show that for some causes errors are slightly increasing for lower age intervals. Furthermore, this was seen in upper–middle age groups (50–70). However, the higher errors are drastically reduced within 0–5% error when viewed at young to middle age groups (10–45) and older age groups (70+), which were consistent among all causes. The observed errors were bounded between 0 and 10%, which was reasonable. In the table shown in the Appendix the error values were divided by 86, which gives the average summed error over all ages. The average errors are small (0–15% error) and comparisons can be seen to other methods in regard to errors in Figures 2 and 3. These fi gures were produced using an individual diagnosed

Figure 2. Plot of the death rates for the historical year 2001 and the forecasted 2001 using Lee–Carter and neural networks (trained for years 1995–1999; tested year: 2001; hidden layer neuron s = 7; output layer neurons = 1; target year: 2000) for regardless-of-disease for validation testing

Error proposed method: 4.45%Error Lee–Carter: 25.8%Error by trend analysis: 4.37%

Page 8: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

542 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Figure 3. Plot of the death rates for the historical year 2001 and the forecasted 2001 using Lee–Carter and neural networks (trained for years 1995–1999; tested year: 2001; hidden layer neurons = 9; output layer neurons = 1; target year: 2000) for breast cancer for validation testing

Error proposed method: 4.26%Error Lee–Carter: 4.96%Error by trend analysis: 4.00%

at the age of 35 (chosen for example), the year of diagnosis was 2000 and the mortality rates were forecast for year 2001 and have the specifi c cancer listed. The networks have a more diffi cult time forecasting when rapid fl uctuation in mortality arises.

Plots between the Lee–Carter and neural network implementation are shown in Figure 3 for the case regardless of cause. The amount of error is high using the Lee–Carter forecasting model when using historical data up to year 2000. A comparison was made between the forecast death rates and the data for 2001 in Figures 2 and 3. It can be seen that the Lee–Carter model consistently shows a greater error. The errors for the linear trend model were lower for the regardless-of-disease case. In order to see whether neural networks can accurately forecast future years, historical data from 1995 to 1999 were used and the target year was set to 2000. The network with the smallest error was chosen. This network was then used to forecast 2001. The inputs change from 1996 to 2000. The network is applied to yield 2001 death rates. The errors produced were consistent with the above section when fi nding the optimal network. The errors associated with the selected causes can be seen in the Appendix table. It is apparent that the neural networks are able to predict close to the 2001 historical data.

Page 9: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 543

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Forecasting resultsThe results of the regardless of disease case is shown in Figures 7 and 8, with respect to females and males. Forecasts that were constructed were for years 2007 to 2010, using the historical data for years up to 2001, forecasting years 2003–2006 and including them within the historical years for forecasting year 2007. The percent increase or decrease was tabulated from 2001 data. For certain diseases, life expectancy increased over the years, while decreases were seen in others. Figures showing life expectancy plots are shown in Figures 4–8. The inputs to the program consisted of constructing a life expectancy profi le of a 35-year-old male or female diagnosed with a certain cancer. The diagnosis year is important since a different survival rate is associated with each year after diagnosis.

Breast cancer fi gures show that the life expectancy is drastically reduced after the individual is diagnosed with breast cancer, whereas before diagnosis higher life expectancies were seen and these life expectancies are similar to the regardless-of-disease case. The prevalence for this disease is lower in younger ages and higher after the age of 40. Similar results can be seen for the three different stages for breast cancer. Distant breast cancer had lower life expectancies than regional or local. Regional had the second lowest and local had the highest life expectancy. This is shown in Figures 4–6.

Figure 4. Life expectancy profi le for breast cancer for 2001 and 2007–2010 for a female diagnosed at age 35 for stage local

Page 10: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

544 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Figure 5. Life expectancy profi le for breast cancer for 2001 and 2007–2010 for a female diagnosed at age 35 for stage regional

Figure 6. Life expectancy profi le for breast cancer for 2001 and 2007–2010 for a female diagnosed at age 35 for stage distant

Page 11: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 545

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Figure 7. Life expectancies for regardless-of-disease for 2001 and 2007–2010 for females

Figure 8. Life expectancies for regardless-of-disease for 2001 and 2007–2010 for males

Page 12: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

546 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

CONCLUSION

By comparing all three models it has been shown that the proposed method performs close to or better than the current method of mortality forecasting, which is the Lee–Carter model and linear trend. It is hard to compare the results from the disease cases since previously published papers have not calculated life expectancies on the basis of cause, and empirical and theoretical fl aws exist (Wilmoth and Jdanov, 2005). However, comparisons of regardless-of-disease case are made in the following paragraph. The main reason for the inaccuracy of the Lee–Carter model is the assumption that death rates will be linear during the short term rather than the long term. Furthermore, the his-torical data that are used for forecasting are assumed to behave in a similar fashion. Neural networks can adapt for each sex and for each cause by using a learning algorithm utilizing past data. The Lee–Carter model has been shown to underestimate death rates for future projection since it is dependent on historical data. The Lee–Carter model consistently has higher errors forecasted for each disease investigated. This was in the range of 1–15% when compared to historical year 2001. The reason for this high error at younger ages using neural networks and the Lee–Carter model is that there is a consistent fl uctuation of death rates observed for different causes of deaths. Another reason for this high error is the small probability of dying. The probability of dying at younger ages is smaller compared to other ages.

The case for all-cause mortality shows positive trends. Life expectancy is expected to increase in the next several years, as can be seen by the results produced for males and females. The projected life expectancy for the year 2010 was estimated to be 81.06 for females and 76.04 for males at age 0. This is consistent with the Social Security Administration forecasted fi ndings at age 0 for females and males to be 79.95 and 75.4, respectively.

Mortality forecasting via neural networks allows forecasts to be made in the future by forecasting for future years with the use of past data. Life expectancies tabulated for each cause were generally reasonable for each type of cause and stage. This was concluded on the basis of speaking with several oncologists at local hospitals, such as the Thomas Jefferson University Hospital and Warminster Hospital, for the example case of age 35. Also, for certain causes of death, the stage of the diagnosed individual does not determine their life expectancy but rather the prevalence of individuals at a certain stage and their survival rates at that specifi c stage. The former probability causes certain stages of each cancer actually to have lower life expectancy, most likely due to treatments available to individuals.

Limitations occurred due to the insuffi cient amount of data available and released to the public by the government. The stage prevalence rates that were obtained were only from the SEER 9 areas, which include San Francisco, Connecticut, Detroit, Hawaii, Iowa, New Mexico, Seattle, Utah, and Atlanta.

In future work, many more variables could be taken into account in order to predict life expectancy more accurately, such as the breakdown of the current data into different races. This might help in concentrating higher life expectancies to that race where the disease is less prevalent and lower life expectancies to the race with higher death rates. Moreover, life expectancy estimates can be more robust if life expectancy rates are tabulated until age 120 instead of 85 because of the increase in data points. Another addition that could be made is to add more historical years after 2001 in order for the networks to adapt for newer data. The addition of newer data for network training will pre-serve trends for future years. Another future addition that may be made is the addition of more diseases that occur in individuals. Current diseases used were classifi ed as leading causes of death. Forecasting life expectancy can be done by using familial history of diseases and incorporating these risks in terms of an individual’s risk.

Page 13: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

Mortality Forecasting Using Neural Networks 547

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

APPENDIX

Table of selected number of targets and hidden layer neurons as well as output layer neurons for certain causes

Disease classifi cation

Stage classifi cation

Sum-of-square error values (%) for chosen network as well as network that was not chosena

Number of hidden layer

neurons

Number of output layer

neurons

CancerAll-female N/A 342.5 5 1All-male N/A 595.4 2 2Breast Local 310.55 9 1

Regional 235.53 9 1Distant 126.04 9 1Unstaged 135.73 9 2

DiabetesFemale N/A 343.5 5 1Male N/A 342.5 5 1Heart diseaseAll-female N/A 391.2 5 1All-male N/A 405.8 9 2Hypertensive (female) N/A 613.6 8 1

553.8 10 3Hypertensive (male) N/A 6259.773 4 2Ischemic (female) N/A 6159.3 2 1

416.8 3 1Ischemic (male) 6194.2 10 3Respiratory conditionsAll-female N/A 630.5 1 3All-male N/A 1160.2 1 3Asthma (female) N/A 577.3 7 1Asthma (male) N/A 715.5 2 2Bronchitis (female) N/A 1197.8 8 1Bronchitis (male) N/A 583.9 3 1Emphysema (female) N/A 2481.7 6 1

N/A 2470.8 3 1Emphysema (male) N/A 595.6 3 2Regardless of diseaseFemale N/A 593.5 7 1Male N/A 581.3 7 1

a Italicized entries indicate selected network had a low error but was not chosen because of inconsistent errors.

REFERENCES

Andreev KF, Vaupel JW. 2006. Forecasts of cohort mortality after age 50. MPIDR working paper WP-2006-012. Max Planck Institute for Demographic Research, Rostock, Germany.

Bell FC, Miller ML. 2005. Life Tables for the United States Social Security Area 1900–2100. Social Security Administration: Woodlawn, MD.

Bowers NL, Gerber HU, Hickman JC, Jones DA, Nesbitt C. 1986. Actuarial Mathematics (2nd edn). Society of Actuaries: Schaumburg, IL; 45–73.

Centers for Disease Control. 2003. Summary health statistics for US adults: national health interview survey. No. 225, US Department of Health and Human Services: Atlanta, GA.

Page 14: Mortality forecasting using neural networks and an application to cause-specific data for insurance purposes

548 P. Shah and A. Guez

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 535–548 (2009) DOI: 10.1002/for

Daykin CD. 1994. Life Insurance. Practical Risk Theory for Actuaries. Chapman & Hall: London; 409–451.

Girosi F, King G. 2005. A reassessment of the Lee–Carter mortality forecasting method. Draft paper. Harvard University: Cambridge, MA. Available: http://gking.harvard.edu/projects/warpubh.shtml.

Gupta AK, Varga T. 2002. An Introduction to Actuarial Mathematics. Kluwer Academic: Dordrecht; 80–146.

Higgins T. 2003. Mathematics models of mortality. In Workshop on Mortality Modeling and Forecasting, Australian National University.

Hu G, Root M. 2000. System and method for predicting disease onset. Biosignia Inc. US Patent 6110109.Lee R, Miller T. 2000. Assessing the performance of the Lee–Carter approach to modeling and forecasting

mortality. Paper presented at the 2000 Annual Meeting of the Population Association of America. Population Association of America: Los Angeles, CA.

Melnikov A. 2004. Risk Analysis in Finance and Insurance. Chapman & Hall/CRC Press: Boca Raton, FL; 138–239.

Murray C, Lopez A. 1997. Alternative projections of mortality and disability by cause 1990–2020: global burden of disease study. Lancet 349: 1498–1504.

National Center for Health Statistics. 2006. United States Mortality Data. NCHS: Atlanta, GA.Renshaw AE, Haberman S. 2003. Lee–Carter mortality forecasting with age-specifi c enhancement. Insurance:

Mathematics and Economics 33: 255–272.SEER Cancer Statistics. 2006. National Cancer Institute. Available: http://canques.seer.cancer.gov [10 September

2008].Thom T, Haase N, Rosamond W, Howard VJ, Rumsfeld J, Manolio T, Zheng Z-J, Flegal K, O’Donnell C, Kittner

S, Lloyd-Jones D, Goff Jr. DC, Hong Y. 2006. Heart Disease and Stroke Statistics—2006 Update: A Report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Circulation 113: e85–e151 (http://circ.ahajournals.org/).

Wilmoth JR. 2003. Overview and discussion of the social security mortality projections. Technical Panel on Assumptions and Methods, Social Security Advisory Board, Washington, DC.

Wilmoth JR, Jdanov AD, Glei DA. 2005. Methods Protocol for the HMD Version 4. University of California: Berkeley, CA. Available: www.mortality.org.

Wong-Fopuy C, Haberman S. 2004. Projecting mortality trends: recent development in the United Kingdom and the United States. North American Actuarial Journal 8: 56–83.

Authors’ biographies:Paras Shah has graduated from Drexel University and obtained a dual degree for Master’s in Biomedical engi-neering and Bachelor’s in Electrical Engineering in 2006. His research areas were in noise cancellation, fNIR imaging, and mortality forecasting. Currently, he is working as a Patent Examiner with the United States Patent and Trademark Offi ce.

Allon Guez is a Professor at Drexel. University in the Electrical & Computer Engineering Department. His degrees include a Ph.D. E.E. at the University of Florida and a MBA in Finance at Drexel University. His areas of inter-est are in understanding and applying the principles of intelligent decision making, adaptation, optimization, and control demonstrated in biological, social and anthropomorphic systems in the automation, robotics, business and other areas.

Authors’ addresses:Paras Shah, 115 Charter Court, Trevose, PA 19053, USA.Allon Guez, 3141 Chestnut Street, Bossone Room 101, Philadelphia, PA 19104 USA.