research article new cancer stochastic models involving...

20
Hindawi Publishing Corporation ISRN Biomathematics Volume 2013, Article ID 954912, 19 pages http://dx.doi.org/10.1155/2013/954912 Research Article New Cancer Stochastic Models Involving Both Hereditary and Nonhereditary Cancer Cases: A New Approach Wai-Yuan Tan 1 and Hong Zhou 2 1 Department of Mathematical Sciences, e University of Memphis, Memphis, TN 38152, USA 2 Department of Mathematics and Statistics, Arkansas State University, State University, AR 72467, USA Correspondence should be addressed to Wai-Yuan Tan; [email protected] Received 24 August 2012; Accepted 10 October 2012 Academic Editors: T. LaFramboise, K. M. Page, I. Rogozin, and J. M. Starobin Copyright © 2013 W.-Y. Tan and H. Zhou. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers from NCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. ese models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases. 1. Introduction It is universally recognized that each cancer tumor develops through stochastic proliferation and differentiation from a single stem cell which has sustained a series of irreversible genetic and/or epigenetic changes (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]; Zheng [7]). at is, carcinogenesis is a stochastic multistage model with intermediate cells subjecting to stochastic proliferation and differentiation. Fur- thermore, the number of stages and the number of pathways of the carcinogenesis process are significantly influenced by environmental factors underlying the individuals (Tan et al. [4, 5]; Weinberg [6]). Another important observation in human carcinogenesis is that most human cancers cluster around family members. Further, many cancer incidence data (such as SEER data of NCI/NIH, USA) have documented that some cancers develop during pregnancy before birth to give new born babies with cancer at birth. is has been referred to as pediatric cancers. Well-known examples of pediatric cancers include retinoblastoma—a pediatric eye cancer, hepatoblastoma— a pediatric liver cancer, Wilm’s tumor—a pediatric kidney cancer, and medulloblastoma—a pediatric brain tumor. Epi- demiological and clinical studies on oncology have also revealed that inherited cancers are very common in many adult human cancers including lung cancer, colon cancer [8], uveal melanomas (adult eye cancer, [9]), and adult liver cancer (HCC, [10]). Given the above results from cancer biology and human cancer epidemiology, the objective of this paper is to illustrate how to develop stochastic models of carcinogenesis incor- porating these biological and epidemiological observations. Based on these models and cancer incidence data, we will then proceed to develop efficient statistical procedures to estimate unknown parameters in the model, to validate the model, and to predict cancer incidence. In Section 2, we illustrate how to incorporate segregation of cancer genes in multistage stochastic models of carcino- genesis to account for inherited cancer cases. In Section 3, we will develop stochastic equations for the state variables of the model described in Section 2. By using these stochastic equations we will derive probability distributions of the state variables (i.e., the number of intermediate cancer cells) and the probability distribution of time to detectable cancer

Upload: others

Post on 23-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

Hindawi Publishing CorporationISRN BiomathematicsVolume 2013 Article ID 954912 19 pageshttpdxdoiorg1011552013954912

Research ArticleNew Cancer Stochastic Models Involving Both Hereditary andNonhereditary Cancer Cases A New Approach

Wai-Yuan Tan1 and Hong Zhou2

1 Department of Mathematical Sciences The University of Memphis Memphis TN 38152 USA2Department of Mathematics and Statistics Arkansas State University State University AR 72467 USA

Correspondence should be addressed to Wai-Yuan Tan waitanmemphisedu

Received 24 August 2012 Accepted 10 October 2012

Academic Editors T LaFramboise K M Page I Rogozin and J M Starobin

Copyright copy 2013 W-Y Tan and H Zhou This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

To incorporate biologically observed epidemics into multistage models of carcinogenesis in this paper we have developed newstochastic models for human cancers We have further incorporated genetic segregation of cancer genes into these models to derivegeneralized mixture models for cancer incidence Based on these models we have developed a generalized Bayesian approach toestimate the parameters and to predict cancer incidence via Gibbs sampling procedures We have applied these models to fit andanalyze the SEER data of human eye cancers fromNCINIH Our results indicate that the models not only provide a logical avenueto incorporate biological information but also fit the data much better than other models These models would not only providemore insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of futurecancer cases

1 Introduction

It is universally recognized that each cancer tumor developsthrough stochastic proliferation and differentiation from asingle stem cell which has sustained a series of irreversiblegenetic andor epigenetic changes (Little [1] Tan [2 3] Tanet al [4 5] Weinberg [6] Zheng [7]) That is carcinogenesisis a stochastic multistage model with intermediate cellssubjecting to stochastic proliferation and differentiation Fur-thermore the number of stages and the number of pathwaysof the carcinogenesis process are significantly influenced byenvironmental factors underlying the individuals (Tan et al[4 5] Weinberg [6])

Another important observation in human carcinogenesisis that most human cancers cluster around family membersFurther many cancer incidence data (such as SEER data ofNCINIHUSA) have documented that some cancers developduring pregnancy before birth to give new born babieswith cancer at birth This has been referred to as pediatriccancers Well-known examples of pediatric cancers includeretinoblastomamdasha pediatric eye cancer hepatoblastomamdasha pediatric liver cancer Wilmrsquos tumormdasha pediatric kidney

cancer and medulloblastomamdasha pediatric brain tumor Epi-demiological and clinical studies on oncology have alsorevealed that inherited cancers are very common in manyadult human cancers including lung cancer colon cancer[8] uveal melanomas (adult eye cancer [9]) and adult livercancer (HCC [10])

Given the above results from cancer biology and humancancer epidemiology the objective of this paper is to illustratehow to develop stochastic models of carcinogenesis incor-porating these biological and epidemiological observationsBased on these models and cancer incidence data we willthen proceed to develop efficient statistical procedures toestimate unknown parameters in the model to validate themodel and to predict cancer incidence

In Section 2 we illustrate how to incorporate segregationof cancer genes in multistage stochastic models of carcino-genesis to account for inherited cancer cases In Section 3we will develop stochastic equations for the state variables ofthe model described in Section 2 By using these stochasticequations we will derive probability distributions of the statevariables (ie the number of intermediate cancer cells) andthe probability distribution of time to detectable cancer

2 ISRN Biomathematics

Normalepithelium

(RASSF1 BLURobo1Dutt1 FHIT etc)

Hyperplasiametaplasia

Telomeraseexpression

(prevention oftelomere erosion)

(disruption of cellcycle check pointresist to apoptosis)

Dysplasia Carcinoma in situ

K-Ras mutation(activation of

growth signals)

Invasive cancer

VEGFexpression

COX-2expression

Squamous cell carcinoma(NSCLC)

9p21 LOH(p16 p14)

17p13 (p53) LOH(or p53 mutation)

3p LOH 8p LOH

Figure 1 Histopathology lesions and genetic pathway of squamous cell carcinoma of NonSmall Cell Lung Caner (NSCLC)

Tumor

PTENBAP1 orRb1 orDDEF1

GNAQ GNA11CDK4

BCL2 HDM2or others

Chr 3 loss

Cell cycle progression Cell survival Cancer progression Metastasis gain

3p loss

BRCA2 or P16

Figure 2 A Multistage Model of uveal melanoma (adult human eye cancer)

tumors In Section 4 assuming that we have some cancerincidence data such as the SEER data from NCINIH weproceed to develop statistical models for these data fromthese multistage models of carcinogenesis In Section 5 bycombining models in Sections 2ndash4 we proceed to developa generalized Bayesian inference and Gibbs sampling proce-dures to estimate the unknown parameters to validate themodel and to predict cancer incidence As an example ofapplication in Section 6 we proceed to develop a multistagemodel of human eye cancer with inherited cancer casesas described in Figure 2 We will illustrate the model andmethods by analyzing the SEER data of human eye cancerfrom NCINIH Finally in Section 7 we will discuss theusefulness of the model and the methods developed in thispaper and point out some future research directions

2 The Stochastic Multistage Model ofCarcinogenesis with Inherited Cancer Cases

The 119896-stage multistage model of carcinogenesis views car-cinogenesis as the end point of 119896 (119896 ge 2) discrete heritableand irreversible events (mutations genetic changes or epige-netic changes) with intermediate cells subjected to stochasticproliferation and differentiation (Little [1] Tan [2 3] Tan etal [4 5] Weinberg [6]) Let 119873 = 119869

0denote normal stem

cells 119879 the cancer tumors and 119869119894the 119894th stage initiated cells

arising from the (119894 minus 1)th stage initiated cells (119894 = 1 119896)by some genetic andor epigenetic changes Then the modelassumes 119873 rarr 119869

1rarr 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903

with the 119869119894cells subject to stochastic proliferation (birth) and

differentiation (death) Further it assumes that each stem cellproceeds independently of other cells and that cancer tumorsdevelop from primary 119869

119896cells by clonal expansion (stochastic

birth and death) where primary 119869119896cells are 119869

119896cells which

arise directly from 119869119896minus1

cells see Yang and Chen [11]For example Figure 1 is a multistage pathway for the

squamous NSCLC (NonSmall Cell Lung cancer) as proposedby Osada and Takahashi [12] and Wistuba et al [13] Simi-larly Figure 2 is the multistage model for uveal melanomaproposed by Landreville et al [14] and Mensink et al [15]while Figure 3 is the APC-120573-Catenin-Tcf pathway for humancolon cancer (Tan et al [8] Tan and Yan [16])

Remark 1 To develop stochastic multistage models of car-cinogenesis in the literature (Little [1] Tan [2] Zheng [7]) itis conveniently assumed that the 119869

119896cells grow instantaneously

into cancer tumors as soon as they are generated In thiscase the number of tumors is equal to the number of 119869

119896cells

and one may identify 119869119896cells as tumors It follows that the

number of tumors is aMarkov process and that the 119869119896cells are

ISRN Biomathematics 3

Second copySecond copyAPC APC

Second copyAPC

Smad4DCC

Smad4DCC

Second copy

Second copySmad4DCC

Second copySmad4DCC

N

Ras

Ras

DysplasticACF

(a) Sporadic (about 70ndash75)

Carcinomas(b) FAP (familial adenomatous polyps) (about 1)

middot middotmiddot

middotmiddotmiddot

P53

P53 P53

P53

Figure 3 The APC-120573-catenin-Tcf-myc pathway for human colon cancer

transient cells In these cases one needs only to deal with119879(119905)and 119869

119895cells with 119895 = 1 119896 minus 1 However as shown by Yang

and Chen [11] the number of tumors is much smaller thanthe total number of 119869

119896cells Also in many animal models and

in cancer risk assessment of radiation Klebanov et al [17]Yakovlev and Tsodikov [18] and Fakir et al [19] have shownthat 119879(119905) are in general not Markov

To extend the above model to include hereditary cancersobserve that mutants of cancer genes exist in the populationand that both germline cells (egg and sperm) and somaticcells may carrymutant alleles of cancer genes [2 20] Furtherwithout exception every human being develops from theembryo in hisher motherrsquos womb (embryo stage denotetime by 0) where stem cells of different organs divide anddifferentiate to develop different organs respectively (seeWeinberg [6] Chapter 10) If both the egg and the spermgenerating the embryo carry mutant alleles of relevant cancergenes then the individual is an 119869

2-stage person at the embryo

stage if only one of the germ line cells (egg or sperm)generating the embryo carries mutant alleles of cancer genesthen at the embryo stage the individual is an 119869

1-stage person

Similarly the individual is a normal person (119873 = 1198690person)

at the embryo stage if both the egg and the sperm generatingthe embryo do not carry mutant alleles of cancer genesRefer to the person in the population as an 119869

119894(119894 = 0 1 2)

person if heshe is an 119869119894-stage person at the embryo stage

Then with respect to the cancer development in questionpeople in the population can be classified into 3 types ofpeople normal people (119873 = 119869

0people) 119869

1people and 119869

2

people Based on this classification for normal people in thepopulation the stochastic model of carcinogenesis is a 119896-stagemultievent model given by 119869

0rarr 119869

1rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 for 1198691people in the population the stochastic model

of carcinogenesis is a (119896minus1)-stage multievent model given by1198691rarr 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903 and for 119869

2people in the

population the stochasticmodel of carcinogenesis is a (119896minus2)-stage multievent model given by 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903

To account for inherited cancer cases let 1199011be the

proportion of 1198691people in the population and 119901

2the

proportion of 1198692people in the population In general large

human populations under steady-state conditions one maypractically assume that the 119901

119894is a constant independent of

time (Crow and Kimura [21]) Then 1199010= 1 minus 119901

1minus 119901

2(0 lt

1199011+ 119901

2lt 1) is the proportion of normal people (ie119873 = 119869

0

people) in the population Let 119899 be the population size and119899119894(119894 = 0 1 2) the number of 119869

119894people in the population

so that sum2

119906=0119899119906= 119899 Assume that 119899 is very large and that

marriage between people in the population is random withrespect to cancer genes then as shown in Crow and Kimura[21] (see also Tan [22] Chapter 2) the conditional probabilitydistribution of (119899

1 119899

2) given n is 2-dimensional multinomial

with parameters (119899 1199011 119901

2) That is

(1198991 119899

2) | 119899 sim Multinomial (119899 119901

1 119901

2) (1)

To derive probability distribution of time to cancerunder the above model observe that during pregnancy theproliferation rates of all stem cells are quite high Thuswith positive probability 119869

2people in the population may

acquire additional genetic andor epigenetic changes duringpregnancy to become 119869

3-stage people at birth Similarly 119869

1

people may acquire genetic andor epigenetic changes duringpregnancy to become 119869

2people at birth albeit the probability

is very small normal people at the embryo stage may acquiresome genetic andor epigenetic changes during pregnancy tobecome 119869

1people at birth Because the probability of genetic

and epigenetic changes is small one may practically assumethat an 119869

119894(119894 = 0 1 2) person at the embryo stage would

only give rise to 119869119894stem cells and possibly 119869

119894+1stem cells at

birth This is equivalent to assuming that 119869119894people at the

embryos stage would not generate 119869119894+119895

(119895 gt 1) stem cells ator before birth This model is represented schematically inFigure 4 Notice that if 119896 = 2 one may practically assumethat with probability one an 119869

2person at the embryo stage

would develop cancer at or before birth (1199050) If 119896 = 3 then

4 ISRN Biomathematics

Two-stage model

Embryo state

Embryo state

At birth

At birth

Tumor

α1 minus α

( gt 3)tumor ( = 3)

-stage model ( ge 3)

Figure 4 Embryo genotypes and their frequencies at embryo stage and at birth

with probability 120572 (120572 gt 0) an 1198692person at the embryo stage

would develop cancer at or before birth

3 The Stochastic Process ofCarcinogenesis with Hereditary CancerCases and Mathematical Analysis

Because tumors are developed from primary 119869119896cells for the

above stochasticmodel the identifiable response variables are119879(119905) and 119869

119906(119905 119894) 119894 = 0 1 2 119906 = 119894 119894 + 1 119896 minus 1 where

119879(119905) is the number of cancer tumors at time 119905 and 119869119906(119905 119894) is

the number of 119869119906(119906 = 119894 119894 + 1 119896 minus 1) cells at time 119905 in

people who are 119869119894people at the embryo stage (see [3 5 8 23]

Remarks 1 and 2) For people who have genotype 119869119894(119894 =

0 1 2) at the embryo stage the stochastic model of carcino-genesis is then given by the stochastic process

˜119883119894(119905) 119879(119905) 119905 gt

0 where˜119883119894(119905) = 119869

119906(119905 119894) 119906 = 119894 119894 + 1 119896 minus 1

1015840 For theseprocesses in the next subsections we will derive stochasticequations for the state variables (119869

119906(119905 119894)119894 = 0 1 2 119906 =

119894 119896 minus 1) we will also derive the probability distributionsof these state variables and the probabilities of developingcancer tumors These are the basic approaches for modelingcarcinogenesis used by the first author and his associates seeTan [3] Tan et al [4 5 8 23] Tan and Zhou [9] Tan and Yan[16] and Tan and Chen [24 25] and Remark 3

Remark 2 At any time (say 119905) the total number of 119869119896cells

is equal to the total number of 119869119896cells generated from 119869

119896minus1

cells at time 119905 plus the total number of 119869119896cells generated by

cell division from other 119869119896cells at time 119905 the former 119869

119896cells

are referred to as primary 119869119896cells while the latter are not

primary 119869119896cells Since each tumor is developed from a single

primary 119869119896cell through stochastic birth and death process

each primary 119869119896cell will generate atmost one tumor It follows

that at any time the total number of 119869119896cells is considerably

greater than the number of cancer tumors (see also Yangand Chen [11]) Thus for generating cancer tumors the onlyidentifiable state variables are the number of 119869

119895cells with (119895 =

0 1 119896 minus 1) and the number of detectable cancer tumor

Remark 3 To model stochastic multistage models of car-cinogenesis the standard traditional approach is to assumethat the last stage cells (ie the 119869

119896cells in the model 119873 rarr

1198691sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903) grow instantaneously into a cancer

tumor as soon as they are generated and then apply thestandard Markov theory to 119879(119905) and to the state variables

˜119883(119905) = 119869

119894(119905) 119894 = 0 1 119896 minus 1 This approach has been

described in detail in Tan [2] Little [1] and Zheng [7] seealso Luebeck and Moolgavkar [26] and Durrett et al [27]However in some cases the assumption of instantaneousgrowth into cancer tumors of 119869

119896cells may not be realistic

ISRN Biomathematics 5

(Klebanov et al [17] Yakovlev and Tsodikov [18] and Fakiret al [19]) in these cases 119879(119905) is not Markov so that theMarkov theory method is not applicable to 119879(119905) To developanalytical results and to resolve many difficult issues Tan andhis associates [4 5 24] have proposed an alternative approachthrough stochastic equations and have followed Yang andChen [11] to assume that cancer tumors develop by clonalexpansion from primary last stage cells Through probabilitygenerating function method Tan and Chen [24] have shownthat if the Markov theory is applicable to 119879(119905) then thestochastic equation method is equivalent to the classicalMarkov theory method but is more powerful Also throughstochastic equation method we have shown in the Appendixthat the classical approach provides a close approximation todiscrete time model under the assumption that the primarylast stage cells develop into a detectable tumor in one timeunit This provides a reasonable explanation why the tradi-tional approach (see [2 22]) can still work well even thoughthe Markov assumption for 119879(119905) may not hold In this paperwe will thus basically use the stochastic equationmethod andassume that cancer tumors develop from primary last stagecells through clonal expansion

31 Stochastic Equations for the State Variables Assume nowthat an individual is an 119869

119894(119894 = 0 1 2) person at the embryo

stageThen in this individual cancer is developed by a (119896minus119894)-stage multievent model given by 119869

119894rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 and the identifiable response variables are given by˜119883119894(119905)

1015840 119879(119905) To derive stochastic equations for the staging

variables in˜119883119894(119905) in this individual observe that for each

119894 = 0 1 2˜119883119894(119905)

1015840 is in general a Markov Process although119879(119905) may not be Markov see Remark 1 Tan [3] and Tan etal [4 5] Tan and Zhou [9] and Tan and Yan [16] It followsthat

˜119883119894(119905 + Δ119905)

1015840 derive from˜119883119894(119905)

1015840 through stochastic birth-death processes of 119869

119906(119906 = 119894 119894 + 1 119896minus1) cells and through

stochastic transition 119869119906rarr 119869

119906+1 119906 = 119894 119894+1 119896minus1 during

(119905 119905 + Δ119905] Let 119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) be the number of

birth the number of death of 119869119906cells and the number of

transition from 119869119906rarr 119869

119906+1cells during (119905 119905+Δ119905] respectively

in people who are 119869119894people at the embryo stage Let 119872

0(119905)

denote the number of transitions from 119873 rarr 1198691during

(119905 119905 + Δ119905] Because the transition of 119869119906rarr 119869

119906+1would not

affect the number of 119869119906cells but only increase the number of

119869119906+1

cells (see Remark 4) by the conservation law we have thefollowing stochastic equations for 119869

119906(119905 119894) 119906 = 119894 119896minus1 119894 =

0 1 2 (see Tan [3] Tan et al [4 5 8] Tan and Zhou [9] andTan and Yan [16])

119869119894 (119905 + Δ119905 119894) = 119869

119894 (119905 119894) + 119861119894 (119905 119894) minus 119863119894 (119905 119894) 119894 = 0 1 2

119869119906(119905 + Δ119905 119894) = 119869

119906(119905 119894) + 119861

119906(119905 119894) minus 119863

119906(119905 119894)

+ 119872119906minus1 (119905 119894) 119894 lt 119906 le 119896 minus 1

(2)

Because 119861119907(119905 119894) 119863

119907(119905 119894)119872

119907(119905 119894) 119894 = 0 1 2 119907 = 119894 119896minus

1 are random variables the above equations are basicallystochastic equations To derive probability distributions ofthese variables let 119887

119906(119905) and 119889

119906(119905) denote the birth rate and

the death rate at time 119905 of the 119869119906(119906 = 0 1 119896 minus 1) cells

respectively Let 120573119906(119905) (119906 = 0 1 119896 minus 1) be the transition

rate at time 119905 from 119869119906rarr 119869

119906+1 Then as shown in Tan [3] for

(119894 = 0 1 2 119906 = 119894 119896 minus 1) we have to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) | 119869

119906(119905 119894)

sim Multinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905 120573119906 (119905) Δ119905

(3)

It follows that to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) | 119869

119906(119905 119894)

simMultinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905

119872119906(119905 119894) | 119869

119906(119905 119894)

sim Binomial 119869119906(119905 119894) 120573

119906(119905) Δ119905

sim Poisson 119869119906(119905 119894) 120573

119906(119905) Δ119905 + 119900 (120573

119895(119905) Δ119905)

independently of 119861119906(119905 119894) 119863

119906(119905 119894)

119906 = 0 1 119896 minus 1

(4)

From these distribution results by subtracting from therandom transition variables its conditional means respec-tively we obtain the following stochastic equations for thestate variables

˜119883119894(119905) (119894 = 0 1 Min(2 119896 minus 1))

119889119869119894(119905 119894) = 119869

119894(119905 + Δ119905 119894) minus 119869

119894(119905 119894) = 119861

119894(119905 119894) minus 119863

119894(119905 119894)

= 119869119894(119905 119894) 120574

119894(119905) Δ119905 + 119890

119894(119905 119894) Δ119905 119894 = 0 1 2

119889119869119906 (119905 119894) = 119869

119906 (119905 + Δ119905 119894) minus 119869119906 (119905 119894) = 119872119906minus1 (119905 119894)

+ 119861119906(119905 119894) minus 119863

119906(119905 119894)

= 119869119906minus1 (119905 119894) 120573119906minus1 (119905)

+119869119906(119905 119894) 120574

119906(119905) Δ119905 + 119890

119906(119905 119894) Δ119905

119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(5)

where 120574119906(119905) = 119887

119906(119905) minus 119889

119906(119905) for 119906 = 0 1 119896 minus 1 and where

119890119894(119905 119894)Δ119905 = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)Δ119905] minus [119863

119894(119905 119894) minus 119869

119894(119905 119894)119889

119894(119905)Δ119905]

for 119894 = 0 1 2 119890119906(119905 119894)Δ119905 = [119861

119906(119905 119894) minus 119869

119906(119905 119894)119887

119906(119905)Δ119905] minus

[119863119906(119905 119894) minus 119869

119906(119905 119894)119889

119906(119905)Δ119905] + [119872

119906minus1(119905 119894) minus 119869

119906minus1(119905 119894)120573

119906minus1(119905)Δ119905]

for 119894 = 0 1 Min(2 119896 minus 1) 119894 lt 119906 le 119896 minus 1From the above equations by dividing both sides by Δ119905

and letting Δ119905 rarr 0 we obtain

119869119894(119905 119894)

119889119905= 119869

119894(119905 119894) 120574

119894(119905) + 119890

119894(119905 119894) 119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894)

119889119905= 119869

119906 (119905 119894) 120574119906 (119905) + 119869119906minus1 (119905 119894) 120573119906minus1 (119905) + 119890119906 (119905 119894)

for 119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(6)

In the above equations using the distribution results in(4) it can easily be shown that the randomnoises 119890

119906(119905 119894) 119906 =

119894 119896 minus 1 119894 = 0 1 2 have expected value zero and are

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

2 ISRN Biomathematics

Normalepithelium

(RASSF1 BLURobo1Dutt1 FHIT etc)

Hyperplasiametaplasia

Telomeraseexpression

(prevention oftelomere erosion)

(disruption of cellcycle check pointresist to apoptosis)

Dysplasia Carcinoma in situ

K-Ras mutation(activation of

growth signals)

Invasive cancer

VEGFexpression

COX-2expression

Squamous cell carcinoma(NSCLC)

9p21 LOH(p16 p14)

17p13 (p53) LOH(or p53 mutation)

3p LOH 8p LOH

Figure 1 Histopathology lesions and genetic pathway of squamous cell carcinoma of NonSmall Cell Lung Caner (NSCLC)

Tumor

PTENBAP1 orRb1 orDDEF1

GNAQ GNA11CDK4

BCL2 HDM2or others

Chr 3 loss

Cell cycle progression Cell survival Cancer progression Metastasis gain

3p loss

BRCA2 or P16

Figure 2 A Multistage Model of uveal melanoma (adult human eye cancer)

tumors In Section 4 assuming that we have some cancerincidence data such as the SEER data from NCINIH weproceed to develop statistical models for these data fromthese multistage models of carcinogenesis In Section 5 bycombining models in Sections 2ndash4 we proceed to developa generalized Bayesian inference and Gibbs sampling proce-dures to estimate the unknown parameters to validate themodel and to predict cancer incidence As an example ofapplication in Section 6 we proceed to develop a multistagemodel of human eye cancer with inherited cancer casesas described in Figure 2 We will illustrate the model andmethods by analyzing the SEER data of human eye cancerfrom NCINIH Finally in Section 7 we will discuss theusefulness of the model and the methods developed in thispaper and point out some future research directions

2 The Stochastic Multistage Model ofCarcinogenesis with Inherited Cancer Cases

The 119896-stage multistage model of carcinogenesis views car-cinogenesis as the end point of 119896 (119896 ge 2) discrete heritableand irreversible events (mutations genetic changes or epige-netic changes) with intermediate cells subjected to stochasticproliferation and differentiation (Little [1] Tan [2 3] Tan etal [4 5] Weinberg [6]) Let 119873 = 119869

0denote normal stem

cells 119879 the cancer tumors and 119869119894the 119894th stage initiated cells

arising from the (119894 minus 1)th stage initiated cells (119894 = 1 119896)by some genetic andor epigenetic changes Then the modelassumes 119873 rarr 119869

1rarr 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903

with the 119869119894cells subject to stochastic proliferation (birth) and

differentiation (death) Further it assumes that each stem cellproceeds independently of other cells and that cancer tumorsdevelop from primary 119869

119896cells by clonal expansion (stochastic

birth and death) where primary 119869119896cells are 119869

119896cells which

arise directly from 119869119896minus1

cells see Yang and Chen [11]For example Figure 1 is a multistage pathway for the

squamous NSCLC (NonSmall Cell Lung cancer) as proposedby Osada and Takahashi [12] and Wistuba et al [13] Simi-larly Figure 2 is the multistage model for uveal melanomaproposed by Landreville et al [14] and Mensink et al [15]while Figure 3 is the APC-120573-Catenin-Tcf pathway for humancolon cancer (Tan et al [8] Tan and Yan [16])

Remark 1 To develop stochastic multistage models of car-cinogenesis in the literature (Little [1] Tan [2] Zheng [7]) itis conveniently assumed that the 119869

119896cells grow instantaneously

into cancer tumors as soon as they are generated In thiscase the number of tumors is equal to the number of 119869

119896cells

and one may identify 119869119896cells as tumors It follows that the

number of tumors is aMarkov process and that the 119869119896cells are

ISRN Biomathematics 3

Second copySecond copyAPC APC

Second copyAPC

Smad4DCC

Smad4DCC

Second copy

Second copySmad4DCC

Second copySmad4DCC

N

Ras

Ras

DysplasticACF

(a) Sporadic (about 70ndash75)

Carcinomas(b) FAP (familial adenomatous polyps) (about 1)

middot middotmiddot

middotmiddotmiddot

P53

P53 P53

P53

Figure 3 The APC-120573-catenin-Tcf-myc pathway for human colon cancer

transient cells In these cases one needs only to deal with119879(119905)and 119869

119895cells with 119895 = 1 119896 minus 1 However as shown by Yang

and Chen [11] the number of tumors is much smaller thanthe total number of 119869

119896cells Also in many animal models and

in cancer risk assessment of radiation Klebanov et al [17]Yakovlev and Tsodikov [18] and Fakir et al [19] have shownthat 119879(119905) are in general not Markov

To extend the above model to include hereditary cancersobserve that mutants of cancer genes exist in the populationand that both germline cells (egg and sperm) and somaticcells may carrymutant alleles of cancer genes [2 20] Furtherwithout exception every human being develops from theembryo in hisher motherrsquos womb (embryo stage denotetime by 0) where stem cells of different organs divide anddifferentiate to develop different organs respectively (seeWeinberg [6] Chapter 10) If both the egg and the spermgenerating the embryo carry mutant alleles of relevant cancergenes then the individual is an 119869

2-stage person at the embryo

stage if only one of the germ line cells (egg or sperm)generating the embryo carries mutant alleles of cancer genesthen at the embryo stage the individual is an 119869

1-stage person

Similarly the individual is a normal person (119873 = 1198690person)

at the embryo stage if both the egg and the sperm generatingthe embryo do not carry mutant alleles of cancer genesRefer to the person in the population as an 119869

119894(119894 = 0 1 2)

person if heshe is an 119869119894-stage person at the embryo stage

Then with respect to the cancer development in questionpeople in the population can be classified into 3 types ofpeople normal people (119873 = 119869

0people) 119869

1people and 119869

2

people Based on this classification for normal people in thepopulation the stochastic model of carcinogenesis is a 119896-stagemultievent model given by 119869

0rarr 119869

1rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 for 1198691people in the population the stochastic model

of carcinogenesis is a (119896minus1)-stage multievent model given by1198691rarr 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903 and for 119869

2people in the

population the stochasticmodel of carcinogenesis is a (119896minus2)-stage multievent model given by 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903

To account for inherited cancer cases let 1199011be the

proportion of 1198691people in the population and 119901

2the

proportion of 1198692people in the population In general large

human populations under steady-state conditions one maypractically assume that the 119901

119894is a constant independent of

time (Crow and Kimura [21]) Then 1199010= 1 minus 119901

1minus 119901

2(0 lt

1199011+ 119901

2lt 1) is the proportion of normal people (ie119873 = 119869

0

people) in the population Let 119899 be the population size and119899119894(119894 = 0 1 2) the number of 119869

119894people in the population

so that sum2

119906=0119899119906= 119899 Assume that 119899 is very large and that

marriage between people in the population is random withrespect to cancer genes then as shown in Crow and Kimura[21] (see also Tan [22] Chapter 2) the conditional probabilitydistribution of (119899

1 119899

2) given n is 2-dimensional multinomial

with parameters (119899 1199011 119901

2) That is

(1198991 119899

2) | 119899 sim Multinomial (119899 119901

1 119901

2) (1)

To derive probability distribution of time to cancerunder the above model observe that during pregnancy theproliferation rates of all stem cells are quite high Thuswith positive probability 119869

2people in the population may

acquire additional genetic andor epigenetic changes duringpregnancy to become 119869

3-stage people at birth Similarly 119869

1

people may acquire genetic andor epigenetic changes duringpregnancy to become 119869

2people at birth albeit the probability

is very small normal people at the embryo stage may acquiresome genetic andor epigenetic changes during pregnancy tobecome 119869

1people at birth Because the probability of genetic

and epigenetic changes is small one may practically assumethat an 119869

119894(119894 = 0 1 2) person at the embryo stage would

only give rise to 119869119894stem cells and possibly 119869

119894+1stem cells at

birth This is equivalent to assuming that 119869119894people at the

embryos stage would not generate 119869119894+119895

(119895 gt 1) stem cells ator before birth This model is represented schematically inFigure 4 Notice that if 119896 = 2 one may practically assumethat with probability one an 119869

2person at the embryo stage

would develop cancer at or before birth (1199050) If 119896 = 3 then

4 ISRN Biomathematics

Two-stage model

Embryo state

Embryo state

At birth

At birth

Tumor

α1 minus α

( gt 3)tumor ( = 3)

-stage model ( ge 3)

Figure 4 Embryo genotypes and their frequencies at embryo stage and at birth

with probability 120572 (120572 gt 0) an 1198692person at the embryo stage

would develop cancer at or before birth

3 The Stochastic Process ofCarcinogenesis with Hereditary CancerCases and Mathematical Analysis

Because tumors are developed from primary 119869119896cells for the

above stochasticmodel the identifiable response variables are119879(119905) and 119869

119906(119905 119894) 119894 = 0 1 2 119906 = 119894 119894 + 1 119896 minus 1 where

119879(119905) is the number of cancer tumors at time 119905 and 119869119906(119905 119894) is

the number of 119869119906(119906 = 119894 119894 + 1 119896 minus 1) cells at time 119905 in

people who are 119869119894people at the embryo stage (see [3 5 8 23]

Remarks 1 and 2) For people who have genotype 119869119894(119894 =

0 1 2) at the embryo stage the stochastic model of carcino-genesis is then given by the stochastic process

˜119883119894(119905) 119879(119905) 119905 gt

0 where˜119883119894(119905) = 119869

119906(119905 119894) 119906 = 119894 119894 + 1 119896 minus 1

1015840 For theseprocesses in the next subsections we will derive stochasticequations for the state variables (119869

119906(119905 119894)119894 = 0 1 2 119906 =

119894 119896 minus 1) we will also derive the probability distributionsof these state variables and the probabilities of developingcancer tumors These are the basic approaches for modelingcarcinogenesis used by the first author and his associates seeTan [3] Tan et al [4 5 8 23] Tan and Zhou [9] Tan and Yan[16] and Tan and Chen [24 25] and Remark 3

Remark 2 At any time (say 119905) the total number of 119869119896cells

is equal to the total number of 119869119896cells generated from 119869

119896minus1

cells at time 119905 plus the total number of 119869119896cells generated by

cell division from other 119869119896cells at time 119905 the former 119869

119896cells

are referred to as primary 119869119896cells while the latter are not

primary 119869119896cells Since each tumor is developed from a single

primary 119869119896cell through stochastic birth and death process

each primary 119869119896cell will generate atmost one tumor It follows

that at any time the total number of 119869119896cells is considerably

greater than the number of cancer tumors (see also Yangand Chen [11]) Thus for generating cancer tumors the onlyidentifiable state variables are the number of 119869

119895cells with (119895 =

0 1 119896 minus 1) and the number of detectable cancer tumor

Remark 3 To model stochastic multistage models of car-cinogenesis the standard traditional approach is to assumethat the last stage cells (ie the 119869

119896cells in the model 119873 rarr

1198691sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903) grow instantaneously into a cancer

tumor as soon as they are generated and then apply thestandard Markov theory to 119879(119905) and to the state variables

˜119883(119905) = 119869

119894(119905) 119894 = 0 1 119896 minus 1 This approach has been

described in detail in Tan [2] Little [1] and Zheng [7] seealso Luebeck and Moolgavkar [26] and Durrett et al [27]However in some cases the assumption of instantaneousgrowth into cancer tumors of 119869

119896cells may not be realistic

ISRN Biomathematics 5

(Klebanov et al [17] Yakovlev and Tsodikov [18] and Fakiret al [19]) in these cases 119879(119905) is not Markov so that theMarkov theory method is not applicable to 119879(119905) To developanalytical results and to resolve many difficult issues Tan andhis associates [4 5 24] have proposed an alternative approachthrough stochastic equations and have followed Yang andChen [11] to assume that cancer tumors develop by clonalexpansion from primary last stage cells Through probabilitygenerating function method Tan and Chen [24] have shownthat if the Markov theory is applicable to 119879(119905) then thestochastic equation method is equivalent to the classicalMarkov theory method but is more powerful Also throughstochastic equation method we have shown in the Appendixthat the classical approach provides a close approximation todiscrete time model under the assumption that the primarylast stage cells develop into a detectable tumor in one timeunit This provides a reasonable explanation why the tradi-tional approach (see [2 22]) can still work well even thoughthe Markov assumption for 119879(119905) may not hold In this paperwe will thus basically use the stochastic equationmethod andassume that cancer tumors develop from primary last stagecells through clonal expansion

31 Stochastic Equations for the State Variables Assume nowthat an individual is an 119869

119894(119894 = 0 1 2) person at the embryo

stageThen in this individual cancer is developed by a (119896minus119894)-stage multievent model given by 119869

119894rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 and the identifiable response variables are given by˜119883119894(119905)

1015840 119879(119905) To derive stochastic equations for the staging

variables in˜119883119894(119905) in this individual observe that for each

119894 = 0 1 2˜119883119894(119905)

1015840 is in general a Markov Process although119879(119905) may not be Markov see Remark 1 Tan [3] and Tan etal [4 5] Tan and Zhou [9] and Tan and Yan [16] It followsthat

˜119883119894(119905 + Δ119905)

1015840 derive from˜119883119894(119905)

1015840 through stochastic birth-death processes of 119869

119906(119906 = 119894 119894 + 1 119896minus1) cells and through

stochastic transition 119869119906rarr 119869

119906+1 119906 = 119894 119894+1 119896minus1 during

(119905 119905 + Δ119905] Let 119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) be the number of

birth the number of death of 119869119906cells and the number of

transition from 119869119906rarr 119869

119906+1cells during (119905 119905+Δ119905] respectively

in people who are 119869119894people at the embryo stage Let 119872

0(119905)

denote the number of transitions from 119873 rarr 1198691during

(119905 119905 + Δ119905] Because the transition of 119869119906rarr 119869

119906+1would not

affect the number of 119869119906cells but only increase the number of

119869119906+1

cells (see Remark 4) by the conservation law we have thefollowing stochastic equations for 119869

119906(119905 119894) 119906 = 119894 119896minus1 119894 =

0 1 2 (see Tan [3] Tan et al [4 5 8] Tan and Zhou [9] andTan and Yan [16])

119869119894 (119905 + Δ119905 119894) = 119869

119894 (119905 119894) + 119861119894 (119905 119894) minus 119863119894 (119905 119894) 119894 = 0 1 2

119869119906(119905 + Δ119905 119894) = 119869

119906(119905 119894) + 119861

119906(119905 119894) minus 119863

119906(119905 119894)

+ 119872119906minus1 (119905 119894) 119894 lt 119906 le 119896 minus 1

(2)

Because 119861119907(119905 119894) 119863

119907(119905 119894)119872

119907(119905 119894) 119894 = 0 1 2 119907 = 119894 119896minus

1 are random variables the above equations are basicallystochastic equations To derive probability distributions ofthese variables let 119887

119906(119905) and 119889

119906(119905) denote the birth rate and

the death rate at time 119905 of the 119869119906(119906 = 0 1 119896 minus 1) cells

respectively Let 120573119906(119905) (119906 = 0 1 119896 minus 1) be the transition

rate at time 119905 from 119869119906rarr 119869

119906+1 Then as shown in Tan [3] for

(119894 = 0 1 2 119906 = 119894 119896 minus 1) we have to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) | 119869

119906(119905 119894)

sim Multinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905 120573119906 (119905) Δ119905

(3)

It follows that to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) | 119869

119906(119905 119894)

simMultinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905

119872119906(119905 119894) | 119869

119906(119905 119894)

sim Binomial 119869119906(119905 119894) 120573

119906(119905) Δ119905

sim Poisson 119869119906(119905 119894) 120573

119906(119905) Δ119905 + 119900 (120573

119895(119905) Δ119905)

independently of 119861119906(119905 119894) 119863

119906(119905 119894)

119906 = 0 1 119896 minus 1

(4)

From these distribution results by subtracting from therandom transition variables its conditional means respec-tively we obtain the following stochastic equations for thestate variables

˜119883119894(119905) (119894 = 0 1 Min(2 119896 minus 1))

119889119869119894(119905 119894) = 119869

119894(119905 + Δ119905 119894) minus 119869

119894(119905 119894) = 119861

119894(119905 119894) minus 119863

119894(119905 119894)

= 119869119894(119905 119894) 120574

119894(119905) Δ119905 + 119890

119894(119905 119894) Δ119905 119894 = 0 1 2

119889119869119906 (119905 119894) = 119869

119906 (119905 + Δ119905 119894) minus 119869119906 (119905 119894) = 119872119906minus1 (119905 119894)

+ 119861119906(119905 119894) minus 119863

119906(119905 119894)

= 119869119906minus1 (119905 119894) 120573119906minus1 (119905)

+119869119906(119905 119894) 120574

119906(119905) Δ119905 + 119890

119906(119905 119894) Δ119905

119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(5)

where 120574119906(119905) = 119887

119906(119905) minus 119889

119906(119905) for 119906 = 0 1 119896 minus 1 and where

119890119894(119905 119894)Δ119905 = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)Δ119905] minus [119863

119894(119905 119894) minus 119869

119894(119905 119894)119889

119894(119905)Δ119905]

for 119894 = 0 1 2 119890119906(119905 119894)Δ119905 = [119861

119906(119905 119894) minus 119869

119906(119905 119894)119887

119906(119905)Δ119905] minus

[119863119906(119905 119894) minus 119869

119906(119905 119894)119889

119906(119905)Δ119905] + [119872

119906minus1(119905 119894) minus 119869

119906minus1(119905 119894)120573

119906minus1(119905)Δ119905]

for 119894 = 0 1 Min(2 119896 minus 1) 119894 lt 119906 le 119896 minus 1From the above equations by dividing both sides by Δ119905

and letting Δ119905 rarr 0 we obtain

119869119894(119905 119894)

119889119905= 119869

119894(119905 119894) 120574

119894(119905) + 119890

119894(119905 119894) 119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894)

119889119905= 119869

119906 (119905 119894) 120574119906 (119905) + 119869119906minus1 (119905 119894) 120573119906minus1 (119905) + 119890119906 (119905 119894)

for 119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(6)

In the above equations using the distribution results in(4) it can easily be shown that the randomnoises 119890

119906(119905 119894) 119906 =

119894 119896 minus 1 119894 = 0 1 2 have expected value zero and are

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 3

Second copySecond copyAPC APC

Second copyAPC

Smad4DCC

Smad4DCC

Second copy

Second copySmad4DCC

Second copySmad4DCC

N

Ras

Ras

DysplasticACF

(a) Sporadic (about 70ndash75)

Carcinomas(b) FAP (familial adenomatous polyps) (about 1)

middot middotmiddot

middotmiddotmiddot

P53

P53 P53

P53

Figure 3 The APC-120573-catenin-Tcf-myc pathway for human colon cancer

transient cells In these cases one needs only to deal with119879(119905)and 119869

119895cells with 119895 = 1 119896 minus 1 However as shown by Yang

and Chen [11] the number of tumors is much smaller thanthe total number of 119869

119896cells Also in many animal models and

in cancer risk assessment of radiation Klebanov et al [17]Yakovlev and Tsodikov [18] and Fakir et al [19] have shownthat 119879(119905) are in general not Markov

To extend the above model to include hereditary cancersobserve that mutants of cancer genes exist in the populationand that both germline cells (egg and sperm) and somaticcells may carrymutant alleles of cancer genes [2 20] Furtherwithout exception every human being develops from theembryo in hisher motherrsquos womb (embryo stage denotetime by 0) where stem cells of different organs divide anddifferentiate to develop different organs respectively (seeWeinberg [6] Chapter 10) If both the egg and the spermgenerating the embryo carry mutant alleles of relevant cancergenes then the individual is an 119869

2-stage person at the embryo

stage if only one of the germ line cells (egg or sperm)generating the embryo carries mutant alleles of cancer genesthen at the embryo stage the individual is an 119869

1-stage person

Similarly the individual is a normal person (119873 = 1198690person)

at the embryo stage if both the egg and the sperm generatingthe embryo do not carry mutant alleles of cancer genesRefer to the person in the population as an 119869

119894(119894 = 0 1 2)

person if heshe is an 119869119894-stage person at the embryo stage

Then with respect to the cancer development in questionpeople in the population can be classified into 3 types ofpeople normal people (119873 = 119869

0people) 119869

1people and 119869

2

people Based on this classification for normal people in thepopulation the stochastic model of carcinogenesis is a 119896-stagemultievent model given by 119869

0rarr 119869

1rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 for 1198691people in the population the stochastic model

of carcinogenesis is a (119896minus1)-stage multievent model given by1198691rarr 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903 and for 119869

2people in the

population the stochasticmodel of carcinogenesis is a (119896minus2)-stage multievent model given by 119869

2rarr sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903

To account for inherited cancer cases let 1199011be the

proportion of 1198691people in the population and 119901

2the

proportion of 1198692people in the population In general large

human populations under steady-state conditions one maypractically assume that the 119901

119894is a constant independent of

time (Crow and Kimura [21]) Then 1199010= 1 minus 119901

1minus 119901

2(0 lt

1199011+ 119901

2lt 1) is the proportion of normal people (ie119873 = 119869

0

people) in the population Let 119899 be the population size and119899119894(119894 = 0 1 2) the number of 119869

119894people in the population

so that sum2

119906=0119899119906= 119899 Assume that 119899 is very large and that

marriage between people in the population is random withrespect to cancer genes then as shown in Crow and Kimura[21] (see also Tan [22] Chapter 2) the conditional probabilitydistribution of (119899

1 119899

2) given n is 2-dimensional multinomial

with parameters (119899 1199011 119901

2) That is

(1198991 119899

2) | 119899 sim Multinomial (119899 119901

1 119901

2) (1)

To derive probability distribution of time to cancerunder the above model observe that during pregnancy theproliferation rates of all stem cells are quite high Thuswith positive probability 119869

2people in the population may

acquire additional genetic andor epigenetic changes duringpregnancy to become 119869

3-stage people at birth Similarly 119869

1

people may acquire genetic andor epigenetic changes duringpregnancy to become 119869

2people at birth albeit the probability

is very small normal people at the embryo stage may acquiresome genetic andor epigenetic changes during pregnancy tobecome 119869

1people at birth Because the probability of genetic

and epigenetic changes is small one may practically assumethat an 119869

119894(119894 = 0 1 2) person at the embryo stage would

only give rise to 119869119894stem cells and possibly 119869

119894+1stem cells at

birth This is equivalent to assuming that 119869119894people at the

embryos stage would not generate 119869119894+119895

(119895 gt 1) stem cells ator before birth This model is represented schematically inFigure 4 Notice that if 119896 = 2 one may practically assumethat with probability one an 119869

2person at the embryo stage

would develop cancer at or before birth (1199050) If 119896 = 3 then

4 ISRN Biomathematics

Two-stage model

Embryo state

Embryo state

At birth

At birth

Tumor

α1 minus α

( gt 3)tumor ( = 3)

-stage model ( ge 3)

Figure 4 Embryo genotypes and their frequencies at embryo stage and at birth

with probability 120572 (120572 gt 0) an 1198692person at the embryo stage

would develop cancer at or before birth

3 The Stochastic Process ofCarcinogenesis with Hereditary CancerCases and Mathematical Analysis

Because tumors are developed from primary 119869119896cells for the

above stochasticmodel the identifiable response variables are119879(119905) and 119869

119906(119905 119894) 119894 = 0 1 2 119906 = 119894 119894 + 1 119896 minus 1 where

119879(119905) is the number of cancer tumors at time 119905 and 119869119906(119905 119894) is

the number of 119869119906(119906 = 119894 119894 + 1 119896 minus 1) cells at time 119905 in

people who are 119869119894people at the embryo stage (see [3 5 8 23]

Remarks 1 and 2) For people who have genotype 119869119894(119894 =

0 1 2) at the embryo stage the stochastic model of carcino-genesis is then given by the stochastic process

˜119883119894(119905) 119879(119905) 119905 gt

0 where˜119883119894(119905) = 119869

119906(119905 119894) 119906 = 119894 119894 + 1 119896 minus 1

1015840 For theseprocesses in the next subsections we will derive stochasticequations for the state variables (119869

119906(119905 119894)119894 = 0 1 2 119906 =

119894 119896 minus 1) we will also derive the probability distributionsof these state variables and the probabilities of developingcancer tumors These are the basic approaches for modelingcarcinogenesis used by the first author and his associates seeTan [3] Tan et al [4 5 8 23] Tan and Zhou [9] Tan and Yan[16] and Tan and Chen [24 25] and Remark 3

Remark 2 At any time (say 119905) the total number of 119869119896cells

is equal to the total number of 119869119896cells generated from 119869

119896minus1

cells at time 119905 plus the total number of 119869119896cells generated by

cell division from other 119869119896cells at time 119905 the former 119869

119896cells

are referred to as primary 119869119896cells while the latter are not

primary 119869119896cells Since each tumor is developed from a single

primary 119869119896cell through stochastic birth and death process

each primary 119869119896cell will generate atmost one tumor It follows

that at any time the total number of 119869119896cells is considerably

greater than the number of cancer tumors (see also Yangand Chen [11]) Thus for generating cancer tumors the onlyidentifiable state variables are the number of 119869

119895cells with (119895 =

0 1 119896 minus 1) and the number of detectable cancer tumor

Remark 3 To model stochastic multistage models of car-cinogenesis the standard traditional approach is to assumethat the last stage cells (ie the 119869

119896cells in the model 119873 rarr

1198691sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903) grow instantaneously into a cancer

tumor as soon as they are generated and then apply thestandard Markov theory to 119879(119905) and to the state variables

˜119883(119905) = 119869

119894(119905) 119894 = 0 1 119896 minus 1 This approach has been

described in detail in Tan [2] Little [1] and Zheng [7] seealso Luebeck and Moolgavkar [26] and Durrett et al [27]However in some cases the assumption of instantaneousgrowth into cancer tumors of 119869

119896cells may not be realistic

ISRN Biomathematics 5

(Klebanov et al [17] Yakovlev and Tsodikov [18] and Fakiret al [19]) in these cases 119879(119905) is not Markov so that theMarkov theory method is not applicable to 119879(119905) To developanalytical results and to resolve many difficult issues Tan andhis associates [4 5 24] have proposed an alternative approachthrough stochastic equations and have followed Yang andChen [11] to assume that cancer tumors develop by clonalexpansion from primary last stage cells Through probabilitygenerating function method Tan and Chen [24] have shownthat if the Markov theory is applicable to 119879(119905) then thestochastic equation method is equivalent to the classicalMarkov theory method but is more powerful Also throughstochastic equation method we have shown in the Appendixthat the classical approach provides a close approximation todiscrete time model under the assumption that the primarylast stage cells develop into a detectable tumor in one timeunit This provides a reasonable explanation why the tradi-tional approach (see [2 22]) can still work well even thoughthe Markov assumption for 119879(119905) may not hold In this paperwe will thus basically use the stochastic equationmethod andassume that cancer tumors develop from primary last stagecells through clonal expansion

31 Stochastic Equations for the State Variables Assume nowthat an individual is an 119869

119894(119894 = 0 1 2) person at the embryo

stageThen in this individual cancer is developed by a (119896minus119894)-stage multievent model given by 119869

119894rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 and the identifiable response variables are given by˜119883119894(119905)

1015840 119879(119905) To derive stochastic equations for the staging

variables in˜119883119894(119905) in this individual observe that for each

119894 = 0 1 2˜119883119894(119905)

1015840 is in general a Markov Process although119879(119905) may not be Markov see Remark 1 Tan [3] and Tan etal [4 5] Tan and Zhou [9] and Tan and Yan [16] It followsthat

˜119883119894(119905 + Δ119905)

1015840 derive from˜119883119894(119905)

1015840 through stochastic birth-death processes of 119869

119906(119906 = 119894 119894 + 1 119896minus1) cells and through

stochastic transition 119869119906rarr 119869

119906+1 119906 = 119894 119894+1 119896minus1 during

(119905 119905 + Δ119905] Let 119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) be the number of

birth the number of death of 119869119906cells and the number of

transition from 119869119906rarr 119869

119906+1cells during (119905 119905+Δ119905] respectively

in people who are 119869119894people at the embryo stage Let 119872

0(119905)

denote the number of transitions from 119873 rarr 1198691during

(119905 119905 + Δ119905] Because the transition of 119869119906rarr 119869

119906+1would not

affect the number of 119869119906cells but only increase the number of

119869119906+1

cells (see Remark 4) by the conservation law we have thefollowing stochastic equations for 119869

119906(119905 119894) 119906 = 119894 119896minus1 119894 =

0 1 2 (see Tan [3] Tan et al [4 5 8] Tan and Zhou [9] andTan and Yan [16])

119869119894 (119905 + Δ119905 119894) = 119869

119894 (119905 119894) + 119861119894 (119905 119894) minus 119863119894 (119905 119894) 119894 = 0 1 2

119869119906(119905 + Δ119905 119894) = 119869

119906(119905 119894) + 119861

119906(119905 119894) minus 119863

119906(119905 119894)

+ 119872119906minus1 (119905 119894) 119894 lt 119906 le 119896 minus 1

(2)

Because 119861119907(119905 119894) 119863

119907(119905 119894)119872

119907(119905 119894) 119894 = 0 1 2 119907 = 119894 119896minus

1 are random variables the above equations are basicallystochastic equations To derive probability distributions ofthese variables let 119887

119906(119905) and 119889

119906(119905) denote the birth rate and

the death rate at time 119905 of the 119869119906(119906 = 0 1 119896 minus 1) cells

respectively Let 120573119906(119905) (119906 = 0 1 119896 minus 1) be the transition

rate at time 119905 from 119869119906rarr 119869

119906+1 Then as shown in Tan [3] for

(119894 = 0 1 2 119906 = 119894 119896 minus 1) we have to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) | 119869

119906(119905 119894)

sim Multinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905 120573119906 (119905) Δ119905

(3)

It follows that to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) | 119869

119906(119905 119894)

simMultinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905

119872119906(119905 119894) | 119869

119906(119905 119894)

sim Binomial 119869119906(119905 119894) 120573

119906(119905) Δ119905

sim Poisson 119869119906(119905 119894) 120573

119906(119905) Δ119905 + 119900 (120573

119895(119905) Δ119905)

independently of 119861119906(119905 119894) 119863

119906(119905 119894)

119906 = 0 1 119896 minus 1

(4)

From these distribution results by subtracting from therandom transition variables its conditional means respec-tively we obtain the following stochastic equations for thestate variables

˜119883119894(119905) (119894 = 0 1 Min(2 119896 minus 1))

119889119869119894(119905 119894) = 119869

119894(119905 + Δ119905 119894) minus 119869

119894(119905 119894) = 119861

119894(119905 119894) minus 119863

119894(119905 119894)

= 119869119894(119905 119894) 120574

119894(119905) Δ119905 + 119890

119894(119905 119894) Δ119905 119894 = 0 1 2

119889119869119906 (119905 119894) = 119869

119906 (119905 + Δ119905 119894) minus 119869119906 (119905 119894) = 119872119906minus1 (119905 119894)

+ 119861119906(119905 119894) minus 119863

119906(119905 119894)

= 119869119906minus1 (119905 119894) 120573119906minus1 (119905)

+119869119906(119905 119894) 120574

119906(119905) Δ119905 + 119890

119906(119905 119894) Δ119905

119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(5)

where 120574119906(119905) = 119887

119906(119905) minus 119889

119906(119905) for 119906 = 0 1 119896 minus 1 and where

119890119894(119905 119894)Δ119905 = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)Δ119905] minus [119863

119894(119905 119894) minus 119869

119894(119905 119894)119889

119894(119905)Δ119905]

for 119894 = 0 1 2 119890119906(119905 119894)Δ119905 = [119861

119906(119905 119894) minus 119869

119906(119905 119894)119887

119906(119905)Δ119905] minus

[119863119906(119905 119894) minus 119869

119906(119905 119894)119889

119906(119905)Δ119905] + [119872

119906minus1(119905 119894) minus 119869

119906minus1(119905 119894)120573

119906minus1(119905)Δ119905]

for 119894 = 0 1 Min(2 119896 minus 1) 119894 lt 119906 le 119896 minus 1From the above equations by dividing both sides by Δ119905

and letting Δ119905 rarr 0 we obtain

119869119894(119905 119894)

119889119905= 119869

119894(119905 119894) 120574

119894(119905) + 119890

119894(119905 119894) 119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894)

119889119905= 119869

119906 (119905 119894) 120574119906 (119905) + 119869119906minus1 (119905 119894) 120573119906minus1 (119905) + 119890119906 (119905 119894)

for 119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(6)

In the above equations using the distribution results in(4) it can easily be shown that the randomnoises 119890

119906(119905 119894) 119906 =

119894 119896 minus 1 119894 = 0 1 2 have expected value zero and are

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

4 ISRN Biomathematics

Two-stage model

Embryo state

Embryo state

At birth

At birth

Tumor

α1 minus α

( gt 3)tumor ( = 3)

-stage model ( ge 3)

Figure 4 Embryo genotypes and their frequencies at embryo stage and at birth

with probability 120572 (120572 gt 0) an 1198692person at the embryo stage

would develop cancer at or before birth

3 The Stochastic Process ofCarcinogenesis with Hereditary CancerCases and Mathematical Analysis

Because tumors are developed from primary 119869119896cells for the

above stochasticmodel the identifiable response variables are119879(119905) and 119869

119906(119905 119894) 119894 = 0 1 2 119906 = 119894 119894 + 1 119896 minus 1 where

119879(119905) is the number of cancer tumors at time 119905 and 119869119906(119905 119894) is

the number of 119869119906(119906 = 119894 119894 + 1 119896 minus 1) cells at time 119905 in

people who are 119869119894people at the embryo stage (see [3 5 8 23]

Remarks 1 and 2) For people who have genotype 119869119894(119894 =

0 1 2) at the embryo stage the stochastic model of carcino-genesis is then given by the stochastic process

˜119883119894(119905) 119879(119905) 119905 gt

0 where˜119883119894(119905) = 119869

119906(119905 119894) 119906 = 119894 119894 + 1 119896 minus 1

1015840 For theseprocesses in the next subsections we will derive stochasticequations for the state variables (119869

119906(119905 119894)119894 = 0 1 2 119906 =

119894 119896 minus 1) we will also derive the probability distributionsof these state variables and the probabilities of developingcancer tumors These are the basic approaches for modelingcarcinogenesis used by the first author and his associates seeTan [3] Tan et al [4 5 8 23] Tan and Zhou [9] Tan and Yan[16] and Tan and Chen [24 25] and Remark 3

Remark 2 At any time (say 119905) the total number of 119869119896cells

is equal to the total number of 119869119896cells generated from 119869

119896minus1

cells at time 119905 plus the total number of 119869119896cells generated by

cell division from other 119869119896cells at time 119905 the former 119869

119896cells

are referred to as primary 119869119896cells while the latter are not

primary 119869119896cells Since each tumor is developed from a single

primary 119869119896cell through stochastic birth and death process

each primary 119869119896cell will generate atmost one tumor It follows

that at any time the total number of 119869119896cells is considerably

greater than the number of cancer tumors (see also Yangand Chen [11]) Thus for generating cancer tumors the onlyidentifiable state variables are the number of 119869

119895cells with (119895 =

0 1 119896 minus 1) and the number of detectable cancer tumor

Remark 3 To model stochastic multistage models of car-cinogenesis the standard traditional approach is to assumethat the last stage cells (ie the 119869

119896cells in the model 119873 rarr

1198691sdot sdot sdot rarr 119869

119896rarr 119879119906119898119900119903) grow instantaneously into a cancer

tumor as soon as they are generated and then apply thestandard Markov theory to 119879(119905) and to the state variables

˜119883(119905) = 119869

119894(119905) 119894 = 0 1 119896 minus 1 This approach has been

described in detail in Tan [2] Little [1] and Zheng [7] seealso Luebeck and Moolgavkar [26] and Durrett et al [27]However in some cases the assumption of instantaneousgrowth into cancer tumors of 119869

119896cells may not be realistic

ISRN Biomathematics 5

(Klebanov et al [17] Yakovlev and Tsodikov [18] and Fakiret al [19]) in these cases 119879(119905) is not Markov so that theMarkov theory method is not applicable to 119879(119905) To developanalytical results and to resolve many difficult issues Tan andhis associates [4 5 24] have proposed an alternative approachthrough stochastic equations and have followed Yang andChen [11] to assume that cancer tumors develop by clonalexpansion from primary last stage cells Through probabilitygenerating function method Tan and Chen [24] have shownthat if the Markov theory is applicable to 119879(119905) then thestochastic equation method is equivalent to the classicalMarkov theory method but is more powerful Also throughstochastic equation method we have shown in the Appendixthat the classical approach provides a close approximation todiscrete time model under the assumption that the primarylast stage cells develop into a detectable tumor in one timeunit This provides a reasonable explanation why the tradi-tional approach (see [2 22]) can still work well even thoughthe Markov assumption for 119879(119905) may not hold In this paperwe will thus basically use the stochastic equationmethod andassume that cancer tumors develop from primary last stagecells through clonal expansion

31 Stochastic Equations for the State Variables Assume nowthat an individual is an 119869

119894(119894 = 0 1 2) person at the embryo

stageThen in this individual cancer is developed by a (119896minus119894)-stage multievent model given by 119869

119894rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 and the identifiable response variables are given by˜119883119894(119905)

1015840 119879(119905) To derive stochastic equations for the staging

variables in˜119883119894(119905) in this individual observe that for each

119894 = 0 1 2˜119883119894(119905)

1015840 is in general a Markov Process although119879(119905) may not be Markov see Remark 1 Tan [3] and Tan etal [4 5] Tan and Zhou [9] and Tan and Yan [16] It followsthat

˜119883119894(119905 + Δ119905)

1015840 derive from˜119883119894(119905)

1015840 through stochastic birth-death processes of 119869

119906(119906 = 119894 119894 + 1 119896minus1) cells and through

stochastic transition 119869119906rarr 119869

119906+1 119906 = 119894 119894+1 119896minus1 during

(119905 119905 + Δ119905] Let 119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) be the number of

birth the number of death of 119869119906cells and the number of

transition from 119869119906rarr 119869

119906+1cells during (119905 119905+Δ119905] respectively

in people who are 119869119894people at the embryo stage Let 119872

0(119905)

denote the number of transitions from 119873 rarr 1198691during

(119905 119905 + Δ119905] Because the transition of 119869119906rarr 119869

119906+1would not

affect the number of 119869119906cells but only increase the number of

119869119906+1

cells (see Remark 4) by the conservation law we have thefollowing stochastic equations for 119869

119906(119905 119894) 119906 = 119894 119896minus1 119894 =

0 1 2 (see Tan [3] Tan et al [4 5 8] Tan and Zhou [9] andTan and Yan [16])

119869119894 (119905 + Δ119905 119894) = 119869

119894 (119905 119894) + 119861119894 (119905 119894) minus 119863119894 (119905 119894) 119894 = 0 1 2

119869119906(119905 + Δ119905 119894) = 119869

119906(119905 119894) + 119861

119906(119905 119894) minus 119863

119906(119905 119894)

+ 119872119906minus1 (119905 119894) 119894 lt 119906 le 119896 minus 1

(2)

Because 119861119907(119905 119894) 119863

119907(119905 119894)119872

119907(119905 119894) 119894 = 0 1 2 119907 = 119894 119896minus

1 are random variables the above equations are basicallystochastic equations To derive probability distributions ofthese variables let 119887

119906(119905) and 119889

119906(119905) denote the birth rate and

the death rate at time 119905 of the 119869119906(119906 = 0 1 119896 minus 1) cells

respectively Let 120573119906(119905) (119906 = 0 1 119896 minus 1) be the transition

rate at time 119905 from 119869119906rarr 119869

119906+1 Then as shown in Tan [3] for

(119894 = 0 1 2 119906 = 119894 119896 minus 1) we have to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) | 119869

119906(119905 119894)

sim Multinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905 120573119906 (119905) Δ119905

(3)

It follows that to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) | 119869

119906(119905 119894)

simMultinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905

119872119906(119905 119894) | 119869

119906(119905 119894)

sim Binomial 119869119906(119905 119894) 120573

119906(119905) Δ119905

sim Poisson 119869119906(119905 119894) 120573

119906(119905) Δ119905 + 119900 (120573

119895(119905) Δ119905)

independently of 119861119906(119905 119894) 119863

119906(119905 119894)

119906 = 0 1 119896 minus 1

(4)

From these distribution results by subtracting from therandom transition variables its conditional means respec-tively we obtain the following stochastic equations for thestate variables

˜119883119894(119905) (119894 = 0 1 Min(2 119896 minus 1))

119889119869119894(119905 119894) = 119869

119894(119905 + Δ119905 119894) minus 119869

119894(119905 119894) = 119861

119894(119905 119894) minus 119863

119894(119905 119894)

= 119869119894(119905 119894) 120574

119894(119905) Δ119905 + 119890

119894(119905 119894) Δ119905 119894 = 0 1 2

119889119869119906 (119905 119894) = 119869

119906 (119905 + Δ119905 119894) minus 119869119906 (119905 119894) = 119872119906minus1 (119905 119894)

+ 119861119906(119905 119894) minus 119863

119906(119905 119894)

= 119869119906minus1 (119905 119894) 120573119906minus1 (119905)

+119869119906(119905 119894) 120574

119906(119905) Δ119905 + 119890

119906(119905 119894) Δ119905

119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(5)

where 120574119906(119905) = 119887

119906(119905) minus 119889

119906(119905) for 119906 = 0 1 119896 minus 1 and where

119890119894(119905 119894)Δ119905 = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)Δ119905] minus [119863

119894(119905 119894) minus 119869

119894(119905 119894)119889

119894(119905)Δ119905]

for 119894 = 0 1 2 119890119906(119905 119894)Δ119905 = [119861

119906(119905 119894) minus 119869

119906(119905 119894)119887

119906(119905)Δ119905] minus

[119863119906(119905 119894) minus 119869

119906(119905 119894)119889

119906(119905)Δ119905] + [119872

119906minus1(119905 119894) minus 119869

119906minus1(119905 119894)120573

119906minus1(119905)Δ119905]

for 119894 = 0 1 Min(2 119896 minus 1) 119894 lt 119906 le 119896 minus 1From the above equations by dividing both sides by Δ119905

and letting Δ119905 rarr 0 we obtain

119869119894(119905 119894)

119889119905= 119869

119894(119905 119894) 120574

119894(119905) + 119890

119894(119905 119894) 119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894)

119889119905= 119869

119906 (119905 119894) 120574119906 (119905) + 119869119906minus1 (119905 119894) 120573119906minus1 (119905) + 119890119906 (119905 119894)

for 119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(6)

In the above equations using the distribution results in(4) it can easily be shown that the randomnoises 119890

119906(119905 119894) 119906 =

119894 119896 minus 1 119894 = 0 1 2 have expected value zero and are

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 5

(Klebanov et al [17] Yakovlev and Tsodikov [18] and Fakiret al [19]) in these cases 119879(119905) is not Markov so that theMarkov theory method is not applicable to 119879(119905) To developanalytical results and to resolve many difficult issues Tan andhis associates [4 5 24] have proposed an alternative approachthrough stochastic equations and have followed Yang andChen [11] to assume that cancer tumors develop by clonalexpansion from primary last stage cells Through probabilitygenerating function method Tan and Chen [24] have shownthat if the Markov theory is applicable to 119879(119905) then thestochastic equation method is equivalent to the classicalMarkov theory method but is more powerful Also throughstochastic equation method we have shown in the Appendixthat the classical approach provides a close approximation todiscrete time model under the assumption that the primarylast stage cells develop into a detectable tumor in one timeunit This provides a reasonable explanation why the tradi-tional approach (see [2 22]) can still work well even thoughthe Markov assumption for 119879(119905) may not hold In this paperwe will thus basically use the stochastic equationmethod andassume that cancer tumors develop from primary last stagecells through clonal expansion

31 Stochastic Equations for the State Variables Assume nowthat an individual is an 119869

119894(119894 = 0 1 2) person at the embryo

stageThen in this individual cancer is developed by a (119896minus119894)-stage multievent model given by 119869

119894rarr sdot sdot sdot rarr 119869

119896rarr

119879119906119898119900119903 and the identifiable response variables are given by˜119883119894(119905)

1015840 119879(119905) To derive stochastic equations for the staging

variables in˜119883119894(119905) in this individual observe that for each

119894 = 0 1 2˜119883119894(119905)

1015840 is in general a Markov Process although119879(119905) may not be Markov see Remark 1 Tan [3] and Tan etal [4 5] Tan and Zhou [9] and Tan and Yan [16] It followsthat

˜119883119894(119905 + Δ119905)

1015840 derive from˜119883119894(119905)

1015840 through stochastic birth-death processes of 119869

119906(119906 = 119894 119894 + 1 119896minus1) cells and through

stochastic transition 119869119906rarr 119869

119906+1 119906 = 119894 119894+1 119896minus1 during

(119905 119905 + Δ119905] Let 119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) be the number of

birth the number of death of 119869119906cells and the number of

transition from 119869119906rarr 119869

119906+1cells during (119905 119905+Δ119905] respectively

in people who are 119869119894people at the embryo stage Let 119872

0(119905)

denote the number of transitions from 119873 rarr 1198691during

(119905 119905 + Δ119905] Because the transition of 119869119906rarr 119869

119906+1would not

affect the number of 119869119906cells but only increase the number of

119869119906+1

cells (see Remark 4) by the conservation law we have thefollowing stochastic equations for 119869

119906(119905 119894) 119906 = 119894 119896minus1 119894 =

0 1 2 (see Tan [3] Tan et al [4 5 8] Tan and Zhou [9] andTan and Yan [16])

119869119894 (119905 + Δ119905 119894) = 119869

119894 (119905 119894) + 119861119894 (119905 119894) minus 119863119894 (119905 119894) 119894 = 0 1 2

119869119906(119905 + Δ119905 119894) = 119869

119906(119905 119894) + 119861

119906(119905 119894) minus 119863

119906(119905 119894)

+ 119872119906minus1 (119905 119894) 119894 lt 119906 le 119896 minus 1

(2)

Because 119861119907(119905 119894) 119863

119907(119905 119894)119872

119907(119905 119894) 119894 = 0 1 2 119907 = 119894 119896minus

1 are random variables the above equations are basicallystochastic equations To derive probability distributions ofthese variables let 119887

119906(119905) and 119889

119906(119905) denote the birth rate and

the death rate at time 119905 of the 119869119906(119906 = 0 1 119896 minus 1) cells

respectively Let 120573119906(119905) (119906 = 0 1 119896 minus 1) be the transition

rate at time 119905 from 119869119906rarr 119869

119906+1 Then as shown in Tan [3] for

(119894 = 0 1 2 119906 = 119894 119896 minus 1) we have to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) 119872

119906(119905 119894) | 119869

119906(119905 119894)

sim Multinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905 120573119906 (119905) Δ119905

(3)

It follows that to the order of 119900(Δ119905)

119861119906(119905 119894) 119863

119906(119905 119894) | 119869

119906(119905 119894)

simMultinomial 119869119906 (119905 119894) 119887119906 (119905) Δ119905 119889119906 (119905) Δ119905

119872119906(119905 119894) | 119869

119906(119905 119894)

sim Binomial 119869119906(119905 119894) 120573

119906(119905) Δ119905

sim Poisson 119869119906(119905 119894) 120573

119906(119905) Δ119905 + 119900 (120573

119895(119905) Δ119905)

independently of 119861119906(119905 119894) 119863

119906(119905 119894)

119906 = 0 1 119896 minus 1

(4)

From these distribution results by subtracting from therandom transition variables its conditional means respec-tively we obtain the following stochastic equations for thestate variables

˜119883119894(119905) (119894 = 0 1 Min(2 119896 minus 1))

119889119869119894(119905 119894) = 119869

119894(119905 + Δ119905 119894) minus 119869

119894(119905 119894) = 119861

119894(119905 119894) minus 119863

119894(119905 119894)

= 119869119894(119905 119894) 120574

119894(119905) Δ119905 + 119890

119894(119905 119894) Δ119905 119894 = 0 1 2

119889119869119906 (119905 119894) = 119869

119906 (119905 + Δ119905 119894) minus 119869119906 (119905 119894) = 119872119906minus1 (119905 119894)

+ 119861119906(119905 119894) minus 119863

119906(119905 119894)

= 119869119906minus1 (119905 119894) 120573119906minus1 (119905)

+119869119906(119905 119894) 120574

119906(119905) Δ119905 + 119890

119906(119905 119894) Δ119905

119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(5)

where 120574119906(119905) = 119887

119906(119905) minus 119889

119906(119905) for 119906 = 0 1 119896 minus 1 and where

119890119894(119905 119894)Δ119905 = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)Δ119905] minus [119863

119894(119905 119894) minus 119869

119894(119905 119894)119889

119894(119905)Δ119905]

for 119894 = 0 1 2 119890119906(119905 119894)Δ119905 = [119861

119906(119905 119894) minus 119869

119906(119905 119894)119887

119906(119905)Δ119905] minus

[119863119906(119905 119894) minus 119869

119906(119905 119894)119889

119906(119905)Δ119905] + [119872

119906minus1(119905 119894) minus 119869

119906minus1(119905 119894)120573

119906minus1(119905)Δ119905]

for 119894 = 0 1 Min(2 119896 minus 1) 119894 lt 119906 le 119896 minus 1From the above equations by dividing both sides by Δ119905

and letting Δ119905 rarr 0 we obtain

119869119894(119905 119894)

119889119905= 119869

119894(119905 119894) 120574

119894(119905) + 119890

119894(119905 119894) 119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894)

119889119905= 119869

119906 (119905 119894) 120574119906 (119905) + 119869119906minus1 (119905 119894) 120573119906minus1 (119905) + 119890119906 (119905 119894)

for 119894 = 0 1 Min (2 119896 minus 1) 119894 lt 119906 le 119896 minus 1

(6)

In the above equations using the distribution results in(4) it can easily be shown that the randomnoises 119890

119906(119905 119894) 119906 =

119894 119896 minus 1 119894 = 0 1 2 have expected value zero and are

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

6 ISRN Biomathematics

uncorrelated with the staging variables and 119879(119905) The initialconditions at birth (119905

0) for the above stochastic differential

equations are 119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1 119869

119906(1199050 119894) = 0 119906 gt 119894 + 1

Given the initial conditions (119869119906(1199050 119894) gt 0 119906 = 119894 119894 + 1)

and (119869119906(1199050 119894) = 0 119906 gt 119894 + 1) at birth (119905

0) the solution of the

equations in (6) is given respectively by

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

int119905

1199050

120574119894(119909)119889119909

+ 120578119894 (119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906 (119905 119894) = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+ int

119905

1199050

119869119906minus1 (119909 119894) 120573119906minus1 (119909) 119890

int119905

119909120574119906(119910)119889119910

119889119909

+ 120578119906(119905 119894) = sdot sdot sdot = 119869

119906(1199050 119894) 119890

int119905

1199050

120574119906(119909)119889119909

+

119906minus119894

sum

119907=1

119869119906minus119907

(1199050 119894) 120601

(119907)

119906(119905 119894) +

119906+1minus119894

sum

119907=1

120578(119907)

119906(119905 119894)

119906 = 119894 + 1 119896 minus 1

where 119894 = 0 if 119896 = 2

119894 = 0 1 if 119896 = 3

119894 = 0 1 2 if 119896 gt 3

(7)

where

120601(1)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910+int

119909

1199050

120574119906minus1

(119910)119889119910120573119906minus1

(119909) 119889119909

119906 = 119894 119896 minus 1

120601(119907)

119906(119905 119894) =int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120601(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

120578119906(119905 119894) = 120578

(1)

119906(119905 119894)

= int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

119890119906(119909 119894) 119889119909

119906 = 119894 119896 minus 1

120578(119907)

119906(119905 119894) = int

119905

1199050

119890int119905

119909120574119906(119910)119889119910

120573119906minus1

(119909) 120578(119907minus1)

119906minus1(119909 119894) 119889119909

119907 = 2 119906 minus 119894

(8)

If the model is time homogeneous so that 120573119906(119905) =

120573119906 119887119906(119905) = 119887

119906 119889

119906(119905) = 119889

119906 120574

119906(119905) = 119887

119906minus 119889

119906= 120574

119906 119906 = 0 1

119896minus1 and if 120574119894= 120574119906if 119894 = 119906 the above solutions under the initial

conditions (119869119894+119906(1199050 119894) gt 0 119906 = 0 1 119869

119894+119906(1199050 119894) = 0 119906 gt 1)

then reduce respectively to

119869119894 (119905 119894) = 119869

119894(1199050 119894) 119890

120574119894(119905minus1199050)+ 120578

(1)

119894(119905 119894)

119894 = 0 1 Min (2 119896 minus 1)

119869119906(119905 119894) = 119869

119906(1199050 119894) 119890

120574119906(119905minus1199050)

+120573119906minus1

int

119905

1199050

119869119906minus1

(119909 119894) 119890120574119906(119905minus119909)

119889119909+120578(1)

119906(119905 119894)

= sdot sdot sdot = 119869119906(1199050 119894) 119890

120574119906(119905minus1199050)

+

119906minus1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894)

=

119894+1

sum

119903=119894

119869119903(1199050 119894) (

119906minus1

prod

119907=119903

120573119907)

119906

sum

119897=119903

119860119903119906(119897) 119890

120574119897(119905minus1199050)

+

119906

sum

119903=119894

120578(119906+1minus119903)

119906(119905 119894) 119894 lt 119906 le 119896 minus 1

(9)

where for 119894 le 119907 le 119906119860119894119906 (119907) = 1 if 119894 = 119906

=

119906

prod

119897=119894119897 = 119907

(120574119897minus 120574

119907)minus1 if 119894 lt 119906

(10)

Obviously 119864[120578(119903)119906(119905 119894)] = 0 for all (119894 = 0 1 2 119906 = 119894

119896 minus 1 119903 = 1 119906 minus 119894 + 1) It follows that for (119894 =

0 1 Min(119896 minus 1 2)) the expected values 119864[119869119896minus1

(119905 119894)] of119869119896minus1

(119905 119894) for homogeneous models with 120574119894= 120574119906if 119894 = 119906 are

given by

119864 [119869119896minus1

(119905 119894) = 119864 [119869119894+1

(1199050 119894)]

times (

119896minus2

prod

119906=119894+1

120573119906)

119896minus1

sum

119907=119894+1

119860(119894+1)(119896minus1)

(119907) 119890120574119907(119905minus1199050)

+ 119864 [119869119894(1199050 119894)] (

119896minus2

prod

119906=119894

120573119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907) 119890120574119907(119905minus1199050)

119894 = 0 1Min (119896 minus 1 2) 119896 ge 2

(11)

where as a convention (sum119894

119895=119894+1119888119894= 0prod

119894

119895=119894+1119889119895= 1) for all

(119888119894 119889

119895)

Remark 4 Because genetic changes and epigenetic changesoccur during cell division to the order of 119900(Δ119905) the probabil-ity is120573

119906(119905)Δ119905 that one 119869

119906cell at time 119905would give rise to 1 119869

119906cell

and 1 119869119906+1

cell at time 119905 + Δ119905 by genetic changes or epigeneticchanges It follows that the transition of 119869

119906rarr 119869

119906+1would not

affect the population size of 119869119906cells but only increase the size

of the 119869119906+1

population

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 7

32 Transition Probabilities and Probability Distributions ofStaging Variables Let 119891(119909 119899 119901) denote the probabilitydensity function of a binomial random variable 119883 sim

Binomial(119899 119901) ℎ(119909 120582) the probability density function ofa Poisson random variable 119883 sim Poisson(120582) and 119892(119909 119910 119899

1199011 119901

2) the probability density function of a bivariate multi-

nomial random vector (119883 119884)simMultinomial(119899 1199011 119901

2) Using

the stochastic equations of the staging variables given by(2) and using the probability distributions of the transitionvariables 119861

119903(119905 119894) 119863

119903(119905 119894)119872

119903(119905 119894) in (4) as inTan et al [4 5]

we obtain the following transition probabilities of 119869119903(119905 +

Δ119905 119894) 119903 + 119894 119896 minus 1 given 119869119903(119905 119894) 119903 = 119894 119896 minus 1 for

(119894 = 0 1 Min(119896 minus 1 2))

119875 119869119903 (119905 + Δ119905 119894) = 119907

119903 119903 = 119894 119894 + 1 119896 minus 1 | 119869

119903 (119905 119894)

= 119906119903 119903 = 119894 119894 + 1 119896 minus 1

= 119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

times

119896minus1

prod

119895=119894+1

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894)

= 119906119895 119869119895minus1

(119905 119894) = 119906119895minus1

(12)

where

119875 119869119894(119905 + Δ119905 119894) = 119907

119894| 119869

119894(119905 119894) = 119906

119894

=

119906119894

sum

119903=0

119891 (119903 119906119894 119887119894 (119905) Δ119905)

times 119891(119906119894minus 119907

119894+ 119903 119906

119894minus 119903

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

119894 = 0 1 Min (119896 minus 1 2)

119875 119869119895(119905 + Δ119905 119894) = 119907

119895| 119869

119895(119905 119894) = 119906

119895 119869119895minus1

(119905 119894) = 119906119895minus1

=

119906119895

sum

1199031=0

119906119895minus1199031

sum

1199032=0

119892 (1199031 1199032 119906

119895 119887119895(119905) Δ119905 119889

119895(119905) Δ119905)

times ℎ (119907119895minus 119906

119895minus 119903

1+ 119903

2 119906

119895minus1120573119895minus1

Δ119905) 119895 gt 119894

(13)

Define the unobservable transition variables˜119880119894(119905) =

119861119894(119905 119894) (119861

119895(119905 119894) 119863

119895(119905 119894)) 119895 = 119894 + 1 119896 minus 1

1015840(119894 = 0 1Min

(119896 minus 1 2)) Then we have for the joint probability densityfunction of

˜119883119894(119905 + Δ119905)

˜119880119894(119905) given

˜119883119894(119905)

119875 ˜119883119894 (119905 +Δ119905) ˜

119880119894 (119905) | ˜

119883119894 (119905) = 119875

˜119883119894 (119905 + Δ119905) | ˜

119880119894 (119905) ˜

119883119894 (119905)

times 119875 ˜119880119894(119905) |

˜119883119894(119905)

(14)

where

119875 ˜119883119894(119905 + Δ119905) |

˜119880119894(119905)

˜119883119894(119905)

= 119891(119869119894 (119905 119894) minus 119869119894 (119905 + Δ119905 119894) + 119861119894 (119905 119894) 119869119894 (119905 119894)

minus119861119894(119905 119894)

119889119894(119905) Δ119905

1 minus 119887119894 (119905) Δ119905

)

times

119896minus1

prod

119895=119894+1

ℎ 119869119895 (119905 + Δ119905 119894) minus 119869119895 (119905 119894) minus 119861119895 (119905 119894)

+119863119895(119905 119894) 119869

119895minus1(119905 119894) 120573

119895minus1(119905) Δ119905

(15)

119875 ˜119880119894(119905) |

˜119883119894(119905)

= 119891 (119861119894 (119905 119894) 119869119894 (119905 119894) 119887119894 (119905 119894) Δ119905)

times

119896minus1

prod

119895=119894+1

119892119861119895(119905 119894) 119863

119895(119905 119894) 119869

119895(119905 119894) 119887

119895(119905 119894) Δ119905 119889

119895(119905 119894) Δ119905

(16)

Let˜119890119894(119895) be a (119896 minus 119894) times 1 column vector with 1 in the

119895th position (1 le 119895 le 119896 minus 119894) and with 0 in other positionsLet

˜119906 = (119906

119894 119906

119896minus1)1015840 and

˜119907 = (119907

119894 119907

119896minus1)1015840 be (119896 minus

119894) times 1 column vectors of nonnegative integers (ie 119906119895and

119907119895are nonnegative integers) Then by using the probability

distribution results in (14)ndash(16) it can readily be shown that

119875 ˜119883119894(119905 + Δ119905) =

˜119907 |

˜119883119894(119905) =

˜119906

= [119906119895119887119895(119905) + (1 minus 120575

119895119894) 119906

119895minus1120573119895minus1

(119905)] Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 +

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119906119895119889119895(119905) Δ119905 + 119900 (Δ119905)

if˜119907 =

˜119906 minus

˜119890119894(119895) 119895 = 119894 119896 minus 1

= 119900 (Δ119905) if 10038161003816100381610038161003816˜11015840

119899minus119896(˜119906 minus

˜119907)10038161003816100381610038161003816ge 2

(17)

The above results imply that˜119883119894(119905) is a (119896minus 119894)-dimensional

birth-death process with birth rates 119894119887119906(119905) 119906 = 119894 119896 minus

1 death rates 119894119889119906(119905) 119906 = 119894 119896 minus 1 and cross-

transition rates 120572119906119906+1

(119895 119905) = 119895120573119906(119905) 119906 = 119894 119896 minus

1 120573119906119907(119895 119905) = 0 if 119907 = 119906 + 1 (See Definition 41 in Tan

([22] Chapter 4)) Using these results it can be shownthat the Kolmogorov forward equation for the probabilities119875119869

119895(119905 119894) = 119906

119895 119895 = 119894 119896 minus 1 | 119869

119894(0) = 119898

119894 119869

119895(0) = 0

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

8 ISRN Biomathematics

119895 gt 119894 = 119875(119906119895 119895 = 119894 119896 minus 1 119905) (119894 = 0 1Min(119896 minus 1 2))

in the above model is given by

119889

119889119905119875 (119906

119895 119895 = 119894 119896 minus 1 119905)

= 119875 (119906119894minus 1 119906

119895 119895 = 119894 + 1 119896 minus 1 119905) (119906

119894minus 1) 119887

119894(119905)

+

119896minus1

sum

119895=119894+1

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

minus1 119906119895+1

119906119896minus1

119905) (119906119895minus 1) 119887

119895(119905)

+

119896minus2

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895 119906

119895+1

minus1 119906119895+2

119906119896minus1

119905) 119906119895120573119895 (119905)

+

119896minus1

sum

119895=119894

119875 (119906119894 119906

119894+1 119906

119895minus1 119906

119895

+1 119906119895+1

119906119896minus1

119905) (119906119895+ 1) 119889

119895(119905)

minus 119875 (119906119894 119906

119894+1 119906

119896minus1 119905)

times

119896minus1

sum

119895=119894

119906119895[119887119895(119905) + 119889

119895(119905)] +

119896minus2

sum

119895=119894

119906119895120573119895(119905)

(18)

for 119906119895= 0 1 infin 119895 = 119894 119896 minus 1

By using the above set of differential equations one canreadily compute the probabilities 119875119869

119895(119905) = 119906

119895 119895 = 119894 119896 minus

1 | 119868119894(0) = 119898

119894 = 119875(119906

119895 119895 = 119894 119896 minus 1 119905) numerically

33 The Probability Distributions of the Number of DetectableTumors and Times to Tumors As shown by Yang and Chen[11] malignant cancer tumors arise from primary 119869

119896cells by

clonal expansion where primary 119869119896cells are 119869

119896cells generated

directly by 119869119896minus1

cells (119869119896cells derived by stochastic birth of

other 119869119896cells are not primary 119869

119896cells) That is cancer tumors

develop from primary 119869119896cells through stochastic birth-death

processesTo derive the probability distribution for 119879(119905) in 119869

119894people

in the population let 119875119879(119904 119905) denote the probability that

a primary cancer cell at time 119904 develops into a detectablecancer tumor by time 119905 (Explicit formula for119875

119879(119904 119905) has been

given in Tan [22] Chapter 8 and in Tan and Chen [24])Than as shown in Tan ([3 22] chapter 8) the conditionalprobability distribution of 119879(119905) given 119869

119896minus1(119904 119894) 119904 le 119905

in 119869119894people is Poisson with mean 120596(119905 119894) where 120596(119905 119894) =

int119905

1199050

119869119896minus1

(119904 119894)120573119896minus1

(119904)119875119879(119904 119905)119889119904 That is

119879 (119905) | 119869119896minus1

(119904 119894) 119904 le 119905 sim Poisson (120596 (119905 119894)) (19)

Let 119876119894(119895) be the probability that cancer tumors develop

during (119905119895minus1

119905119895] in 119869

119894people in the population For time

homogeneous models with small 120573119896minus1

119876119894(119895) is then given by

119876119894(119895) = 119864 119890

minus120596(119905119895minus1

119894)minus 119890

minus120596(119905119895119894)

= 119890minus120573119896minus1

119867119894(119905119895minus1

)minus 119890

minus120573119896minus1

119867119894(119905119895)+ 119900 (120573

119896minus1)

(20)

where119867119894(119905) = int

119905

1199050

119864[119869119896minus1

(119909 119894)]119875119879(119909 119905)119889119909

To derive 119876119894(119895) denote by

120579119894(119896minus1)

= 119864 [119869119894(1199050 119894 minus 1)] 120573

119894

119896minus1

prod

119906=119894+1

(120573119906

120574119906

)

119894 = 1 Min (3 119896 minus 1)

120582119906(119896minus1)

= 119864 [119869119906(1199050 119906)] 120573

119906

119896minus1

prod

119907=119906+1

(120573119906

120574119906

)

119906 = 0 1 Min (2 119896 minus 1)

(21)

and define the functions

120595119894(119896minus1)

(119905) =

119896minus1

prod

119906=119894+1

120574119906

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times int

119905

1199050

119890120574119906(119909minus1199050)119875119879(119909 119905) 119889119909

119894 = 0 1 Min (2 119896 minus 1)

(22)

Applying results of 119864[119869119896minus1

(119905 119894)] given in (11) for timehomogeneous models with 120574

119894= 120574119895if 119894 = 119895 we obtain 119876

119894(119895)rsquos as

follows

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) = 119890

minus1205791112059511(119905119895minus1

)minus1205820112059501(119905119895minus1

)

minus119890minus1205791112059511(119905119895)minus1205820112059501(119905119895) + 119900 (120573

1)

(23)

1198761(119895) = (1 minus 120572

1) 119890

minus1205821112059511(119905119895minus1

)

minus119890minus1205821112059511(119905119895) + 119900 (120573

1)

(24)

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 9

(2) If 119896 ge 3 then we have 119876119894(0) = 0 for 119894 = 0 1 and

1198762(0) = 120575

2119896120572 and for 119895 gt 0

1198760(119895) = 119890

minus1205791(119896minus1)

1205951(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1205951(119896minus1)

(119905119895)minus1205820(119896minus1)

1205950(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(25)

1198761(119895) = 119890

minus1205792(119896minus1)

1205952(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1205952(119896minus1)

(119905119895)minus1205821(119896minus1)

1205951(119896minus1)

(119905119895) + 119900 (120573

119896minus1)

(26)

1198762(119895) = 120575

2119896(1 minus 120572) 119890

minus1205822212059522(119905119895minus1

)minus 119890

minus1205822212059522(119905119895)

+ (1 minus 1205752119896) 119890

minus1205793(119896minus1)

1205953(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1205953(119896minus1)

(119905119895)minus1205822(119896minus1)

1205952(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(27)

where 1205752119896= 1 if 119896 = 2 and = 0 if 119896 = 2

Notice that if 1205740= 0 then 120595

0(119896minus1)(119905) reduces to

1205950(119896minus1)

(119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=0

1198600(119896minus1)

(119907)

times int

119905

1199050

119890120574119907(119909minus1199050)119875119879(119909 119905) 119889119909

= (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times int

119905

1199050

[119890120574119907(119909minus1199050)minus 1] 119875

119879 (119909 119905) 119889119909

(28)

Notice also that if 119875119879(119904 119905) asymp 1 for 119905 gt 119904 and if 120574

0= 0 then

the above 120595119894119895(119905)rsquos reduce respectively to

120595119894119894(119905) =

1

120574119894

119890120574119894(119905minus1199050)minus 1 119894 = 1 119896 minus 1

1205950(119896minus1) (119905) = (

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1) (119907)

times1

1205741199072119890

120574119907(119905minus1199050)minus 1 minus (119905 minus 119905

0) 120574

119907

120595119894(119896minus1)

(119905) = (

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1)

(119907)

times1

120574119907

119890120574119907(119905minus1199050)minus 1 119894 = 1 119896 minus 1

(29)

4 Probability Distribution ofObserved Cancer Incidence IncorporatingHereditary Cancer Cases

For estimating unknown parameters and to validate themodel one would need real data generated from the modelFor studies of carcinogenesis such data are usually given bycancer incidence For example in the SEER data of NCINIHof USA the data are given by (119910

0 119899

0) (119910

119895 119899

119895) 119895 = 1 119905

119873

where 1199100is the number of cancer cases at birth and 119899

0

the total number of birth and where for 119895 ge 1 119910119895is

the number of cancer cases developed during the 119895th agegroup of a one-year period (or 5 years periods) and 119899

119895is

the number of noncancer people who are at risk for cancerand from whom 119910

119895of them have developed cancer during

the 119895th age group Given in Table 1 are the SEER data ofuveal melanoma (adult eye cancer) during the period 1973ndash2007 In Table 1 notice that there are some cancer cases atbirth implying some inherited cancer cases In this sectionwe will develop a statistical model for these types of datasets from the stochastic multistage model with hereditarycancers as given in Section 2 As in previous sections let119899119894119895be the number of individuals who have genotype 119869

119894(119894 =

0 1 2) at the embryo stage among the 119899119895people at risk for

the cancer in question Then as showed above (1198991119895 119899

2119895) |

119899119895sim Multinomial119899

119895 119901

1 119901

2 It follows that 119899

119894119895| 119899

119895sim

Binomial119899119895 119901

119894 119894 = 0 1 2 In what follows we let 119884

119895denote

the random variable for 119910119895unless otherwise stated

41 The Probability Distribution of 1198840 As shown in Figure 4

119869119894(119894 = 0 1 2) people would only generate 119869

119894stage cells and

119869119894+1

stage cells at birth Thus for cancers to develop at orbefore birth the number of stages for the stochastic model ofcarcinogenesis must be 3 or less It follows that if 119910

0gt 0 the

appropriate model of carcinogenesis must be either a 2-stagemodel or a 3-stage model Since 119899

20| 119899

0sim Binomial(119899

0 119901

2)

and 11989910| (119899

0 119899

20) sim Binomial(119899

0 119901

1) + 119900(119901

2) the probability

distribution of 1198840is therefore

1198840sim Poisson (120594

119896) 119896 = 2 3 (30)

where

120594119896= 119899

0(119901

2+ 119901

1120572) if 119896 = 2

= 11989901199012120572 if 119896 = 3

(31)

The expected number of 1198840given 119899

0is 119864(119884

0| 119899

0) =

1198990(119901

2+ 119901

1120572) = 120594

2if 119896 = 2 and 119864(119884

0| 119899

0) = 119899

01199012120572 = 120594

3

if 119896 = 3 Hence for the 2-stage model (ie 119896 = 2) or the 3-stage model (ie 119896 = 3) the maximum likelihood estimateof 120594

119896is 120594

119896= 119910

0and the deviance119863

0(119896) from the conditional

probability distribution of 1198840given 119899

0is

1198630(119896) = minus2 log ℎ (119910

0 120594

119896) minus log ℎ (119910

0 120594

119896)

= 120594119896minus 119910

0 minus 119910

0log

120594119896

1199100

119896 = 2 3

(32)

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

10 ISRN Biomathematics

Table 1 The SEER incidence data (1973ndash2007) of uveal melanomafrom NCINIH (over all races and genders)

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

0 12495777 34 36 36 381 12221582 20 11 11 162 12120990 14 7 8 173 12112995 9 5 6 184 12146174 4 4 5 205 12161336 3 3 4 226 12111854 2 2 4 257 12160452 2 2 4 288 11942586 2 2 4 309 12381299 1 2 5 3410 12512703 2 3 6 3811 12410338 5 3 6 4112 12449244 2 3 6 4413 12527781 5 4 7 4814 12602883 3 5 7 5115 12719598 5 5 8 5516 12766107 7 6 9 5917 12831400 9 7 9 6218 12382047 8 7 9 6319 12581638 10 8 10 6820 12636509 7 9 11 7121 12682601 6 10 10 7522 12840510 12 11 11 7923 13075528 17 13 13 8424 13358635 16 15 14 8925 13473849 12 16 16 9426 13426340 17 18 18 9727 13525264 28 20 20 10128 13149674 17 22 21 10229 13812811 23 25 25 11030 13886874 24 28 27 11431 13488332 37 30 29 11532 13460286 32 32 32 11833 13256067 38 35 35 11934 13428827 37 39 38 12435 13220037 40 41 41 12636 12870265 30 44 44 12637 12689592 43 47 47 12738 12157014 42 49 49 12539 12494081 46 55 54 13140 12272125 49 58 58 13241 11826573 56 61 60 13042 11663153 54 65 64 13143 11407082 53 68 68 13144 11296848 70 73 72 133

Table 1 Continued

Agegroups

Numberof peopleat risk

Observedincidence

Model-Fpredicated

Model-1predicated

Two-stagepredicated

45 11016369 57 76 76 13246 10651593 71 79 79 13047 10475708 87 84 83 13148 9994684 82 86 85 12749 10138908 78 93 93 13150 9836359 87 97 96 13051 9475641 95 100 99 12752 9250985 113 104 104 12653 9027382 106 108 108 12554 8883737 117 113 113 12555 8547883 129 116 116 12356 8279648 107 119 119 12157 8062368 119 123 123 11958 7654610 132 124 124 11559 7563706 118 130 129 11560 7232719 131 131 130 11261 6927332 116 132 132 10962 6708273 133 134 134 10763 6543931 143 138 137 10664 6404652 130 141 141 10565 6168486 145 142 142 10266 5913479 138 142 142 9967 5746766 151 144 144 9868 5480517 147 142 142 9469 5363912 149 144 144 9470 5110728 136 142 142 9071 4925076 144 141 141 8872 4696825 140 138 138 8573 4512136 146 135 135 8274 4345300 126 133 133 8075 4148801 122 128 129 7876 3900900 124 122 122 7477 3681587 114 116 116 7078 3481918 93 110 110 6779 3243631 102 102 102 6380 2961234 79 92 93 5881 2724984 93 83 84 5482 2495219 82 75 75 5083 2271595 92 66 67 4684 2041351 64 57 58 4285 10466605 304 282 285 216Note 1 for age 0 and 1ndash10 years old the cancer incidence are derived bysubtracting incidence of retinoblastoma from the original SEERdata (see Tanand Zhou [9])Note 2 the observed uveal melanoma incidence rates per 106 individuals arederived from the SEER eye cancer incidence by subtracting retinoblastomaincidence as given in Tan and Zhou [9]

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 11

42 The Probability Distribution of 119884119895(119895 ge 1) To derive the

probability distribution of 119884119895(119895 ge 1) in the 119895th age group

let 119884119894119895(119894 = 0 1 2) be the number of cancer cases generated

by people who have genotype 119869119894at the embryo stage among

these 119884119895cancer cases Then 119884

119895= sum

2

119894=0119884119894119895and 119884

0119895is the

number of cancer cases generated by the 1198990119895= 119899

119895minus 119899

1119895minus 119899

2119895

normal people in the populationThe conditional probabilitydistribution of 119884

119894119895given 119899

119894119895is

119884119894119895| 119899

119894119895sim Poisson 119899

119894119895119876119894(119895) 119894 = 0 1 2 (33)

Notice that if 119896 = 2 (a 2-stage model) then all 1198692

individuals would develop tumor at or before birth Thus if119896 = 2 then 119884

2119895= 0 for all 119895 gt 0 so that if 119895 gt 0 cancer

cases develop only from normal people (119873 = 1198690people) and

1198691people On the other hand if 119896 gt 2 then with positive

probability 119884119894119895gt 0 for all (119894 = 0 1 2 119895 = 0 1 119899

119879) where

119899119879is the last time point in the data Let 120575

2119896= 1 if 119896 = 2 and

1205752119896= 0 if 119896 = 2 Then 119884

119895| (119899

119894119895 119894 = 0 1 2) sim Poisson(119876

119879(119895)

where 119876119879(119895) = sum

1

119894=0119899119894119895119876119894(119895) + (1 minus 120575

2119896)119899

21198951198762(119895) Since

(1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) we have for the

conditional probability density function119875(119910119895| 119899

119895) of119884

119895given

119899119895

119875 (119910119895| 119899

119895)=

119899119895

sum

1198990119895=0

119899119895minus1198990119895

sum

1198991119895=0

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1) ℎ 119910

119895 119876

119879(119895)

(34)

where 119892(1198990119895 119899

1119895 119899

119895 119901

0 119901

1) is the probability density function

of (1198990119895 119899

1119895) | 119899

119895sim Multinomial(119899

119895 119901

0 119901

1) and ℎ119910

119895 119876

119879(119895)

the probability density function of 119884119895| (119899

119894119895 119894 = 0 1 2) sim

Poisson119876119879(119895)

The probability density function 119875(119910119895| 119899

119895) given by (34)

is a mixture of Poisson probability density functions withmixing probability density function given by themultinomialprobability distribution of 119899

0119895 119899

1119895 given 119899

119895 This mixing

probability density function represents individuals with dif-ferent genotypes at the embryo stage in the population

Let Θ be the set of all unknown parameters (ie theparameters (119901

1 119901

2 120572) and the birth rates the death rates

and the mutation rates of 119869119895cells) Based on data (119910

119895 119895 =

0 1 119899119879) the likelihood function of Θ is

119871 Θ | 119910119895 119895 = 0 1 119899

119879 = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119875 (119910119895| 119899

119895)

(35)

Notice that because themutation rates are very small onemay practically assume 120573

119894(119905) = 120573

119894for 119894 = 0 1 119896 minus 1

Also because the stage-limiting genes are basically tumorsuppressor genes which act recessively (see Tan [3]Weinberg[6] and Tan et al [5 8 23]) one may practically assume119887119894(119905) = 119887

119894 119889

119894(119905) = 119889

119894 120574

119894(119905) = 119887

119894minus 119889

119894= 120574

119894 119894 = 0 1 119896 minus 1

(see Tan et al [4 5 8])

43 The Joint Probability Distribution of Augmented Variablesand Cancer Incidence For applying the mixture distribution

of 119884119895in (34) to make inference about the unknown parame-

ters one needs to expand the model to include the unobserv-able augmented variables (119899

0119895 119899

1119895 119910

0119895 119910

1119895 119895 = 1 119905

119873) and

derives the joint probability distribution of these variablesFor these purposes observe that for (119895 = 1 119905

119873) and for

119896 gt 2 the conditional probability distribution119875119910119894119895 119894 = 0 1 |

119910119895 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1) given (119884

119895= 119910

119895 119899

119894119895 119894 =

0 1 119899119895) is

(119910119894119895 119894 = 0 1) | (119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Multinomial(119910119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1)

(36)

Since the conditional probability distribution of 119884119895given

(119899119894119895 119894 = 0 1 2) for 119896 gt 2 is Poisson with mean 119876

119879(119895) we

have for the joint conditional probability density function119875119910

119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895 of (119884

119894119895 119894 = 0 1 119884

119895) given

(119899119894119895 119894 = 0 1 119899

119895)

119875 119910119894119895 119894 = 0 1 119910

119895| 119899

119894119895 119894 = 0 1 119899

119895

= 119875 119910119894119895 119894 = 0 1 | 119910

119895 119899

119894119895 119894 = 0 1 2

times 119875 119910119895| 119899

119894119895 119895 = 0 1 2 =

1

119910119895119890minus119876119879(119895)119876

119879(119895)

119910119895

times 1198921199100119895 119910

1119895 119910

119895119899119894119895119876119894(119895)

119876119879(119895)

119894 = 0 1

=

2

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) (119896 gt 2)

(37)

where 1199102119895= 119910

119895minus sum

1

119906=0119910119906119895and 119899

2119895= 119899

119895minus sum

1

119906=0119899119906119895

If 119896 = 2 then 1198842119895= 0 so that 119884

119895= sum

1

119894=0119884119894119895 Thus we have

for 119896 = 2

119884119895| (119899

119894119895 119894 = 0 1 119899

119895) sim Poisson

1

sum

119894=0

119899119894119895119876119894(119895) (38)

119884119894119895| (119884

119895= 119910

119895 119899

119894119895 119894 = 0 1 119899

119895)

sim Binomial(119910119895

119899119894119895119876119894(119895)

sum1

119906=0119899119906119895119876119906(119895)

) 119894 = 0 1

(39)

It follows that if 119896 = 2 then sum1

119894=0119884119894119895= 119884

119895and the joint

probability density function of (1198840119895 119884

119895) given (119899

119906119895 119906 = 0 1 2)

is119875 119910

0119895 119910

119895| 119899

119906119895 119906 = 0 1 119899

119895

= 119875 119910119895| 119899

119894119895 119894 = 0 1 119899

119895

times 119875 1199100119895| 119910

119895 119899

119906119895 119906 = 0 1 119899

119895

=

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895) if 119896 = 2

(40)

where 119910119895= sum

1

119894=0119910119894119895and 119899

119895= sum

2

119894=0119899119894119895

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

12 ISRN Biomathematics

Put Y = (119910119894119895 119894 = 0 1 119895 = 1 119905

119873) N = (119899

119894119895 119894 = 0 1

119895 = 1 119905119873)˜119899 = (119899

119895 119895 = 0 1 119905

119873)˜119910 = (119910

119895 119895 = 0 1

119905119873) From (37) and (40) we have for the conditional joint

probability density function of (Y˜119910) given (N

˜119899)

119875 Y˜119910 | N

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(41)

It follows that the joint conditional probability densityfunction of NY

˜119910 given (

˜119899 Θ) is

119875 NY˜119910 |

˜119899 Θ = ℎ (119910

0 120594 (119896))

119905119873

prod

119895=1

119892 (1198990119895 119899

1119895 119899

119895 119901

0 119901

1)

times ℎ [1199102119895 119899

21198951198762(119895)]

1minus1205752119896

times

1

prod

119894=0

ℎ 119910119894119895 119899

119894119895119876119894(119895)

(42)

Notice that the above probability density function is aproduct of multinomial probability density functions andPoisson probability density functions For this joint probabil-ity density function the deviance Dev = minus2log119875[Y

˜119910N |

˜119899 Θ] minus log119875[Y

˜119910N |

˜119899 Θ] is

Dev = 1198630(119896) + Dev (119901

1 119901

2) +

119905119873

sum

119895=1

119863119895 (43)

where

1198630(119896) = 2 120594 (119896) minus 119910

0minus 119910

0log

120594 (119896)

1199100

(44)

Dev (1199011 119901

2) = 2

119905119873

sum

119895=1

1198990119895log

1199010

(1 minus 1199011minus 119901

2)

+

2

sum

119894=1

119899119894119895log

119901119894

119901119894

(45)

119863119895= 2

1

sum

119894=0

119899119894119895119876119894(119895)minus119910

119894119895minus119910

119894119895log

119899119894119895119876119894(119895)

119910119894119895

+2 (1 minus 1205752119896)

times 11989921198951198762(119895) minus 119910

2119895minus 119910

2119895log

11989921198951198762(119895)

1199102119895

(46)

where 119901119894= ((sum

119905119873

119895=0119899119894119895)(sum

119905119873

119895=0119899119895)) (119894 = 0 1 2)

The joint probability density function 119875Y˜119910N |

˜119899 Θ

of (Y˜119910N) given by (42) will be used as the kernel for the

Bayesian method to estimate the unknown parameters andto predict the state variables

44 Fitting of the Model to Cancer Incidence Data To fit themodel to real data as inTan [3ndash5] we letΔ119905 sim 1 to correspondto a fixed time interval such as 6 months in human cancerstudies (Tan et al [4] has assumed 3months as one-time unitwhile Luebeck andMoolgavkar [26] has assumed one year asone-time unit)Then because the proliferation rate of the laststage cells is quite large onemay practically assume119875

119879(119904 119905) =

1 for 119905 minus 119904 ge 1 Hence noting that 120573119896minus1

(119905) = 120573119896minus1

is usuallyvery small (see [3ndash5]) the 119876

119894(119895) is approximated by

119876119894(119895) asymp119864 119890

minus120573119896minus1

119866119894(119905119895minus1

)minus 119890

minus120573119896minus1

119866119894(119905119895) = 119890

minus120573119896minus1

119864[119866119894(119905119895minus1

)]

minus 119890minus120573119896minus1

119864[119866119894(119905119895)]+ 119900 (120573

119896minus1)

(47)

where 119866119894(119905) = sum

119905minus1

119904=1199050

119869119896minus1

(119904 119894)Under discrete time approximation the 119864[119868

119896minus1(119905 119894)]rsquos

have been derived in the appendix Using these results ofexpected numbers and using the resultsum119905minus1

119894=0119886119894= (119886

119905minus1)(119886minus

1) for 119886 = 0 we obtain

120573119896minus1

119864 [119866119894(119905)] =

119894+1

sum

119903=119894

119864 [119869119903(1199050 119894)] (

119896minus1

prod

119906=119903

120573119906)

times

119896minus1

sum

119907=119903

119860119903(119896minus1)

(119907)

119905minus1

sum

119904=1199050

(1 + 120574119907)119904minus1199050

= 120579(119894+1)(119896minus1)

120601(119894+1)(119896minus1) (119905) + 120582119894(119896minus1)120601119894(119896minus1) (119905)

(48)

where (120579119894(119896minus1)

119894 = 1 2 3) and (120579119894(119896minus1)

119894 = 0 1 2) are definedin Section 33 andwhere the120601

119894(119896minus1)(119905) (119894 = 0 1 2 3)rsquos are given

by

120601119894(119896minus1) (119905) =(

119896minus1

prod

119906=119894+1

120574119906)

119896minus1

sum

119907=119894

119860119894(119896minus1) (119907)

times1

120574119907

(1 + 120574119907)119905minus1199050

minus 1 119894 = 0 1 2 3

(49)

Notice that if 1205740= 0 then 120601

0(119896minus1)(119905) reduces to

1206010(119896minus1)

(119905) =(

119896minus1

prod

119906=1

120574119906)

119896minus1

sum

119907=1

1198601(119896minus1)

(119907)

times1

1205741199072(1 + 120574

119907)119905minus1199050

minus 1 minus (119905 minus 1199050) 120574

119907

(50)

Applying these results for time homogeneous modelswith 120574

119894= 120574119895if 119894 = 119895 the 119876

119894(119895)rsquos under discrete approximation

are given as follows

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 13

(1) If 119896 = 2 then 119896 minus 1 = 1 Hence 1198762(0) = 1 119876

119894(0) =

1198762(119895) = 0 for (119894 = 0 1 119895 gt 0) and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791112060111(119905119895minus1

)minus1205820112060101(119905119895minus1

)

minus119890minus1205791112060111(119905119895)minus1205820112060101(119905119895) + 119900 (120573

1)

1198761(119895) asymp (1 minus 120572

1) 119890

minus1205821112060111(119905119895minus1

)

minus119890minus1205821112060111(119905119895) + 119900 (120573

1)

(51)

(2) If 119896 ge 3 thenwe have 1198762(0) = 120575

2(119896minus1)120572119876

119894(0) = 0 119894 =

0 1 and for 119895 gt 0

1198760(119895) asymp 119890

minus1205791(119896minus1)

1206011(119896minus1)

(119905119895minus1

)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895minus1

)

minus119890minus1205791(119896minus1)

1206011(119896minus1)

(119905119895)minus1205820(119896minus1)

1206010(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198761(119895) asymp 119890

minus1205792(119896minus1)

1206012(119896minus1)

(119905119895minus1

)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895minus1

)

minus119890minus1205792(119896minus1)

1206012(119896minus1)

(119905119895)minus1205821(119896minus1)

1206011(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

1198762(119895) asymp 120575

2(119896minus1)(1 minus 120572) 119890

minus1205822212060122(119905119895minus1

)minus 119890

minus1205822212060122(119905119895)

+ (1 minus 1205752(119896minus1)

) 119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895minus1

)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895minus1

)

minus119890minus1205793(119896minus1)

1206013(119896minus1)

(119905119895)minus1205822(119896minus1)

1206012(119896minus1)

(119905119895)

+ 119900 (120573119896minus1

)

(52)

Notice that if one replaces [1 + 120574119907]119905minus1199050 by [1 + 120574

119907]119905minus1199050 =

119890(119905minus1199050) log(1+120574

119907)

asymp 119890(119905minus1199050)120574119907 the above 119876

119894(119895)rsquos from discrete

time model are exactly the same as from the continuousmodel respectively as given in equations (23)ndash(27) under theassumption that 119875

119879(119904 119905) = 1 for 119905 minus 119904 gt 0 Notice that the

assumption 119875119879(119904 119905) = 1 for 119905minus119904 gt 0 is equivalent to assuming

that the last stage cancer cells grow instantaneously intocancer tumors as soon as they are generated see Remark 1

5 The Fitting of the Model to CancerIncidence and the Generalized BayesianInference Procedure

Given the model in Sections 2 and 3 and cancer incidenceone may use results in Section 4 to fit the model By usingthis model and the distribution results in Section 4 one canreadily estimate the unknown genetic parameters predictcancer incidence and check the validity of the model byusing the generalized Bayesian inference and Gibbs samplingprocedures for more detail see Tan [3 22] and Tan et al[4 5]

The generalized Bayesian inference is based on the pos-terior distribution 119875Θ | NY

˜119910

˜119899 of Θ given NY

˜119910

˜119899

This posterior distribution is derived by combining the prior

distribution 119875Θ ofΘ with the joint probability distribution119875NY

˜119910 |

˜119899 Θ given

˜119899 Θ given by (42) It follows

that this inference procedure would combine informationfrom three sources (1) previous information and experiencesabout the parameters in terms of the prior distribution 119875Θof the parameters (2) biological information of inheritedcancer cases via genetic segregation of cancer genes inthe population (119875N |

˜119899 119901

119894 119894 = 1 2 see Section 2)

and (3) information from the expanded data (Y) and theobserved data (

˜119910) via the statistical model from the system

(119875Y˜119910 | N Θ) given by (37) and (40) Because of additional

information from the genetic segregation of the cancer genesthis inference procedure provides an efficient procedure toextract information of effects of genotypes of individuals atthe embryo stage

51 The Prior Distribution of the Parameters For the priordistributions of Θ because biological information has sug-gested some lower bounds and upper bounds for the muta-tion rates and for the proliferation rates we assume

119875 (Θ) prop 119888 (119888 gt 0) (53)

where 119888 is a positive constant if these parameters satisfy somebiologically specified constraints are and equal to zero forotherwise These biological constraints are as follows

(i) 0 lt 1199011lt 10

minus2 0 lt 1199012lt 10

minus6 and minus001 lt 120574119894lt 1 (119894 =

1 2)(ii) For 120573

119895(119895 = 0 1 119896 minus 1) we let 1 lt 120596

0= 119873(119905

0)120573

0lt

1000 (119873 rarr 1198681) and 10minus8 lt 120573

119894lt 10

minus3 119894 = 1 119896minus1(iii) For the 120582

119895(119896minus1)(119895 = 0 1 2) we let 0 lt 120582

119894lt 10 119894 = 0 1

and 0 lt 1205822lt 10

3(iv) For the 120579

119894(119894 = 1 2 3) we let 0 lt 120579

1lt 10

minus2 and 0 lt

1205792lt 10

minus1

We will refer to the above prior as a partially informativeprior which may be considered as an extension of thetraditional noninformative prior given in Box and Tiao [28]

52 The Posterior Distribution of the Parameters Given119884119873

˜119910

˜119899 Denote by Θ

1= Θ minus 119901

1 119901

2 120572 From the

posterior distribution 119875Θ | NY˜119910

˜119899 we obtain

119875 119901119894 119894 = 1 2 120572 | Θ

1NY

˜119910

˜119899

prop (120594 (119896))1199100

119890minus120594(119896)

119901sum119905119873

119895=11198991119895

1119901sum119905119873

119895=11198992119895

2

times (1 minus 1199011minus 119901

2)sum119905119873

119895=11198990119895

0 lt 1199011 119901

2 120572 lt 1

119875 Θ1| (119901

1 119901

2 120572) NY

˜119910

˜119899

prop

119905119873

prod

119895=1

2

prod

119894=0

119890minus119876119894(119895)119876

119894(119895)

119910119894119895

Θ1isin Ω

(54)

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

14 ISRN Biomathematics

where Ω is the parameter space of Θ1provided by the

biological constraints in Section 51For computational convenience we notice that the log of

1198751199011 119901

2 120572 | Θ

1NY

˜119910

˜119899 is proportional to the negative of

1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) similarly the log of

119875Θ1| (119901

1 119901

2 120572)NY

˜119910

˜119899 is proportional to the negative

of sum119896

119895=1119863119895given by (46)

53 The Multilevel Gibbs Sampling Procedure For EstimatingUnknown Parameters Given the posterior probability distri-butions we will use the following multilevel Gibbs samplingprocedure to derive estimates of the parameters We noticethat numerically the Gibbs sampling procedure given belowis equivalent to the EM-algorithm from the sampling theoryviewpoint with Steps 1 and 2 as the 119864-Step and with Steps 3and 4 as the119872-Step respectively [29]Thesemultilevel Gibbssampling procedures are given by the following

Step 1 (Generating N Given (Y˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 1)) Given Θ and given˜119899 use the multinomial

distribution of 1198991119895 119899

2119895 given 119899

119895in Section 3 to generate

a large sample of N Then by combining this large samplewith 119875Y

˜119910 | N

˜119899 Θ in (37) and (40) to select N through

the weighted bootstrap method due to Smith and Gelfand[30] This selected N is then a sample from 119875N | Y

˜119910

˜119899 Θ

even though the latter is unknown (For proof see Tan [22]Chapter 3) Call the generated sample N

Step 2 (Generating Y Given (N˜119910

˜119899 Θ) (The Data-Augmen-

tation Step 2)) Given

˜119910

˜119899 Θ and given N = N generated

from Step 1 generate Y from the probability distribution119875Y | N

˜119910

˜119899 Θ given by (36) and (38) Call the generated

sample Y

Step 3 (Estimation of 119901119894 119894 = 1 2 120572 Given Θ

1NY

˜119910

˜119899)

Given

˜119910

˜119899 Θ

1 and given (NY) = (N Y) from Steps 1

and 2 derive the posterior mode of 119901119894 119894 = 1 2 120572 by

maximizing the conditional posterior distribution 119875119901119894 119894 =

1 2 120572 | Θ1 N Y

˜119910

˜119899 Under the partially informative prior

this is equivalent to maximize the negative of the deviance1198630(119896) + Dev(119901

1 119901

2) given by (44)-(45) in Section 43 under

the constraints given in Section 51 Denote this generatedmode by 119901

119894 119894 = 1 2

Step 4 (Estimation of Θ1Given (119901

119894 119894 = 1 2 120572NY

˜119910

˜119899))

Given (

˜119910

˜119899) and given (NY 119901

119894 119894 = 1 2 120572) = (N Y 119901

119894 119894 =

1 2 ) from Steps 1ndash3 derive the posterior mode of Θ1

by maximizing the conditional posterior distribution119875Θ

1| 119901

119894 119894 = 1 2 N Y

˜119910

˜119899 Under the partially

informative prior this is equivalent to maximize the negativeof the deviancesum119896

119895=1119863119895in (46) under the constraints Denote

the generated mode as Θ1

Step 5 (Recycling Step) With (NY 119901119894 119894 = 1 2 120572 Θ

1) =

(N Y 119901119894 119894 = 1 2 Θ

1) given above go back to Step 1 and

continue until convergence

The proof of convergence of the above steps can bederived by using procedure given in Tan ([22] Chapter 3) Atconvergence the Θ = 119901

119894 119894 = 1 2 Θ

1 are the generated

values from the posterior distribution of Θ given

˜119910

˜119899

independently of (NY) (for proof see Tan [22] Chapter 3)Repeat the above procedures once then generate a randomsample ofΘ from the posterior distribution ofΘ given

˜119910

˜119899

then one uses the sample mean as the estimates of (Θ) anduse the sample variances and covariances as estimates of thevariances and covariances of these estimates

6 A NewMultistage Stochastic Model for AdultEye Cancer (Uveal Melanoma)mdashAn Example

The human eye cancers consist of pediatric eye cancers andadult eye cancers The most common pediatric eye cancer isthe retinoblastoma which develops from the retinal pigmentepithelium cells underlying the retina that do not formmelanoma The most common adult eye cancers are theuveal melanomas involving the iris the ciliary body andthe choroid (collectively referred to as the uveal) Thesecancers develop from melanocytes (pigment cells) whichreside within the uveal giving color to the eye In Tan andZhou [9] we have developed a modified two-stage model forretinoblastoma Based on results frommolecular biology (seeLandreville et al [14] Mensink et al [15] and Loercher andHarbour [31]) Landreville et al [14] have proposed a threestage model for uveal melanoma as given in Figure 2 As anexample of applications of this paper in this section we willapply this model of uveal melanoma to the NCINIH eyecancer data from the SEER project We notice that the samemethods can be applied to model other human cancers aswell but this will be our future research

Given in Table 1 are the numbers of people at risk andthe eye cancer cases in the age groups together with thepredicted cases from the models These data give cancerincidence at birth and incidence for 85 age groups (119896 =

85) with each group spanning over a 1-year period exceptthe last age group (ge85 years old) For human eye cancerbecause the incidence at birth and for age groups from 1to 10 years old is basically generated by the pediatric eyecancer-retinoblastoma (see [9]) to account for inheritedcancer cases of uveal melanomas the incidence for age 0(birth) and for age periods from 1 to 10 years old in Table 1for uveal melanoma is derived by subtracting incidence ofretinoblastoma from SEER data (see Tan and Zhou [9])

To fit the data we let one-time unit be 6 months afterbirth and let 119905

0= 1 To compare different models and to

assess different assumptions we will consider the following2-3-stage mixture models (1) the complete 3-stage mixturemodel (Model-F) in which no assumptions are made on theparameters (2) the Type-1ndash3-stage mixture model in whichwe assume that 120574

1= 0 and that normal people and 119869

1at

the embryo stage will remain normal people and 1198691people

respectively at birth (Model-1) For comparison purposes wealso fit a 2-stage model as defined in Tan and Zhou [9] Wewill apply the methods in Section 6 to fit these models to theSEER data given in Table 1

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 15

Table 2 The log-likelihood AIC and BIC of the fitted models

Models Log-likelihood AIC BICThree-stage

Model-F minus131202 264804 267735Model-1 minus129576 260953 263151

Two-stage minus4392421 8796843 8811569

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84Age (year)

0

50

100

150

200

250

300

350

Inci

denc

es

ObservedModel-F

Model-1Two-stage

Figure 5 Curve fitting of SEER data by the Model-F Model-1 andthe two-stage model

Given in Table 2 are the natural logs of the likelihoodfunctions the AIC (Akaike Information Criterion) and theBIC (Bayesian Information Criterion) for these modelsGiven in Table 3 are the estimates of parameters in the 3-stagemodels Given in Figure 5 are the plots of predicted cancercases from the 3-stage mixture models (Model-F and Model-1) and the 2-stagemodel For comparison purposes in Table 1we also provide numbers of predicted cancer cases from the3-stage mixture models and the 2-stage model together withthe observed cancer cases over time from SEER From theseresults we have made the following observations

(a) As shown by results in Table 1 and Figure 5 it appear-ed that both Model-F and Model-1 fitted the SEERdata well although Model-1 fitted the data slightlybetter from values of AIC and BIC The Chi-squaretest statistics 1205942 = sum

85

119894=0((119910

119894minus 119910

119894)2119910

119894) for Model-F

and Model-1 are given by 8843 and 9448 respec-tively giving a 119875-value of 012 (df = 86 minus 12 = 74)for Model-F and a 119875 value of 011 (df = 86 minus 7 = 79)for Model-1 On the other hand the 2-stage modelfitted the date very poorly the Chi-statistic value forthe 2-stage model is 274769 giving a 119875-value less than10

minus3The AIC (Akaike Information Criteria) and BIC(Bayesian Information Criteria) values ofModel-1 aregiven by (AIC = 260953 BIC = 263151) which areslightly smaller than those of Model-F respectivelyhowever the AIC and BIC values (879684 881157)of the two-stage model are considerably greater thanthose of the 3-stagemodels respectivelyThese resultssuggest that uvealmelanomamaybest be described bya 3-stage model with inherited component and that

one may practically assume 1205741= 0 and that normal

people and 1198691people at the embryo stage will remain

normal people and 1198691people respectively at birth

(b) From Table 3 it is observed that the estimate of 1205741is

close to zero (the estimate is of order 10minus5) indicatingthat the phenotype of 119869

1is almost identical to that

of 119873 = 1198690further confirming that the staging-

limiting genes are basically tumor suppressor genesand that there is no haploinsufficiency for these tumorsuppressor genes On the other hand the estimate of1205742is of order 10minus2 which is about 103 times greater

than those of cells with genotype 1198691

(c) From Table 3 the estimates 119895(119895 = 0 1 2) of 120582

119895

are of order 10minus1 10

minus1 10

2sim 10

3 respectively

Because 120582119894= 119864[119869

119894(1199050 119894)]120573

119894prod

2

119906=119894+1(120573

119906120574

119906) 119894 = 0 1 2

assuming some values of 119864[119869119894(1199050 119894)] 119894 = 0 1 2 from

some biological observations one can have somerough ideas about the magnitude of 120573

119895(119895 = 0 1 2)

For example if we follow Potten et al [32] to assume(119864[119873(119905

0)] = 119864[119869

119894(1199050 119894)] sim 10

8 119894 = 0 1 2) then 120573

119895asymp

10minus6sim 10

minus5(d) From Table 3 the estimates of 119901

1and 119901

2from the

SEER data are of orders 10minus4 sim 10minus3 and 10minus7 sim 10

minus6respectively This indicates that in the US populationthe frequency of the staging limiting cancer genefor uveal melanoma is approximately around 10

minus3Table 3 also showed that the estimate of 120572 was 08411indicating that most individuals with genotype 119869

2

would develop cancer at birth This may help toexplain why there are observed cancer incidencesat birth for uveal melanoma in the SEER data eventhough the estimate of the frequency 119901

2is of order

10minus7sim 10

minus6

7 Discussion and Conclusions

To account for inherited cancer cases in this paper wehave developed some general multistage models involvinghereditary cancer cases For human cancer incidence thesemodels are basically generalized mixture models In thesemixture models the mixing probability is a multinomialdistribution to account for genetic segregation of the staging-limiting tumor suppressor genes This mixture model allowsus to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations As anexample of applications in this paper we have developed ageneral 3-stage stochastic multistage model of carcinogenesisfor adult human eye cancer To account for inherited cancercases in the stochastic model of human eye cancer wehave also developed a generalized mixture model for uvealmelanoma in human beings

For using the proposed models to fit the cancer incidencedata in this paper we have developed a generalized Bayesianinference procedure to estimate the unknownparameters andto predict cancer cases This inference procedure is advanta-geous over the classical sampling theory inference (ie max-imum likelihood method) because the procedure combines

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

16 ISRN Biomathematics

Table3Estim

ates

ofparametersfor

the3

-stage

stochastic

mod

els

Parameters

1205820

1205821

1205822

1205741

1205742

1199011

1199012

1205721205791

1205792

Mod

el-F

Estim

ates

288119864minus01

481119864minus01

655119864+02

271119864minus05

337119864minus02

995119864minus04

968119864minus07

841119864minus01

329119864minus04

341119864minus01

StD

121119864minus01

465119864minus02

112119864+02

886119864minus06

192119864minus04

175119864minus06

648119864minus09

419119864minus02

354119864minus05

107119864minus02

95CL

-Low

er503119864minus02

389119864minus01

434119864+02

534119864minus06

333119864minus02

991119864minus04

955119864minus07

759119864minus01

260119864minus04

320119864minus01

95CL

-Upp

er525119864minus01

572119864minus01

875119864+02

401119864minus05

341119864minus02

998119864minus04

981119864minus07

923119864minus01

398119864minus04

362119864minus01

Mod

el-1

Estim

ates

655119864minus02

372119864minus01

198119864+02

NA

349119864minus02

998119864minus04

100119864minus06

807119864minus01

NA

NA

StD

254119864minus02

463119864minus02

338119864+01

NA

358119864minus03

627119864minus05

453119864minus08

706119864minus02

NA

NA

95CL

-Low

er157119864minus02

282119864minus01

132119864+02

NA

279119864minus02

875119864minus04

911119864minus07

668119864minus01

NA

NA

95CL

-Upp

er115119864minus01

463119864minus01

265119864+02

NA

419119864minus02

1121198640minus03

109119864minus06

945119864minus01

NA

NA

NoteNAassum

edno

nexiste

nce

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 17

information from three sources (1)previous information andexperiences about the parameters in terms of the prior distri-bution 119875Θ of the parameters (2) biological information ofinherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population and (3)

information from the expanded data (Y) and the observeddata (

˜119910) via the statistical model from the system (119875Y

˜119910 |

N Θ) given by (37) and (40)To illustrate the usefulness and applications of ourmodels

and methods we have applied our models and methods tothe eye cancer SEER data of NCINIH Our analysis clearlyshowed that the proposed 3-stage model with inheritedcancer cases fitted the data nicely (see Table 2 and Figure 5)on the other hand the classical 2-stage model cannot fit thedata at all (see Table 2 and Figure 5)These results clearly haveconfirmed results frommolecular biology that the human eyecancer is derived by a 3-stage model with inherited cancercomponent Notice however our 3-stage multistage modelis more general than the classical 3-stage model as describedin Little [1] Tan [2] and Zheng [7] in that we postulatethat cancer tumors develop from primary 119868

3cells by clonal

expansion (see Yang and Chen [11]) (Note that the stochasticmultistagemodels in the literature assume that cancer tumorsdevelop from last stage cells immediately as soon as theyare generated ignoring completely cancer progression seeRemark 1) As a matter of fact we had assumed Δ119905 = 1 fora period of three months and found that the 3-stage modelsthen did not fit the SEER data clearly indicating that119875

119905(119904 119905) lt

1 over a period of three monthsApplying our models and methods to the SEER data

of human eye cancer we have derived for the first timesome useful pieces of information Specifically we mention(1) for the first time that we have estimated the frequencyof the staging-limiting tumor suppressor gene in the USpopulation (119901

1sim 9948 times 10

minus3) (2) With the estimate of 120572as = 08411 the predicted number of uveal melanoma atbirth is 119910

0= 119899

011199012= 36 by 3-stage models with inherited

cancer component (Model-F and Model-1) (The observednumber of eye cancer at birth is 34) (3) The estimateof the proliferation rate (120574

1) of 119869

1cells using Model-F is

1205741

= 2271 times 10minus5

sim 0 (The estimate is 8603 times 10minus5

using Model-1) This confirms that the stage-1 limitinggene is a tumor suppressor gene and unlike the p53 genein chromosome 17p (see [33]) there is little or no haploidinsufficiency for this gene in cells with genotype 119869

1

Using models and methods of this paper one can easilypredict future cancer cases for human eye cancer Thus bycomparing results from different populations our modelsand methods can be used to assess cancer prevention andcontrol procedures This will be our future research topicswe will not go any further here

Appendix

The Expected Numbers of State Variablesunder Discrete Time Approximation

Under discrete time the stochastic differential equations ofstate variables reduce to the following stochastic difference

equations of state variables respectively

119869119894 (119905 + 1 119894) = 119869

119894 (119905 119894) [1 + 120574119894 (119905)]

+ 119890119894(119905 + 1 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 + 1 119906) = 119869

119895(119905 119906) [1 + 120574

119895(119905)] + 119869

119895minus1(119905 119906) 120573

119895minus1(119905)

+ 119890119895(119905 + 1 119906) 0 le 119906 lt 119895 le 119896 minus 1

(A1)

where 119890119894(119905 + 1 119894) = [119861

119894(119905 119894) minus 119869

119894(119905 119894)119887

119894(119905)] minus [119863

119894(119905 119894) minus

119869119894(119905 119894)119889

119894(119905)] and 119890

119895(119905+1 119894) = [119861

119895(119905 119894)minus119869

119895(119905 119894)119887

119895(119905)] minus [119863

119895(119905 119894)minus

119869119895(119905 119894)119889

119895(119905)] + [119872

119895minus1(119905 119894) minus 119869

119895minus1(119905 119894)120573

119895minus1(119905)] for 119895 gt 119894

The initial conditions at birth (1199050) for the above stochastic

difference equations are 119869119895(1199050 119894) gt 0 if (119895 = 119894 119894 + 1) and

119869119895(1199050 119894) = 0 if 119895 gt 119894 + 1 The solution of the above difference

equations under these initial conditions is given respectivelyby

119869119894(119905 119894) = 119869

119894(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119894(119904))

+

119905

sum

119904=1199050+1

119890119894(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119894(119906))

119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894)

119905minus1

prod

119904=1199050

(1 + 120574119895(119904))

+

119905minus1

sum

119904=1199050

119869119895minus1 (119904 119894) 120573119895minus1 (119904)

119905minus1

prod

119906=119904+1

(1 + 120574119895 (119906))

+

119905

sum

119904=1199050+1

119890119895(119904 119894)

119905minus1

prod

119906=119904

(1 + 120574119895(119906))

0 le 119894 lt 119895 le 119896 minus 1

(A2)

If the model is time homogeneous so that 120573119895(119905) =

120573119895 120574

119895(119905) = 120574

119895(119895 = 119894 119896 minus 1 and if 120574

119894= 120574119895if 119894 = 119895 then the

above solutions under the initial conditions (119869119895(1199050 119894) = 0 119895 gt

119894 + 1) reduce to

119869119894 (119905 119894) = 119869

119894(1199050 119894) (1 + 120574

119894)119905minus1199050

+ 120578(0)

119894(119905 119894) 119894 = 0 1 Min (119896 minus 1 2)

119869119895(119905 119894) = 119869

119895(1199050 119894) (1 + 120574

119895)119905minus1199050

+

119895minus1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895(119903) (1 + 120574

119903)119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 18: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

18 ISRN Biomathematics

=

119894+1

sum

119906=119894

119869119906(1199050 119894)

119895minus1

prod

119907=119906

120573119907

119895

sum

119903=119906

119860119906119895 (119903) (1 + 120574119903)

119905minus1199050

+ 120578(0)

119895(119905 119894) +

119895minus119894

sum

119906=1

119895minus1

prod

119903=119895minus119906

120573119903

120578(119906)

119895(119905 119894)

0 le 119894 lt 119895 le 119896 minus 1

(A3)

where 120578(0)

119895(119905 119894) = sum

119905

119904=1199050+1119890119895(119904 119894)(1 + 120574

119894)119905minus119904 120578

(119906)

119895(119905 119894) =

sum119905

119904=1199050+1119890119895minus119906

(119904 119894)sum119895

119907=119906119860119906119895(119907)(1 + 120574

119907)119905minus119904 (119906 = 1 119895 minus 119894)

Thus if the model is time homogeneous and if 120574119894=

120574119895if 119894 = 119895 the 119864[119869

119896minus1(119905 119894)]rsquos in discrete time models under

the initial conditions (119869119895(1199050 119894) = 0 119895 gt 119894 + 1) are given

respectively by

119864 [119869119896minus1

(119905 119894)] =

119894+1

sum

119906=119894

119864 [119869119906(1199050 119894)] (

119896minus2

prod

119907=119906

120573119907)

times

119896minus1

sum

119903=119906

119860119906(119896minus1)

(119903) (1 + 120574119903)119905minus1199050

119894 = 0 1 Min (119896 minus 1 2)

(A4)

References

[1] M P Little ldquoCancermodels ionization and genomic instabilitya reviewrdquo inHandbook of CancerModels with ApplicationsW YTan and L Hanin Eds chapter 5 World Scientific River EdgeNJ USA 2008

[2] W Y Tan Stochastic Models of Carcinogenesis Marcel DekkerNew York NY USA 1991

[3] W Y Tan ldquoStochastic multiti-stage models of carcinogenesis ashidden Markov models a new approachrdquo International Journalof Systems and Synthetic Biology vol 1 pp 313ndash337 2010

[4] W Y Tan L J Zhang and C W Chen ldquoStochastic modelingof carcinogenesis state space models and estimation of param-etersrdquoDiscrete and Continuous Dynamical Systems B vol 4 no1 pp 297ndash322 2004

[5] W Y Tan C W Chen and L J Zhang ldquoCancer biology cancermodels and stochasticmathematical analysis of carcinogenesisrdquoin Handbook of Cancer Models and Applications W Y Tan andL Hanin Eds chapter 3World Scientific River Edge NJ USA2008

[6] R A Weinberg The Biology of Human Cancer Garland Sci-ences Taylor and Frances New York NY USA 2007

[7] Q Zheng ldquoStochastic multistage cancer models a fresh lookat an old approachrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds chapter 2 WorldScientific River Edge NJ USA 2008

[8] W Y Tan L J Zhang W Chen and J M Zhu ldquoA stochasticmodel of human colon cancer involving multiple pathwaysrdquo inHandbook of Cancer Models with Applications W Y Tan and LHanin Eds chapter 11 World Scientific River Edge NJ USA2008

[9] W Y Tan and H Zhou ldquoA new stochastic model of retinoblas-toma involving both hereditary and non- hereditary cancercasesrdquo Journal of Carcinogenesis and Mutagenesis vol 2 no 2article 117 2011

[10] X W Yan Stochastic and State Space Models of carcinogenesisUnder Complex Situation [PhD thesis] Department of Math-ematical Sciences University of Memphsis Memphis TennUSA

[11] G L Yang and C W Chen ldquoA stochastic two-stage carcino-genesis model a new approach to computing the probabilityof observing tumor in animal bioassaysrdquo Mathematical Bio-sciences vol 104 no 2 pp 247ndash258 1991

[12] H Osada and T Takahashi ldquoGenetic alterations of multipletumor suppressors and oncogenes in the carcinogenesis andprogression of lung cancerrdquo Oncogene vol 21 no 48 pp 7421ndash7434 2002

[13] I I Wistuba L Mao and A F Gazdar ldquoSmoking moleculardamage in bronchial epitheliumrdquo Oncogene vol 21 no 48 pp7298ndash7306 2002

[14] S Landreville O A Agapova and J W Hartbour ldquoEmerginginsights into the molecular pathogenesis of uveal melanomardquoFuture Oncology vol 4 no 5 pp 629ndash636 2008

[15] H W Mensink D Paridaens and A De Klein ldquoGenetics ofuveal melanomardquo Expert Review of Ophthalmology vol 4 no6 pp 607ndash616 2009

[16] WY Tan andXWYan ldquoAnew stochastic and state spacemodelof human colon cancer incorporating multiple pathwaysrdquoBiology Direct vol 5 article 26 2010

[17] L B Klebanov S T Rachev and A Y Yakovlev ldquoA stochasticmodel of radiation carcinogenesis latent time distributions andtheir propertiesrdquo Mathematical Biosciences vol 113 no 1 pp51ndash75 1993

[18] A Y Yakovlev and A D Tsodikov Stochastic Models of TumorLatency and Their Biostatistical Applications World ScientificRiver Edge NJ USA 1996

[19] H Fakir W Y Tan L Hlatky P Hahnfeldt and R K SachsldquoStochastic population dynamic effects for lung cancer progres-sionrdquo Radiation Research vol 172 no 3 pp 383ndash393 2009

[20] A G Knudson ldquoMutation and cancer statistical study ofretinoblastomardquoProceedings of theNational Academy of Sciencesof the United States of America vol 68 no 4 pp 820ndash823 1971

[21] J F Crow and M Kimura An Introduction to PopulationGenetics Theory Harper and Row New York NY USA 1970

[22] W Y Tan Stochastic Models With Applications to GeneticsCancers AIDS and Other Biomedical Systems World ScientificRiver Edge NJ USA 2002

[23] W Y Tan CW Chen and L J Zhang ldquoCancer risk assessmentby state space modelsrdquo in Handbook of Cancer Models andApplications W Y Tan and L Hanin Eds Chapter 13 WorldScientific River Edge NJ USA 2008

[24] W Y Tan andCWChen ldquoStochasticmodeling of carcinogene-sis some new insightsrdquoMathematical and Computer Modellingvol 28 no 11 pp 49ndash71 1998

[25] W Y Tan and C W Chen ldquoCancer stochastic modelsrdquo inEncyclopedia of Statistical Sciences Revised edition John Wileyand Sons New York NY USA 2005

[26] E G Luebeck and S HMoolgavkar ldquoMultistage carcinogenesisand the incidence of colorectal cancerrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 23 pp 15095ndash15100 2002

[27] R Durrett J Mayberry S Moseley and D Schmidt ldquoProba-bility models for cancer development and progressionrdquo GoogleSearch Google 2012

[28] G E P Box and G C Tiao Bayesian Inference in StatisticalAnalysis Addison-Wesley Reading Mass USA 1973

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 19: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

ISRN Biomathematics 19

[29] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithm (withdiscussion)rdquo Journal of the Royal Statistical Society B vol 39pp 1ndash38 1977

[30] A F M Smith and A E Gelfand ldquoBayesian statistics withouttears a samplingresampling perspectiverdquoAmerican Statisticianvol 46 pp 84ndash88 1992

[31] A E Loercher and J W Harbour ldquoMolecular genetics of uvealmelanomardquoCurrent Eye Research vol 27 no 2 pp 69ndash74 2003

[32] C S Potten C Booth and D Hargreaves ldquoThe small intestineas amodel for evaluating adult tissue stem cell drug targetsrdquoCellProliferation vol 36 no 3 pp 115ndash129 2003

[33] C J Lynch and J Milner ldquoLoss of one p53 allele results in four-fold reduction of p53 mRNA and protein a basis for p53 haplo-insufficiencyrdquo Oncogene vol 25 no 24 pp 3463ndash3470 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 20: Research Article New Cancer Stochastic Models Involving ...downloads.hindawi.com/journals/isrn/2013/954912.pdf · how to develop stochastic models of carcinogenesis incor-porating

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of