optimization of dose schedules in radiotherapy · optimization of dose schedules in radiotherapy by...

95
Optimization of dose schedules in radiotherapy by Pierre Miasnikof A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Mechanical and Industrial Engineering University of Toronto Copyright c 2013 by Pierre Miasnikof

Upload: vuongdiep

Post on 19-Mar-2019

220 views

Category:

Documents


1 download

TRANSCRIPT

Optimization of dose schedules in radiotherapy

by

Pierre Miasnikof

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Mechanical and Industrial EngineeringUniversity of Toronto

Copyright c© 2013 by Pierre Miasnikof

Abstract

Optimization of dose schedules in radiotherapy

Pierre Miasnikof

Master of Applied Science

Graduate Department of Mechanical and Industrial Engineering

University of Toronto

2013

Purpose: Fractionation in radiotherapy is the scheduled break up of a total treatment

dose into individual doses. The goal of this thesis is to seek a mathematically optimal dose

schedule, in the context of a biological tissue dose-response model, the linear-quadratic

function.

Methods: We examined the mathematical properties of the fractionation problem in

the context of an arbitrary number of sensitive-structure constraints and determined the

properties of the optima. We also implemented a numerical search technique to solve the

problem.

Results: On the theoretical side, we confirmed and extended the results in the literature.

We showed the optima always occur at the intersection of two or more constraints or at

the equal dose per fraction point (or at any arbitrary feasible point on the boundary,

which includes the two points just mentioned). On the numerical side, we successfully

implemented a simulated annealing algorithm to our problem

ii

Acknowledgements

I would like to begin by expressing my deep gratitude to Doctor Harald Keller, my co-

supervisor. Dr. Keller is a practicing medical physicist at the Princess Margaret Hospital

and a professor in the Radiation Oncology Department of the Faculty of Medicine at the

University of Toronto. He provided me with a research topic and his work on fractiona-

tion formed the foundation for this thesis. I must also highlight Dr. Keller’s commitment,

generous sharing of expertise on the topic, scientific guidance, continuous feedback, sup-

port and warm encouragements, throughout this research endeavor. Without him, this

thesis would not have been possible.

I thank my supervisor, Professor Dionne Aleman who welcomed me into her lab, provided

direction, guidance, feedback and carefully reviewed the contents of this thesis. I also

thank the members of the committee, Professor Michael Carter and Professor Timothy

C. Y. Chan of the Mechanical and Industrial Engineering Department, for taking the

time to review my work and for their helpful comments.

Professor Matt Davison of the Applied Mathematics Department of the University of

Western Ontario provided helpful comments and shared his expertise on this topic, on

many occasions.

Special thanks go to Dominic Dotterrer of the Mathematics Department at the Univer-

sity of Toronto. Dominic’s advice on the geometry of the fractionation problem and

suggestions regarding the change of coordinate system were invaluable and necessary for

the completion of this thesis and derivation of theoretical results.

All errors, typos, inaccuracies are entirely mine.

iii

Contents

1 Introduction 1

1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature review 4

2.1 Dose fractionation in radiotherapy . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Global optimization search techniques . . . . . . . . . . . . . . . . . . . . 10

3 The fractionation problem 17

4 Traditional mathematical programming approach 23

4.1 The Karush-Kuhn-Tucker conditions . . . . . . . . . . . . . . . . . . . . 24

4.2 First-order enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Convex maximization, a counter-example . . . . . . . . . . . . . . . . . . 29

5 Properties of the fractionation problem 33

5.1 Known analytic optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Change of coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Generalized properties of the optima . . . . . . . . . . . . . . . . . . . . 40

6 Simulated annealing 48

iv

7 Numerical results 54

7.1 Benchmark cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.2 Comparison of interior and boundary search algorithms . . . . . . . . . . 56

7.3 Parameter selection and solution quality . . . . . . . . . . . . . . . . . . 58

7.4 Single tumor, single sensitive structure case . . . . . . . . . . . . . . . . 64

7.5 Multiple sensitive structures case . . . . . . . . . . . . . . . . . . . . . . 66

7.6 Time-varying radiosensitivity parameters . . . . . . . . . . . . . . . . . . 69

8 Discussion and conclusions 74

8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8.2 Suggestion for future work . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A Alternate solution techniques investigated 77

A.1 Steepest descent/ascent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Non-linear projected gradient . . . . . . . . . . . . . . . . . . . . . . . . 78

Bibliography 79

v

List of Tables

7.1 EQDF treatment benchmark test case, αT

βT> αS

ωβS. . . . . . . . . . . . . . 55

7.2 Single-dose treatment benchmark test case, αT

βT< αS

ωβS. . . . . . . . . . . 56

7.3 Mean percent deviation from analytic solution of objective function yielded

by regular and boundary search algorithms . . . . . . . . . . . . . . . . . 58

7.4 Two-sample t-test, null hypothesis of equal means, regardless of starting

point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.5 Solution accuracy measured with respect to known optimum . . . . . . . 66

7.6 Numerical results, multiple constraint EQDF cases, second constraint dom-

inates. BFSD denotes the value of the objective at the best feasible single-

dose solution given the set of constraints and BFEQDF denotes the value

of the objective at the best feasible equal-dose solution given the same set

of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.7 Numerical results, multiple constraint EQDF cases, seventh constraint

dominates. BFSD denotes the value of the objective at the best feasi-

ble single-dose solution given the set of constraints and BFEQDF denotes

the value of the objective at the best feasible equal-dose solution given the

same set of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

vi

7.8 Numerical results, shifting dominant constraint, single-dose cases, second

then twelfth constraint dominate respectively. BFSD denotes the value

of the objective at the best feasible single-dose solution given the set of

constraints and BFEQDF denotes the value of the objective at the best

feasible equal-dose solution given the same set of constraints . . . . . . . 71

7.9 Numerical results, multiple dominant constraints, EQDF case. BFSD de-

notes the value of the objective at the best feasible single-dose solution

given the set of constraints and BFEQDF denotes the value of the objec-

tive at the best feasible equal-dose solution given the same set of constraints 72

vii

List of Figures

3.1 Contours of two different linear-quadratic functions, circles centered in the

third quadrant, in a two-fraction case (R2) . . . . . . . . . . . . . . . . . 19

3.2 (a) Feasible set with more than one sensitive structure LQ constraint, with

one dominant constraint (in blue) and (b) dominant constraints intersect-

ing at arbitrary points (also in blue) . . . . . . . . . . . . . . . . . . . . 21

4.1 Simple convex maximization case, with a single decision variable . . . . . 30

5.1 Two possible optimal solutions in the single sensitive structure constraint

case, (a) the single dose treatment and (b) the equal dose per fraction

treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2 Change of coordinates example in R2 . . . . . . . . . . . . . . . . . . . . 39

6.1 Penalty scheme on interior and exterior of the feasible set . . . . . . . . . 51

7.1 Dose per fraction versus known optimal dose, regular and boundary algo-

rithms (single sensitive structure constraint) . . . . . . . . . . . . . . . . 57

7.2 Effect of initial temperature (T0) on mean percent divergence from known

optimum (δ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.3 Effect of search radius (r) on mean percent divergence from known opti-

mum (δ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

viii

7.4 Effect of penalty coefficient (P ) on mean percent divergence from known

optimum (δ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.5 Effect of number of iterations (N) on mean percent divergence from known

optimum (δ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.6 Comparison of simulated annealing dose per fraction and known optimal

dose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.7 Doses per fraction with time varying radiosensitivity parameters, start-end

trend line in black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

ix

Chapter 1

Introduction

Every day, clinical practitioners involved in radiation therapy treatments must decide

how to best divide a global treatment dose into a number of fractional dose increments.

In doing so, they must balance the need to destroy tumor cells with the need to preserve

healthy sensitive structures which also receive radiation, during treatment. Currently,

there are no formal universally accepted benchmarks for doing fractionation and dose-

scheduling is left to the discretion of the practitioner. In the past, fractionation of

treatment was the subject of much debate, but in recent practice fractionation schedules

are mostly developed by trial and error and are often defined by consensus. In clinical

practice, typical radiotherapy treatments involve the total prescription dose being broken

up either into 35 to 40 fractions of equal doses or in a few (one to three, generally) fractions

with large doses (e.g., radiosurgery treatment).

During the last decade however, novel technology allowed patients to be treated much

more precisely than in the past. Patient setup can now be monitored and corrected daily

and dose distributions can be made to conform very closely to the shape of the target.

At the same time, these advances allow better sparing of normal tissues and have revived

1

Chapter 1. Introduction 2

an old debate on fractionation.

In this thesis, we aim to mathematically derive the optimal dose per fraction, which

maximizes tumor cell kill while keeping the effects of dosage to heathy structures below

a predetermined level. We use the yet to be published work of Keller and Davison (2008)

and the work of Mizuta et al. (2012) as a methodological basis for our formulation.

1.1 Organization

We begin by briefly reviewing the history of radiotherapy fractionation, the radiobiologi-

cal basis for fractionation, including the widely used linear-quadratic model, and attempts

at formally arriving at an optimal fractionation scheme. We also examine different global

optimization search algorithms as possible tools for solving difficult optimization prob-

lems, in general, and our specific fractionation problem, in particular. We then pose the

fractionation problem as it exists currently in the literature and attempt to generalize its

formulation, while examining the mathematical properties of the optima. In the course of

doing so, we demonstrate why traditional mathematical programming techniques do not

apply to our specific problem and why numerical search techniques, specifically simulated

annealing, are required to attain a numerical solution.

Finally, we explore various simulated annealing parameter settings and tailor a simulated

annealing algorithm that best suits our particular problem. We apply our algorithm on

a set of sample cases to assess solution quality and explore actual dosage results.

Chapter 1. Introduction 3

1.2 Contribution

In the most recent literature, the fractionation problem is formulated in the context of a

dosage delivered to a tumor, constrained by a single healthy structure adversely affected

by this dose and a tumor with constant sensitivity to radiation. We aim to generalize

this problem formulation to a more realistic case where the doses delivered to the tumor

are constrained by a finite arbitrary number of healthy structures and where the tumor’s

radiosensitivity may evolve over the course of the treatment.

A numerical search algorithm was also developed to solve this more generalized formula-

tion and to find solutions in the case where the tumor’s reaction to radiation may evolve

over the course of treatment.

Finally, a clarification to the theorem presented in Keller and Davison (2008) and a

generalization of it to the case of an arbitrary number of healthy structures that impose

dosage constraints are offered.

Chapter 2

Literature review

2.1 Dose fractionation in radiotherapy

Fractionation, the scheduled break-up of a radiotherapy treatment into a set of treatment

increments, has a long history dating back to the 19th century. Thames (1992) lays out

a detailed history of fractionated radiation treatment, going back to the 1890’s. In 1896,

a physician by the name of Leopold Freund treated a patient afflicted with a hairy

nevus by administering a low daily dose of X-rays, over two weeks. At about the same

time, in Sweden, Thor Stenbeck reported using a similar technique to treat cases of

skin cancer. Unfortunately, the actual doses delivered were unknown, due to a lack of

adequate dosimetric technology at the time.

Around 1902, beginning with the work of Holzknecht and later around 1909 with the work

of Villard, the technology to accurately measure the doses delivered began to appear. This

new technology not only allowed therapists to measure dose, but provided them with the

ability to deliver higher doses in fewer sessions. Thus was born the fractionation debate.

Indeed, throughout the 20th century, a persistent debate over the relative merits of large

4

Chapter 2. Literature review 5

doses over fewer fractions versus smaller doses over longer treatments endured. This

debate perdured, even if each school of thought claimed its approach to be rooted in

human biology (Miles and Lee, 2008).

To this day, the debate remains, in spite of the tremendous advancements in radiotherapy

we have witnessed since this debate began. Although fractionation (as opposed to single

treatment) seems to be the dominant school of thought, no consensus exists on the

precise manner in which dosage should be administered. For example, Miles and Lee

(2008) claim hypofractionation, a shortened higher dose per fraction treatment, may be

advantageous in some prostate cases. As recently as 2010, Ferreira et al. (2010) conducted

a retrospective evaluation of different IMRT fractionation schedules of head and neck

tumors to assess their comparative advantages. Even if their study seems to suggest

shorter higher dose per fraction treatments yield better tumor control probabilities, their

results remain preliminary and are based on a very small sample size (seven patients).

In the late 20th and early 21st century, some work focuses on building a formal radiobi-

ological framework for evaluating the effects of fractionation. Fowler (1989) reviews the

link between treatment time, tissue and tumor biology, on the one hand, and the efficacy

of treatment schedules, on the other. He links fractionation schedules that account for

radiobiological factors such as dose-delivery over time and the role they play in the clin-

ical success of a treatment. He also mentions that early, but not late, adverse reactions

can be reduced or eliminated by extending treatment times.

Later, in Fowler (1992), the author updates and adds to his previous work. He also

summarizes the main facets of the radiobiological basis for fractionated treatment and

identifies the following important facts:

• Late and early complications in normal tissue are made worse by large doses per

fraction;

Chapter 2. Literature review 6

• 2 Gy per fraction will sterilize approximately half the cells in most tumors;

• The effect of radiation damage can be assessed using an existing model (linear-

quadratic model).

Lee et al. (1998) examine the effects of treatment duration, total treatment dose and

fractionation on post-treatment cerebral necrosis. They found the fractionation scheme

to be the most determining factor in predicting post-treatment cerebral necrosis. For

example, in a sample of over 1,000 patients suffering from nasopharyngeal carcinoma,

they found that patients who received 60 Gy in total in 2.5 Gy fractional increments had

lower rates of temporal lobe necrosis than those who received a total dose of 50.4 Gy in

4.2 Gy increments.

Mavroidis et al. (2001) develop the concept of biologically effective uniform dose, an as-

sumption that two dose distributions are equivalent if they achieve similar tumor control

probabilities. Nevertheless, despite this biological equivalence, they insist that one needs

to account for the fractionation schedule when comparing treatments. They claim that

the dose-response parameters of the tissues depend on how the treatment was delivered.

Bourhis et al. (2006) conduct a meta-analysis of accelerated and hyperfractionated ra-

diotherapy treatment of head and neck cancer. They analyze 15 trials with over 6,500

patients and conclude that in comparison to conventional radiotherapy both accelerated

treatments (shorter higher dose per fraction) and hyperfractionated treatments (multiple

treatments per day) lead to greater tumor control probabilities. They also conclude that

hyperfractionated treatment schedules provide the greatest benefit.

Ma et al. (2010) study equivalent uniform biologic effective dose (EUBED) on normal

brain tissue under different fractionation schemes (1-30 fractions) for three different treat-

ment modalities (Gamma Knife, Cyberknife and a Novalis LINAC-based system). They

find that according to the tumor’s radiosensitivity characteristics the fractionation sched-

Chapter 2. Literature review 7

ule is a significant factor in determining normal tissue sparing, regardless of the treatment

modality.

To model the effects of radiation on the tissues, the linear-quadratic model was developed

in the early 1980’s. Since then, it has held up well in the face of empirical validation

(Fowler, 1989) and seems to be, to this day, the most widely employed model for predict-

ing the dose-time relationship and evaluating the effect of different dose-per-fractionation

schedules (Brenner, 2008).

Many authors have made use of and modified the linear-quadratic model, to achieve

various goals (Brenner et al., 1998; Yang and Xing, 2005; Keller and Davison, 2008;

Mizuta et al., 2012), but they all share a common model core. At its core, the linear-

quadratic model describes what is referred to as the radiobiological effect E, as a second

degree polynomial function called the linear-quadratic (LQ) function of the total dose d

delivered to tissues over the course of a treatment.

E = αd+ βd2

The coefficients α, β are referred to as the radiosensitivity parameters or radiosensitivity

coefficients of a given tissue structure. In a mechanistic interpretation, these parameters

describe different damage to the cell’s DNA. Naturally, both these coefficients are positive,

since a radiation dose must have some tissue damage effect. A negative coefficient would

imply an inverse relationship between dose and cell death (more dose, less cell damage).

The cell kill (CK) caused by a radiation treatment is the exponential of the effect E:

CK = exp (E) = exp(αd+ βd2)

According to Fowler (1992) the dose effect is additive over multiple fractions, in the

linear-quadratic model. Therefore, the total effect over the course of a treatment of N

Chapter 2. Literature review 8

fractions can be expressed as

E = α

N∑i=1

di + β

N∑i=1

d2i

Conversely, using this model, we can also evaluate a cell’s survival fraction (SF) over the

course of a treatment as a function of the doses received in each fraction:

SF = exp(−E) = exp

(−α

N∑i=1

di − βN∑i=1

d2i

)For the purpose of predicting the effects of radiation on tumor and normal tissues, the

linear-quadratic model has been evaluated rigorously. It was found to be accurate, robust,

based on solid theoretical foundations and submitted to empirical validation (Fowler,

1989; Brenner et al., 1998; Brenner, 2008) for typical dose ranges.

The linear-quadratic model does have its limitations, nonetheless, and is still the subject

of much debate. Although it can reliably be applied to doses in the range of 2-10 Gy per

fraction, it becomes less accurate at higher doses, in the 15-18 Gy range (Brenner, 2008),

and is found to be inaccurate for the larger doses required in radiosurgery (Kirkpatrick

et al., 2008).

While the use of mathematical optimization to formally determine a treatment plan’s

overall beam-dosage in radiotherapy has been addressed by a very large number of au-

thors, very few papers address the specific problem of determining the optimal dose for

each fraction. For example, Romeijn et al. (2006) consider the problem of designing a

fraction within a treatment plan, with known and pre-determined beam orientations, by

computing the optimal beamlet weights. Chu et al. (2005); Chan et al. (2006) focus on

optimizing a fraction under uncertainty due to motion. Unfortunately, not a lot of work

has been done to examine the optimal delivery schedule of the dose over the course of

the entire multi-fraction treatment.

Only a few publications to date attempt to solve the specific problem of determining the

optimal dose for each fraction. Levin-Plotnik and Hamilton (2004) consider a tumor’s

Chapter 2. Literature review 9

tissue density to find the dose schedule that maximizes tumor-control probability (TCP)

for a fixed mean-dose per fraction (i.e., a fixed total treatment dose). They conclude that

for homogeneous cell densities a homogeneous dose schedule maximizes the TCP, while

for heterogeneous tumor density they conclude an initial inhomogeneous dose will homog-

enize the tumor density and all subsequent doses can then be homogeneous. Yang and

Xing (2005) use the linear-quadratic model to develop a “tumor-biology specific” dose-

schedule that maximizes the tumor’s biologically effective dose (BED), while keeping the

heathy structure’s BED constants. Their problem formulation involves only constraints

from a single sensitive structure and fractional dose constraints. Aleman et al. (2007)

integrate the time dimension within the IMRT plan optimization by taking a physical

view of the problem and solving for the individual beamlet weights in each fraction of the

treatment. Hoffmann et al. (2008) address the fractionation effect on tumor-control and

normal-tissue complication probabilities, but do not address the problem of determining

the optimal dose per fraction.

In a different problem formulation, Keller and Davison (2008); Mizuta et al. (2012) com-

pute instead the optimal dose per fraction, for the case of a single tumor and single sen-

sitive structure constraint, within the framework of a time-independent linear-quadratic

model, without imposing any fractional dose constraintsl. Keller and Davison (2008)

solve the optimal fractionation problem through dynamic programming and by the for-

mulation of a theorem. Their theorem identifies two possible optimal schedule cases, the

single-dose treatment schedule, in which the entire treatment dose is delivered in a single

fraction, and the equal dose per fraction treatment schedule (EQDF), in which the treat-

ment is broken down into fractions of equal dose. The optimality of each dose-schedule

depends on the alpha-beta ratio of the the tumor and sensitive structure linear-quadratic

functions. Without resorting to dynamic programming to validate their claims numeri-

cally, Mizuta et al. (2012) come to the same conclusions. This framework was recently

applied to more general sensitive tissue constraint functions, such as a normal-tissue

Chapter 2. Literature review 10

complication probability (NTCP) model (Keller et al., 2012).

The bulk of the work presented in this document focuses on suggesting alternative solu-

tion approaches to the optimization problem posed in Keller and Davison (2008) and in

generalizing it to the multiple-tumor and multiple-sensitive structure case.

2.2 Global optimization search techniques

Later in this thesis we will show that traditional mathematical programming techniques

do not apply to our specific fractionation problem. For this reason, we explored global

optimization techniques.

When it is hard or impossible to obtain analytical solutions to an optimization problem,

there exists a vast array of global optimization search techniques to approximate a glob-

ally optimal solution. While each of these techniques has its own specificities, they all

explore the feasible set of a problem in a systematic manner, typically without relying on

the objective function’s derivative. We briefly reviewed five common techniques, random

search, tabu search, genetic algorithms, GRASP and simulated annealing, to assess their

suitability for our problem. We also examined the extensive applications of simulated

annealing in the medical physics literature and to radiotherapy problems, in particular.

Random search is a global optimization algorithm that consists of exploring the feasible

set randomly, by following a uniform probability, and retaining the best solution (Brown-

lee, 2011). The biggest shortcoming of random search is that it is a memoryless search,

which may revisit points in the feasible set. Because of this feature of the search strategy,

in the worst-case, performance may be worse than complete enumeration, since the same

points may be visited more than once. In spite of this worst-case behavior, random search

has been found to be an acceptable solution tool for some mixed-integer problems (Nelson

Chapter 2. Literature review 11

and Brodie, 1990). On the other hand, Westhead et al. (1997) compared performances

of random search, tabu search, genetic algorithms, evolutionary programming (another

search technique) and simulated annealing, on a molecular-docking problem, and found

random search provided the worst performance of all techniques. Most applications of

random search typically involve unconstrained problems, but they have also been success-

fully applied to constrained problems, as well (Solis and Wets, 1981; Niemierko, 1992).

Notably, Niemierko (1992); Niemierko et al. (1992) used random search to solve a con-

strained beam weight selection problem. They reported clinically valid, although not

provably optimal, solutions within a few minutes of runtime.

Tabu search is a “meta-heuristic” algorithm, a heuristic algorithm applied on top of an-

other heuristic algorithm. It is typically overlaid on another search technique, such as a

traditional hill-climbing algorithm, for example (Brownlee, 2011). It enhances the search

by ensuring that recently visited points are not revisited before some time has passed, by

maintaining a “tabu list”. The tabu list addresses the shortcomings of the random search

technique, just described (Brownlee, 2011). Westhead et al. (1997), in their comparisons

of search techniques on a molecular-docking problem, reported tabu search returned rea-

sonable and comparable results with most of the other search techniques they considered,

although it was found to be clearly superior to the random search technique. Kapamara

et al. (2006) found tabu search outperformed other search techniques on job-shop prob-

lems. While tabu search was developed initially for combinatorial problems, it has also

been applied to continuous ones (Chelouah and Siarry, 2000). In fact, it was success-

fully applied to various problem formulations in conformal radiotherapy, including beam

weight selection, number of beams, beam direction (Gilio, 1997).

Genetic algorithms fall into the category of evolution-inspired algorithms that mimic

biological processes and which are broadly categorized as evolutionary algorithms. These

algorithms explore the solution space by creating a population of candidate solutions

Chapter 2. Literature review 12

and combining these candidates, in order to obtain a better solution, thus mimicking

genetic mechanisms (Michalewicz, 1995; Brownlee, 2011). Points in the feasible space are

encoded as binary strings. The most promising points are combined to form new points,

which is where they get their name. The general assumption is that by combining the

characteristics of promising points in the feasible set we get points that represent better

solutions. In their comparisons, Westhead et al. (1997) found genetic algorithms to

perform similarly to most of the other techniques they examined. In the past, genetic

algorithms have been used to solve (photon) beam selection problems in radiation therapy

within clinically acceptable time frames (Ezzell, 1996; Li et al., 2004) More recently, Cao

et al. (2012) also obtained good results with proton beam problems.

Greedy randomized adaptive search procedure (GRASP) is a commonly used meta-

heuristic technique in combinatorial optimization. For example, Petrovic and Leite-

Rocha (2008) applied this technique to a radiotherapy appointment scheduling problem.

GRASP is a combination of greedy search and local search techniques. It begins by

selecting a set of candidate points, and then applying a greedy search. Greedy search

is a short-sighted search strategy that moves from point to point by selecting the point

which provides the highest payoff among possible moves. The GRASP procedure then

applies local search, a series small perturbations on the candidates points obtained by

greedy search, to explore possible improvements upon these candidate solutions (Hirsch

et al., 2007; Brownlee, 2011). Although used primarily for combinatorial problems, this

method has also been applied to continuous problems (Hirsch et al., 2007).

Simulated annealing (Bertsimas and Tsitsiklis, 1993; Russell and Norvig, 1995; Brownlee,

2011) is a well-known global optimization technique. It is a variant of “hill-climbing”

that does not systematically apply a greedy logic to determine a move from the current

point to the next point. In simulated annealing, we search the feasible set by moving

from a current point to a new point according to the following set of search rules. If

Chapter 2. Literature review 13

the new point in our search improves the objective function the move is automatically

accepted. In the case where the new point does not improve the objective function, it

is accepted if a random draw is lower than an evolving probability of acceptance. This

acceptance procedure allows simulated annealing to break out of local optima.

Most of the simulated annealing literature is illustrated with basic unconstrained com-

binatorial problems. However, many authors (Morrill et al., 1991; Romeijn and Smith,

1994; Wah and Wang, 1999; Wah and Chen, 2000; Miki et al., 2006) have successfully

applied simulated annealing to continuous and constrained problems.

Simulated annealing is very commonly used in the medical physics literature. Although

no theoretical justification is made for this choice of solution technique, good numerical

results have been reported. In the past, it has been applied to solve various radiother-

apy problems. For example Webb (1989); Mageras and Mohan (1993) used simulated

annealing to determine beam weights. Morrill et al. (1991); Aleman et al. (2008) used

simulated annealing to determine beam angles and weights and reported reasonable run

times. Bortfeld and Schlegel (1993) showed the need for search techniques to solve non-

convex problems and then applied simulated annealing to a beam orientation problem.

Cao et al. (2012) used simulated annealing and genetic algorithms as benchmarks to

compare their own solution technique for a beam angle optimization problem. Simulated

annealing techniques have also been applied to a (different from ours) formulation of the

dose fractionation problem (Yang and Xing, 2005).

Westhead et al. (1997), in their comparison of various search algorithms, found the nu-

merical performance of all the techniques to be roughly equivalent, except for random

search which they found inferior to the other techniques they considered. Rossi-Doria

et al. (2003) compared the performance of different metaheuristics on various instances of

a timetabling problem. The metaheuristic techniques they compared were evolutionary

algorithms, ant colony optimization, iterated local search, simulated annealing, and tabu

Chapter 2. Literature review 14

search. In the end, they concluded that it was impossible to determine which technique

would offer the best performance for all instances of their benchmark problem. Kapamara

et al. (2006) compared various search techniques on a scheduling problem. They com-

pared the performance of branch-and-bound, simulated annealing, tabu search, GRASP

and genetic algorithms and found tabu search outperformed the other techniques. In

another comparative study, Perez and Basterrechea (2007) examined the performance of

variations of genetic algorithms, simulated annealing and particle swarm optimization,

on antenna measurement problems. They found that particle swarm optimization and

simulated annealing techniques were the best suited to their problems. Cao et al. (2012)

also compared the performance of various search techniques in the solution of a beam

angle problem and found no significant differences in the performance.

Finally, in order to put the conclusions of Westhead et al. (1997); Rossi-Doria et al.

(2003); Perez and Basterrechea (2007); Cao et al. (2012) in proper context, it is impor-

tant to mention that all global optimization search algorithms have their strengths and

weaknesses. At present, no algorithm has been found to be generally superior or even su-

perior for a specific class of problems. Some algorithms perform better on some particular

problems, while other algorithms perform better on other specific problems. This char-

acteristic is known as the “no free lunch” property of search algorithms in optimization

(Wolpert and Macready, 1997; Brownlee, 2011, p. 14).

In summary, while it is virtually impossible to select the “best” global optimization search

techniques for all cases, it may be useful to remember the following characteristics:

• Random search

– Cycling possible

– May get caught in local optimum

Chapter 2. Literature review 15

– Bad worst-case behavior

• Tabu search

– Avoids cycling, by maintaining tabu list

– Maintenance of tabu list may impose computational overhead, especially when

decision variables are multi-dimensional

– Tabu is generally overlaid on a hill-climbing heuristic, which may get stuck in

local optima

• Genetic algorithms

– Rely on the assumption that combining good points of the feasible space will

lead to better ones

– As a result, may get caught in local optima

• GRASP

– Combines greedy search and local search heuristics

– Both are susceptible to getting stuck in local optima

• Simulated annealing

– Known to converge to global optimum asymptotically

– Able to break out of local optima, so it is well suited when the shape of the

search space is unknown or hard to predict

– Widely used in the medical physics literature, e.g., previously used for IMRT

optimization and fractionation

Chapter 2. Literature review 16

Because of its previous use in the medical physics literature and convergence properties,

simulated annealing was the numerical technique of choice for this thesis.

Chapter 3

The fractionation problem

We begin this chapter by introducing the base case of single tumor and single sensitive

structure problem posed by Keller and Davison (2008); Mizuta et al. (2012), which is

the most current version of the fractionation optimization problem. We then generalize

this problem to the more realistic case of a treatment constrained by multiple sensitive

structures and the case of time-dependent radiosensitivity parameters.

In the literature, the most current formulation of the fractionation problem is a problem

in which we seek the dose per fraction schedule that maximizes the effect of radiation

on a tumor, thus also maximizing a tumor’s cell kill, while keeping the effect (and cell

kill) imposed on a single sensitive structure to a predetermined level, for a treatment of

known length (known number of fractions) N (Keller and Davison, 2008; Mizuta et al.,

2012). Note here that if a fractional dose di is equal to zero, the number of fractions

where dose is actually being delivered is less than N . In that sense, N is an upper bound

for the number of fractions in which dose is delivered.

For both the tumor and sensitive structure cell kill is given by the linear-quadratic model.

For mathematical clarity and numerical stability, these authors work with the LQ func-

17

Chapter 3. The fractionation problem 18

tion, the natural logarithm of the cells’ cell kill function, the effects function seen in

Section 2.1. For a treatment of N fractions, the problem is expressed as follows (Keller

and Davison, 2008; Mizuta et al., 2012):

maximize f(~d) =N∑f=1

(αTdf + βTd

2f

)(3.1)

subject to (3.2)

g(~d) =N∑f=1

(αSωdf + βSω

2d2f

)= K (3.3)

df ≥ 0 ∀f = 1, . . . , N (3.4)

The parameter ω(≥ 0) is the “sparing factor” of the sensitive structure. It indicates the

amount of radiation, ωdi, that is delivered to the sensitive structure when the adjacent

tumor is administered a radiation dose di.

Before moving forward, we must mention that the linear-quadratic model with constant

positive parameters has two important mathematical properties with respect to the frac-

tional doses. Keller and Davison (2008) describe these properties in detail and exploit

them in their work. First, the order of the doses does not matter. Two dose regimens

with the same overall dosage breakdowns have the same effect, regardless of the dose

ordering in time, for example, α(d1 + d2) + β(d21 + d2

2) = α(d2 + d1) + β(d22 + d2

1). Second,

the quadratic term gives higher weight to large fractional doses, regardless of the total

dose. For example, we have α(5 + 5) +β(52 + 52) ≤ α(10 + 0) +β(102 + 0), with α, β ≥ 0.

Although we used a simple two-fraction example to illustrate these properties, they hold

for any arbitrary number of fractions and any arbitrary permutation of the doses.

Finally, it should be noted the set of vectors of doses ~d = [d1 . . . dN ]T satisfying linear-

quadratic equation α∑N

f=1 df + β∑N

f=1 d2f = K defines an N -dimensional hypersphere

centered at the point [−α/(2β) . . .−α/(2β)]T and having a radius of√

+N (α/(2β))2,

as described in Mizuta et al. (2012).

Chapter 3. The fractionation problem 19

Figure 3.1: Contours of two different linear-quadratic functions, circles centered in the

third quadrant, in a two-fraction case (R2)

In the R2 case (two fraction case, where N = 2), this hypersphere is a circle. For

illustration purposes, we show the contours of two arbitrary LQ functions in Figure 3.1,

in red and black. We see the circles are centered at a point along the diagonal in the

third quadrant.

Up to now, the model described addresses the problem of fractionated treatment of

a single tumor and a single healthy structure, with constant parameters α, β, which

represents the state of the art formulation in the most recent literature (Mizuta et al.,

2012). Here, we show a more general formulation of the problem which allows for an

arbitrary number of sensitive structure constraints and tumor α, β parameters that may

evolve over time to account for evolving radiosensitivity.

maximize f(~d) =N∑f=1

(αfTdf + βfTd

2f

)(3.5)

subject to (3.6)

gj(~d) =N∑f=1

(αjωdf + βjω

2d2f

)≤ Kj ∀j ∈ S (3.7)

df ≥ 0 ∀f = 1, . . . , N (3.8)

Chapter 3. The fractionation problem 20

In this case, we are still maximizing the cell kill of tumor cells, but are now restricted by a

set S of multiple sensitive-structures, indexed by the subscript j. Note that the equality

constraint in equation 3.3 has been replaced by a set of inequalities, in statement 3.7. This

replacement ensures that we do not end up with an empty feasible set, due to inconsistent

restrictions on the dosage, which may be imposed by the the multiple sensitive-structure

constraints. Indeed, some sensitive-structures may impose greater restrictions on the

dosage delivered to the tumor than others, as we show in Figure 3.2 and in Section 7.5.

Such inconsistencies would make a feasible set empty and our problem insolvable.

The feasible set for the multiple sensitive structure problem with constant coefficients

is illustrated in the case of a two-fraction treatment in Figure 3.2(a) and Figure 3.2(b).

This illustration provides an example of the necessity of using a set of inequality con-

straints to describe the feasible set. As illustrated, the feasible set may be defined by

a single dominant sensitive structure constraint which makes all other sensitive struc-

ture constraints redundant (in blue), as in Figure 3.2(a), and reduces the problem to

the initial single sensitive-structure problem of Keller and Davison (2008); Mizuta et al.

(2012). The feasible set may also be defined by the intersection of an arbitrary subset

of the sensitive structure constraints (also in blue), occurring at arbitrary points, as in

Figure 3.2(b). In such cases imposing equality constraints would make the feasible set

empty and the problem insolvable.

When it comes to varying radiosensitivity parameters, we can reasonably envisage two

cases of varying tumor tissue sensitivity to radiation: the cases in which the tumor be-

comes less sensitive to radiation and the case in which the tumor becomes more sensitive

to radiation over the course of the treatment (Keller and Davison, 2008).

Mathematically, this variation of radiosensitivity over time would be expressed by de-

creasing (increasing) αfT , βfT parameters in the objective function, such that α1T ≤ . . . ≤

αFT and β1T ≤ . . . ≤ βFT , in the case of a decrease in radiosensitivity, or α1T ≥ . . . ≥ αFT

Chapter 3. The fractionation problem 21

(a) LQ constraints, multiple sensitive

structures, one dominant constraint

(b) LQ constraints, multiple sensitive

structures, arbitrarily intersecting con-

straints

Figure 3.2: (a) Feasible set with more than one sensitive structure LQ constraint, with

one dominant constraint (in blue) and (b) dominant constraints intersecting at arbitrary

points (also in blue)

Chapter 3. The fractionation problem 22

and β1T ≥ . . . ≥ βFT , in the case of an increase in radiosensitivity. Decreasing (increas-

ing) parameters means that for the same dose the effect on the tumor decreases (increases)

over time as parameter values evolve over the course of treatment. Our generalized for-

mulation now allows for time varying radiosensitivity coefficients αfT , βfT , which are

indexed for each of the f = 1, . . . , N fractions of the treatment.

In looking into the effect of time-varying radiosensitivity parameters, it is important

to recall the properties of the linear-quadratic function with constant parameters. As

described in Section 2.1, the order of the doses does not matter, and two dose schedules

with the same dosages have the same effect, regardless of how the doses are ordered in

time. However, by making the radiosensitivity parameters vary over the course of the

treatment, we break this property.

Chapter 4

Traditional mathematical

programming approach

Before moving forward with our problem formulation, it is important to demonstrate

that the commonly used Karush-Kuhn-Tucker (KKT) conditions for solving non-linear

optimization problems are not helpful in the case of our fractionation problem.

For the reader’s benefit, we restate the KKT conditions for a general case. We then show

a simple counter-example that illustrates our claim and then use our actual problem

formulation to show that KKT conditions do not apply in our specific case. We end this

chapter by justifying to the reader why a numerical search technique is required to solve

the fractionation problem.

23

Chapter 4. Traditional mathematical programming approach 24

4.1 The Karush-Kuhn-Tucker conditions

Prior to demonstrating the inapplicability of the KKT conditions we find it useful to

restate them, in the context of a general constrained nonlinear problem, as shown here:

minimize or maximize f(x)

subject to

gi(x) ≤ 0 ∀i = 1, . . . , I

hj(x) = 0 ∀j = 1, . . . , J

The KKT conditions are a set of necessary (feasibility, stationarity and complementary

slackness) and sufficient (convexity/concavity) conditions for an optimum in a nonlinear

optimization problem, like the one just stated.

The first-order KKT condition states that for a feasible point to be an optimum it must

be a stationary point. Stationarity means that at an optimum x∗, the gradient of the

Lagrangian function for the constrained problem must equal zero:

∇L(x∗, µ, λ) = ∇f(x∗) +I∑i=1

(µi∇gi(x∗)) +J∑j=1

(λj∇hj(x∗)) = ~0

Where the variables µi, λj are the dual variables for the inequality and equality con-

straints, respectively. Also note this formulation is flexible enough so that the elements

of the triplet x∗, µ, λ can each be scalars or vectors. Additionally, for a point to be

an optimum the dual variables µ, λ must fulfill the following conditions, known as the

“complementary slackness” conditions, at the point x∗:

µi gi(x∗) = 0 ∀i = 1, . . . , I

If the ith inequality constraint is tight at the point x∗, then the dual varible µi must be

positive (µi ≥ 0). Otherwise, if the ith inequality is not tight, then µi must equal zero

(µi = 0). For the equality constraints, the variables λj can be any arbitrary scalar.

Chapter 4. Traditional mathematical programming approach 25

Finally, the sufficient (but not necessary) condition relates to the geometry of the La-

grangian (i.e., the geometry of objective function and feasible set). It states that a point

x∗ is a local maximum (minimum), if, in addition to meeting the necessary conditions

stated previously, the Lagrangian is also concave (convex) in the neighborhood of that

point x∗. Convexity (concavity) is verified by examining the Hessian matrix of the La-

grangian at the point x∗. If the Hessian is negative-definite in the neighborhood of the

point x∗, the Lagrangian is locally concave and x∗ can be classified as a local maximum.

Conversely, if the Hessian is positive-definite the Lagrangian is concave and x∗ is a local

minimum. If the Hessian is indefinite, the second-order conditions are inconclusive.

We now derive the KKT conditions based on the original formulation of Keller and

Davison (2008). We use a two fraction case for illustration purposes, but our results hold

for any number of fractions. We recall the problem of determining the doses for two

fractions that maximize tumor cell kill under one constraint is expressed in the literature

(Keller and Davison, 2008; Mizuta et al., 2012) as

maximize f(~d) = αTd1 + βTd21 + αTd2 + βTd

22

subject to

g(~d) = αSωd1 + βSω2d2

1 + αSωd2 + βSω2d2

2 = K

h1(~d) = −d1 ≤ 0

h2(~d) = −d2 ≤ 0

Before looking into optimality conditions, we begin with the examination of a plot of the

linear-quadratic function in R2, shown in Figure 3.1, Figure 3.2(a) and Figure 3.2(b).

These figures visually confirm its convex shape.

In order to remain consistent with the bulk of the mathematical programming literature

we will work with the minimization formulation of this problem for the rest of this chapter,

i.e., minimizef(~d) =∑F

f=1

(−αdf − βd2

f

). In the case of our fractionation problem, the

Chapter 4. Traditional mathematical programming approach 26

first-order conditions for optimality at a point ~d∗ for a two-fraction example are

∇L(~d∗, λ, ~µ) = ∇f(~d∗) + λ∇g(~d∗) + µ1∇h1(~d∗) + µ2∇h2(~d∗) = ~0 (4.1) −αT − 2βTd∗1

−αT − 2βTd∗2

+ λ

αSω + 2βSω2d∗1

αSω + 2βSω2d∗2

+ µ1

−1

0

+ µ2

0

−1

= ~0 (4.2)

−αT − 2βTd∗1 + λαSω + 2λβSω

2d∗1 − µ1

−αT − 2βTd∗2 + λαSω + 2λβSω

2d∗2 − µ2

=

0

0

(4.3)

d∗1

d∗2

=

µ1−λαSω+αT

−2βT +2λβSω2

µ2−λαSω+αT

−2βT +2λβSω2

(4.4)

Recall that

αT , αS, βT , βS, ω > 0 (4.5)

d1, d2 ≥ 0 (4.6)

µ1, µ2 ≥ 0 (4.7)

di > 0⇒ µi = 0 ∀i = 1, 2 (4.8)

Inequalities 4.5 and 4.6 always hold, by design, in our model. Inequalities 4.7 and 4.8

are part of the KKT necessary complementary slackness conditions for an optimum.

The general KKT second-order condition for an optimum in our given fractionation

problem is that the Hessian matrix for the restricted Lagrangian

HL(~d) =

[∂L(~d)

∂di∂dj

]=

−2βT + 2λβSω

2 0

. . .

0 −2βT + 2λβSω2

� or ≺ 0

be positive-definite for a minimum or negative-definite for a maximum, at the critical

point ~d∗ that meets the necessary conditions. However, because we are minimizing, we

are only interested in the cases where the Hessian matrix is positive-definite.

We observe that the Hessian is diagonal and has constant values along its diagonal.

We also know the diagonal elements cannot equal zero because of the ratios we derived

Chapter 4. Traditional mathematical programming approach 27

through the first-order condition. This condition excludes the semi-definite case (i.e.,

diagonal elements or eigenvalues equal to zero). So the second-order conditions, together

with the ratios in the first-order conditions, tell us that for the point ~d∗ to be a minimum,

we require −2βT + 2λβSω2 > 0 or, more simply, −βT + λβSω

2 > 0.

Note, λ ∈ R is unrestricted. This absence of bounds means our Lagrangian’s convex-

ity depends on the values of λ (recall the model parameters βT , βS, ω ≥ 0 are known).

Depending on the values of λ, the inequality βT + λβSω2 > 0 may or may not hold.

Consequently, the restricted Lagrangian may or may not be convex. Therefore, any

point satisfying the second-order condition can only be classified as a local minimum

through the application of the KKT theorem. In addition, some points not satisfying

the second-order condition but satisfying the first-order necessary condition, points lying

in neighborhoods where the restricted Lagrangian function is non-convex, may also be

optima, as will be seen in our example in Section 4.3. Our problem involves the mini-

mization of a concave (maximization of a convex) objective function, a problem shown

to be ill-suited to the KKT theorem, as demonstrated by the second-order condition just

presented, so these facts must be kept in mind.

Mathematical evidence of the non-convexity of this problem is provided by the equality

constraint for the sensitive structure g(~d) = αSωd1+βSω2d2

1+αSωd2+βSω2d2

2 = K, which

defines a non-convex set and by the objective function (recall: f ′′(~d) = −2βT < 0), which

is not convex. This evidence of non-convexity eliminates the relevance of the second-order

conditions. To prove the assertion that the feasible set, defined by equality constraints,

is not convex, we take a single-fraction example and apply the properties of convex

combinations of feasible points. Letting x1 and x2 be two arbitrary feasible points (doses

for a single-fraction treatment), the following equations hold:

g(x1) = αSx1 + βSx21 = K

g(x2) = αSx2 + βSx22 = K

Chapter 4. Traditional mathematical programming approach 28

Now take the convex combination of the two feasible points, x3 = θx1 + (1− θ)x2, where

0 ≤ θ ≤ 1. Because the function g(x) is convex, by definition, we have the the following

inequality: g(x3) = αS(θx1 + (1 − θ)x2) + βS (θx1 + (1− θ)x2)2 ≤ K, which means the

convex combination x3 may not be part of the feasible set.

In order to prove that the function g(x) is convex, all we need to do is to examine

its second-derivative g′′(x), which is always positive, regardless of the values x, since

βS ≥ 0 ⇒ g′′(x) = 2βS ≥ 0. Also, if βS = 0 and therefore g′′(x) = 0, then the function

g(x) is linear, which is also a convex function.

In summary, we must remember the concave minimization problem (convex maximiza-

tion) is in fact a non-convex problem. The optima may lie on a convex segment of the

Lagrangian function (where the second-order conditions are met) or on a non-convex

segment (where the second-order conditions are not met). The second-order conditions

are not relevant and the KKT theorem not useful, in this case. Furthermore, as may

be seen in Section 4.2 there are more unknowns than equations, which makes any itera-

tive solution technique that relies on the KKT FOC impossible to apply. Which is why

numerical search technique must be applied to find the optima.

4.2 First-order enumeration

As we saw in Section 4.1, the KKT conditions, the second-order conditions in particular,

are not applicable to our specific problem. Because we are maximizing a convex function,

the second order KKT condition will not help us identify the optima of our problem. The

first-order conditions, however, are necessary for any optimum.

One possible solution technique could be to solve the KKT first-order condition and

enumerate all solutions. Apart from the possible “computational explosion” this strategy

Chapter 4. Traditional mathematical programming approach 29

may yield, we also note that it is not possible to solve the KKT FOC analytically. As

we saw in Section 4.1, the doses d∗i that make the KKT FOC hold are given by

d∗i =µi − λαsω + αT−2βT + 2βSλω2

This condition, even when combined with feasibility conditions, yields an under-determined

system of equations, with infinite number of solutions, as we see here.

d∗f =µf − λfαsjω + αT−2βT + 2βjλω2

µf ≥ 0

µf = 0⇒ df > 0

αsj

F∑i=1

df + βsj

F∑i=1

d2f ≤ kj ∀j ∈ S

df ≥ 0 ∀f = 1, . . . , N

4.3 Convex maximization, a counter-example

In this section, we use a simple example to follow-up on Section 4.1 and illustrate why

the KKT conditions may not be applicable to the maximization of a convex function.

Take the following simple maximization example, with a single decision variable:

maximize f(x) = x2, subject to − 1 ≤ x ≤ 1

Note that both the objective function f(x) = x2 ⇒ f ′′(x) = 2 > 0 and the (continuous)

feasible set, defined by the box constraints −1 ≤ x ≤ 1, are convex.

The optima can be easily identified by visual inspection of the graph of the function

f(x) = x2 in Figure 4.1. This simple example can be solved intuitively and it is easy to

visually verify the two maxima that occur at the boundary feasible points x = ±1.

Chapter 4. Traditional mathematical programming approach 30

−1.5 −1 −0.5 0 0.5 1 1.50

0.5

1

1.5

2

2.5

Feasible Set Boundaries

Optima

Figure 4.1: Simple convex maximization case, with a single decision variable

We now put our example problem in the traditional form, before deriving the KKT

conditions for its optimum. This problem can be re-written as

minimize f(x) = −x2 (4.9)

subject to (4.10)

g1(x) = −x− 1 ≤ 0 (4.11)

g2(x) = x− 1 ≤ 0 (4.12)

Before moving on, we recall the property that maximizing a convex function is equivalent

to minimizing the negative of a convex function, a concave function. Indeed, in our

present case this property means that finding a solution to the problem maximize f(x) =

x2 is equivalent to finding one for Min −f(x) = −x2. We also immediately observe that

−f(x) = −x2 ⇒ −f ′′(x) = −2 < 0 is obviously a concave function.

Chapter 4. Traditional mathematical programming approach 31

The first-order and complementary slackness conditions for the problem are given by

∇L = −2x− µ1 + µ2 = 0

µi ≥ 0

gi = 0⇔ µi > 0

gi < 0⇔ µi = 0

The second-order conditions for the problem require ∇2L > 0. However, these conditions

are never met, since ∇2L = −2 6> 0.

While it is obvious the optima occur at the feasible points x = ±1, we see the KKT second

order conditions are never met and are not helpful in identifying the optima. Here, it

is important to recall that the first-order KKT and complementary slackness conditions

are necessary conditions, while the second-order conditions are only sufficient but not

necessary for an optimum. As illustrated in this simple counter-example, an optimum can

occur even if the second-order condition is not met. Indeed, at the feasible points x = ±1,

as shown in Case 1 and in Case 2, we can easily verify the first-order necessary and

complementary slackness conditions are met, yet the second-order condition is obviously

not.

Case 1: x = −1

g2 6= 0⇒ µ2 = 0

∇L = −2(−1)− µ1 + µ2 = 0

⇔ µ1 = 2

∇2L = −2

Chapter 4. Traditional mathematical programming approach 32

Case 2: x = 1

g1 6= 0⇒ µ1 = 0

∇L = −2(1)− µ1 + µ2 = 0

⇔ µ2 = 2

∇2L = −2

As mentioned just previously, in both cases we see the first-order and complementary

slackness conditions are met, yet the second-order one is never met. The second-order

condition is never met because both the objective and corresponding restricted La-

grangian functions we are minimizing are not convex, in the neighborhood of the optima

x = ±1. Indeed both functions are concave, over the entire feasible set. Concavity is con-

firmed by the always negative ∇2L and f ′′(x). We can therefore formally conclude that

our simple constrained counter-example problem never meets the second-order conditions

at its optima and verify the claim we made in Section 4.3.

Chapter 5

Properties of the fractionation

problem

In this chapter, we briefly revisit the analytic solutions, from the literature, for the case of

a fractionation problem with a single sensitive structure constraint. We also attempt to

formally characterize all optimum-candidate points, for more general formulations, with

an arbitrary number of sensitive structure constraints. While the most recent literature

only deals with problems constrained by a single sensitive structure, we seek more general

results that can be applied to cases where the treatment dose is constrained by more than

one sensitive structure, as is typically the case in practice.

5.1 Known analytic optima

In the literature, some authors (Keller and Davison, 2008; Mizuta et al., 2012) present

analytic closed-form solutions to the fractionation problem, in the case of a single sensitive

structure constraint. These solutions are based on the relative magnitude of the ratios

33

Chapter 5. Properties of the fractionation problem 34

of the α and β radiosensitivity parameters of the objective and constraint functions.

Here, we review these results and add a minor modification to the theorem of Keller and

Davison (2008), for the cases of equality of ratios. We also explore a specific property of

the optima of the fractionation problem, the boundary property, which we present as a

lemma.

Both Keller and Davison (2008); Mizuta et al. (2012) show that the optimum is either a

large single dose treatment or a dose equally divided into the specified number of dose

fractions that make up the treatment (EQDF). The actual doses can then be determined

by solving the second-degree polynomial formed by the (single) sensitive-tissue constraint.

Recall the fractionation problem with a single sensitive-structure constraint presented in

Chapter 3. The goal is to maximize the tumor cell kill, while keeping the damage to the

sensitive-structure to a preset level.

According to Keller and Davison (2008); Mizuta et al. (2012), if the ratio of the sensitive-

structure’s radiosensitivity parameters (α/ωβ)S is less than the ratio of tumor’s radiosen-

sitivity parameters (α/β)T , then the optimal treatment is an equal-dose-per-fraction

schedule over the entire length of the treatment of N fractions. Otherwise, if the ratio

(α/ωβ)S is greater than the ratio (α/β)T , then the optimal treatment is a dose delivered

in a single fraction. In both cases, the doses (equal-dose-per-fraction or single-dose) are

found by solving the quadratic equations (with one unknown), given by the sensitive-

tissue constraint. Indeed, given that the treatment is either a single dose or N equal

doses, the sensitive-tissue equality constraint becomes a second-degree polynomial in one

variable.

To illustrate this idea, we graphically show how the relative magnitudes of the α/β ratios

affect the shapes and intersections of linear quadratic functions, in a two-fraction (R2)

case. By taking a look at examples in R2, we see that the curvature of the objective

Chapter 5. Properties of the fractionation problem 35

function and the curvature of the constraint function affect the location of the optimum.

In Figure 5.1(a) and Figure 5.1(b), we show the contours of objective function (in red)

and the contour of the constraint function at its constrained value (in blue).

We note and visually confirm that when the objective function (red) has greater cur-

vature, is less flat, than the constraint function, when the ratio (α/β)T is smaller than

the ratio (α/ωβ)S, then the constraint and the objective contour with the highest value

intersect at the axis. The intersection corresponds to the single-dose treatment cases, the

intercepts of the sensitive structure constraint at each of the axes, the points (d1, 0) and

(0, d2). This intersection is illustrated in Figure 5.1(a). On the other hand, when the

objective function (red) has lower curvature, is more flat, than the constraint function,

when the ratio (α/β)T is greater than the ratio (α/ωβ)S, then the constraint and objec-

tive contour with the highest value intersect along the 45-degree line, this intersection is

illustrated in Figure 5.1(b).

While we have clear guidelines in the case of strict inequalities, the case of equality of

ratios is more ambiguous. Keller and Davison (2008) tell us that if the ratio (α/ωβ)S is

equal to the ratio (α/β)T , then the optimal treatment is a treatment with a single large

dose. Mizuta et al. (2012), on the other hand, do not ever specifically mention the case

of equal ratios. While the assertion of Keller and Davison (2008) is not untrue, it ignores

the peculiarities of the specific case of equality of ratios. When the ratios (α/ωβ)S and

(α/β)T are equal, there exists a constant c, such that αT = cαSω and βT = cβSω2. This

relationship allows use to rewrite the objective function as:

f = αT

N∑f=1

df + βT

N∑f=1

d2f = cαSω

N∑f=1

df + cβSω2

N∑f=1

d2f

Chapter 5. Properties of the fractionation problem 36

(a) Contours of the objective (red) and

constraint function (blue), single dose

case

(b) Contours of the objective (red), con-

straint function (blue) and 45 degree line

(black dots), EQDF case

Figure 5.1: Two possible optimal solutions in the single sensitive structure constraint

case, (a) the single dose treatment and (b) the equal dose per fraction treatment

Chapter 5. Properties of the fractionation problem 37

The optimization problem then becomes

maximize c

(αSω

F∑f=1

df + βSω2

F∑f=1

d2f

)

subject to

αSωF∑f=1

df + βSω2

F∑f=1

d2f = K

df ≥ 0 ∀f = 1, . . . , F

This case is degenerate and probably only occurs very rarely. In this case, all feasible

(positive) doses that make the constraint equal K yield exactly the same objective func-

tion value. Therefore, all feasible points are optima, not just the single-dose point, as

claimed in Keller and Davison (2008).

5.2 Change of coordinates

To extract the main mathematical properties of the linear-quadratic function more easily

we change the coordinate system used to represent the dose-per-fraction, from a standard

Rn coordinate system, where each coordinate represents a fractional dose to a more

compact equivalent, described here.

Previously, we expressed the doses for each fraction in a treatment with N fractions as

coordinates of a vector in RN . We called this vector ~d = [d1, . . . , dN ], where di is the

dose delivered to the tumor under treatment, in fraction i.

Any vector in RN can be expressed as a linear combination of a vector of ones ~1N =

[1 . . . 1]T ∈ RN and a vector ~vN ∈ RN that is orthogonal to it and which we have scaled

by its `2-norm, so it has unit length. Therefore, any arbitrary vector ~x ∈ RN can be

expressed as the linear combination ~x = ρ~1N+ε~vN , with the appropriate scalar coefficients

Chapter 5. Properties of the fractionation problem 38

ρ, ε ∈ R.

In this coordinate system, we can identify any point in RN , with only three parameters:

the scalars ρ, ε and the vector ~vN . For example, in R3

[3 4 5] = ρ[1 1 1] + ε[v1 v2 v3]

= 4[1 1 1] + ε

[−1√

20

1√2

]= [4 4 4] +

√2

[−1√

20

1√2

]

Note that the vector ~v3 = [v1 v2 v3] = [−1/√

2 0 1/√

2] is orthogonal to the vector of

ones ~13 = [1 1 1] and is normalized (i.e., scaled by its `2-norm). Its sum of coordinates

is equal to zero and its `2-norm (and sum of squared coordinates) is equal to one, by

design.

In Figure 5.2, we illustrate graphically an example of this change of coordinates, with a

simple example in R2, where we have replaced the traditional unit vectors [1 0] and [0 1]

with the vector [1 1] and an orthogonal vector [ 1√2−1√

2].

Using this new coordinate system, we will express the LQ function for a treatment over

N fractions, where df denotes the dose administered in fraction f . In its original form,

the LQ function was given by

f = α

N∑f=1

df + β

N∑f=1

d2f

Using our new coordinate system, we can re-write our dose-per-fraction vector ~d as

~d = [d1, . . . , dN ]

~d = ρ~1N + ε~vN

Chapter 5. Properties of the fractionation problem 39

[x  y]  

ρ[1  1]  

ε[1/√2  -­‐1/√2]  

Figure 5.2: Change of coordinates example in R2

The LQ function is re-written as

f = αN∑f=1

df + βN∑f=1

d2f (5.1)

= αN∑i=1

(ρ+ εvi) + βN∑i=1

(ρ+ εvi)2 (5.2)

= α

(N∑i=1

ρ+ εN∑i=1

vi

)+ β

N∑i=1

(ρ2 + 2ρεvi + ε2v2

i

)(5.3)

= α

(N∑i=1

ρ+ ε

N∑i=1

vi

)+ β

(N∑i=1

ρ2 + 2ρεN∑i=1

vi + ε2N∑i=1

v2i

)(5.4)

= α(Nρ+ ε× 0) + β(Nρ2 + 2ρε× 0 + ε2 × 1) (5.5)

= α(Nρ) + β(Nρ2 + ε2) (5.6)

Recall that by construction∑N

i=1 vi = 0,∑N

i=1 v2i = 1, which explains the transition from

equation 5.4 to equation 5.5

While this coordinate system transformation is very helpful in establishing mathematical

properties, as will be seen in the next section, it is of limited value in practice. Indeed,

not all points in RN can be expressed using the same orthogonal vector ~v. If we go back

Chapter 5. Properties of the fractionation problem 40

to our earlier example of a vector in R3

[3 4 5] = ρ[1 1 1] + ε[v1 v2 v3]

= 4[1 1 1] + ε

[−1√

20

1√2

]= [4 4 4] +

√2

[−1√

20

1√2

]

Our choice of ~v =[−1/√

2 0 1/√

2]

made this coordinate change possible, for the

case of vector [3 4 5]. However, using another vector orthogonal to the vector of ones,

for example ~v∗ = (2/3)[1 − 1/2 − 1/2], we see it is not possible to express the vector

[3 4 5] as a linear combination ρ~13 + ε~v∗.

Unfortunately, to solve our fractionation problem in practice, we must retain the problem

in the original RF coordinate system, for a treatment of F fractions.

5.3 Generalized properties of the optima

Note that in this section, for added clarity and simplicity, we will use αs, βs, to denote

the coefficients ωαs, ω2βs, used in Chapter 3 as the sensitive structure constraints’ ra-

diosensitivity parameters, without loss of generality or accuracy. Also note that all our

results are based on the assumption that these α, β parameters in the objective and con-

straint functions remain constant throughout the treatment. This assumption will only

be relaxed in Section 7.6.

Earlier in this document, we saw that the KKT theorem may not apply in cases where

we attempt to maximize a convex function and to our fractionation problem, specifically.

However, the KKT first-order conditions (KKT FOC) are necessary for any optimum,

in all cases, regardless of the shape of the objective (Bazaraa et al., 2006). In this

Chapter 5. Properties of the fractionation problem 41

section, we exploit the KKT FOC to formally characterize the optima and extract their

properties, in the case of a problem constrained by an arbitrary number of sensitive

structure constraints.

In the case of a problem constrained by a single sensitive structure, Keller and Davison

(2008); Mizuta et al. (2012) provide an analytic solution for the optima of the fraction-

ation problem. The case of an arbitrary number of sensitive structure constraints, was

not addressed. In this section, we attempt to formally characterize some properties of

the optima, in the more general cases of a problem constrained by an arbitrary number

of sensitive structure constraints. We also attempt to provide a different mathematical

perspective on the results of Keller and Davison (2008); Mizuta et al. (2012) through the

use of the KKT FOC.

We will now prove the fact that the optima are known to always lie on the most restrictive

constraint(s), the constraint(s) that impose the largest restriction on the dose that can

be delivered to the tumor. This property applies not just to problems constrained by a

single sensitive-structure constraint, where it is entirely consistent with the formulations

and results presented in Keller and Davison (2008); Mizuta et al. (2012), but also to more

general problems constrained by an arbitrary number of sensitive-structure constraints.

Lemma 1. The optima for a fractionation problem with a linear-quadratic objective func-

tion with positive radiosenstivity parameters, constrained by one or more linear-quadratic

constraint functions, must lie on the most restrictive sensitive-structure constraint(s). In

other words, at the optimum at least one sensitive-structure constraint must be active.

Proof. (by contradiction)

Let ~̃d = [d̃1, . . . , d̃N ] be a maximum that lies in the interior of the feasible set. That

Chapter 5. Properties of the fractionation problem 42

means that for the given set of constraints gj(~d) ≤ Kj, ∀j = 1, . . . , J we have

gj( ~̃d) =N∑f=1

(αj d̃f + βj d̃2f ) < Kj ∀j = 1, . . . , J

We can replace the first dose d̃1 with a larger dose d∗1 > d̃1, such that for the most

restrictive constraint, call it gr, we get:

gr( ~̃d, d∗1) = (αrd

∗1 + βrd

∗21 ) +

N∑i=2

(αrd̃i + βrd̃2i ) = Kr > gr( ~̃d)

If we go back to the objective, given the parameters α, β and the variables d∗i , d̃i are

always positive, it is clear that the function is monotonically increasing:

(αtd∗1 + βtd

∗21 ) +

N∑i=2

(αtd̃i + βtd̃2i ) >

N∑i=1

(αtd̃i + βtd̃2i )

This inequality contradicts our initial claim that interior point ~̃d was an optimum. There-

fore, the optimum must be on the boundary of the most restrictive sensitive-structure

constraint.

As shown in the single sensitive structure case in Section 5.1, optima occur either the

EQDF point, the intersection of the sensitive tissue constraint with the non-negativity

constraints (which occurs at the the single-dose points), or any positive point along the

boundary of the sensitive-structure constraint. The actual location of the optima de-

pends on the relative magnitude of the radiosensitivity parameters of the objective and

constraint LQ functions, as described by Keller and Davison (2008); Mizuta et al. (2012).

In the case of more than one sensitive structure, these points remain candidates for an

optimum, but additionally (only) the intersections of two or more sensitive structure con-

straints may also be candidates for optima, as we will demonstrate later in this chapter.

Chapter 5. Properties of the fractionation problem 43

In Section 4.1, we have explained why the KKT theorem, its sufficient condition specifi-

cally does not apply to our specific problem. However, the first-order conditions (FOC)

are always necessary for the existence of an optimum. We now examine these conditions,

in the context of the new coordinate system described above, to formally describe some

properties of the optima.

Recall our N -fraction fractionation problem, expressed using the new coordinate system:

minimizeρ,ε f(ρ, ε) = −αt(Nρ)− βt(Nρ2 + ε2)

subject to

gj(ρ, t) = αj(Nρ) + βj(Nρ2 + ε2)−Kj ≤ 0 ∀j = 1, . . . , S

hi(ρ, t) = −ρ− εγi ≤ 0 ∀i = 1, . . . , N

The sensitive tissue constraints for each of the j = 1, . . . , S sensitive-structures are de-

noted by gj and the dose non-negativity constraints for each of the i = 1, . . . , N fractions

are denoted by hi. Also, we use ‘I’ to denote the set of all active constraints.

Theorem 1. The optimum for a fractionation problem with a linear-quadratic objective

function and sensitive structure constraint function of fractional doses occurs in one of

three cases:

(i) at the equal dose per fraction point given by the most restrictive sensitive structure

constraint

(ii) at the intersection of at least two active constraints

(iii) at any feasible boundary point along the most restrictive constraint, in the degenerate

case where the objective and the most restrictive sensitive structure constraint are

multiples of each other.

Proof. Before proving this claim, we note that this problem needs at least one sensitive-

Chapter 5. Properties of the fractionation problem 44

tissue constraint, otherwise it is unbounded and the doses can be infinitely high. Addi-

tionally, because the problem is defined for a non-empty closed bounded feasible set, over

which the objective function is defined and continuous, an optimum exists. Finally, as

shown in lemma 1, at least one sensitive tissue constraint will be tight, at the optimum.

We also need to recall that we only deal with feasible positive doses, as feasibility must

always be met, at any optimum.

Recall that for our fractionation problem, at an optimum, the following must hold (by

KKT FOC and feasibility conditions).

∇L = ∇f +∑j∈I

(µj∇gj) +∑i∈I

(µi∇hi) = ~0

gj, hi ≤ 0 ∀i, j ∈ I

µj, µi ≥ 0 ∀i, j ∈ I

µj, µi = 0, if gj, hi < 0 ∀i, j /∈ I

Here the functions gj denote the sensitive structure constraints, the functions hi the dose

non-negativity constraints and I the set of all active constraints.

Now, let the optimum occur at a point where only one sensitive-structure constraint,

namely gj∗ , is active and all other gj, hi < 0, in other words none of the doses di > 0

violate any constraint.

At such an optimum, the KKT FOC gives us the following equality. (Recall that all but

one µ = 0, namely µj∗ ≥ 0.)

∇L = ∇f + µj∗∇gj∗ = ~0

From the KKT FOC. we get two possible cases µj∗ = 0 or µj∗ > 0

Chapter 5. Properties of the fractionation problem 45

Case 1: µj∗ = 0 (i.e., ∇f = 0)

∇L =

−αtN − 2βtNρ

−2βtε

=

0

0

Recall that αt, n, βt, λ > 0, so this equality can never hold, since the inequality −αtn 6=

2βtnλ is always true. Therefore, we can conclude Case 1 will never occur.

Case 2: µj∗ > 0

∇L =

−αtN − 2βtNρ+ µj∗(αj∗N + 2βj∗Nρ)

−2βtε+ 2µj∗βj∗ε

=

0

0

This equality means that βtε = µj∗βj∗r or, in the case of an arbitrary ε 6= 0, that

βt = µj∗βj∗ . From Case 2, we see only two possibilities:

A) If ε = 0 (i.e., only candidate for an optimum is the EQDF point given by the j∗th

constraint)

(a) αt + 2βtρ = µj∗αj∗ + 2µj∗βj∗ρ, for some µj∗ , ρ pair

(b) βtε = µj∗βj∗ε, which is always true (i.e., 0 = 0)

B) If ε 6= 0 (i.e., The only candidate for an optimum is not the EQDF point)

(a) αt + 2βtρ = µj∗αj∗ + 2µj∗βj∗ρ = µj∗(αj∗ + 2βj∗ρ)

(b) βtε = µj∗βj∗ε⇔ βt = µj∗βj∗

In the case where ε 6= 0, when the EQDF point is not the sole candidate for an optimum,

we see that αt + 2µj∗βj∗ρ = µj∗αj∗ + 2µj∗βj∗ρ ⇒ αt = µj∗αj∗ and of course βt =

Chapter 5. Properties of the fractionation problem 46

µj∗βj∗ . The constraint is a multiple of the objective, so all (positive dose) points on the

boundary satisfy the FOC, yield the same objective function value and are candidates

for an optimum.

The only possible candidates for an optimum, when only one sensitive structure con-

straint is tight, are the points where ε = 0, which are equal-dose-per-fraction points, or

all arbitrary points on the boundary, in the (degenerate) case where the objective and

constraint function are multiples of each other. This proves the assertions in case (i) and

in case (iii) of the theorem statement.

For any other point to be a candidate for an optimum we need to introduce more active

constraints, which will intersect, by definition. These intersections, then become the

only feasible points and are therefore the only possible candidates for an optimum. For

example, if two constraints were active, at the optimum, we would get the equality:

∇L = ∇f + µj∗∇gj∗ + µj̃∇gj̃ = ~0

In this scenario the optimum can only occur at the intersection of the active constraints

gj∗ and gj̃, by definition, since both are active. This proves the assertion in case (ii) of

the theorem statement.

In summary, it has been shown that with only one active sensitive-tissue constraint

a boundary equal-dose-per-fraction treatment or any arbitrary point on the boundary

are the only possible candidates for optima. Any other specific optimum candidate

point would require at least one more active constraint. Therefore we can claim that an

optimum, in the case where more than one constraint is active, must occur an intersection

of these constraints, by definition.

In the work of Keller and Davison (2008) and Mizuta et al. (2012), which deal with the

special case of a single sensitive structure constraint, the optima are shown to occur only

Chapter 5. Properties of the fractionation problem 47

in the equal-dose-per-fraction case or in the single-dose case. The equal-dose-per-fraction

optimum corresponds to a situation where ε = 0 and is shown here in Case 2. The single-

dose optimum corresponds to the intersection of multiple constraints, the sensitive tissue

constraint and all but one of the non-negativity constraints. Our conclusions are therefore

entirely consistent with the conclusions reached by these authors.

In the case of multiple sensitive tissue constraints, the intersection of these constraints

offer an additional set of potential candidates for optima. Future work may involve

solving the system of constraints to find all constraint intersection points and evaluate

the objective at each of these points, in order to solve this problem exactly.

Chapter 6

Simulated annealing

We have shown that the fractionation problem cannot be solved using conventional math-

ematical programming techniques and that we must resort to a numerical search global

optimization technique, to obtain our solution. Unfortunately, these techniques are com-

putationally costly and typically do not guarantee optimality of the solution. In our

case, it is important for the reader to understand that we are not looking for a provable

global optimum. Our goals are a lot more modest. Currently, clinical practice conven-

tion splits up the total prescribed dose into equal doses in each fraction. Our goal is

to verify whether or not a different fractionation scheme may yield better results within

the framework imposed by the linear-quadratic model, described in Fowler (1992); Bren-

ner et al. (1998); Brenner (2008); Kirkpatrick et al. (2008) or, in the alternate case, to

validate current conventional practices. We look for improved treatment plans that do

not differ drastically from current clinical norms and from the solutions proposed in the

literature, which guides our numerical search and allows us to validate our results. We

use the known optima in the single-dose and “equal-dose-per-fraction” cases to test our

computational technique and to empirically validate the theorem presented in Keller and

Davison (2008); Mizuta et al. (2012), for the case of a single sensitive-structure constraint,

48

Chapter 6. Simulated annealing 49

where we have the benefit of existing analytical solutions.

In Keller and Davison (2008) the authors solve the fractionation using dynamic pro-

gramming. We investigated three alternate solution techniques: the application of the

KKT theorem, non-linear projected gradient, and an array of global optimization search

algorithms. In the end, we settled on simulated annealing as our numerical optimization

technique of choice. A thorough comparison of heuristic techniques is beyond the scope

of this document, but we justify our final choice of simulated annealing on the basis that

it is widely accepted and used in the medical physics literature in general, for example

Webb (1989); Morrill et al. (1991); Mageras and Mohan (1993); Aleman et al. (2008); Cao

et al. (2012), and notably in a different formulation of the dose fractionation problems

(Yang and Xing, 2005), on the fact that it is known simulated annealing yields a global

optimum, given a long enough cooling period (Brownlee, 2011) and on the basis of the

“no free lunch theorem” (Wolpert and Macready, 1997; Brownlee, 2011).

To account for the sensitive structure constraints, we transformed our (continuous con-

strained) problem into a continuous unconstrained one, by substituting a penalized ob-

jective function for the original objective. Our penalty formulation was also designed to

accommodate an arbitrary number of sensitive-structure constraints.

We faced a number of options with this technique and we did a bit of tailoring to the

common simulated annealing technique before applying it to the particulars of the frac-

tionation problem. Following the recommendation in Wah and Wang (1999), we relaxed

our problem stated in Chapter 3 by converting our equality constraint, in the single

sensitive structure case, into an inequality constraint (≤), which was integrated into the

penalized objective function discussed in Chapter 6. This relaxation later allowed us to

overcome possible inconsistencies in the constraints (and an empty feasible set), in the

case of multiple constraints. Also, according to Wah and Wang (1999) this relaxation

enhances the search and increases the likelihood of attaining a global optimum.

Chapter 6. Simulated annealing 50

We also devised two distinct search strategies to explore the feasible set. In one case,

we explore only the boundary of the feasible set, where the optimum is known to lie (as

shown in Section 5.3) and in another case, we explore the interior of the feasible set, as

well as its boundary. Although we know the optima must lie on the boundary comparing

these two search strategies provides further validation of our numerical technique.

The penalized functions fp we seek to maximize through simulated annealing, in each of

the two cases, is given by

f rp (~d) = f(~d)− P ×max[0, (gj(~d)−K)] (regular simulated annealing algorithm) (6.1)

f bp(~d) = f(~d)− P × [0, (gj(~d)−K)]2 (boundary algorithm) (6.2)

where f(~d) is the objective (radiation effect) function we are maximizing, gj(~d) the j− th

healthy structure constraint function and K is the right hand side of the constraint

gj(~d) ≤ K. In the first case (regular algorithm), infeasible solutions, solutions for which

gj(~d) − K > 0 are penalized by a large penalty coefficient P (� 0). In the second

case (boundary algorithm), all solutions not lying on the boundary are penalized by a

coefficient P (which is why we take the square of all deviations from the boundary) .

This algorithm steers the search of the feasible set toward the boundary. Naturally, we

also exclude from our search any dose vector ~d with negative coordinates, by rejecting

any move to a point with negative coordinates.

In the case of more than one sensitive-structure constraint, we pick the largest devia-

tion, since we are looking to minimize deviation from the boundary formed by the most

restrictive set of constraints:

f rp (~d) = f(~d)− P ×maxj{max[0, (gj(~d)−Kj)]} (6.3)

f bp(~d) = f(~d)− P ×max

j{max[(gj(~d)−Kj)]

2} (6.4)

Where f rp in equation 6.3 is the regular search penalized objective function and f bp in

equation 6.4 is the boundary search penalized objective function. The mechanics of this

Chapter 6. Simulated annealing 51

g(x)−K<0

g(x)−K>0

g(x)−K=0

Figure 6.1: Penalty scheme on interior and exterior of the feasible set

penalty scheme are illustrated in Figure 6.1. The steps of the simulated algorithm are

described in Algorithm 1, with the five following user-specified variables:

1. S0: starting point

2. T0: starting temperature

3. Ck: cooling constant

4. I: total number of iterations (stopping criterion)

5. r: search radius

and the randomly generated (U(-1,1)) step direction vector, denoted by ~s. The new point

is the current point plus some random step direction, multiplied by a fixed radius r. This

random uniform step allows the exploration of increasing and decreasing coordinates

(fraction doses).

In the following chapter, we will assess the quality of the solutions yielded by our sim-

Chapter 6. Simulated annealing 52

Algorithm 1 Function maximization via simulated annealing (Brownlee, 2011)

1: i← 0

2: T ← T0

3: x̄← S0

4: x(i) ← x̄

5: best ← x(i)

6: while i ≤ N do

7: i← i+ 1

8: ~s← rand(U(-1,1))

9: x̄← x(i−1) + r × ~s

10: if fp(x̄) ≥ fp(x(i−1) then

11: best ← x̄

12: end if

13: Pr ← exp(

[fp(x̄)−fp(x(i−1))]

T

)14: if Pr ≥ rand(U(0,1)) then

15: x(i) ← x̄

16: else

17: x(i) ← x(i−1)

18: end if

19: T ← temp×Ck

20: end while

21: return best

Chapter 6. Simulated annealing 53

ulated annealing routine variations. We will compare the performance of the boundary

and interior search algorithms and the effect of the different parameters on the final

solution.

In order to efficiently compare solutions obtained by our simulated annealing routine,

in each of our trials we sort the solution doses, in descending order. According to this

convention, larger doses are delivered first. This re-arrangement is in line with the spirit

of Keller and Davison (2008), who apply this convention to their results as well. Also, this

re-arrangement does not affect the value of the linear-quadratic functions, as discussed

in Section 2.1. Re-arranging doses in this way allows for a clearer comparison of results

yielded by the different variations of our algorithm. Indeed, two treatments may have

the same tumor/sensitive-structure cell kill/cell survival characteristics but have different

ordering of the doses, since the doses in the linear-quadratic model do not have time-

dependency. A large dose followed by a small dose have the same effect as a small

dose followed by a large dose, when the doses are equal to each other, as described in

Chapter 3.

Finally, we classify a solution as being an equal dose per fraction, when the solution

returned by the simulated annealing has approximately equal doses in each fraction,

when the dose are all within a band of ±50% maximum to minimum ratio. When this

ratio is of an the order of two or more, then we classify our solution as a single dose

treatment and we round up the largest dose to the nearest integer and set all others

fractional dose to zero, while ensuring the treatment is still feasible.

Chapter 7

Numerical results

Two rounds of tests were conducted. The first round was aimed at tuning the parameters

of the simulated annealing algorithm. The second round was conducted once the most

suitable simulated annealing parameters had been found, to assess solution quality, for

hypothetical patient cases.

In all our tests, we compared the results obtained by our simulated annealing algorithm,

which we denote as z̄, against the solutions taken from the conclusions of Keller and

Davison (2008); Mizuta et al. (2012), in cases where the problem involved only a single

sensitive-structure constraint, these authors provided us with analytic solutions. Where

the problem involved more than one single sensitive-structure constraint, we compared

our numerical results against the highest feasible single dose treatment and the highest

feasible equal dose treatment, which can be found by solving for the roots of of the linear-

quadratic constraints. Indeed, when it is known the solution is either an equal dose or

single treatment, the sensitive structure constraint becomes a polynomial in only one

unknown variable.

54

Chapter 7. Numerical results 55

7.1 Benchmark cases

Our benchmark EQDF test case was modeled after a typical prostate-tumor case, de-

livered in 35 fractions, shown in Keller and Davison (2008). The exact details of the

problem we tried to solve are given in Table 7.1.

Table 7.1: EQDF treatment benchmark test case, αT

βT> αS

ωβS

Parameter value

αT 0.35

βT 0.035

αS 0.315

βS 0.105

ω 0.7

K 22.64

Analytic dose per fraction 2 Gy

For our single-dose treatment test case, we used the same example with the coefficients

in the objective function set to αT = 0.35, βT = 0.35, and then solved for the roots of

the quadratic equation, given by the constraint. The treatment parameters are shown in

Table 7.2.

The analytic optimum for the benchmark EQDF test case was an equal-dose treatment

of 2 Gy per fraction, while it was a single-dose treatment of 18.94 Gy, for the single-dose

treatment test case benchmark.

Chapter 7. Numerical results 56

Table 7.2: Single-dose treatment benchmark test case, αT

βT< αS

ωβS

Parameter value

αT 0.35

βT 0.35

αS 0.315

βS 0.105

ω 0.7

K 22.64

Analytic single dose 18.94 Gy

7.2 Comparison of interior and boundary search al-

gorithms

We compared the mean performance of two samples of 36 runs of each of our two search

algorithm, the interior and boundary search algorithms, described in equations 6.3 and

6.4, against the two known analytical optima, in the case of problems constrained by

a single sensitive structure constraint. The results are shown in Table 7.3. We also

show the dose per fraction solution of one instance of each search algorithm, in Figure

7.1. We note that both search algorithms perform similarly well. In the case where the

optimum is known to be an equal dose per fraction treatment our results show both

search techniques perform very well(< 0.1% deviation from known optimum, for both

techniques). In the case where the known optimum is a single-dose the boundary search

performs slightly better (±10% deviation from known optimum vs. ±15%), as seen in

Figure 7.1 and Figure 7.3.

Chapter 7. Numerical results 57

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

Fraction

Dos

e

Regular search

Boundary search

Known analytic

(a) Known analytic optimum is an EQDF treatment of 2 Gy per fraction

0 5 10 15 20 25 30 350

2

4

6

8

10

12

14

16

18

20

Regular search

Boundary search

Known analytic

(b) Known analytic optimum is a single-dose treatment of 18.94 Gy

Figure 7.1: Dose per fraction versus known optimal dose, regular and boundary algo-

rithms (single sensitive structure constraint)

Chapter 7. Numerical results 58

Table 7.3: Mean percent deviation from analytic solution of objective function yielded

by regular and boundary search algorithms

Case δ

Analytic solution is an EQDF treatment

Regular -0.057%

Boundary -0.088%

Analytic solution is a single-dose treatment

Regular -15.304%

Boundary -10.101%

7.3 Parameter selection and solution quality

As shown in the pseudo-code (Algorithm 1) the simulated annealing algorithm depends

on a set of exogenous parameters: the starting point, the initial temperature T0, the

penalty factor in the penalized objective P , the size r of the search radius in the move

from point to point, as well as the maximum number of search iterations N . No formal

theory guides the selection of specific parameters, in practice they are picked on a trial

and error basis. We examined the effect of variations in the algorithm’s parameters, to

determine their impact on solution quality. We defined solution quality as the percentage

difference between the objective function evaluated at our algorithm’s best solution (z̄)

and at the analytic optimum provided by the conclusions of Keller and Davison (2008);

Mizuta et al. (2012) (z∗). We examined these effects on the benchmark cases discussed

earlier in this chapter.

To assess solution quality we modified one key parameter at a time, while keeping all

other parameter combinations constant and recorded the quantity δ = 100×(|z̄−z∗|)/z∗,

over all instances. We repeated this process for 36 different values of each parameter,

grouped the resulting differences (δ) by specific parameter levels and averaged the results

Chapter 7. Numerical results 59

for each parameter level. For all tests, with the exception of the test for starting points, we

seeded Matlab’s random number generator, before each run of the algorithm, to remove

any possible effect from the state of the random number generation routine. The results

are reported in the Figures 7.2(a) to 7.5(b).

10 50 100

0.2

0.4

0.6

0.8

1

1.2

1.4

T0

δ (%

)

(a) Initial temperature v. δ, EQDF case

10 50 1000

1

2

3

4

5

6

7

8

9

T0

δ (%

)

(b) Initial temperature v. δ, single dose treatment case case

Figure 7.2: Effect of initial temperature (T0) on mean percent divergence from known

optimum (δ)

We set the initial temperature to T0 = 100, since it offered a drop-off in the divergence

from the known optimum, in the single dose treatment case, while it had no significant

effect in the EQDF case, as seen in Figure 7.2.

Chapter 7. Numerical results 60

0.25 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

r

δ (%

)

(a) Search radius v. δ, EQDF case

0.25 0.5 10

1

2

3

4

5

6

7

8

r

δ (%

)

(b) Search radius v. δ, single dose treatment case case

Figure 7.3: Effect of search radius (r) on mean percent divergence from known optimum

(δ)

We set the search radius r = 0.25 as it offered the best results for the EQDF case and

results comparable to the best outcome in the single dose treatment case, as shown in

Figure 7.3.

We set our penalty coefficient P = 100, since it offers the best results in the single dose

treatment case, while its variation seems to have virtually no effect on solution accuracy,

in the EQDF case, as seen in Figure 7.4.

Chapter 7. Numerical results 61

100 500 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

P

δ (%

)

(a) Penalty coefficient v. δ, EQDF case

100 500 10000

1

2

3

4

5

6

7

8

9

P

δ (%

)

(b) Penalty coefficient v. δ, single dose treatment case case

Figure 7.4: Effect of penalty coefficient (P ) on mean percent divergence from known

optimum (δ)

The number of iterations (N) is the stopping criterion for the simulated annealing algo-

rithm, once the algorithm has performed N searches it stops and returns its best solution.

We chose a maximum number of iterations N = 500, 000 because it offered an error rate

of ±5% in the singe-dose case and ≤ 1% in the EQDF case, while keeping runtimes

reasonable (±1 minute), as can be seen in Figure 7.5.

Finally, in order to measure the effect of starting points on the quality of the solution

obtained, solutions obtained from starting points that were randomly generated were

Chapter 7. Numerical results 62

10K 100K 500K 5.2M0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

N

δ (%

)

(a) Number of iterations v. δ, EQDF case

10K 100K 500K 5.2M0

2

4

6

8

10

12

14

16

N

δ (%

)

(b) Number of iterations v. δ, single dose treatment case case

Figure 7.5: Effect of number of iterations (N) on mean percent divergence from known

optimum (δ)

compared to ”warm starts” in which the starting point is the end point of a previous

run of the algorithm. We generated a sample of 36 results with a random starting point

and a sample of 36 results with a warm start and then performed a two-sample t-test

to see if we could reject the null hypothesis that the mean objective function value was

statistically different from one sample to the other. We conducted this experiment on

both benchmark cases. Our null hypothesis was that if start point affects the solution,

then the mean objective function values obtained in each sample will be statistically

Chapter 7. Numerical results 63

Table 7.4: Two-sample t-test, null hypothesis of equal means, regardless of starting point

Statistic EQDF case Single dose case

Mean rand start 29.373 111.964

Stdev rand start 0.181 2.874

Mean warm start 29.365 118.859

Stdev warm start 0.142 2.988

t statistic 0.083 -16.848

degrees of freedom 67 68

p value 0.23 0

different and we will not reject it.

Unlike the other tests for parameter sensitivity, we employed statistical significance test-

ing, instead of simple numerical comparison of the effect of parameter variations on the

solution, because our data points were randomly generated and for greater sampling

variety. For a treatment with N fractions, our starting point is a variable in RN , to ad-

equately observe the effect of variations in starting point on solution quality, one would

have to test for increases and decreases in all N dimensions, a combinatorially large task.

Instead, we conducted a comparison of result samples obtained from random and warm

start starting points for an objective and constraint function combination that have a

known analytically-derived optimum was an equal dose per fraction and one where the

known optimum was a single-dose treatment. The results of this test are reported, in

Table 7.4. For the mechanics of the t-test, we followed the procedure outline in Devore

(2000).

We see that in the case of the single-dose treatment the null hypothesis that both means

are indistinguishable is rejected. This rejection of the null hypothesis means the warm

start appears to yield better results (higher objective function value) in the single-dose

Chapter 7. Numerical results 64

case. Therefore all our further runs were done with warm starts. That means we ran our

algorithm once, using a random start, recorded the end point and used that end point

as a starting point for another run. The results of the second run were the one that were

retained. Although the warm start procedure increased runtimes, they remained very

reasonable (less than 2 minutes).

We may conclude that comparing various algorithm parameter settings allowed us to

fine-tune our algorithm to obtain the best possible solutions. Our solutions also seemed

to numerically validate the conclusions presented in Keller and Davison (2008); Mizuta

et al. (2012).

7.4 Single tumor, single sensitive structure case

The first hypothetical patient cases are cases with only one sensitive structure constraint.

In these cases we have the benefit of testing our solutions against the known optima of

Keller and Davison (2008); Mizuta et al. (2012). We compare our solution against the

known analytic optima, in both benchmark test cases, described at the beginning of this

chapter.

In Figure 7.6 and Table 7.5 we see that our simulated annealing is able to virtually

replicate the benchmark case and yields a solution that is only 0.03 % worse than the

known optimum, in the case where the optimum is known to be an EQDF treatment of

2 Gy per fraction. In the case where the optimum is known to be a single dose of 18.94

Gy, our solution yields an objective function value that is roughly 10% worse than the

optimal.

Chapter 7. Numerical results 65

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

Fraction

Dos

e in

Gy

Simulated annealing results

Known optimal doses

(a) Dose per fraction versus known optimum, EQDF case

0 5 10 15 20 25 30 350

2

4

6

8

10

12

14

16

18

20

Fraction

Dos

e in

Gy

Simulated annealing results

Known optimal doses

(b) Dose per fraction versus known optimum, single dose case

Figure 7.6: Comparison of simulated annealing dose per fraction and known optimal dose

Chapter 7. Numerical results 66

Table 7.5: Solution accuracy measured with respect to known optimum

Solution Mean value of objective function (min; max; stdev) Solutions feasible?

EQDF case

Analytic solution 29.40

SA algorithm’s solution 29.39 yes

Percent difference -1.24% (-0.03;-4;2)

Single dose case

Analytic solution 132.18

SA algorithm’s solution 118.86 yes

Percent difference -4.87% (-0.86;-15.3;6)

7.5 Multiple sensitive structures case

We examined examples with multiple constraints (two to fourteen), to re-create a realistic

treatment situation in which more than one sensitive structure may be adversely affected

by the radiation dose. To come-up with these test scenarios we began with a problem

with one single constraint and added constraints one by one, while recording our results

after each addition of a new constraint, to the problem. As each constraint was included

we recorded the parameters:

• the total number of constraints in the problem

• the best (highest) objective function value returned by the simulated annealing

algorithm (z∗)

• the best (highest) feasible objective function value for a single-dose treatment, given

the new constraint set (BFSD)

Chapter 7. Numerical results 67

• the best (highest) feasible objective function value for a equal-dose treatment, given

the new constraint set (BFEQDF)

• the number of active constraints

• the parameters α, β, ω of the last constraint added to the problem

• the right-hand side value of last constraint added to the problem (Kj)

Unfortunately, in these general cases of multiple sensitive structure constraints, unlike

the case of a single constraint where the analytical optima were given by Keller and

Davison (2008); Mizuta et al. (2012), we do not benefit from a known benchmark for the

optimum. We compare our results to the best feasible single-dose and equal dose per

fraction treatments allowed by our set of constraints, as a “reality check”. Our initial

results suggest, for the examples we encountered, that only the most restrictive constraint

plays a role in the optimum and that the problems we examined can be simplified into

the form of a problem with only one sensitive-tissue constraint, for which results from

Keller and Davison (2008); Mizuta et al. (2012) may be applied.

While the most restrictive constraint may be difficult to identify, prior to running our

simulated annealing algorithm on the entire problem, it is easy to identify it once the

algorithm has been run. The most restrictive constraint(s) will be tight at the conclusion

of the simulated annealing run, since the optimum occurs on the boundary of the feasible

set and this boundary in our problem is defined by the most restrictive constraint(s).

Such a case is shown in Table 7.6, where we see that problem is entirely defined by the

second constraint, which make all 13 other constraints irrelevant.

We recomputed results for this same problem but shuffled the order in which the con-

straints were introduced into the problem. We notice the exact same behavior, except

that the dominant, most restrictive, constraint is now the seventh constraint. We note

Chapter 7. Numerical results 68

Table 7.6: Numerical results, multiple constraint EQDF cases, second constraint domi-

nates. BFSD denotes the value of the objective at the best feasible single-dose solution

given the set of constraints and BFEQDF denotes the value of the objective at the best

feasible equal-dose solution given the same set of constraints

Tot constr z∗ BFSD BFEQDF Active αj βj ωj Kj

1 29.39 19.18 29.40 1 0.32 0.11 0.70 22.64

2 22.59 12.39 22.61 1 0.37 0.18 0.71 22.64

3 22.59 12.39 22.61 1 0.32 0.03 0.65 22.59

4 22.59 12.39 22.61 1 0.38 0.10 0.65 22.64

5 22.59 12.39 22.61 1 0.37 0.17 0.70 22.64

6 22.59 12.39 22.61 1 0.28 0.11 0.71 22.64

7 22.59 12.39 22.61 1 0.36 0.14 0.74 22.70

8 22.59 12.39 22.61 1 0.29 0.05 0.70 22.72

9 22.59 12.39 22.61 1 0.32 0.04 0.75 22.68

10 22.59 12.39 22.61 1 0.39 0.10 0.72 22.61

11 22.59 12.39 22.61 1 0.23 0.04 0.71 22.61

12 22.59 12.39 22.61 1 0.39 0.18 0.65 22.64

13 22.59 12.39 22.61 1 0.34 0.16 0.74 22.58

14 22.59 12.39 22.61 1 0.30 0.05 0.70 22.71

Chapter 7. Numerical results 69

the simulated annealing solution does not change after the introduction of this dominant

constraint. The results are reported in Table 7.7.

We also observed similar constraint dominance in the case where our simulated annealing

returned a single-dose treatment as optimal. In this specific case, we also noted how

the introduction of additional constraints may change the solution. Here the dominant

constraint for all cases with two to eleven constraints was the second constraint, but with

the introduction of the twelfth constraint, the feasible set was restricted further and the

analytical and numerical optimal solutions decreased as well. With the introduction of

the second constraint the problem solution became a single-dose treatment. The problem

became dominated by the second constraint, which made others irrelevant. The numerical

results are reported in Table 7.8.

Finally, we also examined the case where multiple constraints intersected on the bound-

ary. In this case we noted that only one of the dominant constraints was necessary to

define the problem. In the case we encountered adding extra intersecting constraints

did not affect the solution. Results are reported in Table 7.9. This case is especially

interesting because the second, fifth and twelfth constraints all have the same equal-dose

per fraction root (but not single dose roots) and the simulated annealing results suggest

the optimum is an equal dose treatment.

7.6 Time-varying radiosensitivity parameters

As discussed in Chapter 3, it is suspected that tumor-tissue response to radiation evolves

over the course of treatment. We experimented numerically with cases in which the tumor

become less or more sensitive to radiation, over the course of a treatment. Naturally,

unlike what we did with all our other numerical examples, we did not sort our doses, in

Chapter 7. Numerical results 70

Table 7.7: Numerical results, multiple constraint EQDF cases, seventh constraint domi-

nates. BFSD denotes the value of the objective at the best feasible single-dose solution

given the set of constraints and BFEQDF denotes the value of the objective at the best

feasible equal-dose solution given the same set of constraints

Tot constr z∗ BFSD BFEQDF Active αj βj ωj Kj

1 29.39 19.18 29.40 1 0.32 0.11 0.70 22.64

2 23.38 13.78 23.41 1 0.36 0.14 0.74 22.70

3 23.38 13.78 23.41 1 0.32 0.03 0.65 22.59

4 23.38 13.78 23.41 1 0.38 0.10 0.65 22.64

5 23.38 13.78 23.41 1 0.37 0.17 0.70 22.64

6 23.38 13.78 23.41 1 0.28 0.11 0.71 22.64

7 22.59 12.39 22.61 1 0.37 0.18 0.71 22.64

8 22.59 12.39 22.61 1 0.29 0.05 0.70 22.72

9 22.59 12.39 22.61 1 0.32 0.04 0.75 22.68

10 22.59 12.39 22.61 1 0.39 0.10 0.72 22.61

11 22.59 12.39 22.61 1 0.23 0.04 0.71 22.61

12 22.59 12.39 22.61 1 0.39 0.18 0.65 22.64

13 22.59 12.39 22.61 1 0.34 0.16 0.74 22.58

14 22.59 12.39 22.61 1 0.30 0.05 0.70 22,71

Chapter 7. Numerical results 71

Table 7.8: Numerical results, shifting dominant constraint, single-dose cases, second

then twelfth constraint dominate respectively. BFSD denotes the value of the objective

at the best feasible single-dose solution given the set of constraints and BFEQDF denotes

the value of the objective at the best feasible equal-dose solution given the same set of

constraints

Tot constr z∗ BFSD BFEQDF Active αj βj ωj Kj

1 131.09 132.23 73.51 1 0.32 0.11 0.70 22.64

2 83.93 84.78 55.64 1 0.35 0.17 0.70 22.82

3 83.94 84.78 55.64 1 0.32 0.03 0.65 22.59

4 83.93 84.78 55.64 1 0.38 0.10 0.65 22.64

5 83.93 84.78 55.64 1 0.37 0.17 0.70 22.64

6 83.93 84.78 55.64 1 0.28 0.11 0.71 22.64

7 83.93 84.78 55.64 1 0.29 0.05 0.70 22.72

8 83.93 84.78 55.64 1 0.32 0.04 0.75 22.68

9 83.93 84.78 55.64 1 0.39 0.10 0.72 22.61

10 83.93 84.78 55.64 1 0.23 0.04 0.71 22.61

11 83.93 84.78 55.64 1 0.39 0.18 0.65 22.64

12 83.87 84.71 55.47 1 0.33 0.16 0.73 22.58

13 83.87 84.71 55.47 1 0.30 0.05 0.70 22.71

14 83.87 84.71 55.47 1 0.40 0.12 0.69 22.73

Chapter 7. Numerical results 72

Table 7.9: Numerical results, multiple dominant constraints, EQDF case. BFSD denotes

the value of the objective at the best feasible single-dose solution given the set of con-

straints and BFEQDF denotes the value of the objective at the best feasible equal-dose

solution given the same set of constraints

Tot constr z∗ BFSD BFEQDF Active αj βj ωj Kj

1 29.38 19.19 29.40 1 0.32 0.11 0.70 22.64

2 23.40 12.79 23.42 1 0.35 0.17 0.70 21.96

3 23.40 12.79 23.42 1 0.32 0.03 0.65 22.59

4 23.40 12.79 23.42 1 0.38 0.10 0.65 22.64

5 23.40 12.79 23.42 2 0.37 0.17 0.70 22.35

6 23.40 12.79 23.42 2 0.28 0.11 0.71 22.64

7 23.40 12.79 23.42 2 0.29 0.05 0.70 22.72

8 23.40 12.79 23.42 2 0.32 0.04 0.75 22.68

9 23.40 12.79 23.42 2 0.39 0.10 0.72 22.61

10 23.40 12.79 23.42 2 0.23 0.04 0.71 22.61

11 23.40 12.79 23.42 2 0.39 0.18 0.65 22.64

12 23.40 12.79 23.42 3 0.33 0.16 0.73 21.79

13 23.40 12.79 23.42 3 0.30 0.05 0.70 22.71

14 23.40 12.79 23.42 3 0.40 0.12 0.69 22.73

order to examine the effect of the time-varying parameters. In Figure 7.7 we show the

numerical results of a continuous decrease (increase) in both αT , βT parameters of 0.1%

per fraction, over a treatment of 35 fractions.

Although our numerical technique could easily accommodate more than one sensitive-

structure constraint, we restrict our examination to the single-constraint case, for which

we have known analytical optima in the case of constant parameters. By restricting our-

selves to single-constraint cases we validated our intuition and submitted our numerical

Chapter 7. Numerical results 73

0  

0.5  

1  

1.5  

2  

2.5  

3  

1   3   5   7   9   11  13  15  17  19  21  23  25  27  29  31  33  35  

Dose  (G

y)  

Frac.on  

Increasing  parameters  

Decreasing  parameters  

Linear  (Increasing  parameters)  

Linear  (Decreasing  parameters)  

Figure 7.7: Doses per fraction with time varying radiosensitivity parameters, start-end

trend line in black

routine to further testing.

In spite of the inherent noise introduced by the search technique, in Figure 7.7 we see that

doses per fraction decrease (increase) in the case of decreasing (increasing) parameters.

This becomes clearer upon examination of upon examination of the linear trend lines.

These results are very intuitive, since only the objective function is affected by this time-

dependency, the ordering of boundary feasible doses over time only affects the objective

function’s value, while leaving the constraint undisturbed. For example, let αT1 ≥ · · · ≥

αTf and βT1 ≥ · · · ≥ βTf , or vice versa αT1 ≤ · · · ≤ αTf and βT1 ≤ · · · ≤ βTf , in such

a case we would naturally expect higher (lower) doses to exploit time variations in the

objective function, while keeping the constraints unaffected, as we saw in Section 2.1 the

constraints would remain unaffected by the ordering of feasible dose schedules. These

trends in doses over time also provide validation of the efficacy of our simulated annealing

routine.

Chapter 8

Discussion and conclusions

8.1 Summary

In this thesis, we have introduced the problem of dose per fraction optimization for a

problem with a pre-determined fixed number of fractions. After reviewing the latest

literature on the subject, we found that only the case of a tumor and single sensitive-

structure was formally studied, until now.

We generalized the fractionation problem presented in the literature, to deal with the

cases in which the dosage delivered to the tumor is constrained by more than one sensitive-

structure and where tumor radiosensitivity parameters may vary over the course of a

treatment.

While posing this generalized problem, we also showed that any solution must lie on the

boundary of the feasible set. Because of this property of the optimum, we noted that the

multiple constraint problem often reduces to a single constraint problem, when one of

the constraints dominates the rest, as seen in Section 7.5. Also, as shown in Section 5.3,

74

Chapter 8. Discussion and conclusions 75

the optimum occurs at the intersection of two or more constraints or at the equal dose

per fraction point (or at any arbitrary feasible point on the boundary, which includes the

two points just mentioned).

Finally, we implemented and tailored a simulated annealing search algorithm by examin-

ing its sensitivities to its parameters, for the case of our specific problem. We successfully

applied the end product of our tailoring to the fractionation problem, with very promising

results, as reported in Chapter 7.

8.2 Suggestion for future work

As we saw in Chapter 5 the optima must lie on the boundary and this boundary may be

defined by the intersection of two or more sensitive-structure constraints. Developing a

way to formally describe this boundary would be a useful way of simplifying the problem

and would be helpful in coming up with analytical solutions to this problem.

More modestly, an enumeration algorithm that finds the intersection points of the con-

straints may be useful in obtaining an exact global optimum. As seen in Chapter 5 the

optima must not only lie but must also lie at the intersection of two or more constraints,

at the EQDF point or at any arbitrary feasible point on the boundary. Enumerating

these constraint intersection points as well as the EQDF points, then comparing the

objective function value at each of these points would guarantee we obtain a global op-

timum. Developing such an enumeration algorithm may be an interesting avenue for

future research.

We very briefly investigated various global optimization search techniques, to finally end

up with simulated annealing, which is widely used in the medical physics literature.

Further tests and comparisons of these techniques in the context of our fractionation

Chapter 8. Discussion and conclusions 76

problem, may more rigorously determine the most appropriate of these techniques, in

the case of our specific problem.

On the medical side it may be interesting to examine how total dose constraints affect the

solution of the fractionation problem. In current clinical practice, physicians typically

prescribe a total dose and a treatment length. Adding a total dose constraint would allow

a direct comparison of the mathematical solution and current medical conventions. To

make the problem more clinically realistic, constrains on each fractional dose may also

be explored. Indeed, very large doses are probably not realistic, in a clinical setting.

Appendix A

Alternate solution techniques

investigated

A.1 Steepest descent/ascent

The steepest descent (ascent) algorithm (Luenberger and Ye, 2008, p. 233-234) uses a

function’s gradient to find a path of successive points xi, . . . , xi+Nthat decrease (increase)

a function’s value, such that f(xi) ≤ . . . ≤ f(xi+N), in the case of steepest descent and

f(xi) ≥ . . . ≥ f(xi+N), in the case of steepest ascent. The sequence of points is generated

by the following relation:

xi = xi−1 + α∇f(xi−1)

Where the gradient gives us is the step direction and α the step size. This step size is

positive in the case of a maximization problem and negative in the case of a minimization

problem. The idea is to exploit the gradient which gives the direction of steepest ascent,

in order to move towards a minimum (maximum), as fast as possible. If we are trying

to maximize or minimize a function, subject to a set of constraints, this technique is not

77

Appendix A. Alternate solution techniques investigated 78

suitable as it may lead to infeasible points.

In the specific case of the fractionation problem, this technique has one further in-

convenient, due to the objective function’s nature. Indeed, the gradient of the ob-

jective function of the fractionation problem over F fractions is given by ∇f = [α +

2βx1 . . . α + 2βxN ]T , which means that if our initial starting point has all equal coor-

dinates ~x = [x̃ . . . x̃]T (an EQDF case), we will only be exploring other EQDF points,

since the gradient ∇f = [α + 2βx̃ . . . α + 2βx̃]T also has all equal coordinates. Simi-

larly, if our starting point has non-uniform coordinates ~x = [x1 . . . xF ]T we will never

visit a point with uniform coordinates (an EQDF point). In this case, the gradient is

∇f = [α + 2βx1 . . . α + 2βxN ]T . For this reason, this technique was not pursued.

A.2 Non-linear projected gradient

The projected gradient technique is a steepest ascent algorithm tailored to constrained

problems. It projects the objective function’s gradient into the subspace defined by the

active constraint set at the current point. It is typically employed for linearly-constrained

problems, for which it is best suited. In the case of a linearly constrained problem, the

gradient is projected in the null-space of the matrix of active constraints. This subspace

lies in the feasible set.

With some modifications however, this technique may be applied to non-linearly con-

strained problems. In Luenberger and Ye (2008), a variant of this algorithm tailored

to nonlinearly constrained problems is presented. Its details are presented here. This

variant of the technique projects the gradient onto the (linear) subspace tangent to the

(nonlinear) surface defined by the active constraints, by pre-multiplying it by the pro-

jection matrix Pn×n = In×n − Ji (~xi)Tn×m[Ji (~xi) Ji (~xi)

T]−1

Ji (~xi). It then attempts to

Appendix A. Alternate solution techniques investigated 79

Algorithm 2 Non-linear projected gradient

1: while f (~xi+1) ≥ (1 + ε) f (~xi) or until maximum number of desired iterations has

been reached do

2: Start with arbitrary feasible point ~xi

3: Set hi (~xi) = ~0 to be the active set (i.e., a subset of all constraints)

4: Project ∇f (~xF ) into feasible subspace (i.e., ~̂x = Pi∇f)

5: take step in direction ~̂x

~y = ~x+ k~̂x

6: while h (~y) 6= ~0(adjust ~y) (while infeasible, keep adjusting until you re-enter fea-

sible set) do

7: ~y = ~y − Ji (~xk)Tn×m[Ji (~xk) Ji (~xi)

T]−1

h (~y)

8: end while

9: Set ~xk+1 = ~y

10: end while

backtrack into the feasible space, as seen in step 6 of the projected gradient algorithm.

Because the gradient is projected onto the tangent space, by definition, it is projected

outside the feasible set. The algorithm then attempts to backtrack to a point within

the feasible set. This imposes a large computational cost and our experiments with this

technique were not promising. Even with small test cases, run times were unreasonably

long, so it was not pursued beyond a few experiments.

Bibliography

Aleman, D M, J Wallgren, H E Romeijn, J F Dempsey. 2007. A fluence map optimization

model for restoring traditional fractionation in imrt treatment planning. Unpublished

.

Aleman, DM, A Kumar, RK Ahuja, E Romeijn, JF Dempsey. 2008. Neighborhood search

approaches to beam orientation optimization in intensity modulated radiation therapy

treatment planning. Journal of Global Optimization 42 587–607.

Bazaraa, M S, H D Sherali, C M Shetty. 2006. Nonlinear Programming . John Wiley and

Sons, Inc., 165–235.

Bertsimas, D, J Tsitsiklis. 1993. Simulated annealing. Statistical Science 8 10–15.

Bortfeld, T, W Schlegel. 1993. Optimization of beam orientations in radiation therapy:

some theoretical considerations. Physics in Medicine and Biology 38 291–304.

Bourhis, J, j Overgaard, H Audry, K K Ang, M Saunders, J Bernier, J-C Horiot, A Le

Matre, T F Pajak, M G Poulsen, B O’Sullivan, W Dombrowskyand A Hliniak, K Sklaw-

dowski, J H Hay, L HJ Pinto, C Fallai, K K Fu, R Sylvester, J-P Pignon. 2006. Hy-

perfractionated or accelerated radiotherapy in head and neck cancer: a meta-analysis.

The Lancet 368 843–854.

Brenner, D J. 2008. The linear-quadratic model is an appropriate methodology for deter-

80

BIBLIOGRAPHY 81

mining isoeffective doses at large doses per fraction. Seminars in Radiation Oncology

18 234 – 239.

Brenner, D J, L R Hlatky, P J Hahnfeldt, Y Huang, R K Sachs. 1998. The linear-quadratic

model and most other common radiobiological models result in similar predictions of

time-dose relationships. Radiation Research 150 83 – 91.

Brownlee, J. 2011. Clever Algorithms Nature Inspired Programming Recipes . LuLu.

Cao, W, G J Lim, A Lee, Y Li, W Liu, R Zhu, X Zhang. 2012. Uncertainty incorporated

beam angle optimization for impt treatment planning. Medical Physics 39 5248–5256.

Chan, T, T Bortfeld, J Tsitsiklis. 2006. A robust approach to imrt optimization. Physics

in Medicine and Biology 51 2567–2583.

Chelouah, R, P Siarry. 2000. Tabu search applied to global optimization. European

Journal of Operational Research 123 256–270.

Chu, M, Y Zinchenko, S G Henderson, M Sharpe. 2005. Robust optimization for inten-

sity modulated radiation therapy treatment planning under uncertainty. Physics in

Medicine and Biology 50 5463–5477.

Devore, J L. 2000. Probability and Statistics for Engineering and the Sciences . Duxbury

Thomson Learning, 366–369.

Ezzell, G A. 1996. Genetic and geometric optimization of three-dimensional radiation

therapy treatment planning. Medical Physics 23 293–305.

Ferreira, B C, MdC Lopes, J Mateus, M Capela, P Mavroidis. 2010. Radiobiological

evaluation of forward and inverse imrt using different fractionations for head and neck

tumours. Radiation Oncology .

Fowler, J F. 1989. The linear-quadratic formula and progress in fractionated radiotherapy.

The British Journal of Radiology 62 679–694.

BIBLIOGRAPHY 82

Fowler, J F. 1992. Brief summary of radiobiological principles in fractionated radiother-

apy. Seminars in Radiation Oncology 2 16–21.

Gilio, J P. 1997. Tabu search: An automatic treatment planning algorithm for conformal

radiotherapy. Ph.D. thesis, Wayne State University.

Hirsch, M J, C N Meneses, P M Pardalos, M G C Resensede. 2007. Global optimization

by continuous grasp. Optimization Letters 1 201–21.

Hoffmann, A L, D d Hertog, A Y D Siem, J H A M Kaanders, H Huizenga. 2008.

Convex reformulation of biologically-based multi-criteria intensity-modulated radiation

therapy optimization including fractionation effects. Physics in Medicine and Biology

53 6345–6362.

Kapamara, T, K Sheibani, C R Reeves OC L Haas, D Petrovic. 2006. A comparison

of heuristic search algorithms for molecular docking. Proceedings of the Eighteenth

International Conference on Systems Engineering (ICSE2006) .

Keller, H, M Davison. 2008. Dose painting in time: Optimal dose-per-fraction schedules

for static and time-dependent dose-effect model parameters. Preprint .

Keller, H, A Hope, G Meier, M Davison. 2012. Optimizing therapeutic ratio through

fractionation: When should high doses per fraction be considered? Submitted to

Medical Physics .

Kirkpatrick, J P, J J Meyer, L B Marks. 2008. The linear-quadratic model is inappropriate

to model high dose per fraction effects in radiosurgery. Seminars in Radiation Oncology

18 240–243.

Lee, A W M, W Foo, R Chappell, J F Fowler, W M Sze, Y F Poon, S C K Law, S H Ng,

S Y Tung, J H C Ho. 1998. Effect of time, dose, and fractionation on temporal lobe

BIBLIOGRAPHY 83

necrosis following radiotherapy for nasopharyngeal carcinoma. International Journal

of Radiation Oncology Biology and Physics 40 35–42.

Levin-Plotnik, D, R J Hamilton. 2004. Optimization of tumor control probability for

heterogeneous tumours in fractionated radiotherapy treatment protocols. Physics in

Medicine and Biology 49 407–424.

Li, Y, J Yao, D Yao. 2004. Automatic beam angle selection in imrt planning using genetic

algorithm. Physics in Medicine and Biology 49.

Luenberger, D G, Y Ye. 2008. Linear and Nonlinear Programming . Springer.

Ma, L, A Sahgal, M Descovich, Y B Cho, C Chuang, K Huang, N J Laperriere, D C

Shrieve, D A Larson. 2010. Equivalence in dose fall-off for isocentric and nonisocen-

tric intracranial treatment modalities and its impact on dose fractionation schemes.

International Journal of Radiation Oncology Biology and Physics 76 943–48.

Mageras, GS, R Mohan. 1993. Application of fast simulated annealing to optimization

of conformal radiation treatments. Medical Physics 20 639–647.

Mavroidis, P, B K Lind, A Brahme. 2001. Biologically effective uniform dose ( ¯̄d) for

specification, report and comparison of dose response relations and treatment plans.

Physics in Medicine and Biology 46 2607–2630.

Michalewicz, Z. 1995. Genetic algorithms, numerical optimizaion, and constraints. Sixth

International Conference on Genetic Algorithms 151–158.

Miki, M, S Hiwa, T Hiroyasu. 2006. Simulated annealing using an adaptive search vector.

2006 IEEE Conference on Cybernetics and Intelligent Systems .

Miles, E F, W R Lee. 2008. Hypofractionation for prostate cancer: A critical review.

Seminars in Radiation Oncology 18 41–47.

BIBLIOGRAPHY 84

Mizuta, M, S Takao, H Date, N Kishimoto, K Sutherland, R Onimaru, H Shirato. 2012. A

mathematical study to select fractionation regimen based on physical dose distribution

and the linear-quadratic model. International Journal of Radiation Oncology Biology

Physics 84 829–33.

Morrill, SM, RG Lane, G Jacobson, II Rosen. 1991. Treatment planning optimization

using constrained simulated annealing. Physics in Medicine and Biology 36 1341–1359.

Nelson, J, D Brodie. 1990. Comparison of a random search algorithm and mixed integer

programming for solving area-based forest plans. Canadian Journal of Forest Research

20 934–942.

Niemierko, A. 1992. Random search algorithm (ronsc) for optimization of radiation

therapy with both physical and biological end points and constraints. International

Journal of Radiation Oncology Biology Physics 23.

Niemierko, A, M Urie, M Goitien. 1992. Optimization of 3d radiation therapy with both

physical and biological end points and constraints. International Journal of Radiation

Oncology Biology Physics 23.

Perez, JR, J Basterrechea. 2007. Comparison of different heuristic optimization methods

for near-field antenna measurements. IEEE Transactions on Antennas and Propagation

55 549–555.

Petrovic, S, P Leite-Rocha. 2008. Constructive and grasp approaches to radiotherapy

treatment scheduling. IAENG Special Edition of the World Congress on Engineer-

ing and Computer Science 208, WCECS ’08. Advances in Electrical and Electronics

Engineering 192–200.

Romeijn, H E, R K Ahuja, J F Dempsey, A Kumar. 2006. A new linear programming

approach to radiation therapy treatment planning problems. Operations Research 54

201–216.

BIBLIOGRAPHY 85

Romeijn, H E, R L Smith. 1994. Simulated annealing for constrained global optimization.

Journal of Global Optimization 5 101–126.

Rossi-Doria, O, M Sampels, M Birattari, M Chiarandini, M Dorigo, LM Gambardella,

J Knowles, M Manfrin, M Mastrolilli, B Paechter, L Paquete, T Stutzle. 2003. A

comparison of the performance of different metaheuristics on the timetabling problem.

B Edmund, C Patrick, eds., Practice and Theory of Automated Timetabling IV , Lecture

Notes in Computer Science, vol. 2740. 329–351.

Russell, S, P Norvig. 1995. Artificial Intelligence: A Modern Approach. Prentice-Hall

Inc., 92–115.

Solis, FJ, RJ-B Wets. 1981. Minimization by random search techniques. Mathematics of

Operations Research 6 19–30.

Thames, H D. 1992. On the origin of dose fractionation regimens in radiotherapy. Sem-

inars in Radiation Oncology 2 3–9.

Wah, B W, Y X Chen. 2000. Optimal anytime constrained simulated annealing for

constrained global optimization. Principles and Practice of Constraint Programming

CP 2000 6th International Conference, CP 2000 Singapore, September 1821, 2000

Proceedings 1894 425–440.

Wah, B W, T Wang. 1999. Constrained simulated annealing with applications in nonlin-

ear continuous constrained global optimization. 11th IEEE International Conference

on Tools with Artificial Intelligence, 1999. Proceedings. 381–388.

Webb, S. 1989. Optimisation of conformal radiotherapy dose distribution by simulated

annealing. Physics in Medicine and Biology 34 1349–1370.

Westhead, D R, D E Clark, C W Murray. 1997. A comparison of heuristic search algo-

rithms for molecular docking. Journal of Computer-Aided Molecular Design 11.

BIBLIOGRAPHY 86

Wolpert, DH, WG Macready. 1997. No free lunch theorems for optimization. IEEE

Transactions on Evolutionary Computation 1 67–82.

Yang, Y, L Xing. 2005. Optimization of radiotherapy dose-time fractionation with con-

sideration of tumor specific biology. Medical Physics 32 3666–3677.