parallel scalable pde-constrained optimization: antenna...

7
Noname manuscript No. (will be inserted by the editor) Parallel Scalable PDE-Constrained Optimization: Antenna Identification in Hyperthermia Cancer Treatment Planning Olaf Schenk · Murat Manguoglu · Ahmed Sameh · Matthias Christen · Madan Sathe Received: 01 February 2009 / Accepted: 20 March 2009 Abstract We present a PDE-constrained optimization algo- rithm which is designed for parallel scalability on distributed- memory architectures with thousands of cores. The method is based on a line-search interior-point algorithm for large- scale continuous optimization, it is matrix-free in that it does not require the factorization of derivative matrices. Instead, it uses a new parallel and robust iterative linear solver on distributed-memory architectures. We will show almost lin- ear parallel scalability results for the complete optimization problem, which is a new emerging important biomedical ap- plication and is related to antenna identification in hyper- thermia cancer treatment planning. Keywords PDE-constrained optimization · large-scale parallel optimization · biomedical application · saddle-point matrices · sparse liner solver PACS PACS 02.10.Ud · PACS 02.60.Dc · PACS 02.60.Pn This work was supported by the Swiss National Science Foundation under grant 200021 - 117745/1, by an IBM Faculty Award on ”Mod- eling, Simulation and Optimization in Hyperthermia Cancer Treatment Planning” and partially supported by a gift from Intel’s Parallel and Distributed Solutions Division (PDSD), and grants from NSF (NSF- CCF-0635169), and DARPA/AFRL (FA8750-06-1-0233) Olaf Schenk, Matthias Christen, Madan Sathe Computer Science Department, University of Basel, Klingel- bergstrasse 50, CH-4056 Basel Tel.: +41-61-26714765 Fax: +41-61-26714761 E-mail: olaf.schenk@unibas. ch, [email protected], [email protected] Murat Manguoglu, Ahmed Sameh Computer Sciences, University of Purdue, 305 N. University Street, West Lafayette, IN 4790 Tel.: +1 765 4941559 Fax: +1 765 4941550 E-mail: [email protected], [email protected] Mathematics Subject Classification (2000) MSC 90C06 · MSC 90C51 · MSC 90C26 · MSC 65F10 · MSC 65F05 · MSC 68W10 1 Introduction Biomedical hyperthermia cancer treatment is a promising therapeutical option in oncology [12]. Various types of can- cer can be treated by heating the tumor to about 45 0 C us- ing non-ionizing radiation (microwaves). It makes the tumor more susceptible to an accompanying radio or chemother- apy. Heating tumors above a temperature of about 45 C re- sults in preferential killing of tumor cells and makes them more susceptible to an accompanying radio or chemo ther- apy. Modern hyperthermia applicators operating at around 100 MHz provide a larger number of antennas for which the amplitude and phase can be controlled independently which permits shifting the focus and preventing hotspots. The op- timal hyperthermia treatment planning can be formulated as a nonlinear optimization problem, in which the Pennes Bio-heat equations [13] appear as important constraints — a mathematical task known as PDE-constrained optimization. Solving the PDE constrained optimization problem presents a frontier problem in scientific computing. The size, com- plexity and infinite-dimensional nature of PDE-constrained optimization problems present significant challenges for gen- eral purpose optimization algorithms and, typically, Tikho- nov regularization, iterative solvers, preconditioning, inex- actness and parallel implementations are necessary to cope with the numerical challenges. For designing an individually optimal therapy, ampli- tudes and phases of the antennas have to be selected such that the tumor temperature is maximized up to a target ther- apeutical temperature T ther of 43 C. In order not to damage healthy tissue, certain temperature constraints have to be re-

Upload: others

Post on 19-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

Noname manuscript No.(will be inserted by the editor)

Parallel Scalable PDE-Constrained Optimization: Antenna Identificationin Hyperthermia Cancer Treatment Planning

Olaf Schenk · Murat Manguoglu · Ahmed Sameh · Matthias Christen · MadanSathe

Received: 01 February 2009 / Accepted: 20 March 2009

Abstract We present a PDE-constrained optimization algo-rithm which is designed for parallel scalability on distributed-memory architectures with thousands of cores. The methodis based on a line-search interior-point algorithm for large-scale continuous optimization, it is matrix-free in that it doesnot require the factorization of derivative matrices. Instead,it uses a new parallel and robust iterative linear solver ondistributed-memory architectures. We will show almost lin-ear parallel scalability results for the complete optimizationproblem, which is a new emerging important biomedical ap-plication and is related to antenna identification in hyper-thermia cancer treatment planning.

Keywords PDE-constrained optimization · large-scaleparallel optimization · biomedical application · saddle-pointmatrices · sparse liner solver

PACS PACS 02.10.Ud · PACS 02.60.Dc · PACS 02.60.Pn

This work was supported by the Swiss National Science Foundationunder grant 200021−117745/1, by an IBM Faculty Award on ”Mod-eling, Simulation and Optimization in Hyperthermia Cancer TreatmentPlanning” and partially supported by a gift from Intel’s Parallel andDistributed Solutions Division (PDSD), and grants from NSF (NSF-CCF-0635169), and DARPA/AFRL (FA8750-06-1-0233)

Olaf Schenk, Matthias Christen, Madan SatheComputer Science Department, University of Basel, Klingel-bergstrasse 50, CH-4056 BaselTel.: +41-61-26714765Fax: +41-61-26714761E-mail: olaf.schenk@unibas. ch, [email protected],[email protected]

Murat Manguoglu, Ahmed SamehComputer Sciences, University of Purdue, 305 N. University Street,West Lafayette, IN 4790Tel.: +1 765 4941559Fax: +1 765 4941550E-mail: [email protected], [email protected]

Mathematics Subject Classification (2000) MSC 90C06 ·MSC 90C51 · MSC 90C26 · MSC 65F10 · MSC 65F05 ·MSC 68W10

1 Introduction

Biomedical hyperthermia cancer treatment is a promisingtherapeutical option in oncology [12]. Various types of can-cer can be treated by heating the tumor to about 450C us-ing non-ionizing radiation (microwaves). It makes the tumormore susceptible to an accompanying radio or chemother-apy. Heating tumors above a temperature of about 45C re-sults in preferential killing of tumor cells and makes themmore susceptible to an accompanying radio or chemo ther-apy. Modern hyperthermia applicators operating at around100 MHz provide a larger number of antennas for which theamplitude and phase can be controlled independently whichpermits shifting the focus and preventing hotspots. The op-timal hyperthermia treatment planning can be formulatedas a nonlinear optimization problem, in which the PennesBio-heat equations [13] appear as important constraints — amathematical task known as PDE-constrained optimization.Solving the PDE constrained optimization problem presentsa frontier problem in scientific computing. The size, com-plexity and infinite-dimensional nature of PDE-constrainedoptimization problems present significant challenges for gen-eral purpose optimization algorithms and, typically, Tikho-nov regularization, iterative solvers, preconditioning, inex-actness and parallel implementations are necessary to copewith the numerical challenges.

For designing an individually optimal therapy, ampli-tudes and phases of the antennas have to be selected suchthat the tumor temperature is maximized up to a target ther-apeutical temperature T ther of 43C. In order not to damagehealthy tissue, certain temperature constraints have to be re-

Page 2: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

2

Fig. 1 Overview of the hyperthermia treatment planning process.

spected. The induced temperature distribution is essentiallydescribed by the elliptic bio heat transfer equation [13]. Theaim is to predict the temperature distribution T (state vari-ables) which depends on the complex control of the anten-nas ui and Ei (control variables). This leads to the followinglarge-scale nonlinear optimal PDE control problem and thegoal is to minimize the objective function [12]:

F =∫

x∈Ωt

(T ther−T )2dΩ +∫

x 6∈Ωt ,T >T health

(T−T health)2dΩ

(1a)

subject to

−∇ · (k∇T )+ρbρω(T −T b) =ρσ

2

∣∣∣∣∣∑iuiEi

∣∣∣∣∣2

in Ω (1b)

k∂nT = qconst on ∂Ω (1c)

T |Ω/Ωt < T lim (1d)

min|ui| ≥ α max|ui| (1e)

Here, k is the thermal conductivity, ρ the density (ρb:blood density), ω the perfusion rate, σ the electrical con-ductivity, T the temperature and T b the arterial blood tem-perature and these tissue parameters depend also on the tem-perature. Ω is the part of the patient’s body that is affected,Ωt ⊂ Ω is the domain occupied by tumor tissue. The com-plex control of antenna is defined by ui and Ei and the tem-perature is T ther = 43C, T health = 42C and T lim = 44C.α depends on the HTP applicator that is used in the therapy.

In our application there is great interest in solving opti-mization problems of extremely large sizes. Since the con-

straints of the problem correspond to a discretized biomed-ical PDE, the accuracy of the optimization solution with re-spect to this infinite-dimensional problem is directly relatedto size of the largest discrete approximate problem that canbe solved. One possible alternative for large-scale optimiza-tion is to reduce the original problem into one of a smallersize through a process of nonlinear elimination [9,23]. Thisprocess involves an iteration for determining an optimal setof control variables. For each set of controls, the equalityconstraints in (1b) are solved for the remaining state vari-ables, and an auxiliary system may be solved for the sensi-tivities of the state variables with respect to the controls. Theremainder of the iteration involves only the computation ofa displacement in the controls. Algorithms of this type, how-ever, suffer from a number of various setbacks. For example,if a large number of iterations are required to find an opti-mal set of control variables, then such a procedure requiresa large number of exact solutions of the equality constraints(a PDE) and the adjoint equations (another set of PDEs).

The challenge is thus to design a constrained optimiza-tion algorithm that emulates an efficient nonlinear program-ming approach. The algorithm may utilize matrix-vector pro-ducts with the constraint Jacobian, its transpose, and theHessian of the Lagrangian together with appropriate precon-ditioners — quantities that are computable for many large-scale applications of interest — but must overcome the factthat exact factorizations of derivative matrices are impracti-cal to obtain. Iterative linear system solvers present a viablealternative to direct factorization methods, but the benefitsof these techniques are only realized if inexact step com-putations are controlled appropriately in order to guaranteeglobal convergence of the algorithm.

Figure 1 shows the parallel biomedical framework of ourapplication. In a first step, the Maxwell equation has to besolved in order to determine the electric magnetic fields.The overall EM fields will be used in the Pennes-Bio heatequation (1b) and the optimal temperature distribution canbe computed by changing the phase and amplitude of eachantenna of the hyperthermia applicator. Interior point opti-mization methods will be used for the nonlinear optimiza-tion problem. They have been proven to be a very efficientclass of methods for inequality constrained finite dimen-sional optimization. An inexact Newton pathfollowing al-gorithm is constructed on the base of an efficient inexactscheme in the primal-dual barrier interior-point optimizerIPOPT [22] and the parallel linear system solver PSPIKE.The framework of inexact Newton methods yields accuracyrequirements on the discretization error used to control theNewton iteration.

Recent advances in optimization algorithms [4–7,22] com-bined with scalable robust appropriate preconditioners willresult in PDE-constrained optimization applications that canoften scale to millions of optimization variables. We note

Page 3: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

3

Fig. 2 Patient model and simulated energy absorption temperature distribution. The images show different temperature distributions within theiteration of our parallel PDE-constrained optimization process.

that our method has much in common with the algorithms in[2,3,10] as we follow a full-space, or all-at-once, approachfor PDE-constrained problems. The major difference, how-ever, is that we have presented in [7] conditions that guaran-tee the global convergence of the inexact optimization algo-rithm. In this paper, we will use this inexact primal-dual bar-rier interior-point optimizer IPOPT [22] and add a new scal-able and robust solver — PSPIKE. It will be used to solve alarge-scale biomedical PDE-constrained optimization appli-cation in parallel on distributed-memory architectures. Fig-ure 2 shows different temperature distributions for two dif-ferent hyperthermia patient models during the optimizationprocess. The results have been computed using the parallelalgorithms that are described in the next sections. The un-derlying framework of the new solver PSPIKE is based onthe well-known shared-memory parallel sparse direct solverPARDISO [18,19], which typically scales up to eight to six-teen cores. In order to address scalability up to thousands ofcores, the new solver PSPIKE has been very recently devel-oped and represents a significant extension to existing paral-lel methods. In this paper, we will show almost linear scala-bility results for the complete optimization application up to256 cores.

2 Primal-Dual Barrier Interior-Point Optimization

After applying a finite-difference discretization, the PDE-constrained optimization problem (1) can be transformedinto a nonlinear programming problem (NLP) given by anonconvex objective function f (x) : Rn→ R and constraintfunctions c(x) : Rn → Rm, which are both assumed to betwice continuously differentiable. The objective is to find alocal solution of the optimization problem.

minx∈Rn

f (x) (2a)

s.t. c(x) = 0 and xL ≤ x≤ xU . (2b)

Growing interest in efficient optimization methods has ledto the development of interior-point or barrier methods forlarge-scale nonlinear programming. In particular, these meth-ods provide an attractive alternative to active set strategiesin handling problems with large numbers of inequality con-straints. Over the past 15 years, there has also been a muchbetter understanding of the convergence properties of interior-point methods and efficient algorithms and robust softwarecodes e.g. [22] have been developed with desirable globaland local convergence properties. Given the original prob-lem in the form (2), a sequence of corresponding barrier

Page 4: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

4

problems,

minx∈Rn

ϕµ(x) = f (x)−µi

n

∑k=1

ln(x(k)) (3a)

s.t. c(x) = 0, (3b)

is solved to increasingly tighter tolerances, while again thebarrier parameter µ is driven to zero. Interior-point methodcomputes the solution equivalently by first solving a smaller,symmetric, indefinite linear system[

Wk AkAT

k 0

](∆xk∆λk

)=−

(∇ϕµ(xk)+Akλk

c(xk)

), (4)

The vectors λ and z are the Lagrangian multipliers for theequality and bound constraints, and X and Z denote the di-agonal matrices with the vector elements of x and z on thediagonal. Here Ak = ∇c(xk), and Wk denotes the Hessian∇xxL (x,λ ,z) of the Lagrangian function for the originalproblem (2),

L (x,λ ,z) := f (x)+ c(x)Tλ − z (5)

The search directions are obtained from solving the linearsystem (4) where Wk is an approximation of the Hessian ofthe Lagrangian for the barrier problem (3).

Therefore, the computational efficiency of all PDE con-strained optimization applications strongly depends on theefficiency of the numerical linear algebra kernel to solve thesymmetric indefinite Karush-Kuhn-Tucker systems of opti-mality (4). In recent years, a large amount of work has beendevoted to the problem of solving large symmetric indefi-nite systems (4) in saddle-point form efficiently. One reasonfor this surge in interest is due the success of interior-pointmethods in nonlinear programming, which at their core re-quire the solution of a series of linear systems in saddle-point form. In this work, we will focus on a new method —the PSPIKE algorithm — which represents a highly scalableparallel solver on distributed memory architectures.

3 The Scalable PSPIKE Algorithm

3.1 The Basic SPIKE Algorithm

Let Ax = f be a nonsymmetric diagonally dominant sys-tem of linear equations where A is of order n and band-width 2m+1. Unlike classical banded solvers such as thosein Lapack which are based on LU-factorization of A, thespike algorithm [17,16,1,8,14] is based on the factorizationA = D× S, in which D is a block-diagonal matrix and S isthe spike matrix as shown in Figure 3 for three partitions.Note that the block diagonal matrices A1, A2, and A3 arenonsingular by virtue of the diagonal dominance of A. Forthe example in Figure 3, the basic Spike algorithm consistsof the following stages.

.

A

A1

A2

A3

....................................................................................................

............................

...........................................................................

.....................

...

......

.........

.........

........................................................................................................................................................

..................................................................................................................

...

......

A1

A2

A3

=

= D × S

×

.........................................

I1

I2

I3

........................

.........

.............................. .................................

..............................

........................

......

..............................

B1

C2

B2

C3

V1

W2 V2

W3

Fig. 3 Decomposition where A = D∗S,S = D−1A, B j,C j ∈ Rm×m

– Stage 1: Obtain the LU-factorization (without pivoting)of the diagonal blocks A j (i.e. A j = L jU j, j = 1,2,3 )

– Stage2: Forming the spike matrix S and updating theright hand side(i) solve L1U1[V1,g1] = [( 0

B1), f1]

(ii) solve L2U2[W2,V2,g2] = [(C20 ),( 0

B2), f2]

(iii) solve L3U3[W3,g3] = [(C30 ), f3]

f j, i≤ j≤ 3, are the corresponding partitions of the righthand side f .

– Stage 3: Solving the reduced system,I V (b)

1 0 0W (t)

2 I 0 V (t)2

W (b)2 0 I V (b)

2

0 0 W (t)3 I

x(b)1

x(t)2

x(b)2

x(t)3

=

g(b)

1

g(t)2

g(b)2

g(t)3

(6)

where (V (b)i ,W (b)

i ) and (V (t)i ,W (t)

i ) are the bottom andtop m×m blocks of (Vi,Wi), respectively. Similarly, g(b)

i ,g(t)

i and x(b)i , x(t)

i are the bottom and top m elements ofeach gi and xi, respectively. Once the much smaller re-duced system is solved, via a dense system solver, say,the three partitions xi, i = 1,2,3, of the solution x areobtained as follows:(i) x1 = g1−V1x(t)

2

(ii) x2 = g2−W2x(b)1 −V2x(t)

3

(iii) x3 = g3−W3x(b)2

Clearly, the above basic scheme may be made more efficientif in stage 2 one can generate only the bottom and top m×mtips of the spikes Vi and Wi, as well as the corresponding bot-tom and top tips of gi. In this case, once the reduced systemis solved, solving the system Ax = f is reduced to solvingthe independent systems:

(i) L1U1x1 = f1−B1x(t)2

(ii) L2U2x2 = f2−C2x(b)1 −B2x(t)

3

(iii) L3U3x3 = f3−C3x(b)2

If the matrix A is not diagonally dominant, we cannot guar-antee that the diagonal blocks Ai are nonsingular. However,if we obtain the LU-factorization, without pivoting, of each

Page 5: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

5

Ai using diagonal boosting (perturbation), then

LiUi = (Ai +δAi) (7)

in which ||δAi|| = O(ε||Ai||) where ε is the unit roundoff.In this case, we will need to solve Ax = f using an outeriterative scheme with the preconditioner being the matrix Mwhich is identical to A except that each diagonal block Aiis replaced by LiUi in Equation 7. Solving systems of theform My = r is accomplished via the Spike scheme outlinedabove.

3.2 The PSPIKE Scheme

The PSPIKE scheme can be used for solving general sparsesystems as follows. First, the sparse matrix A is reorderedvia a nonsymmetric reordering scheme that assures none ofthe diagonal elements is zero, followed by a weighted re-ordering scheme which brings as many of the largest ele-ments as possible inside a narrow central band M. Using aKrylov subspace method, e.g. BiCGStab [21], for solvingAx = f , with the extracted banded preconditioner M. Themajor operations in each iteration are: (i) matrix-vector mul-tiplication, and (ii) solving systems of the form My = r. Thesystems My = r are solved using our proposed algorithm:PSPIKE as follows:

The LU-factorization of each diagonal block partitionMi (banded and sparse within the band) is obtained usingPARDISO [19,18] without pivoting but utilizing diagonal boost-ing. The most recent version of PARDISO is also capableof obtaining the top and bottom tips of the left and rightspikes Wi and Vi, as well as the corresponding tips of the up-dated right hand side subvectors. Further, having the largestelements within the band, whether induced by the reorder-ing or occur naturally, allows us to approximate (or trun-cate ) the resulting reduced system by its block diagonal,S = diag(S1, S2, ..., Sp), where p is the number of partitionsand,

S j =

[I V (b)

j

W (t)j+1 I

]. (8)

This also enhances parallelism especially when M is of alarge bandwidth. For a more detailed discussion of the de-cay rate of spikes and the truncated Spike algorithm we referthe reader to [11]. In the following section we will demon-strate the suitability of PSPIKE solver for implementation onclusters consisting of several nodes in which each node is ofmulticore architecture. Thus, while PARDISO is scalable on asingle node only, PSPIKE is scalable across multiple nodes.

3.3 The PSPIKE Algorithm for Solving Linear SystemsArising in Biomedical PDE-Constrained Optimization

The linear systems that are extracted from the nonlinear sol-ver has the following block structureD BT

H CT

B C 0

x1x2x3

=

f1f2f3

(9)

where D ∈ Rn×n is diagonal and Dii > 0 for i = 1,2, ...,n.Furthermore H ∈ Rk×k is symmetric positive definite andC ∈ Rk×n is dense with k << m. B ∈ Rn×n is nonsymmetricbanded and sparse within the band.

Premultiplying the above equation by[D−1

H−1I

]we get

I BT

I CT

B C 0

x1x2x3

=

f1

f2f3

(10)

where BT = D−1BT , CT = H−1CT , f1 = D−1 f1 and f2 =H−1 f2. Rearranging the rows and columns, we have the sys-tem, I BT 0

B 0 C0 CT I

x1x3x2

=

f1f3

f2

. (11)

Observing that[I BT

B 0

]−1

=[

0 B−1

B−T −B−T B−1

](12)

we can see that x2 can be obtained by solving the small sys-tem,

(H + JT DJ)x2 = ( f2− JT f1 + JT Db) (13)

where J and b are obtained by solving BJ = C and Bb = f3,respectively. Consequently, x1 and x3 can be computed viax1 = b− Jx2 and x3 = B−T ( f1−Dx1). The solution processrequires solving linear systems with the coefficient matricesBT and B. We use BiCGStab with a banded preconditioner tosolve these systems, in which the systems involving the pre-conditioner are solved via the PSPIKE scheme we describedabove. In the next section we will illustrate the scalabilityof the above method for solving those systems that arise inbiomedical PDE-constrained optimization problems.

4 Results

The distributed-memory test platform is a cluster with In-finiband interconnection. Each node has 16Gb of memoryand two Intel Harpertown processors where each processor

Page 6: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

6

Table 1 Parallel scalability (total optimization time in seconds) for the PDE-constrained interior-point optimization process. N represents thenumber of discretization points, ’Nodes’ is a dual-quad core Intel Harpertown processor, and ’Threads’ is the number of threads used on eachnode. ’Direct’ indicate the optimization process using an exact step computation, whereas ’Inexact’ indicates step computation based on PSPIKE.The symbol † indicates convergence problems, and ‡ shows that the optimization problem could not be solved due a high memory consumptionof the direct solver.

N Threads Nodes=1 Nodes=4 Nodes=8 Nodes=16 Nodes=32 Nodes=64(Direct) (Inexact) (Inexact) (Inexact) (Inexact) (Inexact)

1 32’107 2’016 1’024 507 † †753 4 10’033 786 262 133 † †

8 7’830 629 198 114 † †1 ‡ 60’861 33’812 17’796 9’132 6’066

1503 4 ‡ 20’287 10’821 5’393 4’072 3’8818 ‡ 14’490 7’246 4’138 1’923 1’596

contains four cores that run at 2.8GHz. The BiCGStab it-erations for solving systems involving BT and B are termi-nated when ||rk||∞/||r0||∞ < εin where r0 and rk are the ini-tial residual and the residual at the kth iteration, respectively,and the systems involving the preconditioners are solved viathe PSPIKE scheme described above.

Figure 1 shows the parallel scalability (total optimiza-tion time in seconds) for the complete interior-point opti-mization process for an artificial Hyperthermia biomedicalmodel problem described in [20]. N represent the numberof spatial discretization points, ’Nodes’ is a dual-quad coreIntel Harpertown processors, and threads is the number ofthreads used on each node. ’Direct’ indicate the optimizationprocess using an exact step computation, and ’Inexact’ indi-cate step computation based on PSPIKE. It can be observedthat almost linear scalability has been reached. The largestPDE-constrained optimization problem contains more than6’750’000 optimization variables and it can be solved in1’597 seconds using 512 Intel Xeon cores (64 nodes witheach 8 cores).

A real 3D hyperthermia model with over 1.8 milliontemperature variables in the NLP problem in (1). We con-sider linear systems extracted from the first, tenth (middle),and twenty first (last) iterations. However, since the resultsare uniform across these three systems, we present in Fig-ure 4 the results for the linear system of the tenth New-ton iteration.1 Figure 4 depict the speed improvement for aMPI/OpenMP hybrid implementation (8 cores(threads) perMPI processes) of the PSPIKE scheme. We demonstrate thespeed improvement of the PSPIKE scheme for two stop-ping criteria, 10−5 and 10−7. The corresponding final rela-tive residuals are O(10−5) and O(10−7), respectively, whichis sufficient to ensure the convergence of the inexact interior-point optimization process. For PARDISO and MUMPS the fi-nal relative residuals are O(10−12). Our new solver PSPIKE

1 The solution of the linear system within the interior-point opti-mization process consumes more than > 99% of the overall time andalmost linear scalability can be observed up to 256 cores.

1

10

100

2561286432168421

Spe

ed Im

prov

emen

t (T

Par

diso

(1 C

ore)

/ T

)

Number of Cores

(4.5)

(9.8)

(19.3)(22.9)

(45.0)

(6.5)

(12.6)

(27.4)(32.2)

(64.3)

(1.0)

(1.7)

(3.2) (3.3) (3.4)

(6.4) (6.2)(5.4)

PSPIKE:εin=10-7

PSPIKE:εin=10-5

PARDISOMUMPS

Fig. 4 The speed improvement of PSPIKE compared to PARDISO andMUMPS up to 256 cores. The base-line is the performance of PARDISOusing one core (1,338 seconds).

shows excellent scalability compared to these two state-of-the-art parallel solvers.

5 Conclusion

We have presented a PDE-constrained interior-point algo-rithm for large-scale hyperthermia cancer treatment plan-ning. The novel aspects of the approach is that we are usinga globally convergent optimization method [7] and also us-ing a new scalable linear solver PSPIKE, which is designedto scale-up to thousands of cores. We have demonstratedthat PSPIKE is an effective extension of PARDISO regard-ing parallel scalability on distributed-memory architectures.The experiments illustrate the computational advantages ofour algorithm, and it is shown that the PDE-constrained op-timization algorithm and implementation outperforms con-ventional approaches in terms of storage and CPU time forchallenging linear systems arising in biomedical applica-tions.

Acknowledgements We thank Andreas Wachter (IBM Research) foruseful comments and suggestions, Esra Neufeld (IT’IS, ETH Zurich)

Page 7: Parallel Scalable PDE-Constrained Optimization: Antenna ...user.ceng.metu.edu.tr/~manguoglu/PDFs/isc09.pdf · Keywords PDE-constrained optimization large-scale parallel optimization

7

for providing clinical data, and David Kuck (Parallel and DistributedSolutions Division, Intel) for financial support of the SPIKE develop-ment.

References

1. Michael W. Berry and Ahmed Sameh. Multiprocessor schemesfor solving block tridiagonal linear systems. The InternationalJournal of Supercomputer Applications, 1(3):37–57, 1988.

2. G. Biros and O. Ghattas. Parallel Lagrange-Newton-Krylov-Schurmethods for PDE-constrained optimization. Part I: The Krylov-Schur solver. SIAM Journal on Scientific Computing, 27(2):687–713, 2005.

3. G. Biros and O. Ghattas. Parallel Lagrange-Newton-Krylov-Schurmethods for PDE-constrained optimization. Part II: The Lagrange-Newton solver and its application to optimal control of steady vis-cous flows. SIAM Journal on Scientific Computing, 27(2):714–739, 2005.

4. R. H. Byrd, F. E. Curtis, and J. Nocedal. An inexact SQP methodfor equality constrained optimization. SIAM Journal on Optimiza-tion, 19(1):351–369, 2008.

5. R. H. Byrd, F. E. Curtis, and J. Nocedal. An inexact Newtonmethod for nonconvex equality constrained optimization. Mathe-matical Programming, Series A, 2009. (To appear).

6. F. E. Curtis, J. Nocedal, and A. Wachter. A matrix-free algorithmfor equality constrained optimization problems with rank deficientjacobians. SIAM Journal on Optimization, 2008. (Submitted).

7. F. E. Curtis, O. Schenk, and A. Wachter. An Interior-Point Algo-rithm for Large-Scale Nonlinear Optimization with Inexact StepComputations. SIAM Journal on Scientific Computing, 2009.(Submitted).

8. Jack J. Dongarra and Ahmed H. Sameh. On some parallel bandedsystem solvers. Parallel Computing, 1(3):223–235, 1984.

9. M. Fisher, J. Nocedal, Y. Tremolet, and S. J. Wright. Data assim-ilation in weather forecasting: A case study in PDE-constrainedoptimization. Optimization and Engineering, 2008. DOI:10.1007/s11081-008-9051-5.

10. E. Haber and U. M. Ascher. Preconditioned all-at-once methodsfor large, sparse parameter estimation problems. Inverse Prob-lems, 17(6):1847–1864, 2001.

11. Carl Christian Kjelgaard Mikkelsen and Murat Manguoglu. Anal-ysis of the truncated spike algorithm. SIAM Journal on MatrixAnalysis and Applications, 30(4):1500–1519, 2008.

12. Esra Neufeld. High Resolution Hyperthermia Treatment Planning.PhD thesis, ETH Zurich, August 2008.

13. Harry H. Pennes. Analysis of Tissue and Arterial Blood Tempera-tures in the Resting Human Forearm. J Appl Physiol, 1(2):93–122,1948.

14. Eric Polizzi and Ahmed H. Sameh. A parallel hybrid banded sys-tem solver: the spike algorithm. Parallel Comput., 32(2):177–194,2006.

15. Eric Polizzi and Ahmed H. Sameh. Spike: A parallel environ-ment for solving banded linear systems. Computers & Fluids,36(1):113–120, 2007.

16. D. J. Kuck S. C. Chen and A. H. Sameh. Practical parallel bandtriangular system solvers. ACM Transactions on MathematicalSoftware, 4(3):270–277, 1978.

17. A. H. Sameh and D. J. Kuck. On stable parallel linear systemsolvers. J. ACM, 25(1):81–91, 1978.

18. O. Schenk and K. Gartner. On fast factorization pivoting methodsfor sparse symmetric indefinite systems. Electronic Transactionson Numerical Analysis, 23:158 – 179, 2006.

19. O. Schenk and K. Gartner. Solving unsymmetric sparse systemsof linear equations with PARDISO. Future Generation ComputerSystems, 20(3):475 – 487, 2004.

20. O. Schenk, A. Wachter, and M. Weiser. Inertia revealing precondi-tioning for large-scale nonconvex constrained optimization. SIAMJournal on Scientific Computing, Volume 31, Issue 2, pp. 939-960,(2008).

21. H. A. van der Vorst. Bi-cgstab: a fast and smoothly convergingvariant of bi-cg for the solution of nonsymmetric linear systems.SIAM J. Sci. Stat. Comput., 13(2):631–644, 1992.

22. A. Wachter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale non-linear programming. Mathematical Programming, 106(1):25–57,2006.

23. D. P. Young, W. P. Huffman, R. G. Melvin, C. L. Hilmes, andF. T. Johnson. Nonlinear elimination in aerodynamic analysis anddesign optimization. In L. T. Biegler, O. Ghattas, M. Heinken-schloss, and B. v. Bloemen-Waanders, editors, Large-Scale PDE-Constrained Optimization, pages 17–44, New York, NY, USA,2003. Springer.