modi ed alternating direction methods for the modi ed ... · pdf filemodi ed alternating...

Modified alternating direction methods for the modified

multiple-sets split feasibility problems

Yuning Yang

School of Mathematics Science and LPMC

Nankai University

Tianjin 300071, P.R. China

Email:[email protected]

Su Zhang

Institute of Modern Management, Business School

Nankai University



Qingzhi Yang

School of Mathematics Science and LPMC

Nankai University



June 21, 2012

Abstract

In this paper, we propose two new multiple-sets split feasibility problem (MSFP) models,

where the MSFP requires to find a point closest to the intersection of a family of closed

convex sets in one space, such that its image under a linear transformation will be closest to

the intersection of another family of closed convex sets in the image space. This problem

arises in image restoration, signal processing and intensity-modulated radiation therapy

(IMRT). The background of the first new model, called the modified multiple-sets split

feasibility problem (MMSFP), comes from IMRT. Comparing with MSFP, the MMSFP has

three advantages. At the practical level, it is more able to reflect the real world problem;

at the algorithmic level, its structure is more separable and the size of each part is smaller,

which enables us to apply a modified alternating direction method (ADM) to solve it,

which produces parallel steps in each iteration. This parallel feature fits the development

of modern parallel-architecture computers. Then, to overcome the difficulty of computing

projections onto the constraint sets, a special version of this method with the strategy of

projection onto half-space is given. The second new model is to find a least l2-norm solution

1

of the MSFP (or MMSFP). For this problem, a modified ADM with parallel feature is also

provided. The convergence of the three ADMs are established, and the convergence rate

of the third method is shown to be O(1/t). Numerical results provide at the last show the

efficiency of our methods.

Key words: multiple-sets split feasibility problem, alternating direction method, parallel

computing, half-space, convergence rate.

1 Introduction

The multiple-sets split feasibility problem (MSFP) proposed by Censor et al [2, 1], arising in

image restoration, signal processing and intensity-modulated radiation therapy (IMRT) is to

find a vector x, such that

find x ∈t⋂

i=1

Ci such that Ax ∈r⋂

j=1

Qj

where A is an M × N real matrix, Ci, i = 1, . . . , t, Qj , j = 1, . . . , r are closed convex sets

in <N and <M , respectively. Some projection-type methods, such as the projection gradient

method [2, 1, 3, 21] and the alternating direction method (ADM) [20], are proposed to solve

this problelm. The ADM has a nice feature in the sense that it can use the separable structure

of the problem and produces parallel steps in every iteration of the method, which fits the

development of modern parallel-architecture computers. In this paper, motivated by the idea of

parallel computing, we first review the intensity-modulated radiation therapy (IMRT) model,

from which we propose a new MSFP, called the modified multiple-sets split feasibility problem

(MMSFP). Comparing with MSFP, the MMSFP has three advantages: at the practical level,

it is more able to reflect the real world problem; at the algorithmic level, its structure is more

separable and the size of each part is smaller, which can be solved by parallel algorithms. By

fully exploiting the separable structure of the new model, we apply a modified ADM to solve

it, which is a modified version of that proposed by Zhang et al [20]. Comparing with [20],

the modified ADM is more parallelable in every iteration. Then, to overcome the difficulty of

computing projections onto the constraint sets, a special version of the modified ADM with the

strategy of projection onto half-space is given. Such an idea was used by Yang, and Qu and

Xiu for solving the SFP model [18, 13], and by Censor et al, and Zhao and Yang for solving

the MSFP [21, 3], where SFP is a special case of MSFP by restricting t = r = 1. Next, we

intend to find a “good” solution of the MSFP (or MMSFP) model in the sense that its l2-norm

is the least in the solution set. Of course, if in the real world, producing a vector x is expensive,

or the vector represents some harmful materials, then finding a least l2-norm solution of the

MSFP (or MMSFP) is significant. For this problem, a modified ADM with parallel feature is

also provided, and we show its global convergence and the O(1/t) convergence rate. Numerical

results show the efficiency of our methods.

Briefly speaking, the contribution of this paper is as follows:

• A new model of MSFP, called MMSFP, is proposed.

2

• An enhanced model: finding the least l2 norm solution for MSFP (or MMSFP), is pro-

posed.

• Three modified ADMs with parallel feature for solving the above two models and their

convergence are provided. The convergence rate of the last algorithm is provided.

The rest of this paper is organized as follows: in Section 2, we dive into the IMRT model,

from which we propose the modified multiple-sets feasibility problem and recall some useful

lemmas. In Section 3, a simultaneous ADM is reviewed, and then two modified ADMs are

provided with convergence results. The problem for finding the least l2 norm solution of MSFP

(or MMSFP) is proposed in Section 4 and another modified ADM is given, with the convergence

and the O(1/t) convergence rate being provided. Numerical results are shown in Section 5.

2 IMRT, MMSFP and some useful results

In intensity-modulated radiation therapy (IMRT), beams of penetrating radiation are directed

at the tumour lesion from external sources. A multileaf collimator is used to split each beam

into many beamlets with individually controllable intensities. A problem is to find a distribu-

tion of radiation intensities (radiation intensity map) deliverable by all beamlets, which would

result in a clinically-acceptable dose distribution, i.e., such that the dose to each tissue would

be within the desired upper and lower bounds, which are prescribed on the basis of medical

diagnosis, knowledge and experience. To be specific, we first assume that the radiation is deliv-

ered independently from each of the N beamlets arranged in a certain geometry and indexed

by j = 1, 2, . . . , N . The intensity xj of the beamlets, is the j-th entry of the intensity vector

x ∈ <N where <N denotes the radiation intensity space. An important constraint in the radia-

tion intensity space is nonnegativity, namely, we can never deliver negative intensities. Hence,

all deliverable intensity vectors must belong to the nonnegative orthant. There are also other

delivery constraints depending on the technical equipment used to deliver the treatment.

Next, let the entire volume of the patient be divided into M voxels indexed by i = 1, . . . ,M .

Suppose that T anatomical structures have been outlined, including planning target volumes

(PTVs) and the organs at risk (OAR). We denote the set of voxels indices in structure t by

St. Note that individual voxels i may belong to several sets St, i.e., different structures may

overlap. Denote the quantities dij ≥ 0 by the dose absorbed in voxel i due to radiation of unit

intensity from the j-th beamlet. These quantities are calculated in advance. Then the total

dose absorbed in the i-th voxel is given by

hi =

N∑j=1

dijxj , i = 1, . . . ,M,

and in the vector form

h = Dx ∈ <M .

Let ht = Dtx ∈ <Mt be a sub-vector of h that representing the doses absorbed by the t-th

structure, i.e., hti ∈ St, i = 1, . . . ,Mt, t = 1, . . . , T , where Dt is a sub-matrix of D. A typical

constraint is that, in a given critical structure t, the dose should not exceed an upper bound ut.

3

The corresponding constraint set is

Hmax,t = ht ∈ <mt |hti ≤ ut, i = 1, . . . ,Mt. (2.1)

While for the target structures, the dose should not fall below a lower bound lt, and the con-

straint is given by

Hmin,t = ht ∈ <mt |hti ≥ lt, i = 1, . . . ,Mt.

Besides this, there exist EUD constraints. First the EUD function Et : <Mt → < is defined as

Et,α(ht) =

(∑Mt

i=1(hti)α

Mt

)1/α

. (2.2)

For each target structure t, the parameter α is chosen negative and the EUD constraint is

described by

HEUD,t,α = ht ∈ <Mt |Et,α(ht) ≥ Emint,α , α < 0, (2.3)

where Emint,α is given. For the structures at risk, the parameter is chosen α ≥ 1 and the EUD

constraint is given by

HEUD,t,α = ht ∈ <Mt |Et,α(ht) ≤ Emaxt,α , α ≥ 1, (2.4)

where Emaxt,α is also given. These EUD sets have been shown to be convex [4].

Thus, we have a modified multiple-sets split feasibility problem (MMSFP), where some

constraints are defined in the radiation intensity space <N and other constraints are defined

in the space of each structure <Mt , t = 1, . . . , T . The unified problem can be formulated as

follows:

find x ∈ X⋂

(∩i=1Ci) such that ht = Dtx ∈⋂j=1

Htj , t = 1, . . . , T,

where X denotes the nonnegative constraint while Ci represent other constraints; Htj denote

the box constraints (2.1), (2.2) and the EUD constraints (2.3), (2.4), and other constraints; Dt

is a sub-matrix of D corresponding to ht.

The model we proposed here is different from the MSFP model:

find x ∈ X⋂

(∩ti=1Ci) such that h = Dx ∈r⋂j=1

Qj

proposed by Censor et al [2, 1] in the sense that we divide the dose space <M into several sub-

spaces <Mt representing each structure t ∈ 1, . . . , T and the constraints are given separately in

each sub-space. From the practical view, our model is more able to reflect the IMRT. Although

our model would increase the total size of the problem (the sum of the dimension of ht may

be larger than the dimension of h due to the overlap of the structures), the size of each ht is

smaller than h. Moreover, by fully exploiting the separable structure of the modified model, we

can use some new parallel methods to solve it, i.e., the parallel ADM [20]. This can be seen in

Section 3 of this paper.

It is worth mentioning the relationship between MSFP and MMSFP. First, we see that

MSFP is a special case of MMSFP by letting T = 1 in the MMSFP model. Next, let

H := (⋂j=1

H1j )× (

⋂j=1

H2j )× · · · × (

⋂j=1

HTj ),

4

D :=

D1

...

DT

and h :=

h1

...

hT

.

Then the MMSFP is equivalent to

find x ∈ X⋂

(∩i=1Ci) such that h = Dx ∈ H,

which is a special case of MSFP.

In what follows, we consider the constrained MMSFP model, which is described as the

following form:

find x ∈ X ∩ C, such that y1 := A1x ∈ Y1 ∩Q1, . . . , yL := ALx ∈ Y1 ∩QL,

where C :=⋂ti=1 Ci ⊆ <N , Ai ∈ <Mi×N , Qi :=

⋂rij=1Qi,j ⊆ <Mi , i = 1, . . . , L, respectively,

where X , Yi, Ci, Qi,j are all closed convex sets. We suppose the constrained MMSFP has at

least a solution in the rest of this paper.

For the MSFP model, Censor et al [2] proposed the following proximal function to measure

the distance from a point x to the sets Ci and Qj

f(x) :=1

2

t∑i=1

ai‖x− PCi(x)‖2 +

1

2

r∑j=1

bj‖Dx− PQj(Dx)‖2,

where ‖ · ‖ is the l2-norm, ai > 0(i = 1, . . . , t) and bj > 0(j = 1, . . . , r) are coefficients satisfying∑ti=1 ai +

∑rj=1 bj = 1, where PΩ(·) denotes the projection mapping from <n onto the closed

convex set Ω, which is given by

PΩ(x) := arg min‖x− y‖ | y ∈ Ω.

It is easy to see that if MSFP has a solution, then the optimal value of this proximal function

is zero and vice versa. Censor et al [2] proposed to solve the following optimization problem

minf(x) |x ∈ X

to get a solution of the constrained MSFP and used a projection gradient method to solve it

xk+1 = PX (xk − α∇f(xk)),

where α > 0 is the step-size and ∇f(x) is the gradient of f(x) at x given by

∇f(x) =

t∑i=1

ai(x− PCi(x)) +

r∑j=1

bjD>(Dx− PQj

(Dx)).

Similarly, the proximal function for the MMSFP model is given by

f(x) = θ(x) +

L∑i=1

φi(Aix) +

L∑i=1

‖Aix− PYi(Aix)‖2,

where

θ(x) :=1

2

t∑i=1

ai‖x− PCi(x)‖2,

5

φi(Aix) :=1

2

ri∑j=1

bi,j‖Aix− PQi,j(Aix)‖2 i = 1, . . . , L.

The optimization model is

minf(x) |x ∈ X.

The projection gradient method can also be applied to solve this problem. However, using

the projection method to solve MMSFP is the same as to solve MSFP, and it can not fully

explore the separable structure of the MMSFP model. In the next section, we review a parallel

ADM for solving the MSFP model proposed by Zhang et al [20], and then apply a modified

version of this method to the MMSFP model. The following lemmas and definitions will be

useful in the sequel.

Lemma 2.1 ([20]) Let Ω be a closed convex set in <n. Then for any x, y ∈ <n and z ∈ Ω, the

following properties hold:

1. 〈z − PΩ(x), PΩ(x)− x〉 ≥ 0;

2. 〈x− y, PΩ(x)− PΩ(y)〉 ≥ ‖PΩ(x)− PΩ(y)‖2;

3. ‖PΩ(x)− z‖2 ≤ ‖x− z‖2 − ‖PΩ(x)− x‖2.

Lemma 2.2 ([12], Theorem 2.3) Let Ω be a closed convex set in a Hilbert space and let PΩ(x)

be the projection of x onto Ω. Then

〈z − y, y − x〉 ≥ 0, ∀z ∈ Ω ⇔ y = PΩ(x).

Definition 2.1 ([20]) Let F be a mapping defined on the closed convex set Ω ⊆ <n. Then

1. F is called monotone on Ω if

〈F (x)− F (y), x− y〉 ≥ 0, ∀x, y ∈ Ω;

2. F is called ν-inverse strongly monotone (ν-ism) on Ω if there exists a constant ν > 0 such

that

〈F (x)− F (y), x− y〉 ≥ v‖F (x)− F (y)‖2, ∀x, y ∈ Ω;

3. F is called Lipschitz continuous on Ω if there exists a constant L > 0 such that

‖F (x)− F (y)‖ ≤ L‖x− y‖.

3 Parallel ADMs for the MSFP and MMSFP

In this section, we first review a parallel ADM for solving the constrained MSFP problem

proposed by Zhang et al [20] in Section 3.1, and then apply a modified version of this method for

the MMSFP problem in Section 3.2. When dealing with the difficulty of computing projections

onto the constraint sets, we use the strategy of projection onto half-space. This is provided in

Section 3.3. The convergence result is given in Section 3.4.

6

3.1 A Parallel ADM for solving the constrained MSFP problem

In [20], Zhang et al considered the constrained MSFP model

find x ∈ X⋂

(∩ti=1Ci) ⊆ <N such that y = Ax ∈ Y⋂

(∩rj=1Qj) ⊆ <M ,

which is equivalent to the following optimization model

minθ1(x) + θ2(Ax) |x ∈ X , Ax ∈ Y.

By introducing a slack variable y, the model becomes

minθ1(x) + θ2(y) |Ax = y, x ∈ X , y ∈ Y. (3.5)

This linearly constrained model with separable structure in the objective function is studied

intensively in the literature recently [8, 9, 7, 19, 15, 11]. The Lagrangian dual of (3.5) is

maxλ

minx∈X ,y∈Y

L(x, y, λ) := θ1(x) + θ2(y)− 〈λ,Ax− y〉,

where λ is the Lagrangian multiplier. By the first order optimality condition, (x∗, y∗) is a

solution of (3.5) if and only if there exists λ∗ such that ω∗ := (x∗, y∗, λ∗) satisfies the following

variational inequality system〈x− x∗,∇θ1(x∗)−ATλ∗〉 ≥ 0, ∀x ∈ X〈y − y∗,∇θ2(y∗) + λ∗〉 ≥ 0, ∀y ∈ YAx∗ − y∗ = 0.

(3.6)

The classical ADM for solving this optimization problem generates the new iterate ωk+1 =

(xk+1, yk+1, λk+1) via solving the following subproblems〈x− xk+1,∇θ1(xk+1)−A>(λk − β(Axk+1 − yk))〉 ≥ 0, ∀x ∈ X〈y − yk+1,∇θ2(yk+1) + λk − β(Axk+1 − yk+1)〉 ≥ 0, ∀y ∈ Yλk+1 = λk − β(Axk+1 − yk+1),

(3.7)

where β > 0 is the penalty parameter of the linear constraint. Unfortunately, solving (3.7)

is as hard as solving (3.6). By Lemma 2.2, the first two variational inequalities of (3.7) is

equivalent to the following form

xk+1 = PX[xk+1 − τ(∇θ1(xk+1)−A>[λk − β(Axk+1 − yk)])

]yk+1 = PY

[yk+1 − σ(∇θ2(yk+1) + [λk − β(Axk+1 − yk+1)])

],

where τ > 0 and σ > 0 are two parameters. They are still hard to be solved due to the implicit

form. Now, replacing xk+1 by xk at the right hand side of the first equation, and similarly,

replacing yk+1 by yk in the right hand side of the second equation, one just gets two explicit

assignments which can be easily solved. To ensure the global convergence of the method, a

correction step must be added after generating ωk+1 and now the algorithm becomes

Algorithm 3.1 (A Parallel ADM for constrained MSFP)

Prediction step: generating the trial iterate ωk := (xk, yk, λk) byxk = PX

[xk − τk(∇θ1(xk)−A>[λk − β(Axk − yk)])

]yk = PY

[yk − σk(∇θ2(yk) + [λk − β(Axk − yk)])

]λk = λk − β(Axk − yk).

7

Correction step: generating the new iterate ωk+1 by

ωk+1 = ωk − γkd(ωk, ωk),

where τk > 0, σk > 0 and γk > 0 are step-sizes satisfying some conditions, and d(ωk, ωk) is a

certain residue function of ωk and ωk.

To employ the idea of parallel computing, the y-related subproblem of the prediction step

can be replaced by

yk = PY[yk − σk(∇θ2(yk) + [λk − β(Axk+1 − yk)])

].

The only difference is that the term Axk is replaced by Axk. Thus xk and yk can be computed

simultaneously.

To get suitable step-sizes τk and σk, Zhang et al [20] used an armijo-like search rule. Then

they proved the global convergence of the algorithm. These are the main ideas of their paper.

In the next subsection, we give a modified version of Algorithm 3.1 for the constrained MMSFP

model.

3.2 The constrained MMSFP problem: modified parallel ADM I

In this subsection, we apply Algorithm 3.1 to the constrained MMSFP. By making good use

of the separable structure of MMSFP, we propose a modified parallel algorithm for solving

MMSFP. Recall the constrained MMSFP model:

find x ∈ X ∩ C, such that y1 := A1x ∈ Y1 ∩Q1, . . . , yL := ALx ∈ Y1 ∩QL, (3.8)

where C :=⋂ti=1 Ci ⊆ <N , Ai ∈ <Mi×N , Qi :=

⋂rij=1Qi,j ⊆ <Mi , i = 1, . . . , L. Let the

proximal function be

p(x, y1, . . . , yL) = θ(x) +

L∑i=1

φi(yi),

where

θ(x) :=1

2

t∑i=1

ai‖x− PCi(x)‖2, (3.9)

φi(yi) :=1

2

ri∑j=1

bi,j‖yi − PQi,j (yi)‖2 i = 1, . . . , L. (3.10)

Now we do not restrict the coefficients ai and bi,j to satisfy∑ti=1 ai +

∑Li=1

∑rij=1 bi,j = 1. The

reason can be seen later.

Now, finding a solution to the MMSFP is equivalent to solving the following optimization

problem:

minimize p(x, y1, . . . , yL)

subject to x ∈ X , (3.11)

Aix = yi, yi ∈ Yi, i = 1, . . . , L.

8

Lemma 3.3 ([20], Lemma 3) Let θ and φi be defined above. Then ∇θ and ∇φi are inverse

strongly monotone and Lipschitz continuous on X and Yi, i = 1, . . . , L, where ∇ is the gradient

operator. More specifically,

‖∇θ(x1)−∇θ(x2)‖ ≤ Lx‖x1 − x2‖,

where Lx =∑ti=1 ai, x1, x2 ∈ X ,

〈∇θ(x1)−∇θ(x2), x1 − x2〉 ≥ 1/Lx‖∇θ(x1)−∇θ(x2)‖2.

‖∇φi(y1)−∇φi(y2)‖ ≤ Lyi‖y1 − y2‖,

where Lyi =∑rij=1 bi,j, y1, y2 ∈ Yi, i = 1, . . . , L,

〈∇φi(y1)−∇φi(y2), y1 − y2〉 ≥ 1/Lyi‖∇φi(y1)−∇φi(y2)‖2.

For positive parameters τ , β, σ1, . . . , σL, define

M :=

τIN

σ1IM1

. . .

σLIML

1/βIM1

. . .

1/βIML

,

where I denotes the identity matrix and IN ∈ <N , IMi ∈ <Mi . Let ω := (x, y1, . . . , yL, λ1, . . . , λL).

F (ω) :=

∇θ(x)−∑Li=1A

Ti λi

∇φ1(y1) + λ1

. . .

∇φL(yL) + λL

A1x− y1

. . .

ALx− yL

, ξ(ω, ω) :=

∇θ(x)−∇θ(x)

∇φ1(y1)−∇φ1(y1)

. . .

∇φL(yL)−∇φL(yL)

0

d(ω, ω) := M(ω − ω)− ξ(ω, ω)

ϕ(ω, ω) := 〈ω − ω, d(ω, ω)〉+ 〈λ− λ,L∑i=1

Ai(x− x)−L∑i=1

(y − y)〉

We assume that all projections onto the sets X , Yi, Ci, Qi,j are easy to calculate in this

subsection.

Algorithm 3.2 (A Modified Parallel ADM for constrained MMSFP I)

Given arbitrary ν ∈ (0, 1), β > 0, µ > 1, γ ∈ (0, 2), τ0 > 0, σ0 > 0 and ω0 := (x0, y01 , . . . , y

0L, λ

01, . . . , λ

0L).

Let ε > 0 be the error tolerance for an approximate solution and set k = 1.

Step 1. Prediction step: Generate the trial iterate x, y1, . . . , yL simultaneously via Step 2.1

9

and Step 2.2, respectively, and generate λ1, . . . , λL simultaneously via Step 2.3.

Step 1.1 Find the smallest nonnegative integer lk such that τk = µlkτk−1 and

xk = PX

xk − 1

τk[∇θ(xk)−

L∑i=1

ATi (λki − β(Aixk − yki ))]

(3.12)

which satisfies

〈xk − xk, ξ(xk, xk)〉+ β

L∑i=1

‖Aixk −Aixk‖2 ≤ τkν‖xk − xk‖2; (3.13)

Step 1.2 For i = 1, . . . , L, find the smallest nonnegative integers mik such that σik = µm

ikσik−1

and

yik = PYi

yki −

1

σik[∇φi(yki ) + (λki − β(Aix

k − yki ))]

which satisfies

〈yki − yki , φi(yki , yki )〉+ β‖yki − yki ‖2 ≤ σikν‖yki − yki ‖2;

Step 1.3 For i = 1, . . . , L, calculate λki via

λki = λki − β(Aixk − yki ).

Step 2. Correction step: Generate the new iterate ωk+1 via

ωk+1 = ωk − γα∗kd(ωk, ωk),

where

α∗k =ϕ(ωk, ωk)

‖d(ωk, ωk‖2

Step 3. If

p(xk+1) ≤ ε,

then stop; otherwise goto Step 1.

Remark 3.1 In the algorithm, we can first compute x, yi, i = 1, . . . , L in parallel, and then

compute λi, i = 1, . . . , L in parallel. As that of [20], finding suitable parameters τk, σik will

terminate in a finite number of steps, and the sequence τk is bounded above by a positive

number τmax > 0. Similarly, the sequences σik are bounded above by positive numbers σimax >

0.

Remark 3.2 Recall that we drop the assumption that the sum of ai and bi,j equals to 1. Now

supposet∑i=1

ai +

L∑i=1

ri∑j=1

bi,j = C

and let ai = ai/C, bi,j = bi,j/C. Then

t∑i=1

ai +

L∑i=1

ri∑j=1

bi,j = 1.

10

Let ∇θ(x) = ∇θ(x)/C =∑ti=1 ai(I − PΩ)(x). Then Lx = CLx, where Lx and Lx denote the

Lipschitz constant of ∇θ and ∇θ, respectively. Since

〈xk − xk, ξ(xk, xk)〉+ β

L∑i=1

‖Aixk −Aixk‖2 ≤ (Lx + β

L∑i=1

‖ATi Ai‖)‖xk − xk‖2

= (CLx + β

L∑i=1

‖ATi Ai‖)‖xk − xk‖2,

the inequality (3.13) holds as long as τk ≥ (CLx + β∑Li=1 ‖ATi Ai‖)/ν. Replacing ∇θ(xk) by

C∇θ(xk) and τk by (CLx + β∑Li=1 ‖ATi Ai‖)/ν in (3.12), xk can be obtained by

xk = PX

xk − ν

(CLx + β∑Li=1 ‖ATi Ai‖)

[C∇θ(xk)−L∑i=1


. (3.14)

We can see that the step-size varies as C varies, which will potentially affect the convergence

speed. More specifically, the step-size of ∇θ(x) is

νC

CLx + β∑Li=1 ‖ATi Ai‖

=ν

Lx−

νβ∑Li=1 ‖ATi A‖

CLx + β∑Li=1 ‖ATi A‖

.

Hence a suitably large number C may speed up the convergence speed. This is not like the

projection gradient method with constant step-size used by Censor et al [2]:

xk+1 = PX [xk − α∇f(xk)] (3.15)

where f(x) =∑ti=1 ai‖x− PCi

(x)‖2 +∑rj=1 bj‖Ax− PQj

(Ax)‖2 and α ∈ [0, 2/Lf ) where Lf is

the Lipschitz constant of ∇f . In this case, if we multiplies ∇f by a constant C, then Lf has

also to be multiplied by C and now α is restricted in [0, 2/(CLf )), which remains the same as

(3.15).

Algorithm 3.2 modifies Algorithm 1 of [20] in Step 1.2, which does not affects the convergence

of the algorithm. Thus we list the convergent result here, and omit the proof.

Theorem 3.1 The sequence ωk generated by Algorithm 3.2 converges to a solution of the

constrained MMSFP problem (3.8).

3.3 The constrained MMSFP problem: modified parallel ADM II

In this subsection, we deal with the case that computing projections onto Ci, Qi,j can not be

easily obtained while computing projections onto X and Yi is still easy. For example, computing

projections onto the EUD constraint sets defined in (2.3) and (2.4) may cost much time. To

overcome this difficulty, we use the strategy of computing projections onto some halfspaces

containing the original sets, instead of directly computing projections onto them. This idea was

used by Yang, and Qu and Xiu for solving the SFP problem [18, 13], and by Censor et al, and

Zhao and Yang for solving the MSFP problem [21, 3].

To this end, it is necessary to suppose that the convex sets Ci and Qi,j satisfy the following

assumptions:

11

• The sets Ci, i = 1, 2, . . . , t, are given by

Ci = x ∈ <N | ci(x) ≤ 0,

where ci : <N → <, i = 1, 2, . . . , t, are convex functions.

• The sets Qi,j , i = 1, . . . , L, j = 1, 2, . . . , ri, are given by

Qi,j = y ∈ <Mi | qi,j(y) ≤ 0,

where qi,j : <Mi → <, i = 1, . . . , L, j = 1, 2, . . . , ri, are convex functions.

• For any x ∈ <N , at least one subgradient ξ ∈ δci(x), i = 1, . . . , t can be calculated, where

δci(x) is the subgradient set of ci(x) at x and it is defined as follows:

δci(x) = ξ ∈ <N | ci(z) ≥ ci(x) + 〈ξ, z − x〉, for all z ∈ <N.

For any y ∈ <Mi , i = 1, . . . , L, at least one subgradient η ∈ δqi,j(y), j = 1, . . . , ri can be

calculated, where δqi,j(y) is defined as follows:

δqi,j(y) = η ∈ <Mi | qi,j(u) ≥ qi,j(y) + 〈η, u− y〉, for all u ∈ <Mi.

In the k-th iteration of our algorithm, let

Cki = x ∈ <N | ci(xk) + 〈ξk, x− xk〉 ≤ 0,

where ξk is an element in δci(xk), i = 1, . . . , t;

Qki,j = y ∈ <Mi | qi,j(yk) + 〈ηk, y − yk〉 ≤ 0,

where ηk is an element in δqi,j(yk), i = 1, . . . , L, j = 1, . . . , ri. It is not hard to verify that Cki ,

Qki,j contain Ci, Qi,j for all k ≥ 1, respectively.

Define

θk(x) :=1

2

t∑i=1

ai‖x− PCki(x)‖2,

φik(yi) :=1

2

ri∑j=1

bi,j‖yi − PQki,j

(yi)‖2.

Lemma 3.3 also holds for ∇θk and ∇φik.

Algorithm 3.3 (A Modified Parallel ADM for constrained MMSFP II)

Given arbitrary ν ∈ (0, 1), β > 0, µ > 1, γ ∈ (0, 2), τ0 > 0, σ0 > 0 and ω0 := (x0, y01 , . . . , y

0L, λ

01, . . . , λ

0L).

Let ε > 0 be the error tolerance for an approximate solution and set k = 1.

Step 1. Prediction step: Generate the trial iterate x, y1, . . . , yL simultaneously via Step 2.1

and Step 2.2, respectively, and generate λ1, . . . , λL simultaneously via Step 2.3.

Step 1.1 Find the smallest nonnegative integer lk such that τk = µlkτk−1 and

xk = PX

xk − 1

τk[∇θk(xk)−

L∑i=1


12

which satisfies

〈xk − xk, ξk(xk, xk)〉+ β

L∑i=1

‖Aixk −Aixk‖2 ≤ τkν‖xk − xk‖2; (3.16)

Step 1.2 For i = 1, . . . , L, find the smallest nonnegative integers mik such that σik = µm

ikσik−1

and

yik = PYi

yki −

1

σik[∇φik(yki ) + (λki − β(Aix

k − yki ))]

which satisfies

〈yki − yki , φik(yki , yki )〉+ β‖yki − yki ‖2 ≤ σikν‖yki − yki ‖2; (3.17)

Step 1.3 For i = 1, . . . , L, calculate λki via

λki = λki − β(Aixk − yki ).

Step 2. Correction step: Generate the new iterate ωk+1 via

ωk+1 = ωk − γα∗kdk(ωk, ωk),

where

α∗k =ϕk(ωk, ωk)

‖dk(ωk, ωk‖2

Step 3. If

maxci(xk), qi,j(yk) ≤ ε,

then stop; otherwise goto Step 1.

Remark 3.3 In the algorithm, Fk, ξk, ϕk and dk are similar to F , ξ, ϕ and d defined in

Subsection 3.2, with ∇θ and ∇φi being replaced by ∇θk and ∇φik here, respectively. That is the

difference between Algorithm 3.2 and Algorithm 3.3.

3.4 Convergence

In this subsection, we give the convergence result of Algorithm 3.3. For simplicity, we restrict

L = 1 and omit the suffix “i” in yi, Ai and Qi,j . For other case, the proof is the same.

Since the property of ∇θk(x) and ∇φk(y) are the same as that of ∇θ(x) and ∇φ(y) (the

only difference is the projection convex sets), the following two lemmas coming from [20] still

hold here, where the first lemma depends basically on the monotonicity of Fk, while the second

lemma depends on the armijo-like search rule (3.16) and (3.17):

Lemma 3.4 ([20], Lemma 4) Suppose that ω∗ := (x∗, y∗, λ∗) is a solution of problem (3.8),

and the sequences ωk, ωk are generated by Algorithm 3.3. Then we have

〈ωk − ω∗, dk(ωk, ωk)〉 ≥ ϕk(ωk, ωk).

Proof. Replacing ∇θ and ∇φ by ∇θk and ∇φk respectively in the proof of Lemma 4 of [20]

and repeating the proof, we get the result.

13

Lemma 3.5 ([20], Theorem 1) The following inequality holds

ϕk(ωk, ωk) ≥ %‖ωk − ωk‖2, ∀k ≥ 1,

where ν, β, τ0, σ0 are define in Algorithm 3.3,

% = min(1− ν)τ0, (1− ν)σ0,1

2β.

Using the lemmas above, one obtains

Theorem 3.2 ([20], Theorem 2) Suppose that ω∗ := (x∗, y∗, λ∗) is a solution of problem (3.8).

Then the sequence ωk is Fejer monotone, i.e., there exists a constant C > 0 such that

‖ωk+1 − ω∗‖2 ≤ ‖ωk − ω∗‖2 − C‖ωk − ωk‖2.

This theorem shows that the sequence ‖ωk−ω∗‖ decreases monotonically and is bounded

below, and ωk is bounded. Thus we have

limk→∞

‖ωk − ωk‖ = 0,

which implies that the sequences ωk and ωk have the same cluster points. Without loss of

generality suppose ω is a cluster point of ωk and suppose ωkl → ω as l→∞. Denote

e(x, α) := x− PX (x− αg) (3.18)

and

f(y, α) := y − PY(y − αh),

where g, h are in the same places of x, y, respectively.

Lemma 3.6 ([13, 21]) For any x ∈ <N and α > 0, we have

min1, α‖e(x, 1)‖ ≤ ‖e(x, α)‖ ≤ max1, α‖e(x, 1)‖.

By letting g := ∇θk(xk)−AT (λk − β(Axk − yk)) in (3.18), we get

limk→∞

‖e(xk, 1)‖ ≤ limk→∞

‖xk − xk‖min1, 1/τk

≤ limk→∞

‖xk − xk‖min1, 1/τmax

= 0. (3.19)

The same conclusion also holds for f(yk, 1).

Lemma 3.7 ([13], Lemma 4.1) Suppose h : <n → < is a convex function. Then it is subdiffer-

entiable everywhere and its subdifferentials are uniformly bounded subsets of <n.

In what follows we still denote gk := ∇θk(xk)−AT (λk−β(Axk− yk)). Let ω∗ be a solution

of (MMSFP). Then we obtain from Lemma 2.1 that for any l ≥ 1,

〈xkl − gkl − PX (xkl − gkl), PX (xkl − gkl)− x∗〉 ≥ 0,

14

which means

〈e(xkl , 1)− gkl , xkl − x∗ − e(xkl , 1)〉 ≥ 0

and hence

〈gkl , xkl − x∗〉 ≤ 〈xkl − x∗ + gkl , e(xkl , 1)〉 − ‖e(xkl , 1)‖2. (3.20)

On the other hand, replacing gkl by ∇θkl(xkl) − AT (λkl − β(Axkl − ykl)) and by the fact

that ∇θkl(x∗) = 0 (because x∗ is contained in Cki ), we obtain

〈gkl , xkl − x∗〉 = 〈∇θkl(xkl)−∇θkl(x∗), xkl − x∗〉 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉

=

t∑i=1

ai〈(I − PCkli

)(xkl − x∗), xkl − x∗〉 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉

≥t∑i=1

αi‖xkl − PCkli

(xkl)‖2 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉, (3.21)

where the inequality comes from the inverse strong monotonicity of the operator I − PCki.

Letting hkl := ∇φk(xk) + (λk − β(Axk − yk)) and using the same argument, we get

〈hkl , ykl − y∗〉 ≤ 〈ykl − y∗ + hkl , f(ykl , 1)〉 − ‖f(ykl , 1)‖2 (3.22)

and

〈hkl , ykl − y∗〉 ≥r∑j=1

βj‖ykl − PQklj

(ykl)‖2 + 〈λkl − β(Axkl − ykl), ykl − y∗〉. (3.23)

Adding (3.21) and (3.23) and using (3.20) and (3.22), we have

t∑i=1

αi‖xkl − PCkli

(xkl)‖2 +

r∑j=1

βj‖ykl − PQklj

(ykl)‖2

≤ 〈gkl , xkl − x∗〉+ 〈hkl , ykl − y∗〉+ 〈λkl − β(Axkl − ykl), Axkl − ykl +Ax∗ − y∗〉

≤ 〈xkl − x∗ + gkl , e(xkl , 1)〉+ 〈ykl − y∗ + hkl , f(ykl , 1)〉

+〈λkl − β(Axkl − ykl), Axkl − ykl〉.

Since ‖ωk − ωk‖ → 0 and (Axk − yk) = (λk − λk)/β, we get Axkl − ykl → 0, which further

implies Axkl − ykl → 0 as l → ∞. Thus the last term of the right hand side of the inequality

tends to zero as l → ∞. By (3.19) and the boundness of g, h, x, y, the rest terms of the right

hand side also tend to zero as l→∞, which yields

liml→∞

‖xkl − PC

kli

(xkl)‖ = 0

for i = 1, . . . , t and

liml→∞

‖ykl − PQ

klj

(ykl)‖ = 0

for j = 1, . . . , r. Moreover, since PC

kli

(xkl) ∈ Ckli and PQ

klj

(ykl) ∈ Qklj , from the definition of

Ckli and Qklj we obtain for i = 1, . . . , t and j = 1, . . . , r

ci(xkl) + 〈ξkl , P

Ckli

(xkl)− xkl〉 ≤ 0,

15

and

qi(ykl) + 〈ηkl , P

Qkli

(ykl)− ykl〉 ≤ 0.

Passing onto the limits and using Lemma 3.7, we get

ci(x) ≤ 0, i = 1, . . . , t,

and

qj(y) ≤ 0, j = 1, . . . , r.

Moreover, since Axkl−ykl → 0, we have Ax = y, which implies (x, y) is a solution of (MMSFP).

Using w in place of ω∗ in Theorem 3.2, we obtain a nonincreasing sequence ‖ω − ω‖.Furthermore, since there is a subsequence ωkl converging to ω, we get that the whole sequence

converging to ω. This completes the proof.

4 An enhance model for constrained MMSFP: the least

l2-norm optimization problem

Recall the background of the MMSFP model: to find a distribution of radiation intensities

(radiation intensity map) deliverable by all beamlets, which satisfies some conditions. The

beamlets is denoted by a vector x ∈ <N where the j-th entry xj represents the intensity of the

j-th beamlet. But the beamlets may be harmful to the patients, or delivering the beamlets may

be expensive. So in the solution set of MMSFP, we intend to find a “best” solution. Using the

metric function ‖x‖2, our enhanced problem becomes

min‖x‖2 |x is a solution of MMSFP.

For simplicity, in what follows, we only consider the MSFP case, while the MMSFP case is

similar. We suppose MSFP has at least a solution.

Now the model can be formulated as

minimizeµ

2‖x‖2

subject to x ∈ X⋂

(∩ti=1Ci) ⊆ <N , (4.24)

Ax ∈ ∩rj=1Qj ⊆ <M ,

where µ > 0. As that of [14], by introducing slack variables zi, i = 1, . . . , t and yj , j = 1, . . . , r,

we can rewrite (4.24) as

minimizeµ

2‖x‖2

subject to x ∈ X , (4.25)

x = zi, zi ∈ Ci, i = 1, . . . , t,

Ax = yj , yj ∈ Qj , j = 1, . . . , r.

Note that since ‖x‖2 is a strongly convex function, the optimal solution of (4.25) is unique.

16

The Lagrangian dual of (4.25) is

maxλi,γj

minx∈X ,zi∈Ci,yj∈Qj

L(x, zi, yj , λi, γj) :=µ

2‖x‖2 −

t∑i=1

〈λi, x− zi〉 −r∑j=1

〈γj , Ax− yj〉,

where λi ∈ <N , i = 1, . . . , t and γj ∈ <M , j = 1, . . . , r are the Lagrangian multipliers. By

the first order optimality condition, (x∗, z∗i , y∗j ) is a solution of (4.25) if and only if there exists

(λ∗i , γ∗j ) such that ω∗ := (x∗, z∗i , y

∗j , λ∗i , γ∗j ) satisfies the following variational inequalities system

〈x− x∗, µx∗ −∑ti=1 λ

∗i −AT

∑rj=1 γ

∗j 〉 ≥ 0, ∀x ∈ X

〈z − z∗i , λ∗i 〉 ≥ 0, ∀z ∈ Ci, i = 1, . . . , t

〈y − y∗j , AT γ∗j 〉 ≥ 0, ∀y ∈ Qj , j = 1, . . . , r

〈λ− λ∗i , x∗ − z∗i 〉 ≥ 0, ∀λ ∈ <N , i = 1, . . . , t

〈γ∗j , Ax∗ − y∗j 〉 ≥ 0, ∀γ ∈ <M , j = 1, . . . , r

(4.26)

Denote Ω := (X , Ci, Qj ,<N , . . . ,<N , . . .) and

F (ω) :=

µx−

∑ti=1 λi −AT

∑rj=1 γj

λi

γj

x− ziAx− yj

. (4.27)

Then (4.26) can be rewritten in a concise form

〈ω − ω∗, F (ω∗)〉 ≥ 0, ∀ω ∈ Ω. (4.28)

The classical ADM for solving (4.25) generates the new iterate ωk+1 = (xk+1, zk+1i , yk+1

j , λk+1i , γk+1

j )

via solving the following subproblems

〈x− xk+1, µxk+1 −∑ti=1(λki − αi(xk+1 − zki ))−A>

∑rj=1(γkj − βj(Axk+1 − ykj ))〉 ≥ 0, ∀x ∈ X

〈z − zk+1i , λki − αi(xk+1 − zk+1

i )〉 ≥ 0, ∀z ∈ Ci, i = 1, . . . , t

〈y − yk+1j , γkj − βj(Axk+1 − yk+1

j )〉 ≥ 0, ∀y ∈ Qj , j = 1, . . . , r

λk+1i = λki − αi(xk+1 − zk+1

i ), i = 1, . . . , t

γk+1j = γkj − βj(Axk+1 − yk+1

j ), j = 1, . . . , r

(4.29)

To avoid solving the variational inequalities, we convert them into projection operations. By

Lemma 2.2, the first variational inequality of (4.29) is equivalent to

xk+1 = PX[xk+1−τx[µxk+1−

t∑i=1

(λki −αi(xk+1−zki ))−A>r∑j=1

(γkj −βj(Axk+1−ykj ))]], (4.30)

where τx > 0 is an undetermined parameter. By introducing a new term in (4.30) and suitably

choosing τk, we want to eliminate xk+1 in the right hand side of (4.30). To this end, let the

residual function between xk+1 and xk be

R(xk, xk+1) := (ρI −ATAr∑j=1

βj)(xk+1 − xk),

17

where ρ > ρ(ATA)∑rj=1 βj where ρ(ATA) denotes the spectral radius of ATA, and we use the

following iteration

xk+1 = PX[xk+1 − τx[µxk+1 −

t∑i=1

(λki − αi(xk+1 − zki ))−ATr∑j=1

(γkj − βj(Axk+1 − ykj )) +R(xk, xk+1)]]

= PX[(1− τx(µ+

t∑i=1

αi + ρ))xk+1 + τx[

t∑i=1

(λki + αizki ) +AT

r∑j=1

(γkj − βj(Axk − ykj )) + ρxk]](4.31)

instead of (4.30). By choosing suitable parameter

τx := (µ+

t∑i=1

αi + ρ)−1,

we can eliminate xk+1 at the right hand side of (4.31) and by letting

Dk :=

t∑i=1

(λki + αizki ) +AT

r∑j=1

(γkj − βj(Axk − ykj )) + ρxk,

we get

xk+1 = PX (τxDk),

which can be simply computed. Note that the second and the third variational inequalities are

equivalent to

zk+1i = arg minαi

2‖(xk+1 − zi)−

1

αiλki ‖2 | zi ∈ Ci, (4.32)

yk+1j = arg minβj

2‖(Axk+1 − yj)−

1

βjγkj ‖2 | yj ∈ Qj, (4.33)

which have closed-form solutions

zk+1i = PCi(x

k+1 − 1

αiλki )

and

yk+1j = PQj (Axk+1 − 1

βjγkj ).

Thus we get our algorithm

Algorithm 4.4 Compute ωk+1 := (xk+1, zk+1i , yk+1

j , λk+1i , γk+1

j ) by

xk+1 = PX (τxDk)

zk+1i = PCi(x

k+1 − 1αiλki )

yk+1j = PQj (Axk+1 − 1

βjγkj )

λk+1i = λki − αi(xk+1 − zk+1

i ), i = 1, . . . , t

γk+1j = γkj − βj(xk+1 − yk+1

j ), j = 1, . . . , r

(4.34)

Remark 4.4 As the algorithms proposed in Section 2, the main advantage of Algorithm 4.4 is

the parallelable feature. Moreover, when computing zi and yj, the most recent updates of the

variable xk+1 can be used.

18

We should mention that in [17], a least l2-norm problem for SFP in an infinite Hilbert space

was considered by Xu:

min‖x‖2 |x ∈ C,Ax ∈ Q, (4.35)

and he used the following iterative scheme for solving the problem:

xk+1 = PC[(1− αkγk)xk − γkAT (I − PQ)Axk

],

where αk > 0, γk > 0. The author proved that under some conditions, the sequence xkconverges strongly to the solution of (4.35). Since this algorithm is designed for the SFP model,

it is not suitable for our case.

4.1 Convergence and Convergence rate

In this subsection, we prove the global convergence of Algorithm 4.4 and show the O(1/t)

convergence rate for the algorithm. The proof of global convergence is similar to that of [14]

while the convergence rate comes from the framework of [10]. For simplicity, we let t = 1 and

r = 1, while for other case, the proof is the same.

Let ω := (x, z, y, λ, γ). Denote the block diagonal matrix

G :=

ρIN − βATA

αIN

βIM1α/IN

1β IM

, (4.36)

where I denotes the identity matrix and IN ∈ <N , IM ∈ <M . Since G is positive definite, we

can define the G-inner product of ω and ω′ as

〈ω, ω′〉G := xT (ρIN − βATA)x′ + αzT z′ + βyT y′ +1

αλTλ′ +

1

βγT γ′,

and the associated G-norm as

‖ω‖G :=(‖x‖2ρIN−βATA + α‖z‖2 + β‖y‖2 +

1

α‖λ‖2 +

1

β‖γ‖2

) 12 .

Let ω∗ denote the optimal solution of variational inequalities system (4.26).

Lemma 4.8 The sequence ωk generated by Algorithm 4.4 satisfies

〈ωk − ω∗, ωk − ωk+1〉G ≥ ‖ωk − ωk+1‖2G. (4.37)

Proof. Note that by Lemma 2.2, (4.31) can be equivalently written as

〈x−xk+1, µxk+1− (λk−α(xk+1− zk))−AT (γk−β(Axk+1−yk)) +R(xk, xk+1)〉 ≥ 0, ∀x ∈ X .(4.38)

By the last two equalities of (4.34) and letting x = x∗ in (4.38), we get

〈xk+1−x∗,−µxk+1+(λk+1−α(zk+1−zk))+AT (γk+1−β(yk+1−yk))−R(xk, xk+1)〉 ≥ 0. (4.39)

19

Setting x = xk+1 in the first inequality of (4.26), we obtain

〈xk+1 − x∗, µx∗ − λ∗ −AT γ∗〉 ≥ 0 (4.40)

Adding (4.39) and (4.40) yields

〈α(xk+1 − x∗), zk − zk+1〉+ 〈βA(xk+1 − x∗), yk − yk+1〉+ (4.41)

〈xk+1 − x∗, λk+1 − λ∗〉+ 〈A(xk+1 − x∗), γk+1 − γ∗〉 −

〈xk+1 − x∗, R(xk, xk+1)〉

≥ µ‖xk − xk+1‖2 ≥ 0.

Using (4.26) and (4.29), we obtain

〈z∗ − zk+1, λk+1 − λ∗〉 ≥ 0 and 〈zk+1 − zk, λk − λk+1〉 ≥ 0; (4.42)

〈y∗ − yk+1, γk+1 − γ∗〉 ≥ 0 and 〈yk+1 − yk, γk − γk+1〉 ≥ 0. (4.43)

Adding (4.41) — (4.43) together, we have

α〈zk+1 − z∗, zk − zk+1〉+ β〈yk+1 − y∗, yk − yk+1〉+

1

α〈λk − λk+1, λk+1 − λ∗〉+

1

β〈γk − γk+1, γk+1 − γ∗〉

−〈xk+1 − x∗, R(xk, xk+1)〉

≥ 0,

which means

〈ωk − ω∗, ωk − ωk+1〉G ≥ ‖ωk − ωk+1‖2G.

This completes the proof.

Theorem 4.3 The sequence ωk generated by Algorithm 4.4 converges to the unique optimal

solution of (4.25).

Proof. By Lemma 4.8, we get

‖ωk+1 − ω∗‖2G = ‖(ωk+1 − ωk)− (ωk − ω∗)‖2G= ‖ωk − ω∗‖2G − 2〈ωk − ω∗, ωk − ωk+1〉+ ‖ωk − ωk+1‖2G≤ ‖ωk − ω∗‖2G − ‖ωk − ωk+1‖2G,

which implies that ‖ωk − ω∗‖G is a monotonically decreasing sequence, ωk is bounded and

limk→∞

‖ωk+1 − ωk‖G = 0.

Suppose ω is a cluster point of ωk and let ωkl be the subsequence converging to it. By

taking limits over the subsequence in (4.31) and (4.34), we have

x = PX[x− τx[µx− (λ− α(x− z))−AT (γ − β(Ax− y))]

]z = PC

[z − τ(λ− α(x− z))

]y = PQ

[y − σ(γ − β(Ax− y))

]x = z, Ax = y,

20

which means that ω is the solution of the variational inequalities system (4.26) and hence

ω = ω∗. Since there is a subsequence converging to zero in the sequence ‖ωk−ω∗‖G, we have

limk→∞ ‖ωk − ω∗‖G = 0.

Next, we show the O(1/t) convergence rate for Algorithm 4.4. As above, we still let t = 1

and r = 1. In fact, this can be obtained by reformulating the algorithm to the case discussed

in [10]. In their paper, He and Yuan proved that the ADM in the following form

xk+1 = arg minθ1(x) +β

2‖(Ax+Byk − b)− 1

βλk‖2 +

1

2‖x− xk‖2G |x ∈ X,

yk+1 = arg minθ2(y) +β

2‖(Axk+1 +By − b)− 1

βλk‖2 | y ∈ Y,

λk+1 = λk − β(Axk+1 +Byk+1 − b)

for the problem

minθ1(x) + θ2(y) |Ax+By = b, x ∈ X , y ∈ Y

admits an O(1/t) convergence rate in an ergodic sense.

Now, observe that (4.31) (or (4.38)) is equivalent to

xk+1 = arg minµ2‖x‖2 +

α

2‖x− zk − 1

αλk‖2 +

β

2‖Ax− yk − 1

βλk‖2 +

1

2‖x− xk‖2G |x ∈ X.

Combining with (4.32) and (4.33), it reduces to a slight modified version of [10], which also has

a O(1/t) convergence rate in an ergodic sense. Thus we give the result here, and omit the proof.

Denote

ωk :=

xk

zk

yk

λk

γk

=

xk+1

zk+1

yk+1

λk − α(xk+1 − zk)

γk − β(Axk+1 − yk)

.

Theorem 4.4 Let ωk be the sequence generated by Algorithm 4.4. For any integer t > 0, let

ωt be define by

ωt :=1

t+ 1

t∑k=1

ωk. (4.44)

Then we have ωt ∈ Ω and

〈ωt − ω, F (ω)〉 ≤ 1

2(t+ 1)‖ω − ω0‖2G, ∀ω ∈ Ω,

where G is defined in (4.36) and ω0 is the initial point of Algorithm 4.4.

From the analysis of [5], Theorem 2.3.5, the solution set of (4.28) is characterized by

Ω∗ =⋂ω∈Ω

ω ∈ Ω | 〈ω − ω, F (ω)〉 ≥ 0. (4.45)

For any given compact set D ⊂ Ω, let d = sup‖ω−ω0‖G |ω ∈ D. Then after t iterations, the

point ωt defined in (4.44) satisfies

supω∈D〈ωt − ω, F (ω)〉 ≤ d2

2(t+ 1),

which by (4.45) implies that ωt is an approximate solution of (4.28) with the accuracy O(1/t).

21

5 Numerical Results

In this section, we give three examples to test our methods. All the numerical computations

are conducted using an Intel i3 330 CPU personal computer with 2GB of RAM with Matlab

R2011a.

Example 1. Consider the MMSFP problem

x ∈ X ∩t⋂i=1

Ci, Alx ∈rl⋂j=1

Ql,j , l = 1, 2,

where

Ci = x ∈ <N |Li ≤ x ≤ Ui, Ql,j = y ∈ <Ml | ‖y − dl,j‖ ≤ rl,j

are generated by the following matlab-like codes:

Li = rand(N,1)*5; Ui = rand(N,1)*500 +20; i=1,...t.

dl,j = rand(Ml,1)*10 + 60; rl,j= rand(1)*100 + 500; l=1,2,j=1,...rl.

Al=rand(Ml,N)/ max row, l=1,2,

where max row denotes the max row sum of Al, l = 1, 2.

Algorithm 3.2 is chosen to test this example. the parameters are ν = 0.95, β = 0.0005µ =

1.8γ = 1.2, τ = σ = 0.4. The initial point is (0,1,1,1,1), where 0 and 1 represent the all-zero

and all-one vector, respectively. The stopping criterion is∑ti=1 ‖xk − PCi

(xk)‖2 +∑2l=1

∑rlj=1 ‖Alxk − PQl,j

(Alxk)‖2∑t

i=1 ‖x0 − PCi(x0)‖2 +

∑2l=1

∑rlj=1 ‖Alx0 − PQl,j

(Alx0)‖2≤ 10−4.

The results are illustrated in Table 1, where k, kin, cpu represent the iterations, total inner itera-

tions and cpu time, respectively. The unit of the cpu time is second. For every chosen N,Ml, t, r,

we all executed the proposed methods many times, and the results in the tables are the average

values. From the results, we see that the method performs better when N,M1,M2, t, r are small.

Table 1, Algorithm 3.2 for Example 1

N,M1,M2 50,60,70 90,100,110 150,180,200

t = r1 = r2 = 10

k 67.7 41.4 108.9

kin 4.2 11.9 14.6

cpu 0.0838 0.5701 0.1516

t = r1 = r2 = 30

k 166.9 147 279.3

kin 4 11.8 12.1

cpu 3.4966 0.3415 1.1631

t = r1 = r2 = 50

k 239.5 231.3 426.5

kin 4.7 11.9 12

cpu 0.8068 0.854 1.838

Example 2. We consider the MSFP case, where the sets are give by

Ci = x ∈ <N |Li ≤ x ≤ Ui, i = 1, . . . , t,

Q1 = y ∈ <M |h1(y) ≤ 812 , Q2 = y ∈ <M |h2(y) ≤ 2,

22

where Ci are generated by the following matlab-like codes:

Li = rand(N,1)*20+10; Ui = rand(N,1)*40 +40; i=1,...t.

h1 and h2 are two EUD functions given by

h1(y) =

(∑M/2i=1 y4

i

(M/2)

) 14

, h2(y) =

(∑Mi= M

2 +1 y6i

(M/2)

) 16

.

Here y(1 : M/2) and y(M/2 + 1 : M) simulate two structures of a body, respectively. The

matrix A is generated as that of Example 1. We compare Algorithm 3.3 with Algorithm 3.2

of [21] proposed by Zhao and Yang (denoted by ZY’s method), which is a projection gradient-

type method with line search rule. The parameters of our method are taken as ν = 0.95, β =

0.0005, µ = 1.2, γ = 1.9, τ = σ = 0.55. For ZY’s method, as suggested in [21], we take

ρ = 0.8, γ = 1, µ = 0.8. The initial point of our method is (0,0, rand), where “rand” represents

that the entries of a vector are randomly generated in (0, 1), while for ZY’s method, the initial

point is 0. The stopping criterion is

max‖xk − PCi(xk)‖2, q1(Axk), q2(Axk)

max‖x0 − PCi(x0)‖2, q1(Ax0), q2(Ax0)

≤ 10−4.

The results are illustrated in Table 2, where kin denotes the total inner iterations.

From Table 2, we see that when the size of the problem is small(M,N = 50, 100, t = 10, 30),

our method is as competive as ZY’s method. As the size increases, our method performs better

when considering the cpu-time. This indicates that our method is promising for solving large

scale problem. scale problems

Table 2, Algorithm 3.3 and ZY’s method for Example 2

r = 2 M = N = 50 100 500 800

t = 10

Alg 3.3

k 14.8 16.2 16 15.5

kin 2 2 2 2

cpu 0.0142 0.0218 0.0525 0.1149

ZY’s method

k 24.2 24.4 22.9 22.9

kin 0 0 0 0

cpu 0.0123 0.0217 0.0625 0.1555

t = 30

Alg 3.3

k 11.3 11.5 7.9 8.1

kin 3 2.9 3 3

cpu 0.0217 0.025 0.0421 0.0851

ZY’s method

k 25.2 25.5 22.4 22.4

kin 2 2 2 2

cpu 0.028 0.0346 0.086 0.189

t = 50

Alg 3.3

k 10.5 10.2 7.7 7.5

kin 3 3 3 3

cpu 0.0238 0.0279 0.0501 0.0908

ZY’s method

k 26 24.2 22.3 22.1

kin 2 2 2 2

cpu 0.037 0.0444 0.1075 0.214

23

Example 3. The MSFP problem where constraint sets are given by

Ci = x ∈ <N |Li ≤ x ≤ Ui, Qj = y ∈ <M | ‖y − dj‖ ≤ rj,

which are generated by the matlab-like codes as

Li = rand(N,1)*5; Ui = rand(N,1)*50 +20; i=1,...t,

dj = rand(M,1)*10 + 60; rj= rand(1)*100 + 500; j=1,...r.

The matrix A is generated as that of Example 1. Algorithm 4.4 is tested for this example.

First we compare with the method proposed by Xu [17] for finding the least l2 norm solution

of a SFP problem, i.e., the case of t = r = 1. Recall the iterative scheme of Xu

xk+1 = PC [(1− αkγk)xk − γkAT (I − PQ)Axk],

where the author suggested to let the parameters be given by

αk = k−δ and γk = k−σ with 0 < δ < σ < 1 and σ + 2δ < 1.

Here we choose δ = 0.25 and σ = 0.45. In Algorithm 4.4, we take parameters α = 0.1, β = 0.05

and µ = 0.5. The initial point of our method is (x, z, y, λ, γ) = (0,0,0, rand, rand), while the

initial point of Xu’s method is x = 0. The stopping criterion of our method is

max‖xk+1 − xk‖, ‖zk+1 − zk‖, ‖yk+1 − yk‖, ‖λk+1 − λk‖, ‖γk+1 − γk‖max‖x1 − x0‖, ‖z1 − z0‖, ‖y1 − y0‖, ‖λ1 − λ0‖, ‖γ1 − γ0‖

≤ 10−4,

and that of Xu’s method ismax‖xk+1 − xk‖max‖x1 − x0‖

≤ 10−4.

The result is illustrated in Table 3, where p denotes the feasibility of the solution, i.e.,

p = max‖x− PC(x)‖∞, ‖Ax− PQ(Ax)‖∞,

and ‖x‖ is the l2 norm of the solution.

Table 3, Algorithm 4.4 and Xu’s method for Example 3

M = N = 10 50 100 200 500

Alg 4.4

k 12.8 13.4 17.3 20.3 65.4

cpu 0.0046 0.0056 0.0063 0.008 0.031

p 0.0017 0.0015 0 3.65e-04 0.0146

‖x‖ 8.7752 23.8156 176.5826 404.4845 969.3512

Xu’s method

k 2 2 561.8 529.9 677.3

cpu 4.16e-04 7.15e-04 0.0873 0.1929 2.9067

p 0 0 2.5652 6.0038 9.6437

‖x‖ 8.7758 23.8158 136.9682 319.5067 812.7862

From Table 3, we see that when the size is small (M = N = 10, 50), both of the two

methods can sucessifully solve the problem, while Xu’s method performs extremly well. As the

size increases, our method can still achieve the goal with a fast convergence speed, while Xu’s

24

method fails to find the solution in the sense that x is infeasible (p is far away from zero), and

the method needs much more iterations to converge.

Next we test Algorithm 4.4 for Example 3 for the MSFP case with different numbers of

constrant sets. The parameters we set here is αi = 0.005 and βj = 0.0005. The results are

illustrated in Table 4, where we can see that when the size is not too large, the method performs

well; as the size increases, it needs much more iterations to find the solution.

Table 4, Algorithm 4.4 for Example 3

M = N = 10 50 100 200 500

t = r = 5k 47.7 96.2 169.2 511 296.1

cpu 0.0221 0.0563 0.1356 1.0938 2.1917

t = r = 10k 55.8 113.5 140.1 219.6 284

cpu 0.0337 0.0936 0.1843 0.682 2.555

t = r = 30k 46.3 131 202.3 326.1 297.2

cpu 0.0516 0.2047 0.546 2.0472 5.0032

t = r = 50k 49.7 122 218.6 411.6 409.5

cpu 0.0831 0.2852 0.9833 3.5949 8.1403

6 Conclusions

This paper present three modified ADMs for solving two new MSFPs. The new models can

reflect the real world problem, and their nice separable structure enable us to apply ADMs

with parallel feature to solve them. Comparing with some algorithms shows the efficiency

of our methods. Note that our parallel methods are implemented in a personal computer

without parallel-architecture. That means the parallel steps in the methods have to be executed

sequentially. If the methods are implemented in a parallel computer, they may enjoy higher

convergence speed.

7 Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant No.

10871105), Academic Scholarship for Doctoral Candidates, Ministry of Education of China

(Grant No. (190)H0511009) and Scientific Research Foundation for the Returned Overseas

Chinese Scholars, State Education Ministry.

References

[1] Y. Censor, T. Bortfeld, B. Martin and A. Trofimov, A unified approach for inversion

problems in intensity-modulated radiation therapy, Physics in Medicine and Biology, 51

(2006) 2353-2365.

25

[2] Y. Censor, T. Elfving, N. Kopf and T. Bortfeld, The multiple-sets split feasibility problem

and its applications for inverse problems, Inverse Problems 21 (2005) 2071-2084.

[3] Y. Censor, A. Motova and A. Segal, Perturbed Projections and Subgradient Projections for

the Multiple-Sets Split Feasibility Problem, J. Math. Anal Appl. , 327 (2007) 1244-1256.

[4] B. Choi and J. O. Deasy, The generalized equivalent uniform dose function as a basis for

intensity-modulated treatment planning Phys. Med. Biol. 47 (2002) 3579.

[5] F. Facchinei and J.S. Pang, Finite-Dimensional Variational Inequalities and Complemen-

tarity Problems, Vol. I and II. Springer Series in Operations Research. Springer Verlag,

New York, (2003).

[6] E. M. Gafni and D. P. Bertsekas, Two-metric projection methods for constrained optimiza-

tion, SIAM Journal on Control and Optimization, 22 (1984) 936-964.

[7] D. Han and X. Yuan, A Note on the Alternating Direction Method of Multipliers, J. Optim.

Theory Appl. DOI: 10.1007/s10957-012-0003-z.

[8] B. He, M. Tao and X. Yuan, A splitting method for separate convex programming with link-

ing linear constraints, http://www.optimization-online.org/DB FILE/2010/06/2665.pdf.

(2010).

[9] B. He, M. Tao and X. Yuan, On the O(1/t) convergence rate of Eck-

stein and Bertsekas’s generalized alternating direction method of multipliers.

http://www.math.hkbu.edu.hk/ xmyuan/Paper/ADM-EckBes-Nov15-2011.pdf. (2011).

[10] B. He and X. Yuan, On the O(1/t) convergence rate of alternating direction method,

http://www.optimization-online.org/DB FILE/2011/09/3157.pdf. (2011)

[11] B. He and X. Yuan, Linearized Alternating Direction Method with

Gaussian Back Substitution for Separable Convex Programming,

http://www.math.hkbu.edu.hk/ xmyuan/Paper/LADM-Gaussian-Yuan-2011.pdf.

[12] D. Kinderlehrer abd G. Stampacchia, An Introduction to Variational Inequalities and their

Applications, Academic Press, New York, 1980.

[13] B. Qu and N. Xiu, A note on the CQ algorithm for the split feasibility problem, Inverse

Problems 21 (2005) 1655-1665.

[14] J. Sun and S. Zhang, A modified alternating direction method for convex quadratically

constrained quadratic semidefinite programs, Eur. J. Oper. Res., 207 (2010) 1210-1220.

[15] M. Tao and X. Yuan, An inexact parallel splitting augmented Lagrangian method for

monotone variational inequalities with separable structures, Comput. Optim. Appl. DOI

10.1007/s10589-011-9417-z.

[16] PH. L. Toint, Global Convergence of a Class of Trust-Region Methods for Nonconvex

Minimization in Hilbert Space, IMA Journal of Numerical Analysis, 8 (1988) 231-252.

[17] H.-K. Xu , Iterative methods for the split feasibility problem in infinite-dimensional Hilbert

spaces, Inverse Problems, 26 (2010) 105018 (17pp).

26

[18] Q. Yang, The relaxed CQ algorithm solving the split feasibility problem, Inverse Problems

20 (2004) 1261-1266.

[19] X. Yuan, An improved proximal alternating direction method for monotone variational

inequalities with separable structure, Comput. Optim. Appl. 49 (2011) 17-29.

[20] W. Zhang, D. Han and X. Yuan, An efficient simultaneous method for the constrained

multiple-sets split feasibility problem, Comput. Optim. Appl., DOI 10.1007/s10589-011-

9429-8.

[21] J. Zhao and Q. Yang, Self-adaptive projection methods for the multiple-sets split feasibility

problem, Inverse Problems, 27 (2011) 035009.

27

modi ed alternating direction methods for the modi ed ... · pdf filemodi ed alternating...

Documents