modi ed alternating direction methods for the modi ed ... · pdf filemodi ed alternating...
Post on 22-Mar-2018
223 Views
Preview:
TRANSCRIPT
Modified alternating direction methods for the modified
multiple-sets split feasibility problems
Yuning Yang
School of Mathematics Science and LPMC
Nankai University
Tianjin 300071, P.R. China
Email:nk0310145@gmail.com
Su Zhang
Institute of Modern Management, Business School
Nankai University
Tianjin 300071, P.R. China
Email:zhangsu@nankai.edu.cn
Qingzhi Yang
School of Mathematics Science and LPMC
Nankai University
Tianjin 300071, P.R. China
Email:qz-yang@nankai.edu.cn
June 21, 2012
Abstract
In this paper, we propose two new multiple-sets split feasibility problem (MSFP) models,
where the MSFP requires to find a point closest to the intersection of a family of closed
convex sets in one space, such that its image under a linear transformation will be closest to
the intersection of another family of closed convex sets in the image space. This problem
arises in image restoration, signal processing and intensity-modulated radiation therapy
(IMRT). The background of the first new model, called the modified multiple-sets split
feasibility problem (MMSFP), comes from IMRT. Comparing with MSFP, the MMSFP has
three advantages. At the practical level, it is more able to reflect the real world problem;
at the algorithmic level, its structure is more separable and the size of each part is smaller,
which enables us to apply a modified alternating direction method (ADM) to solve it,
which produces parallel steps in each iteration. This parallel feature fits the development
of modern parallel-architecture computers. Then, to overcome the difficulty of computing
projections onto the constraint sets, a special version of this method with the strategy of
projection onto half-space is given. The second new model is to find a least l2-norm solution
1
of the MSFP (or MMSFP). For this problem, a modified ADM with parallel feature is also
provided. The convergence of the three ADMs are established, and the convergence rate
of the third method is shown to be O(1/t). Numerical results provide at the last show the
efficiency of our methods.
Key words: multiple-sets split feasibility problem, alternating direction method, parallel
computing, half-space, convergence rate.
1 Introduction
The multiple-sets split feasibility problem (MSFP) proposed by Censor et al [2, 1], arising in
image restoration, signal processing and intensity-modulated radiation therapy (IMRT) is to
find a vector x, such that
find x ∈t⋂
i=1
Ci such that Ax ∈r⋂
j=1
Qj
where A is an M × N real matrix, Ci, i = 1, . . . , t, Qj , j = 1, . . . , r are closed convex sets
in <N and <M , respectively. Some projection-type methods, such as the projection gradient
method [2, 1, 3, 21] and the alternating direction method (ADM) [20], are proposed to solve
this problelm. The ADM has a nice feature in the sense that it can use the separable structure
of the problem and produces parallel steps in every iteration of the method, which fits the
development of modern parallel-architecture computers. In this paper, motivated by the idea of
parallel computing, we first review the intensity-modulated radiation therapy (IMRT) model,
from which we propose a new MSFP, called the modified multiple-sets split feasibility problem
(MMSFP). Comparing with MSFP, the MMSFP has three advantages: at the practical level,
it is more able to reflect the real world problem; at the algorithmic level, its structure is more
separable and the size of each part is smaller, which can be solved by parallel algorithms. By
fully exploiting the separable structure of the new model, we apply a modified ADM to solve
it, which is a modified version of that proposed by Zhang et al [20]. Comparing with [20],
the modified ADM is more parallelable in every iteration. Then, to overcome the difficulty of
computing projections onto the constraint sets, a special version of the modified ADM with the
strategy of projection onto half-space is given. Such an idea was used by Yang, and Qu and
Xiu for solving the SFP model [18, 13], and by Censor et al, and Zhao and Yang for solving
the MSFP [21, 3], where SFP is a special case of MSFP by restricting t = r = 1. Next, we
intend to find a “good” solution of the MSFP (or MMSFP) model in the sense that its l2-norm
is the least in the solution set. Of course, if in the real world, producing a vector x is expensive,
or the vector represents some harmful materials, then finding a least l2-norm solution of the
MSFP (or MMSFP) is significant. For this problem, a modified ADM with parallel feature is
also provided, and we show its global convergence and the O(1/t) convergence rate. Numerical
results show the efficiency of our methods.
Briefly speaking, the contribution of this paper is as follows:
• A new model of MSFP, called MMSFP, is proposed.
2
• An enhanced model: finding the least l2 norm solution for MSFP (or MMSFP), is pro-
posed.
• Three modified ADMs with parallel feature for solving the above two models and their
convergence are provided. The convergence rate of the last algorithm is provided.
The rest of this paper is organized as follows: in Section 2, we dive into the IMRT model,
from which we propose the modified multiple-sets feasibility problem and recall some useful
lemmas. In Section 3, a simultaneous ADM is reviewed, and then two modified ADMs are
provided with convergence results. The problem for finding the least l2 norm solution of MSFP
(or MMSFP) is proposed in Section 4 and another modified ADM is given, with the convergence
and the O(1/t) convergence rate being provided. Numerical results are shown in Section 5.
2 IMRT, MMSFP and some useful results
In intensity-modulated radiation therapy (IMRT), beams of penetrating radiation are directed
at the tumour lesion from external sources. A multileaf collimator is used to split each beam
into many beamlets with individually controllable intensities. A problem is to find a distribu-
tion of radiation intensities (radiation intensity map) deliverable by all beamlets, which would
result in a clinically-acceptable dose distribution, i.e., such that the dose to each tissue would
be within the desired upper and lower bounds, which are prescribed on the basis of medical
diagnosis, knowledge and experience. To be specific, we first assume that the radiation is deliv-
ered independently from each of the N beamlets arranged in a certain geometry and indexed
by j = 1, 2, . . . , N . The intensity xj of the beamlets, is the j-th entry of the intensity vector
x ∈ <N where <N denotes the radiation intensity space. An important constraint in the radia-
tion intensity space is nonnegativity, namely, we can never deliver negative intensities. Hence,
all deliverable intensity vectors must belong to the nonnegative orthant. There are also other
delivery constraints depending on the technical equipment used to deliver the treatment.
Next, let the entire volume of the patient be divided into M voxels indexed by i = 1, . . . ,M .
Suppose that T anatomical structures have been outlined, including planning target volumes
(PTVs) and the organs at risk (OAR). We denote the set of voxels indices in structure t by
St. Note that individual voxels i may belong to several sets St, i.e., different structures may
overlap. Denote the quantities dij ≥ 0 by the dose absorbed in voxel i due to radiation of unit
intensity from the j-th beamlet. These quantities are calculated in advance. Then the total
dose absorbed in the i-th voxel is given by
hi =
N∑j=1
dijxj , i = 1, . . . ,M,
and in the vector form
h = Dx ∈ <M .
Let ht = Dtx ∈ <Mt be a sub-vector of h that representing the doses absorbed by the t-th
structure, i.e., hti ∈ St, i = 1, . . . ,Mt, t = 1, . . . , T , where Dt is a sub-matrix of D. A typical
constraint is that, in a given critical structure t, the dose should not exceed an upper bound ut.
3
The corresponding constraint set is
Hmax,t = ht ∈ <mt |hti ≤ ut, i = 1, . . . ,Mt. (2.1)
While for the target structures, the dose should not fall below a lower bound lt, and the con-
straint is given by
Hmin,t = ht ∈ <mt |hti ≥ lt, i = 1, . . . ,Mt.
Besides this, there exist EUD constraints. First the EUD function Et : <Mt → < is defined as
Et,α(ht) =
(∑Mt
i=1(hti)α
Mt
)1/α
. (2.2)
For each target structure t, the parameter α is chosen negative and the EUD constraint is
described by
HEUD,t,α = ht ∈ <Mt |Et,α(ht) ≥ Emint,α , α < 0, (2.3)
where Emint,α is given. For the structures at risk, the parameter is chosen α ≥ 1 and the EUD
constraint is given by
HEUD,t,α = ht ∈ <Mt |Et,α(ht) ≤ Emaxt,α , α ≥ 1, (2.4)
where Emaxt,α is also given. These EUD sets have been shown to be convex [4].
Thus, we have a modified multiple-sets split feasibility problem (MMSFP), where some
constraints are defined in the radiation intensity space <N and other constraints are defined
in the space of each structure <Mt , t = 1, . . . , T . The unified problem can be formulated as
follows:
find x ∈ X⋂
(∩i=1Ci) such that ht = Dtx ∈⋂j=1
Htj , t = 1, . . . , T,
where X denotes the nonnegative constraint while Ci represent other constraints; Htj denote
the box constraints (2.1), (2.2) and the EUD constraints (2.3), (2.4), and other constraints; Dt
is a sub-matrix of D corresponding to ht.
The model we proposed here is different from the MSFP model:
find x ∈ X⋂
(∩ti=1Ci) such that h = Dx ∈r⋂j=1
Qj
proposed by Censor et al [2, 1] in the sense that we divide the dose space <M into several sub-
spaces <Mt representing each structure t ∈ 1, . . . , T and the constraints are given separately in
each sub-space. From the practical view, our model is more able to reflect the IMRT. Although
our model would increase the total size of the problem (the sum of the dimension of ht may
be larger than the dimension of h due to the overlap of the structures), the size of each ht is
smaller than h. Moreover, by fully exploiting the separable structure of the modified model, we
can use some new parallel methods to solve it, i.e., the parallel ADM [20]. This can be seen in
Section 3 of this paper.
It is worth mentioning the relationship between MSFP and MMSFP. First, we see that
MSFP is a special case of MMSFP by letting T = 1 in the MMSFP model. Next, let
H := (⋂j=1
H1j )× (
⋂j=1
H2j )× · · · × (
⋂j=1
HTj ),
4
D :=
D1
...
DT
and h :=
h1
...
hT
.
Then the MMSFP is equivalent to
find x ∈ X⋂
(∩i=1Ci) such that h = Dx ∈ H,
which is a special case of MSFP.
In what follows, we consider the constrained MMSFP model, which is described as the
following form:
find x ∈ X ∩ C, such that y1 := A1x ∈ Y1 ∩Q1, . . . , yL := ALx ∈ Y1 ∩QL,
where C :=⋂ti=1 Ci ⊆ <N , Ai ∈ <Mi×N , Qi :=
⋂rij=1Qi,j ⊆ <Mi , i = 1, . . . , L, respectively,
where X , Yi, Ci, Qi,j are all closed convex sets. We suppose the constrained MMSFP has at
least a solution in the rest of this paper.
For the MSFP model, Censor et al [2] proposed the following proximal function to measure
the distance from a point x to the sets Ci and Qj
f(x) :=1
2
t∑i=1
ai‖x− PCi(x)‖2 +
1
2
r∑j=1
bj‖Dx− PQj(Dx)‖2,
where ‖ · ‖ is the l2-norm, ai > 0(i = 1, . . . , t) and bj > 0(j = 1, . . . , r) are coefficients satisfying∑ti=1 ai +
∑rj=1 bj = 1, where PΩ(·) denotes the projection mapping from <n onto the closed
convex set Ω, which is given by
PΩ(x) := arg min‖x− y‖ | y ∈ Ω.
It is easy to see that if MSFP has a solution, then the optimal value of this proximal function
is zero and vice versa. Censor et al [2] proposed to solve the following optimization problem
minf(x) |x ∈ X
to get a solution of the constrained MSFP and used a projection gradient method to solve it
xk+1 = PX (xk − α∇f(xk)),
where α > 0 is the step-size and ∇f(x) is the gradient of f(x) at x given by
∇f(x) =
t∑i=1
ai(x− PCi(x)) +
r∑j=1
bjD>(Dx− PQj
(Dx)).
Similarly, the proximal function for the MMSFP model is given by
f(x) = θ(x) +
L∑i=1
φi(Aix) +
L∑i=1
‖Aix− PYi(Aix)‖2,
where
θ(x) :=1
2
t∑i=1
ai‖x− PCi(x)‖2,
5
φi(Aix) :=1
2
ri∑j=1
bi,j‖Aix− PQi,j(Aix)‖2 i = 1, . . . , L.
The optimization model is
minf(x) |x ∈ X.
The projection gradient method can also be applied to solve this problem. However, using
the projection method to solve MMSFP is the same as to solve MSFP, and it can not fully
explore the separable structure of the MMSFP model. In the next section, we review a parallel
ADM for solving the MSFP model proposed by Zhang et al [20], and then apply a modified
version of this method to the MMSFP model. The following lemmas and definitions will be
useful in the sequel.
Lemma 2.1 ([20]) Let Ω be a closed convex set in <n. Then for any x, y ∈ <n and z ∈ Ω, the
following properties hold:
1. 〈z − PΩ(x), PΩ(x)− x〉 ≥ 0;
2. 〈x− y, PΩ(x)− PΩ(y)〉 ≥ ‖PΩ(x)− PΩ(y)‖2;
3. ‖PΩ(x)− z‖2 ≤ ‖x− z‖2 − ‖PΩ(x)− x‖2.
Lemma 2.2 ([12], Theorem 2.3) Let Ω be a closed convex set in a Hilbert space and let PΩ(x)
be the projection of x onto Ω. Then
〈z − y, y − x〉 ≥ 0, ∀z ∈ Ω ⇔ y = PΩ(x).
Definition 2.1 ([20]) Let F be a mapping defined on the closed convex set Ω ⊆ <n. Then
1. F is called monotone on Ω if
〈F (x)− F (y), x− y〉 ≥ 0, ∀x, y ∈ Ω;
2. F is called ν-inverse strongly monotone (ν-ism) on Ω if there exists a constant ν > 0 such
that
〈F (x)− F (y), x− y〉 ≥ v‖F (x)− F (y)‖2, ∀x, y ∈ Ω;
3. F is called Lipschitz continuous on Ω if there exists a constant L > 0 such that
‖F (x)− F (y)‖ ≤ L‖x− y‖.
3 Parallel ADMs for the MSFP and MMSFP
In this section, we first review a parallel ADM for solving the constrained MSFP problem
proposed by Zhang et al [20] in Section 3.1, and then apply a modified version of this method for
the MMSFP problem in Section 3.2. When dealing with the difficulty of computing projections
onto the constraint sets, we use the strategy of projection onto half-space. This is provided in
Section 3.3. The convergence result is given in Section 3.4.
6
3.1 A Parallel ADM for solving the constrained MSFP problem
In [20], Zhang et al considered the constrained MSFP model
find x ∈ X⋂
(∩ti=1Ci) ⊆ <N such that y = Ax ∈ Y⋂
(∩rj=1Qj) ⊆ <M ,
which is equivalent to the following optimization model
minθ1(x) + θ2(Ax) |x ∈ X , Ax ∈ Y.
By introducing a slack variable y, the model becomes
minθ1(x) + θ2(y) |Ax = y, x ∈ X , y ∈ Y. (3.5)
This linearly constrained model with separable structure in the objective function is studied
intensively in the literature recently [8, 9, 7, 19, 15, 11]. The Lagrangian dual of (3.5) is
maxλ
minx∈X ,y∈Y
L(x, y, λ) := θ1(x) + θ2(y)− 〈λ,Ax− y〉,
where λ is the Lagrangian multiplier. By the first order optimality condition, (x∗, y∗) is a
solution of (3.5) if and only if there exists λ∗ such that ω∗ := (x∗, y∗, λ∗) satisfies the following
variational inequality system〈x− x∗,∇θ1(x∗)−ATλ∗〉 ≥ 0, ∀x ∈ X〈y − y∗,∇θ2(y∗) + λ∗〉 ≥ 0, ∀y ∈ YAx∗ − y∗ = 0.
(3.6)
The classical ADM for solving this optimization problem generates the new iterate ωk+1 =
(xk+1, yk+1, λk+1) via solving the following subproblems〈x− xk+1,∇θ1(xk+1)−A>(λk − β(Axk+1 − yk))〉 ≥ 0, ∀x ∈ X〈y − yk+1,∇θ2(yk+1) + λk − β(Axk+1 − yk+1)〉 ≥ 0, ∀y ∈ Yλk+1 = λk − β(Axk+1 − yk+1),
(3.7)
where β > 0 is the penalty parameter of the linear constraint. Unfortunately, solving (3.7)
is as hard as solving (3.6). By Lemma 2.2, the first two variational inequalities of (3.7) is
equivalent to the following form
xk+1 = PX[xk+1 − τ(∇θ1(xk+1)−A>[λk − β(Axk+1 − yk)])
]yk+1 = PY
[yk+1 − σ(∇θ2(yk+1) + [λk − β(Axk+1 − yk+1)])
],
where τ > 0 and σ > 0 are two parameters. They are still hard to be solved due to the implicit
form. Now, replacing xk+1 by xk at the right hand side of the first equation, and similarly,
replacing yk+1 by yk in the right hand side of the second equation, one just gets two explicit
assignments which can be easily solved. To ensure the global convergence of the method, a
correction step must be added after generating ωk+1 and now the algorithm becomes
Algorithm 3.1 (A Parallel ADM for constrained MSFP)
Prediction step: generating the trial iterate ωk := (xk, yk, λk) byxk = PX
[xk − τk(∇θ1(xk)−A>[λk − β(Axk − yk)])
]yk = PY
[yk − σk(∇θ2(yk) + [λk − β(Axk − yk)])
]λk = λk − β(Axk − yk).
7
Correction step: generating the new iterate ωk+1 by
ωk+1 = ωk − γkd(ωk, ωk),
where τk > 0, σk > 0 and γk > 0 are step-sizes satisfying some conditions, and d(ωk, ωk) is a
certain residue function of ωk and ωk.
To employ the idea of parallel computing, the y-related subproblem of the prediction step
can be replaced by
yk = PY[yk − σk(∇θ2(yk) + [λk − β(Axk+1 − yk)])
].
The only difference is that the term Axk is replaced by Axk. Thus xk and yk can be computed
simultaneously.
To get suitable step-sizes τk and σk, Zhang et al [20] used an armijo-like search rule. Then
they proved the global convergence of the algorithm. These are the main ideas of their paper.
In the next subsection, we give a modified version of Algorithm 3.1 for the constrained MMSFP
model.
3.2 The constrained MMSFP problem: modified parallel ADM I
In this subsection, we apply Algorithm 3.1 to the constrained MMSFP. By making good use
of the separable structure of MMSFP, we propose a modified parallel algorithm for solving
MMSFP. Recall the constrained MMSFP model:
find x ∈ X ∩ C, such that y1 := A1x ∈ Y1 ∩Q1, . . . , yL := ALx ∈ Y1 ∩QL, (3.8)
where C :=⋂ti=1 Ci ⊆ <N , Ai ∈ <Mi×N , Qi :=
⋂rij=1Qi,j ⊆ <Mi , i = 1, . . . , L. Let the
proximal function be
p(x, y1, . . . , yL) = θ(x) +
L∑i=1
φi(yi),
where
θ(x) :=1
2
t∑i=1
ai‖x− PCi(x)‖2, (3.9)
φi(yi) :=1
2
ri∑j=1
bi,j‖yi − PQi,j (yi)‖2 i = 1, . . . , L. (3.10)
Now we do not restrict the coefficients ai and bi,j to satisfy∑ti=1 ai +
∑Li=1
∑rij=1 bi,j = 1. The
reason can be seen later.
Now, finding a solution to the MMSFP is equivalent to solving the following optimization
problem:
minimize p(x, y1, . . . , yL)
subject to x ∈ X , (3.11)
Aix = yi, yi ∈ Yi, i = 1, . . . , L.
8
Lemma 3.3 ([20], Lemma 3) Let θ and φi be defined above. Then ∇θ and ∇φi are inverse
strongly monotone and Lipschitz continuous on X and Yi, i = 1, . . . , L, where ∇ is the gradient
operator. More specifically,
‖∇θ(x1)−∇θ(x2)‖ ≤ Lx‖x1 − x2‖,
where Lx =∑ti=1 ai, x1, x2 ∈ X ,
〈∇θ(x1)−∇θ(x2), x1 − x2〉 ≥ 1/Lx‖∇θ(x1)−∇θ(x2)‖2.
‖∇φi(y1)−∇φi(y2)‖ ≤ Lyi‖y1 − y2‖,
where Lyi =∑rij=1 bi,j, y1, y2 ∈ Yi, i = 1, . . . , L,
〈∇φi(y1)−∇φi(y2), y1 − y2〉 ≥ 1/Lyi‖∇φi(y1)−∇φi(y2)‖2.
For positive parameters τ , β, σ1, . . . , σL, define
M :=
τIN
σ1IM1
. . .
σLIML
1/βIM1
. . .
1/βIML
,
where I denotes the identity matrix and IN ∈ <N , IMi ∈ <Mi . Let ω := (x, y1, . . . , yL, λ1, . . . , λL).
F (ω) :=
∇θ(x)−∑Li=1A
Ti λi
∇φ1(y1) + λ1
. . .
∇φL(yL) + λL
A1x− y1
. . .
ALx− yL
, ξ(ω, ω) :=
∇θ(x)−∇θ(x)
∇φ1(y1)−∇φ1(y1)
. . .
∇φL(yL)−∇φL(yL)
0
d(ω, ω) := M(ω − ω)− ξ(ω, ω)
ϕ(ω, ω) := 〈ω − ω, d(ω, ω)〉+ 〈λ− λ,L∑i=1
Ai(x− x)−L∑i=1
(y − y)〉
We assume that all projections onto the sets X , Yi, Ci, Qi,j are easy to calculate in this
subsection.
Algorithm 3.2 (A Modified Parallel ADM for constrained MMSFP I)
Given arbitrary ν ∈ (0, 1), β > 0, µ > 1, γ ∈ (0, 2), τ0 > 0, σ0 > 0 and ω0 := (x0, y01 , . . . , y
0L, λ
01, . . . , λ
0L).
Let ε > 0 be the error tolerance for an approximate solution and set k = 1.
Step 1. Prediction step: Generate the trial iterate x, y1, . . . , yL simultaneously via Step 2.1
9
and Step 2.2, respectively, and generate λ1, . . . , λL simultaneously via Step 2.3.
Step 1.1 Find the smallest nonnegative integer lk such that τk = µlkτk−1 and
xk = PX
xk − 1
τk[∇θ(xk)−
L∑i=1
ATi (λki − β(Aixk − yki ))]
(3.12)
which satisfies
〈xk − xk, ξ(xk, xk)〉+ β
L∑i=1
‖Aixk −Aixk‖2 ≤ τkν‖xk − xk‖2; (3.13)
Step 1.2 For i = 1, . . . , L, find the smallest nonnegative integers mik such that σik = µm
ikσik−1
and
yik = PYi
yki −
1
σik[∇φi(yki ) + (λki − β(Aix
k − yki ))]
which satisfies
〈yki − yki , φi(yki , yki )〉+ β‖yki − yki ‖2 ≤ σikν‖yki − yki ‖2;
Step 1.3 For i = 1, . . . , L, calculate λki via
λki = λki − β(Aixk − yki ).
Step 2. Correction step: Generate the new iterate ωk+1 via
ωk+1 = ωk − γα∗kd(ωk, ωk),
where
α∗k =ϕ(ωk, ωk)
‖d(ωk, ωk‖2
Step 3. If
p(xk+1) ≤ ε,
then stop; otherwise goto Step 1.
Remark 3.1 In the algorithm, we can first compute x, yi, i = 1, . . . , L in parallel, and then
compute λi, i = 1, . . . , L in parallel. As that of [20], finding suitable parameters τk, σik will
terminate in a finite number of steps, and the sequence τk is bounded above by a positive
number τmax > 0. Similarly, the sequences σik are bounded above by positive numbers σimax >
0.
Remark 3.2 Recall that we drop the assumption that the sum of ai and bi,j equals to 1. Now
supposet∑i=1
ai +
L∑i=1
ri∑j=1
bi,j = C
and let ai = ai/C, bi,j = bi,j/C. Then
t∑i=1
ai +
L∑i=1
ri∑j=1
bi,j = 1.
10
Let ∇θ(x) = ∇θ(x)/C =∑ti=1 ai(I − PΩ)(x). Then Lx = CLx, where Lx and Lx denote the
Lipschitz constant of ∇θ and ∇θ, respectively. Since
〈xk − xk, ξ(xk, xk)〉+ β
L∑i=1
‖Aixk −Aixk‖2 ≤ (Lx + β
L∑i=1
‖ATi Ai‖)‖xk − xk‖2
= (CLx + β
L∑i=1
‖ATi Ai‖)‖xk − xk‖2,
the inequality (3.13) holds as long as τk ≥ (CLx + β∑Li=1 ‖ATi Ai‖)/ν. Replacing ∇θ(xk) by
C∇θ(xk) and τk by (CLx + β∑Li=1 ‖ATi Ai‖)/ν in (3.12), xk can be obtained by
xk = PX
xk − ν
(CLx + β∑Li=1 ‖ATi Ai‖)
[C∇θ(xk)−L∑i=1
ATi (λki − β(Aixk − yki ))]
. (3.14)
We can see that the step-size varies as C varies, which will potentially affect the convergence
speed. More specifically, the step-size of ∇θ(x) is
νC
CLx + β∑Li=1 ‖ATi Ai‖
=ν
Lx−
νβ∑Li=1 ‖ATi A‖
CLx + β∑Li=1 ‖ATi A‖
.
Hence a suitably large number C may speed up the convergence speed. This is not like the
projection gradient method with constant step-size used by Censor et al [2]:
xk+1 = PX [xk − α∇f(xk)] (3.15)
where f(x) =∑ti=1 ai‖x− PCi
(x)‖2 +∑rj=1 bj‖Ax− PQj
(Ax)‖2 and α ∈ [0, 2/Lf ) where Lf is
the Lipschitz constant of ∇f . In this case, if we multiplies ∇f by a constant C, then Lf has
also to be multiplied by C and now α is restricted in [0, 2/(CLf )), which remains the same as
(3.15).
Algorithm 3.2 modifies Algorithm 1 of [20] in Step 1.2, which does not affects the convergence
of the algorithm. Thus we list the convergent result here, and omit the proof.
Theorem 3.1 The sequence ωk generated by Algorithm 3.2 converges to a solution of the
constrained MMSFP problem (3.8).
3.3 The constrained MMSFP problem: modified parallel ADM II
In this subsection, we deal with the case that computing projections onto Ci, Qi,j can not be
easily obtained while computing projections onto X and Yi is still easy. For example, computing
projections onto the EUD constraint sets defined in (2.3) and (2.4) may cost much time. To
overcome this difficulty, we use the strategy of computing projections onto some halfspaces
containing the original sets, instead of directly computing projections onto them. This idea was
used by Yang, and Qu and Xiu for solving the SFP problem [18, 13], and by Censor et al, and
Zhao and Yang for solving the MSFP problem [21, 3].
To this end, it is necessary to suppose that the convex sets Ci and Qi,j satisfy the following
assumptions:
11
• The sets Ci, i = 1, 2, . . . , t, are given by
Ci = x ∈ <N | ci(x) ≤ 0,
where ci : <N → <, i = 1, 2, . . . , t, are convex functions.
• The sets Qi,j , i = 1, . . . , L, j = 1, 2, . . . , ri, are given by
Qi,j = y ∈ <Mi | qi,j(y) ≤ 0,
where qi,j : <Mi → <, i = 1, . . . , L, j = 1, 2, . . . , ri, are convex functions.
• For any x ∈ <N , at least one subgradient ξ ∈ δci(x), i = 1, . . . , t can be calculated, where
δci(x) is the subgradient set of ci(x) at x and it is defined as follows:
δci(x) = ξ ∈ <N | ci(z) ≥ ci(x) + 〈ξ, z − x〉, for all z ∈ <N.
For any y ∈ <Mi , i = 1, . . . , L, at least one subgradient η ∈ δqi,j(y), j = 1, . . . , ri can be
calculated, where δqi,j(y) is defined as follows:
δqi,j(y) = η ∈ <Mi | qi,j(u) ≥ qi,j(y) + 〈η, u− y〉, for all u ∈ <Mi.
In the k-th iteration of our algorithm, let
Cki = x ∈ <N | ci(xk) + 〈ξk, x− xk〉 ≤ 0,
where ξk is an element in δci(xk), i = 1, . . . , t;
Qki,j = y ∈ <Mi | qi,j(yk) + 〈ηk, y − yk〉 ≤ 0,
where ηk is an element in δqi,j(yk), i = 1, . . . , L, j = 1, . . . , ri. It is not hard to verify that Cki ,
Qki,j contain Ci, Qi,j for all k ≥ 1, respectively.
Define
θk(x) :=1
2
t∑i=1
ai‖x− PCki(x)‖2,
φik(yi) :=1
2
ri∑j=1
bi,j‖yi − PQki,j
(yi)‖2.
Lemma 3.3 also holds for ∇θk and ∇φik.
Algorithm 3.3 (A Modified Parallel ADM for constrained MMSFP II)
Given arbitrary ν ∈ (0, 1), β > 0, µ > 1, γ ∈ (0, 2), τ0 > 0, σ0 > 0 and ω0 := (x0, y01 , . . . , y
0L, λ
01, . . . , λ
0L).
Let ε > 0 be the error tolerance for an approximate solution and set k = 1.
Step 1. Prediction step: Generate the trial iterate x, y1, . . . , yL simultaneously via Step 2.1
and Step 2.2, respectively, and generate λ1, . . . , λL simultaneously via Step 2.3.
Step 1.1 Find the smallest nonnegative integer lk such that τk = µlkτk−1 and
xk = PX
xk − 1
τk[∇θk(xk)−
L∑i=1
ATi (λki − β(Aixk − yki ))]
12
which satisfies
〈xk − xk, ξk(xk, xk)〉+ β
L∑i=1
‖Aixk −Aixk‖2 ≤ τkν‖xk − xk‖2; (3.16)
Step 1.2 For i = 1, . . . , L, find the smallest nonnegative integers mik such that σik = µm
ikσik−1
and
yik = PYi
yki −
1
σik[∇φik(yki ) + (λki − β(Aix
k − yki ))]
which satisfies
〈yki − yki , φik(yki , yki )〉+ β‖yki − yki ‖2 ≤ σikν‖yki − yki ‖2; (3.17)
Step 1.3 For i = 1, . . . , L, calculate λki via
λki = λki − β(Aixk − yki ).
Step 2. Correction step: Generate the new iterate ωk+1 via
ωk+1 = ωk − γα∗kdk(ωk, ωk),
where
α∗k =ϕk(ωk, ωk)
‖dk(ωk, ωk‖2
Step 3. If
maxci(xk), qi,j(yk) ≤ ε,
then stop; otherwise goto Step 1.
Remark 3.3 In the algorithm, Fk, ξk, ϕk and dk are similar to F , ξ, ϕ and d defined in
Subsection 3.2, with ∇θ and ∇φi being replaced by ∇θk and ∇φik here, respectively. That is the
difference between Algorithm 3.2 and Algorithm 3.3.
3.4 Convergence
In this subsection, we give the convergence result of Algorithm 3.3. For simplicity, we restrict
L = 1 and omit the suffix “i” in yi, Ai and Qi,j . For other case, the proof is the same.
Since the property of ∇θk(x) and ∇φk(y) are the same as that of ∇θ(x) and ∇φ(y) (the
only difference is the projection convex sets), the following two lemmas coming from [20] still
hold here, where the first lemma depends basically on the monotonicity of Fk, while the second
lemma depends on the armijo-like search rule (3.16) and (3.17):
Lemma 3.4 ([20], Lemma 4) Suppose that ω∗ := (x∗, y∗, λ∗) is a solution of problem (3.8),
and the sequences ωk, ωk are generated by Algorithm 3.3. Then we have
〈ωk − ω∗, dk(ωk, ωk)〉 ≥ ϕk(ωk, ωk).
Proof. Replacing ∇θ and ∇φ by ∇θk and ∇φk respectively in the proof of Lemma 4 of [20]
and repeating the proof, we get the result.
13
Lemma 3.5 ([20], Theorem 1) The following inequality holds
ϕk(ωk, ωk) ≥ %‖ωk − ωk‖2, ∀k ≥ 1,
where ν, β, τ0, σ0 are define in Algorithm 3.3,
% = min(1− ν)τ0, (1− ν)σ0,1
2β.
Using the lemmas above, one obtains
Theorem 3.2 ([20], Theorem 2) Suppose that ω∗ := (x∗, y∗, λ∗) is a solution of problem (3.8).
Then the sequence ωk is Fejer monotone, i.e., there exists a constant C > 0 such that
‖ωk+1 − ω∗‖2 ≤ ‖ωk − ω∗‖2 − C‖ωk − ωk‖2.
This theorem shows that the sequence ‖ωk−ω∗‖ decreases monotonically and is bounded
below, and ωk is bounded. Thus we have
limk→∞
‖ωk − ωk‖ = 0,
which implies that the sequences ωk and ωk have the same cluster points. Without loss of
generality suppose ω is a cluster point of ωk and suppose ωkl → ω as l→∞. Denote
e(x, α) := x− PX (x− αg) (3.18)
and
f(y, α) := y − PY(y − αh),
where g, h are in the same places of x, y, respectively.
Lemma 3.6 ([13, 21]) For any x ∈ <N and α > 0, we have
min1, α‖e(x, 1)‖ ≤ ‖e(x, α)‖ ≤ max1, α‖e(x, 1)‖.
By letting g := ∇θk(xk)−AT (λk − β(Axk − yk)) in (3.18), we get
limk→∞
‖e(xk, 1)‖ ≤ limk→∞
‖xk − xk‖min1, 1/τk
≤ limk→∞
‖xk − xk‖min1, 1/τmax
= 0. (3.19)
The same conclusion also holds for f(yk, 1).
Lemma 3.7 ([13], Lemma 4.1) Suppose h : <n → < is a convex function. Then it is subdiffer-
entiable everywhere and its subdifferentials are uniformly bounded subsets of <n.
In what follows we still denote gk := ∇θk(xk)−AT (λk−β(Axk− yk)). Let ω∗ be a solution
of (MMSFP). Then we obtain from Lemma 2.1 that for any l ≥ 1,
〈xkl − gkl − PX (xkl − gkl), PX (xkl − gkl)− x∗〉 ≥ 0,
14
which means
〈e(xkl , 1)− gkl , xkl − x∗ − e(xkl , 1)〉 ≥ 0
and hence
〈gkl , xkl − x∗〉 ≤ 〈xkl − x∗ + gkl , e(xkl , 1)〉 − ‖e(xkl , 1)‖2. (3.20)
On the other hand, replacing gkl by ∇θkl(xkl) − AT (λkl − β(Axkl − ykl)) and by the fact
that ∇θkl(x∗) = 0 (because x∗ is contained in Cki ), we obtain
〈gkl , xkl − x∗〉 = 〈∇θkl(xkl)−∇θkl(x∗), xkl − x∗〉 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉
=
t∑i=1
ai〈(I − PCkli
)(xkl − x∗), xkl − x∗〉 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉
≥t∑i=1
αi‖xkl − PCkli
(xkl)‖2 − 〈λkl − β(Axkl − ykl), Axkl −Ax∗〉, (3.21)
where the inequality comes from the inverse strong monotonicity of the operator I − PCki.
Letting hkl := ∇φk(xk) + (λk − β(Axk − yk)) and using the same argument, we get
〈hkl , ykl − y∗〉 ≤ 〈ykl − y∗ + hkl , f(ykl , 1)〉 − ‖f(ykl , 1)‖2 (3.22)
and
〈hkl , ykl − y∗〉 ≥r∑j=1
βj‖ykl − PQklj
(ykl)‖2 + 〈λkl − β(Axkl − ykl), ykl − y∗〉. (3.23)
Adding (3.21) and (3.23) and using (3.20) and (3.22), we have
t∑i=1
αi‖xkl − PCkli
(xkl)‖2 +
r∑j=1
βj‖ykl − PQklj
(ykl)‖2
≤ 〈gkl , xkl − x∗〉+ 〈hkl , ykl − y∗〉+ 〈λkl − β(Axkl − ykl), Axkl − ykl +Ax∗ − y∗〉
≤ 〈xkl − x∗ + gkl , e(xkl , 1)〉+ 〈ykl − y∗ + hkl , f(ykl , 1)〉
+〈λkl − β(Axkl − ykl), Axkl − ykl〉.
Since ‖ωk − ωk‖ → 0 and (Axk − yk) = (λk − λk)/β, we get Axkl − ykl → 0, which further
implies Axkl − ykl → 0 as l → ∞. Thus the last term of the right hand side of the inequality
tends to zero as l → ∞. By (3.19) and the boundness of g, h, x, y, the rest terms of the right
hand side also tend to zero as l→∞, which yields
liml→∞
‖xkl − PC
kli
(xkl)‖ = 0
for i = 1, . . . , t and
liml→∞
‖ykl − PQ
klj
(ykl)‖ = 0
for j = 1, . . . , r. Moreover, since PC
kli
(xkl) ∈ Ckli and PQ
klj
(ykl) ∈ Qklj , from the definition of
Ckli and Qklj we obtain for i = 1, . . . , t and j = 1, . . . , r
ci(xkl) + 〈ξkl , P
Ckli
(xkl)− xkl〉 ≤ 0,
15
and
qi(ykl) + 〈ηkl , P
Qkli
(ykl)− ykl〉 ≤ 0.
Passing onto the limits and using Lemma 3.7, we get
ci(x) ≤ 0, i = 1, . . . , t,
and
qj(y) ≤ 0, j = 1, . . . , r.
Moreover, since Axkl−ykl → 0, we have Ax = y, which implies (x, y) is a solution of (MMSFP).
Using w in place of ω∗ in Theorem 3.2, we obtain a nonincreasing sequence ‖ω − ω‖.Furthermore, since there is a subsequence ωkl converging to ω, we get that the whole sequence
converging to ω. This completes the proof.
4 An enhance model for constrained MMSFP: the least
l2-norm optimization problem
Recall the background of the MMSFP model: to find a distribution of radiation intensities
(radiation intensity map) deliverable by all beamlets, which satisfies some conditions. The
beamlets is denoted by a vector x ∈ <N where the j-th entry xj represents the intensity of the
j-th beamlet. But the beamlets may be harmful to the patients, or delivering the beamlets may
be expensive. So in the solution set of MMSFP, we intend to find a “best” solution. Using the
metric function ‖x‖2, our enhanced problem becomes
min‖x‖2 |x is a solution of MMSFP.
For simplicity, in what follows, we only consider the MSFP case, while the MMSFP case is
similar. We suppose MSFP has at least a solution.
Now the model can be formulated as
minimizeµ
2‖x‖2
subject to x ∈ X⋂
(∩ti=1Ci) ⊆ <N , (4.24)
Ax ∈ ∩rj=1Qj ⊆ <M ,
where µ > 0. As that of [14], by introducing slack variables zi, i = 1, . . . , t and yj , j = 1, . . . , r,
we can rewrite (4.24) as
minimizeµ
2‖x‖2
subject to x ∈ X , (4.25)
x = zi, zi ∈ Ci, i = 1, . . . , t,
Ax = yj , yj ∈ Qj , j = 1, . . . , r.
Note that since ‖x‖2 is a strongly convex function, the optimal solution of (4.25) is unique.
16
The Lagrangian dual of (4.25) is
maxλi,γj
minx∈X ,zi∈Ci,yj∈Qj
L(x, zi, yj , λi, γj) :=µ
2‖x‖2 −
t∑i=1
〈λi, x− zi〉 −r∑j=1
〈γj , Ax− yj〉,
where λi ∈ <N , i = 1, . . . , t and γj ∈ <M , j = 1, . . . , r are the Lagrangian multipliers. By
the first order optimality condition, (x∗, z∗i , y∗j ) is a solution of (4.25) if and only if there exists
(λ∗i , γ∗j ) such that ω∗ := (x∗, z∗i , y
∗j , λ∗i , γ∗j ) satisfies the following variational inequalities system
〈x− x∗, µx∗ −∑ti=1 λ
∗i −AT
∑rj=1 γ
∗j 〉 ≥ 0, ∀x ∈ X
〈z − z∗i , λ∗i 〉 ≥ 0, ∀z ∈ Ci, i = 1, . . . , t
〈y − y∗j , AT γ∗j 〉 ≥ 0, ∀y ∈ Qj , j = 1, . . . , r
〈λ− λ∗i , x∗ − z∗i 〉 ≥ 0, ∀λ ∈ <N , i = 1, . . . , t
〈γ∗j , Ax∗ − y∗j 〉 ≥ 0, ∀γ ∈ <M , j = 1, . . . , r
(4.26)
Denote Ω := (X , Ci, Qj ,<N , . . . ,<N , . . .) and
F (ω) :=
µx−
∑ti=1 λi −AT
∑rj=1 γj
λi
γj
x− ziAx− yj
. (4.27)
Then (4.26) can be rewritten in a concise form
〈ω − ω∗, F (ω∗)〉 ≥ 0, ∀ω ∈ Ω. (4.28)
The classical ADM for solving (4.25) generates the new iterate ωk+1 = (xk+1, zk+1i , yk+1
j , λk+1i , γk+1
j )
via solving the following subproblems
〈x− xk+1, µxk+1 −∑ti=1(λki − αi(xk+1 − zki ))−A>
∑rj=1(γkj − βj(Axk+1 − ykj ))〉 ≥ 0, ∀x ∈ X
〈z − zk+1i , λki − αi(xk+1 − zk+1
i )〉 ≥ 0, ∀z ∈ Ci, i = 1, . . . , t
〈y − yk+1j , γkj − βj(Axk+1 − yk+1
j )〉 ≥ 0, ∀y ∈ Qj , j = 1, . . . , r
λk+1i = λki − αi(xk+1 − zk+1
i ), i = 1, . . . , t
γk+1j = γkj − βj(Axk+1 − yk+1
j ), j = 1, . . . , r
(4.29)
To avoid solving the variational inequalities, we convert them into projection operations. By
Lemma 2.2, the first variational inequality of (4.29) is equivalent to
xk+1 = PX[xk+1−τx[µxk+1−
t∑i=1
(λki −αi(xk+1−zki ))−A>r∑j=1
(γkj −βj(Axk+1−ykj ))]], (4.30)
where τx > 0 is an undetermined parameter. By introducing a new term in (4.30) and suitably
choosing τk, we want to eliminate xk+1 in the right hand side of (4.30). To this end, let the
residual function between xk+1 and xk be
R(xk, xk+1) := (ρI −ATAr∑j=1
βj)(xk+1 − xk),
17
where ρ > ρ(ATA)∑rj=1 βj where ρ(ATA) denotes the spectral radius of ATA, and we use the
following iteration
xk+1 = PX[xk+1 − τx[µxk+1 −
t∑i=1
(λki − αi(xk+1 − zki ))−ATr∑j=1
(γkj − βj(Axk+1 − ykj )) +R(xk, xk+1)]]
= PX[(1− τx(µ+
t∑i=1
αi + ρ))xk+1 + τx[
t∑i=1
(λki + αizki ) +AT
r∑j=1
(γkj − βj(Axk − ykj )) + ρxk]](4.31)
instead of (4.30). By choosing suitable parameter
τx := (µ+
t∑i=1
αi + ρ)−1,
we can eliminate xk+1 at the right hand side of (4.31) and by letting
Dk :=
t∑i=1
(λki + αizki ) +AT
r∑j=1
(γkj − βj(Axk − ykj )) + ρxk,
we get
xk+1 = PX (τxDk),
which can be simply computed. Note that the second and the third variational inequalities are
equivalent to
zk+1i = arg minαi
2‖(xk+1 − zi)−
1
αiλki ‖2 | zi ∈ Ci, (4.32)
yk+1j = arg minβj
2‖(Axk+1 − yj)−
1
βjγkj ‖2 | yj ∈ Qj, (4.33)
which have closed-form solutions
zk+1i = PCi(x
k+1 − 1
αiλki )
and
yk+1j = PQj (Axk+1 − 1
βjγkj ).
Thus we get our algorithm
Algorithm 4.4 Compute ωk+1 := (xk+1, zk+1i , yk+1
j , λk+1i , γk+1
j ) by
xk+1 = PX (τxDk)
zk+1i = PCi(x
k+1 − 1αiλki )
yk+1j = PQj (Axk+1 − 1
βjγkj )
λk+1i = λki − αi(xk+1 − zk+1
i ), i = 1, . . . , t
γk+1j = γkj − βj(xk+1 − yk+1
j ), j = 1, . . . , r
(4.34)
Remark 4.4 As the algorithms proposed in Section 2, the main advantage of Algorithm 4.4 is
the parallelable feature. Moreover, when computing zi and yj, the most recent updates of the
variable xk+1 can be used.
18
We should mention that in [17], a least l2-norm problem for SFP in an infinite Hilbert space
was considered by Xu:
min‖x‖2 |x ∈ C,Ax ∈ Q, (4.35)
and he used the following iterative scheme for solving the problem:
xk+1 = PC[(1− αkγk)xk − γkAT (I − PQ)Axk
],
where αk > 0, γk > 0. The author proved that under some conditions, the sequence xkconverges strongly to the solution of (4.35). Since this algorithm is designed for the SFP model,
it is not suitable for our case.
4.1 Convergence and Convergence rate
In this subsection, we prove the global convergence of Algorithm 4.4 and show the O(1/t)
convergence rate for the algorithm. The proof of global convergence is similar to that of [14]
while the convergence rate comes from the framework of [10]. For simplicity, we let t = 1 and
r = 1, while for other case, the proof is the same.
Let ω := (x, z, y, λ, γ). Denote the block diagonal matrix
G :=
ρIN − βATA
αIN
βIM1α/IN
1β IM
, (4.36)
where I denotes the identity matrix and IN ∈ <N , IM ∈ <M . Since G is positive definite, we
can define the G-inner product of ω and ω′ as
〈ω, ω′〉G := xT (ρIN − βATA)x′ + αzT z′ + βyT y′ +1
αλTλ′ +
1
βγT γ′,
and the associated G-norm as
‖ω‖G :=(‖x‖2ρIN−βATA + α‖z‖2 + β‖y‖2 +
1
α‖λ‖2 +
1
β‖γ‖2
) 12 .
Let ω∗ denote the optimal solution of variational inequalities system (4.26).
Lemma 4.8 The sequence ωk generated by Algorithm 4.4 satisfies
〈ωk − ω∗, ωk − ωk+1〉G ≥ ‖ωk − ωk+1‖2G. (4.37)
Proof. Note that by Lemma 2.2, (4.31) can be equivalently written as
〈x−xk+1, µxk+1− (λk−α(xk+1− zk))−AT (γk−β(Axk+1−yk)) +R(xk, xk+1)〉 ≥ 0, ∀x ∈ X .(4.38)
By the last two equalities of (4.34) and letting x = x∗ in (4.38), we get
〈xk+1−x∗,−µxk+1+(λk+1−α(zk+1−zk))+AT (γk+1−β(yk+1−yk))−R(xk, xk+1)〉 ≥ 0. (4.39)
19
Setting x = xk+1 in the first inequality of (4.26), we obtain
〈xk+1 − x∗, µx∗ − λ∗ −AT γ∗〉 ≥ 0 (4.40)
Adding (4.39) and (4.40) yields
〈α(xk+1 − x∗), zk − zk+1〉+ 〈βA(xk+1 − x∗), yk − yk+1〉+ (4.41)
〈xk+1 − x∗, λk+1 − λ∗〉+ 〈A(xk+1 − x∗), γk+1 − γ∗〉 −
〈xk+1 − x∗, R(xk, xk+1)〉
≥ µ‖xk − xk+1‖2 ≥ 0.
Using (4.26) and (4.29), we obtain
〈z∗ − zk+1, λk+1 − λ∗〉 ≥ 0 and 〈zk+1 − zk, λk − λk+1〉 ≥ 0; (4.42)
〈y∗ − yk+1, γk+1 − γ∗〉 ≥ 0 and 〈yk+1 − yk, γk − γk+1〉 ≥ 0. (4.43)
Adding (4.41) — (4.43) together, we have
α〈zk+1 − z∗, zk − zk+1〉+ β〈yk+1 − y∗, yk − yk+1〉+
1
α〈λk − λk+1, λk+1 − λ∗〉+
1
β〈γk − γk+1, γk+1 − γ∗〉
−〈xk+1 − x∗, R(xk, xk+1)〉
≥ 0,
which means
〈ωk − ω∗, ωk − ωk+1〉G ≥ ‖ωk − ωk+1‖2G.
This completes the proof.
Theorem 4.3 The sequence ωk generated by Algorithm 4.4 converges to the unique optimal
solution of (4.25).
Proof. By Lemma 4.8, we get
‖ωk+1 − ω∗‖2G = ‖(ωk+1 − ωk)− (ωk − ω∗)‖2G= ‖ωk − ω∗‖2G − 2〈ωk − ω∗, ωk − ωk+1〉+ ‖ωk − ωk+1‖2G≤ ‖ωk − ω∗‖2G − ‖ωk − ωk+1‖2G,
which implies that ‖ωk − ω∗‖G is a monotonically decreasing sequence, ωk is bounded and
limk→∞
‖ωk+1 − ωk‖G = 0.
Suppose ω is a cluster point of ωk and let ωkl be the subsequence converging to it. By
taking limits over the subsequence in (4.31) and (4.34), we have
x = PX[x− τx[µx− (λ− α(x− z))−AT (γ − β(Ax− y))]
]z = PC
[z − τ(λ− α(x− z))
]y = PQ
[y − σ(γ − β(Ax− y))
]x = z, Ax = y,
20
which means that ω is the solution of the variational inequalities system (4.26) and hence
ω = ω∗. Since there is a subsequence converging to zero in the sequence ‖ωk−ω∗‖G, we have
limk→∞ ‖ωk − ω∗‖G = 0.
Next, we show the O(1/t) convergence rate for Algorithm 4.4. As above, we still let t = 1
and r = 1. In fact, this can be obtained by reformulating the algorithm to the case discussed
in [10]. In their paper, He and Yuan proved that the ADM in the following form
xk+1 = arg minθ1(x) +β
2‖(Ax+Byk − b)− 1
βλk‖2 +
1
2‖x− xk‖2G |x ∈ X,
yk+1 = arg minθ2(y) +β
2‖(Axk+1 +By − b)− 1
βλk‖2 | y ∈ Y,
λk+1 = λk − β(Axk+1 +Byk+1 − b)
for the problem
minθ1(x) + θ2(y) |Ax+By = b, x ∈ X , y ∈ Y
admits an O(1/t) convergence rate in an ergodic sense.
Now, observe that (4.31) (or (4.38)) is equivalent to
xk+1 = arg minµ2‖x‖2 +
α
2‖x− zk − 1
αλk‖2 +
β
2‖Ax− yk − 1
βλk‖2 +
1
2‖x− xk‖2G |x ∈ X.
Combining with (4.32) and (4.33), it reduces to a slight modified version of [10], which also has
a O(1/t) convergence rate in an ergodic sense. Thus we give the result here, and omit the proof.
Denote
ωk :=
xk
zk
yk
λk
γk
=
xk+1
zk+1
yk+1
λk − α(xk+1 − zk)
γk − β(Axk+1 − yk)
.
Theorem 4.4 Let ωk be the sequence generated by Algorithm 4.4. For any integer t > 0, let
ωt be define by
ωt :=1
t+ 1
t∑k=1
ωk. (4.44)
Then we have ωt ∈ Ω and
〈ωt − ω, F (ω)〉 ≤ 1
2(t+ 1)‖ω − ω0‖2G, ∀ω ∈ Ω,
where G is defined in (4.36) and ω0 is the initial point of Algorithm 4.4.
From the analysis of [5], Theorem 2.3.5, the solution set of (4.28) is characterized by
Ω∗ =⋂ω∈Ω
ω ∈ Ω | 〈ω − ω, F (ω)〉 ≥ 0. (4.45)
For any given compact set D ⊂ Ω, let d = sup‖ω−ω0‖G |ω ∈ D. Then after t iterations, the
point ωt defined in (4.44) satisfies
supω∈D〈ωt − ω, F (ω)〉 ≤ d2
2(t+ 1),
which by (4.45) implies that ωt is an approximate solution of (4.28) with the accuracy O(1/t).
21
5 Numerical Results
In this section, we give three examples to test our methods. All the numerical computations
are conducted using an Intel i3 330 CPU personal computer with 2GB of RAM with Matlab
R2011a.
Example 1. Consider the MMSFP problem
x ∈ X ∩t⋂i=1
Ci, Alx ∈rl⋂j=1
Ql,j , l = 1, 2,
where
Ci = x ∈ <N |Li ≤ x ≤ Ui, Ql,j = y ∈ <Ml | ‖y − dl,j‖ ≤ rl,j
are generated by the following matlab-like codes:
Li = rand(N,1)*5; Ui = rand(N,1)*500 +20; i=1,...t.
dl,j = rand(Ml,1)*10 + 60; rl,j= rand(1)*100 + 500; l=1,2,j=1,...rl.
Al=rand(Ml,N)/ max row, l=1,2,
where max row denotes the max row sum of Al, l = 1, 2.
Algorithm 3.2 is chosen to test this example. the parameters are ν = 0.95, β = 0.0005µ =
1.8γ = 1.2, τ = σ = 0.4. The initial point is (0,1,1,1,1), where 0 and 1 represent the all-zero
and all-one vector, respectively. The stopping criterion is∑ti=1 ‖xk − PCi
(xk)‖2 +∑2l=1
∑rlj=1 ‖Alxk − PQl,j
(Alxk)‖2∑t
i=1 ‖x0 − PCi(x0)‖2 +
∑2l=1
∑rlj=1 ‖Alx0 − PQl,j
(Alx0)‖2≤ 10−4.
The results are illustrated in Table 1, where k, kin, cpu represent the iterations, total inner itera-
tions and cpu time, respectively. The unit of the cpu time is second. For every chosen N,Ml, t, r,
we all executed the proposed methods many times, and the results in the tables are the average
values. From the results, we see that the method performs better when N,M1,M2, t, r are small.
Table 1, Algorithm 3.2 for Example 1
N,M1,M2 50,60,70 90,100,110 150,180,200
t = r1 = r2 = 10
k 67.7 41.4 108.9
kin 4.2 11.9 14.6
cpu 0.0838 0.5701 0.1516
t = r1 = r2 = 30
k 166.9 147 279.3
kin 4 11.8 12.1
cpu 3.4966 0.3415 1.1631
t = r1 = r2 = 50
k 239.5 231.3 426.5
kin 4.7 11.9 12
cpu 0.8068 0.854 1.838
Example 2. We consider the MSFP case, where the sets are give by
Ci = x ∈ <N |Li ≤ x ≤ Ui, i = 1, . . . , t,
Q1 = y ∈ <M |h1(y) ≤ 812 , Q2 = y ∈ <M |h2(y) ≤ 2,
22
where Ci are generated by the following matlab-like codes:
Li = rand(N,1)*20+10; Ui = rand(N,1)*40 +40; i=1,...t.
h1 and h2 are two EUD functions given by
h1(y) =
(∑M/2i=1 y4
i
(M/2)
) 14
, h2(y) =
(∑Mi= M
2 +1 y6i
(M/2)
) 16
.
Here y(1 : M/2) and y(M/2 + 1 : M) simulate two structures of a body, respectively. The
matrix A is generated as that of Example 1. We compare Algorithm 3.3 with Algorithm 3.2
of [21] proposed by Zhao and Yang (denoted by ZY’s method), which is a projection gradient-
type method with line search rule. The parameters of our method are taken as ν = 0.95, β =
0.0005, µ = 1.2, γ = 1.9, τ = σ = 0.55. For ZY’s method, as suggested in [21], we take
ρ = 0.8, γ = 1, µ = 0.8. The initial point of our method is (0,0, rand), where “rand” represents
that the entries of a vector are randomly generated in (0, 1), while for ZY’s method, the initial
point is 0. The stopping criterion is
max‖xk − PCi(xk)‖2, q1(Axk), q2(Axk)
max‖x0 − PCi(x0)‖2, q1(Ax0), q2(Ax0)
≤ 10−4.
The results are illustrated in Table 2, where kin denotes the total inner iterations.
From Table 2, we see that when the size of the problem is small(M,N = 50, 100, t = 10, 30),
our method is as competive as ZY’s method. As the size increases, our method performs better
when considering the cpu-time. This indicates that our method is promising for solving large
scale problem. scale problems
Table 2, Algorithm 3.3 and ZY’s method for Example 2
r = 2 M = N = 50 100 500 800
t = 10
Alg 3.3
k 14.8 16.2 16 15.5
kin 2 2 2 2
cpu 0.0142 0.0218 0.0525 0.1149
ZY’s method
k 24.2 24.4 22.9 22.9
kin 0 0 0 0
cpu 0.0123 0.0217 0.0625 0.1555
t = 30
Alg 3.3
k 11.3 11.5 7.9 8.1
kin 3 2.9 3 3
cpu 0.0217 0.025 0.0421 0.0851
ZY’s method
k 25.2 25.5 22.4 22.4
kin 2 2 2 2
cpu 0.028 0.0346 0.086 0.189
t = 50
Alg 3.3
k 10.5 10.2 7.7 7.5
kin 3 3 3 3
cpu 0.0238 0.0279 0.0501 0.0908
ZY’s method
k 26 24.2 22.3 22.1
kin 2 2 2 2
cpu 0.037 0.0444 0.1075 0.214
23
Example 3. The MSFP problem where constraint sets are given by
Ci = x ∈ <N |Li ≤ x ≤ Ui, Qj = y ∈ <M | ‖y − dj‖ ≤ rj,
which are generated by the matlab-like codes as
Li = rand(N,1)*5; Ui = rand(N,1)*50 +20; i=1,...t,
dj = rand(M,1)*10 + 60; rj= rand(1)*100 + 500; j=1,...r.
The matrix A is generated as that of Example 1. Algorithm 4.4 is tested for this example.
First we compare with the method proposed by Xu [17] for finding the least l2 norm solution
of a SFP problem, i.e., the case of t = r = 1. Recall the iterative scheme of Xu
xk+1 = PC [(1− αkγk)xk − γkAT (I − PQ)Axk],
where the author suggested to let the parameters be given by
αk = k−δ and γk = k−σ with 0 < δ < σ < 1 and σ + 2δ < 1.
Here we choose δ = 0.25 and σ = 0.45. In Algorithm 4.4, we take parameters α = 0.1, β = 0.05
and µ = 0.5. The initial point of our method is (x, z, y, λ, γ) = (0,0,0, rand, rand), while the
initial point of Xu’s method is x = 0. The stopping criterion of our method is
max‖xk+1 − xk‖, ‖zk+1 − zk‖, ‖yk+1 − yk‖, ‖λk+1 − λk‖, ‖γk+1 − γk‖max‖x1 − x0‖, ‖z1 − z0‖, ‖y1 − y0‖, ‖λ1 − λ0‖, ‖γ1 − γ0‖
≤ 10−4,
and that of Xu’s method ismax‖xk+1 − xk‖max‖x1 − x0‖
≤ 10−4.
The result is illustrated in Table 3, where p denotes the feasibility of the solution, i.e.,
p = max‖x− PC(x)‖∞, ‖Ax− PQ(Ax)‖∞,
and ‖x‖ is the l2 norm of the solution.
Table 3, Algorithm 4.4 and Xu’s method for Example 3
M = N = 10 50 100 200 500
Alg 4.4
k 12.8 13.4 17.3 20.3 65.4
cpu 0.0046 0.0056 0.0063 0.008 0.031
p 0.0017 0.0015 0 3.65e-04 0.0146
‖x‖ 8.7752 23.8156 176.5826 404.4845 969.3512
Xu’s method
k 2 2 561.8 529.9 677.3
cpu 4.16e-04 7.15e-04 0.0873 0.1929 2.9067
p 0 0 2.5652 6.0038 9.6437
‖x‖ 8.7758 23.8158 136.9682 319.5067 812.7862
From Table 3, we see that when the size is small (M = N = 10, 50), both of the two
methods can sucessifully solve the problem, while Xu’s method performs extremly well. As the
size increases, our method can still achieve the goal with a fast convergence speed, while Xu’s
24
method fails to find the solution in the sense that x is infeasible (p is far away from zero), and
the method needs much more iterations to converge.
Next we test Algorithm 4.4 for Example 3 for the MSFP case with different numbers of
constrant sets. The parameters we set here is αi = 0.005 and βj = 0.0005. The results are
illustrated in Table 4, where we can see that when the size is not too large, the method performs
well; as the size increases, it needs much more iterations to find the solution.
Table 4, Algorithm 4.4 for Example 3
M = N = 10 50 100 200 500
t = r = 5k 47.7 96.2 169.2 511 296.1
cpu 0.0221 0.0563 0.1356 1.0938 2.1917
t = r = 10k 55.8 113.5 140.1 219.6 284
cpu 0.0337 0.0936 0.1843 0.682 2.555
t = r = 30k 46.3 131 202.3 326.1 297.2
cpu 0.0516 0.2047 0.546 2.0472 5.0032
t = r = 50k 49.7 122 218.6 411.6 409.5
cpu 0.0831 0.2852 0.9833 3.5949 8.1403
6 Conclusions
This paper present three modified ADMs for solving two new MSFPs. The new models can
reflect the real world problem, and their nice separable structure enable us to apply ADMs
with parallel feature to solve them. Comparing with some algorithms shows the efficiency
of our methods. Note that our parallel methods are implemented in a personal computer
without parallel-architecture. That means the parallel steps in the methods have to be executed
sequentially. If the methods are implemented in a parallel computer, they may enjoy higher
convergence speed.
7 Acknowledgement
This work was supported by the National Natural Science Foundation of China (Grant No.
10871105), Academic Scholarship for Doctoral Candidates, Ministry of Education of China
(Grant No. (190)H0511009) and Scientific Research Foundation for the Returned Overseas
Chinese Scholars, State Education Ministry.
References
[1] Y. Censor, T. Bortfeld, B. Martin and A. Trofimov, A unified approach for inversion
problems in intensity-modulated radiation therapy, Physics in Medicine and Biology, 51
(2006) 2353-2365.
25
[2] Y. Censor, T. Elfving, N. Kopf and T. Bortfeld, The multiple-sets split feasibility problem
and its applications for inverse problems, Inverse Problems 21 (2005) 2071-2084.
[3] Y. Censor, A. Motova and A. Segal, Perturbed Projections and Subgradient Projections for
the Multiple-Sets Split Feasibility Problem, J. Math. Anal Appl. , 327 (2007) 1244-1256.
[4] B. Choi and J. O. Deasy, The generalized equivalent uniform dose function as a basis for
intensity-modulated treatment planning Phys. Med. Biol. 47 (2002) 3579.
[5] F. Facchinei and J.S. Pang, Finite-Dimensional Variational Inequalities and Complemen-
tarity Problems, Vol. I and II. Springer Series in Operations Research. Springer Verlag,
New York, (2003).
[6] E. M. Gafni and D. P. Bertsekas, Two-metric projection methods for constrained optimiza-
tion, SIAM Journal on Control and Optimization, 22 (1984) 936-964.
[7] D. Han and X. Yuan, A Note on the Alternating Direction Method of Multipliers, J. Optim.
Theory Appl. DOI: 10.1007/s10957-012-0003-z.
[8] B. He, M. Tao and X. Yuan, A splitting method for separate convex programming with link-
ing linear constraints, http://www.optimization-online.org/DB FILE/2010/06/2665.pdf.
(2010).
[9] B. He, M. Tao and X. Yuan, On the O(1/t) convergence rate of Eck-
stein and Bertsekas’s generalized alternating direction method of multipliers.
http://www.math.hkbu.edu.hk/ xmyuan/Paper/ADM-EckBes-Nov15-2011.pdf. (2011).
[10] B. He and X. Yuan, On the O(1/t) convergence rate of alternating direction method,
http://www.optimization-online.org/DB FILE/2011/09/3157.pdf. (2011)
[11] B. He and X. Yuan, Linearized Alternating Direction Method with
Gaussian Back Substitution for Separable Convex Programming,
http://www.math.hkbu.edu.hk/ xmyuan/Paper/LADM-Gaussian-Yuan-2011.pdf.
[12] D. Kinderlehrer abd G. Stampacchia, An Introduction to Variational Inequalities and their
Applications, Academic Press, New York, 1980.
[13] B. Qu and N. Xiu, A note on the CQ algorithm for the split feasibility problem, Inverse
Problems 21 (2005) 1655-1665.
[14] J. Sun and S. Zhang, A modified alternating direction method for convex quadratically
constrained quadratic semidefinite programs, Eur. J. Oper. Res., 207 (2010) 1210-1220.
[15] M. Tao and X. Yuan, An inexact parallel splitting augmented Lagrangian method for
monotone variational inequalities with separable structures, Comput. Optim. Appl. DOI
10.1007/s10589-011-9417-z.
[16] PH. L. Toint, Global Convergence of a Class of Trust-Region Methods for Nonconvex
Minimization in Hilbert Space, IMA Journal of Numerical Analysis, 8 (1988) 231-252.
[17] H.-K. Xu , Iterative methods for the split feasibility problem in infinite-dimensional Hilbert
spaces, Inverse Problems, 26 (2010) 105018 (17pp).
26
[18] Q. Yang, The relaxed CQ algorithm solving the split feasibility problem, Inverse Problems
20 (2004) 1261-1266.
[19] X. Yuan, An improved proximal alternating direction method for monotone variational
inequalities with separable structure, Comput. Optim. Appl. 49 (2011) 17-29.
[20] W. Zhang, D. Han and X. Yuan, An efficient simultaneous method for the constrained
multiple-sets split feasibility problem, Comput. Optim. Appl., DOI 10.1007/s10589-011-
9429-8.
[21] J. Zhao and Q. Yang, Self-adaptive projection methods for the multiple-sets split feasibility
problem, Inverse Problems, 27 (2011) 035009.
27
top related