algebraic iterative methods for computed tomographypcha/hdtomo/sc/week2day1.pdf ·...

51
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic Iterative Methods 1 / 51

Upload: others

Post on 31-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Algebraic Iterative Methods for Computed Tomography

Per Christian Hansen

DTU ComputeDepartment of Applied Mathematics and Computer Science

Technical University of Denmark

Per Christian Hansen Algebraic Iterative Methods 1 / 51

Page 2: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Plan for Today

1 A bit of motivation.2 The algebraic formulation.3 Matrix notation and interpretation.4 Kaczmarz’s method (also known as ART) – fully sequential.5 Cimmino’s method and vatiants – fully simultaneous.6 More linear algebra: null space and least squares.7 The optimization viewpoint.

Points to take home today:Linear algebra provides a concise framework for formulating thealgorithms.Convergence analysis of iterative algebraic methods:

Kaczmarz’s method = ART converges for consistent problems only.Cimmino’s method always converges.

Be careful with the null space.Least squares problems always have a solution.The optimization viewpoint leads to a broad class of iterative methods.

Per Christian Hansen Algebraic Iterative Methods 2 / 51

Page 3: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

FBP: Filtered Back Projection

This is the classical method for 2D reconstructions.There are similar methods for 3D, such as FDK.Many year of use → lots of practical experience.The FBP method is very fast (it uses the Fast Fourier Transform)!The FBP method has low memory requirements.With many data, FBP gives very good results.Example with 3% noise:

Per Christian Hansen Algebraic Iterative Methods 3 / 51

Page 4: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

FBP Versus Algebraic Methods

Limited data, or nonuniform distribution of projection angles or rays→ artifacts appear in FBP reconstructions.Difficult to incorporate constraints (e.g., nonnegativity) in FBP.Algebraic methods are more flexible and adaptive.Same example with 3% noise and projection angles 15◦, 30◦, . . . , 180◦:

Algebraic Reconstruction Technique, box constraints (pixel values ∈ [0,1]).Per Christian Hansen Algebraic Iterative Methods 4 / 51

Page 5: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Another Motivating Example: Missing Data

Irregularly spaced angles & “missing” angles also cause difficulties for FBP:

Per Christian Hansen Algebraic Iterative Methods 5 / 51

Page 6: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Setting up the Algebraic Model

The damping of the ith X-ray through the object is a line integral of theattenuation coefficient χ(s) along the ray (from Lambert-Beer’s law):

bi =

∫rayi

χ(s) d`, i = 1, 2, . . . ,m.

Assume that χ(s) is a constant xj in pixel j . This leads to:

bi =∑

j∼rayi

aij xj , aij = length of rayi in pixel j ,

where the sum is over those pixels j that are intersected by rayi .

If we define aij = 0 for those pixels not intersected by rayi , then we have asimple sum

bi =n∑

j=1

aij xj , n = number of pixels.

Per Christian Hansen Algebraic Iterative Methods 6 / 51

Page 7: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

A Big and Sparse System

If we collect all m equations then we arrive at a system of linear equations

Ax = b

with a very sparse system matrix A. Example: 5× 5 pixels and 9 rays:

A really big advantage is that we only set up equations for the data that weactually have. In case of missing data, e.g., for certain projection angles orcertain rays in a projection, we just omit those from the linear system.

Per Christian Hansen Algebraic Iterative Methods 7 / 51

Page 8: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The System Matrix is Very Sparse

Another example: 256× 256 pixels and 180 projections with 362 rays each.The system matrix A is 65, 160× 65, 536 and has ≈ 4.27 · 109 elements.There are 15, 018, 524 nonzero elements corresponding to a fill of 0.35%.

Per Christian Hansen Algebraic Iterative Methods 8 / 51

Page 9: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Simplest Algebraic Problem

One unknown, no noise:

Now with noise in the data – compute a weighted average:

We know from statistics that solution’s variance is inversely proportional tothe number of data. So more data is better.

Let us immediately continue with a 2× 2 image . . .

Per Christian Hansen Algebraic Iterative Methods 9 / 51

Page 10: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

A “Sudoku” Problem

Four unknowns, four rays → system of linear equations Ax = b:

Unfortunately there are infinitely many solutions, with k ∈ R:

(There is an arbitrary component in the null space of the matrix A.)Per Christian Hansen Algebraic Iterative Methods 10 / 51

Page 11: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

More Data Gives a Unique Solution

With enough rays the problem has a unique solution.Here, one more ray is enough to ensure a full-rank matrix:

The “difficulties” associated with the discretized tomography problem areclosely linked with properties of the coefficient matrix A:

The sensitivity of the solution to the data errors is characterized bythe condition number κ = ‖A‖2 · ‖A−1‖2 (not discussed today).The uniqueness of the solution is characterized by the rank of A, thenumber of linearly independent row or columns.

Per Christian Hansen Algebraic Iterative Methods 11 / 51

Page 12: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Algebraic Reconstruction Methods

In principle, all we need to do in the algebraic formulation is to solvethe large sparse linear system Ax = b:

Math: x = A−1b, MATLAB: x = A\b.

How hard can that be?Actually, this can be a formidable task if we try do use a traditionalapproach such as Gaussian elimination.Researchers in tomography have therefore focused on the use ofiterative solvers – and they have rediscovered many methodsdeveloped by mathematicians . . .In tomography they are called algebraic reconstruction methods.They are much more flexible than FBP, but at a highercomputational cost!

Per Christian Hansen Algebraic Iterative Methods 12 / 51

Page 13: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Some Algebraic Reconstruction Methods

Fully Sequential MethodsKaczmarz’s method + variants.These are row-action methods: they update the solution using one rowof A at a time.Fast convergence.

Fully Simultaneous MethodsLandweber, Cimmino, CAV, DROP, SART, SIRT, . . .These methods use all the rows of A simultaneously in one iteration(i.e., they are based on matrix multiplications).Slower convergence.

Krylov subspace methods (rarely used in tomography)CGLS, LSQR, GMRES, . . .These methods are also based on matrix multiplications.

Per Christian Hansen Algebraic Iterative Methods 13 / 51

Page 14: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Matrix Notation and Interpretation

Notation:

A =

| | |c1 c2 · · · cn

| | |

=

−−− r1 −−−...

−−− rm −−−

,

The matrix A maps the discretized absorption coefficients (the vector x) tothe data in the detector pixels (the elements of the vector b) via:

b =

b1b2...bm

= Ax = x1 c1 + x2 c2 + · · ·+ xn cn︸ ︷︷ ︸linear combination of columns

=

r1 · xr2 · x...

rm · x

.

The ith row of A maps x to detector element i via the ith ray:

bi = r i · x =n∑

j=1

aij xj , i = 1, 2, . . . ,m.

Per Christian Hansen Algebraic Iterative Methods 14 / 51

Page 15: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Example of Column Interpretation

A 32× 32 image has four nonzero pixels with intensities 1, 0.8, 0.6, 0.4.In the vector x these four pixels correspond to entries 468, 618, 206, 793.Hence the sinogram, represented as a vector b, takes the form

b = 0.6 c206 + 1.0 c468 + 0.8 c618 + 0.4 c793.

Per Christian Hansen Algebraic Iterative Methods 15 / 51

Page 16: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Forward and Back Again

Forward projection. Consider ray i :aij = length of ray i in pixel jr i = [ ai1 ai2 0 ai4 0 0 ai7 0 0 ]

bi = r i · x = ai1x1 + ai2x2 + ai4x4 + ai7x7

The forward projection b = Ax maps all imagepixels xj along the ray to the detector data bi .

�����

���

���

ray i

x1

x2

x3

x4

x5

x6

x7

x8

x9

Back projection. Another name for multipli-cation with AT :

ATb =m∑i=1

bi r i =m∑i=1

biai1biai2...

Each data bi is “distributed back” along theith ray to the pixels, with “weights” = aij :bi r i = [ biai1 biai2 0 biai4 0 0 biai7 0 0 ]T

����

���

���

�ray i

biai1

biai2

0

biai4

0

0

biai7

0

0

Per Christian Hansen Algebraic Iterative Methods 16 / 51

Page 17: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Geometric Interpretation of Ax = b

r1 · x = a11x1 + a12x2 + · · ·+ a1nxn = b1

r2 · x = a21x1 + a22x2 + · · ·+ a2nxn = b2...

rm · x = am1x1 + am2x2 + · · ·+ amnxn = bm.

Each equation r i · x = bi defines an affine hyperplane in Rn:

-

6

������������

x1

x2

ai1x1 + ai2x2 = bi

R2

-

6

����

SSSSSSS

""

"""SSSSSSS

"""

""

x1

x2

x3 R3

���

x2

ai1x1 + ai2x2 + ai3x3 = bi

Per Christian Hansen Algebraic Iterative Methods 17 / 51

Page 18: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Geometric Interpretation of the Solution

Assuming that the solution to Ax = b is unique, it is the point x ∈ Rn

where all the m affine hyperplanes intersect.

Example with m = n = 2:

-

6

������������

QQQQQQQQQQQQ

x1

x2

a11x1 + a12x2 = b1

a21x1 + a22x2 = b2

R2

Per Christian Hansen Algebraic Iterative Methods 18 / 51

Page 19: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Kaczmarz’s Method = Algebraic Reconstruction Technique

A simple iterative method based on the geometric interpretation.

In each iteration, and in a cyclic fashion, compute the new iteration vectorsuch that precisely one of the equations is satisfied.

This is achieved by projecting the current iteration vector x on one of thehyperplanes r i · x = bi for i = 1, 2, . . . ,m, 1, 2, . . . ,m, 1, 2, . . .

Originally proposed in 1937, and independently suggested under the nameART by Gordon, Bender & Herman in 1970 for tomographic reconstruction.

Per Christian Hansen Algebraic Iterative Methods 19 / 51

Page 20: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Orthogonal Projection on Affine Hyperplane

�������

�������

Hi = {x ∈ Rn | r i · x = bi}

rO

6r i

������ z

�����������

������:Pi (z)

The orthogonal projection Pi (z) of an arbitrary point z on the affinehyperplane Hi defined by r i · x = bi is given by:

Pi (z) = z +bi − r i · z‖r i‖22

r i , ‖r i‖22 = r i · r i .

In words, we scale the row vector r i by (bi − r i · z)/‖r i‖22 and add it to z .

Per Christian Hansen Algebraic Iterative Methods 20 / 51

Page 21: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Kaczmarz’s Method

We thus obtain the following algebraic formulation:

Basic Kaczmarz algorithm

x (0) = initial vectorfor k = 0, 1, 2, . . .

i = k (mod m)

x (k+1) = Pi

(x (k)

)= x (k) +

bi − r i · x (k)

‖r i‖22r i

end

Each time we have performed m iterations of this algorithm, we haveperformed one sweep over the rows of A. Other choices of sweeps:

Symmetric Kaczmarz: i = 1, 2, . . . ,m−1,m,m−1, . . . , 3, 2.Randomized Kaczmarz: select row i randomly, possibly withprobability proportional to the row norm ‖r i‖2.

Per Christian Hansen Algebraic Iterative Methods 21 / 51

Page 22: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Convergence Issues

The convergence of Kaczmarz’s method is quite obvious from the graph onslide 19 – but can we say more?

Difficulty: the ordering of the rows of A influences the convergence rate:

1.0 1.01.0 1.11.0 3.01.0 3.7

x =

2.02.14.04.7

The ordering 1–3–2–4 is preferable: almost twice as fast.

Per Christian Hansen Algebraic Iterative Methods 22 / 51

Page 23: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Convergence of Kaczmarz’s Method

One way to avoid the difficulty associated with influence of the ordering ofthe rows is to assume that we select the rows randomly.

For simplicity, assume that A is invertible and that all rows of A are scaledto unit 2-norm. Then the expected value E(·) of the error norm satisfies:

E(‖x (k) − x̄‖22

)≤(1− 1

n κ2

)k‖x (0) − x̄‖22, k = 1, 2, . . . ,

where x̄ = A−1b and κ = ‖A‖2 ‖A−1‖2. This is linear convergence.

When κ is large we have(1− 1

n κ2

)k≈ 1− k

n κ2 .

After k = n steps, corresp. to one sweep, the reduction factor is 1− 1/κ2.Note that there are often orderings for which the convergence is faster!

Per Christian Hansen Algebraic Iterative Methods 23 / 51

Page 24: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Cyclic Convergence

So far we have assumed that there is a unique solution x̄ = A−1b thatsatisfies Ax = b, i.e., all the affine hyperplanes associated with the rowsof A intersect in a single point.

What happens when this is not true? → cyclic convergence:

m = 3, n = 2

Kaczmarz’s method can be brought to converge to a unique point, and wewill discuss the modified algorithm later today.

Per Christian Hansen Algebraic Iterative Methods 24 / 51

Page 25: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

From Sequential to Simultaneous Updates

Karzmarz’s method accesses the rows sequentially. Cimmino’s methodaccesses the rows simultaneously and computes the next iteration vector asthe average of the all the projections of the previous iteration vector:

x (k+1) =1m

m∑i=1

Pi

(x (k)

)=

1m

m∑i=1

(x (k) +

bi − r i · x (k)

‖r i‖22r i)

= x (k) +1m

m∑i=1

bi − r i · x (k)

‖r i‖22r i .

�������

QQQQQQQ

QQ

QQ

��

���

H1

H2 rx (k)JJ

P1(x (k)) r

P2(x (k))r

x (k+1)���r

Per Christian Hansen Algebraic Iterative Methods 25 / 51

Page 26: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Matrix formulation of Cimmino’s Method

We can write the updating in our matrix-vector formalism as follows

x (k+1) = x (k) +1m

m∑i=1

bi − r i · x (k)

‖r i‖22r i

= x (k) +1m

( r1‖r1‖22

· · · rm‖rm‖22

) b1 − r1 · x (k)

...bm − rm · x (k)

= x (k) +1m

r1...

rm

T‖r1‖−2

2. . .

‖rm‖−22

b −

r1...

rm

x (k)

= x (k) + ATM−1(b − Ax (k)

),

where we introduced the diagonal matrix M = diag(m‖r i‖22

).

Per Christian Hansen Algebraic Iterative Methods 26 / 51

Page 27: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Cimmino’s Method

We thus obtain the following formulation:

Basic Cimmino algorithm

x (0) = initial vectorfor k = 0, 1, 2, . . .

x (k+1) = x (k) + ATM−1(b − Ax (k))

end

Note that one iteration here involves all the rows of A, while one iterationin Kaczmarz’s method involves a single row.Therefore, the computational work in one Cimmino iteration is equivalentto m iterations (a sweep over all the rows) in Kaczmarz’s basic algorithm.

The issue of finding a good row ordering is, of course, absent fromCimmino’s method.

Per Christian Hansen Algebraic Iterative Methods 27 / 51

Page 28: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Convergence Study

Assume x (0) = 0 and let I denote the n × n identity matrix; then:

x (k+1) =k∑

j=0

(I − ATM−1A)jATM−1b

=(I − (I − ATM−1A)k+1

)(ATM−1A)−1ATM−1b.

If A is invertible then

(ATM−1A)−1ATM−1b = A−1M A−TATM−1b = A−1b.

Moreover, the largest eigenvalue of the symmetric matrix I − ATM−1A isstrictly smaller than one, and therefore(

I − (I − ATM−1A)k+1)→ I for k →∞.

Hence the iterates x (k) converge to the solution x̄ = A−1b.Per Christian Hansen Algebraic Iterative Methods 28 / 51

Page 29: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Convergence of Cimmino’s Method

To simplify the result, assume that A is invertible and that the rows of Aare scaled such that ‖A‖22 = m. Then

‖x (k) − x̄‖22 ≤(1− 2

1 + κ2

)k‖x (0) − x̄‖22

where x̄ = A−1b, κ = ‖A‖2 ‖A−1‖2, and we have linear convergence.

When κ� 1 then we have the approximate upper bound

‖x (k) − x̄‖22 <∼ (1− 2/κ2)k ‖x (0) − x̄‖22,

showing that in each iteration the error is reduced by a factor 1− 2/κ2.

This is almost the same factor as in one sweep through the rows of A inKaczmarz’s method.

Per Christian Hansen Algebraic Iterative Methods 29 / 51

Page 30: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Rectangular and/or Rank Deficient Matrices

In tomography, A ∈ Rm×n is almost always a rectangular matrix: m 6= n.It is also very common that A does not have full rank.We need to set the stage for treating such matrices.

The rank r of A is the number of linearly independent rows (equal to thenumber of linearly independent columns), and r ≤ min(m, n).

The range R(A) is the linear subspace spanned by the columns of A:

R(A) ≡ {u ∈ Rm |u = α1c1 + α2c2 + · · ·+ αncn, arbitrary αj}. (1)

The null space N (A) is the linear subspace of all vectors mapped to zero:

N (A) ≡ {v ∈ Rn |Av = 0}. (2)

The dimensions of the two subspaces are r and n−r , respectively.

Per Christian Hansen Algebraic Iterative Methods 30 / 51

Page 31: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

A Small Example

Consider the 3× 3 matrix

A =

1 2 34 5 67 8 9

.

This matrix has rank r = 2 since the middle row is the average of the firstand third rows which are linearly independent.

The range R(A) and null space N (A) consist of all vectors of the forms

α1

147

+ α2

369

=

α1 + 3α24α1 + 6α27α1 + 9α2

and α3

1−21

,

respectively, for arbitrary α1, α2, and α3.

Per Christian Hansen Algebraic Iterative Methods 31 / 51

Page 32: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

A Small Example, Continued

Consider two linear systems with the matrix A from the previous example:1 2 34 5 67 8 9

x =

142050

and

1 2 34 5 67 8 9

x =

61524

.

The left system has no solution because b /∈ R(A); no matter which linearcombination of the columns of A we create, we can never form this b.

The right system has infinitely many solutions; any vector of the form

x =

111

+ α

1−21

, α arbitrary

satisfies this equation. The arbitrary component is in the null space N (A).

Per Christian Hansen Algebraic Iterative Methods 32 / 51

Page 33: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Null Space Artifacts in Tomography I

Image has 16× 16 pixels and we use 16 horizontal and 16 vertical X-rays→ very under-determined system.

Left: three different vectors in the null space N (A).

Right: the exact (“ground truth”) image x̄ and the reconstruction.One pixel at the intersection of the vertical and horizontal “strips” has alarge and incorrect value, and the values of the horizontal and vertical“strips” are slightly too low.

Per Christian Hansen Algebraic Iterative Methods 33 / 51

Page 34: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Null Space Artifacts in Tomography II

The mage has 29× 29 pixels and we use projection angles in [50◦, 130◦]→ A is 841× 841 and rank deficient; the dimension of N (A) is 24.

Both reconstructions are imperfect, and the vertical structure of the secondtest image is almost completely lost in the reconstruction.

Per Christian Hansen Algebraic Iterative Methods 34 / 51

Page 35: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Null Space Artifacts in Tomography III

Same example – the 24 images that span the null space N (A) representinformation about missing vertical structures in the reconstruction:

This illustrates that intuition and mathematics go hand-in-hand.

Per Christian Hansen Algebraic Iterative Methods 35 / 51

Page 36: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Consistent and Inconsistent Systems

A system is consistent if there exists at least one x such that Ax = b, i.e.,such that b is a linear combination of the columns c i of A.This is equivalent to the requirement b ∈ R(A).

Otherwise the system is inconsistent, b /∈ R(A), as shown below.

�������

�������

-�����

����*

������

����:

�������������>

����

����

�����

R(A)

c1

c2

c3

b

This is actually the normal situation in problems with measurement noise.Per Christian Hansen Algebraic Iterative Methods 36 / 51

Page 37: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Overview of Systems: m ≤ n

Full rank Rank deficientm < nUnderdetermined

=

r = mAlways consistent.Always infinitelymany solutions.

r < mCan be inconsistent.No solution orinfinitely manysolutions.

m = nSquare

=

r = m = nAlways consistent.Always aunique solution.

r < m = nCan be inconsistent.No solution orinfinitely manysolutions.

The system is inconsistent when b /∈ R(A).There is a unique solution only if r = m = n.

Per Christian Hansen Algebraic Iterative Methods 37 / 51

Page 38: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Overview of Systems: m > n

Full rank Rank deficientm > nOverdetermined

=

r = nCan be inconsistent.No solution ora uniquesolution.

r < nCan be inconsistent.No solution orininitely manysolutions

The system is inconsistent when b /∈ R(A).

There is a unique solution only if r = n and the system is consistent.

Per Christian Hansen Algebraic Iterative Methods 38 / 51

Page 39: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Least Squares Solution

We must define a unique solution for inconsistent systems!

Assume that b = A x̄ + e and e is zero-mean Gaussian noise. The bestlinear unbiased estimate of x̄ is the solution to the least squares problem:

xLS = argminx

1/2 ‖b − Ax‖22,

and xLS is unique when r = n. Geometrically, this corresponds to findingxLS such that AxLS is orthogonal to the residual vector b − AxLS.

�������

�������

�������������>

����

����

�����1

6

R(A)

b

AxLS

Per Christian Hansen Algebraic Iterative Methods 39 / 51

Page 40: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Computing the Least Squares Solution

The requirement that AxLS ⊥ (b − AxLS) leads to:(AxLS

)T (b − AxLS)

= 0 ⇔ xTLS(ATb − ATAxLS

)= 0

which means that xLS is the solution to the normal equations:

ATAx = ATb ⇒ xLS = (ATA)−1ATb.

xLS exists and is unique when ATA is invertible, which is the case whenr = n (i.e., the system is over-determined and A has full rank).

Bonus info: the matrix A† = (ATA)−1AT is called the pseudoinverse(or Moore-Penrose inverse) of A.

Per Christian Hansen Algebraic Iterative Methods 40 / 51

Page 41: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Minimum-Norm Least Squares Solution

If r < n we can define a unique minimum-norm least squares solution by:

x0LS = argmin

x‖x‖2 subject to ATAx = ATb.

Example. Consider again the problem1 2 34 5 67 8 9

x =

142050

with r = 2 < n = 3,

xLS is not unique, and all least squares solutions have the form

xLS =

321

+ α

1−21

, α arbitrary.

The minimum-norm least squares solution x0LS is obtained by setting α = 0.

Per Christian Hansen Algebraic Iterative Methods 41 / 51

Page 42: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Weighted Least Squares Solutions and Cimmino

Recall our definition of the diagonal matrix M = diag(m‖r i‖22

).

We also define the weighted least squares problem

minx

1/2 ‖M−1/2(Ax − b)‖22 ⇔ (ATM−1A) x = ATM−1b

and the corresponding solution xLS,M = (ATM−1A)−1ATM−1b.

Similarly we define the minimum-norm weighted least squares solution

x0LS,M = argmin

x‖x‖2 subject to ATM−1Ax = ATM−1b.

The full picture of Cimmino’s method:

r = n = m: convergence to A−1b.r = n < m and b ∈ R(A): convergence to xLS.r = n < m and b /∈ R(A): convergence to xLS,M .r < min(m, n): convergence to x0

LS,M .

Per Christian Hansen Algebraic Iterative Methods 42 / 51

Page 43: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Optimization Viewpoint

Karczmarz, Cimmino and similar algebraic iterative methods are usuallyconsidered as solvers for systems of linear equations.

But it is more convenient to consider them as optimization methods.

Within this framework we can easily handle common extensions:We can introduce a relaxation parameter – or step length parameter –in the algorithm which controls the “size” of the updating and, as aconsequence, the convergence of the method:

a constant ω, ora parameter ωk that changes with the iterations.

We can also, in each updating step, incorporate a projection PC on asuitably chosen convex set C that reflects prior knowledge, such as

the positive orthant Rn+ → nonnegative solutions,

the n-dimensional box [0, 1]n → solution elements in [0,1].

We can introduce other norms than the 2-norm ‖ · ‖2, which canimprove the robustness of the method.

Per Christian Hansen Algebraic Iterative Methods 43 / 51

Page 44: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Example: Robust Solutions with the 1-norm

The 1-norm is well suited for handling “outliers” in the data:

minx‖Ax − b‖1, ‖Ax − b‖1 =

m∑i=1

|r i · x − bi |.

Consider two over-determined noisy problems with the same matrix:

A =

1 1 11 2 41 3 91 4 161 5 251 6 36

, A x̄ =

6

17345786

121

, b =

6.0001

17.028533.997157.006185.9965

120.9958

, bo =

6.0001

17.285033.997157.006185.9965

120.9958

.

Least squares solutions: xLS and xoLS; 1-norm solutions: x1 and xo

1:

xLS =

1.00412.00512.9989

, xoLS =

1.08112.01512.9943

, x1 =

0.99322.00872.9986

, xo1 =

0.99322.00882.9986

.

Per Christian Hansen Algebraic Iterative Methods 44 / 51

Page 45: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Least Squares Problem Revisited

The objective function and its gradient

F(x) = 1/2 ‖b − Ax‖22, ∇F(x) = −AT (b − Ax).

The method of steepest descent for minx F(x), with starting vector x (0),performs the updates

x (k+1) = x (k) − ωk ∇F(x) = x (k) + ωk AT (b − Ax),

where ωk is a step-length parameter that can depend on the iteration.

Also known as Landweber’s method – corresponds to M = I in Cimmino.

Cimmino’s method corresponds to the weighted problem:

FM(x) = 1/2 ‖M−1/2(b − Ax)‖22, ∇FM(x) = −ATM−1(b − Ax).

Per Christian Hansen Algebraic Iterative Methods 45 / 51

Page 46: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Incorporating Simple Constraints

We can include constraints on the elements of the reconstructed image.

Assume that we can write the constraint as x ∈ C, where C is a convex set;this includes two very common special cases:

Non-negativity constraints. The set C = Rn+ corresponds to

xi ≥ 0, i = 1, 2, . . . , n.

Box constraints. The set C = [0, 1]n (n-dimensional box) corresponds to

0 ≤ xi ≤ 1, i = 1, 2, . . . , n.

Per Christian Hansen Algebraic Iterative Methods 46 / 51

Page 47: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

The Projected Algorithms

Both algorithms below solve minx∈C ‖M−1/2(b − Ax)‖2.

Projected gradient algorithm (ωk < 2/‖ATMA‖2)

x (0) = initial vectorfor k = 0, 1, 2, . . .

x (k+1) = PC(x (k) + ωk ATM−1(b − Ax (k))

)end

Projected incremental gradient (Kaczmarz) algorithm (ωk < 2)

x (0) = initial vectorfor k = 0, 1, 2, . . .

i = k (mod m)

x (k+1) = PC

(x (k) + ωk

bi − r i · x‖r i‖22

r i)

end

Per Christian Hansen Algebraic Iterative Methods 47 / 51

Page 48: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Iteration-Dependent Relaxation Parameter ωk

The basic Kaczmarz algorithm gives a cyclic and non-convergent behavior.

Consider the example from slide 24 with:

ωk = 0.8 (independent of k) and ωk = 1/√k, k = 0, 1, 2, . . .

The rightmost plot is a “zoom” of the middle plot.

With a fixed ωk < 1 we still have a cyclic non-convergent behavior.With the diminishing relaxation parameter ωk = 1/

√k → 0 as k →∞

the iterates converge to the weighted least squares solution xLS,M .

Per Christian Hansen Algebraic Iterative Methods 48 / 51

Page 49: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Overview of Convergence (see also slides 37–38)

What the unconstr. methods converge to, with starting vector x (0) = 0:Kac: Kaczmarz’s method with a fixed relaxation parameter.K–d: Kaczmarz’s method with a diminishing parameter.Cim: Cimmino’s method with a fixed relaxation parameter.

r < min(m, n) r = min(m, n)

b ∈ R(A) b /∈ R(A) b ∈ R(A) b /∈ R(A)

m < n x0LS = x0

LS,M

m = n x0LS = x0

LS,M A−1bKac: cyclic Kac: cyclic

m > n x0LS = x0

LS,M K–d: x0LS,M xLS = xLS,M K–d: xLS,M

Cim: x0LS,M Cim: xLS,M

NB: for over-det. systems with b /∈ R(A), x0LS,M 6= x0

LS and xLS,M 6= xLS.

Per Christian Hansen Algebraic Iterative Methods 49 / 51

Page 50: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

Last Slide: Other Projected Gradient Algorithms

Many algorithm proposed in the literature (Landweber, CAV, DROP,SART, SIRT, . . . ) are special cases of the following general formulation.

General projected gradient algorithm (λk < 2)

x (0) = initial vectorfor k = 0, 1, 2, . . .

x (k+1) = PC(x (k) + ωk D−1ATM−1(b − Ax (k))

)end

Of particular interest is the method SIRT in which:

D = diag(‖c j‖1

), ‖c j‖1 =

∑mi=1 aij ,

M = diag(‖r i‖1

), ‖r i‖1 =

∑nj=1 aij

SIRT is implemented in the ASTRA software, to be discussed tomorrow.

Per Christian Hansen Algebraic Iterative Methods 50 / 51

Page 51: Algebraic Iterative Methods for Computed Tomographypcha/HDtomo/SC/Week2Day1.pdf · AlgebraicIterativeMethodsforComputedTomography PerChristianHansen DTUCompute DepartmentofAppliedMathematicsandComputerScience

A Few References

T. Elfving, T. Nikazad, and C. Popa, A class of iterative methods:semi-convergence, stopping ryles, inconsistency, and constraining; in Y.Censor, Ming Jiang, and Ge Wang (Eds.), “Biomedical Mathematics:Promising Directions in Imaging, Therapy Planning, and InverseProblems,” Medical Physics Publishing, Madison, Wisconsin, 2010.P. C. Hansen and M. Saxild-Hansen, AIR Tools – A MATLAB packageof algebraic iterative reconstruction methods, J. Comp. Appl. Math.,236 (2012), pp. 2167–2178.G. T. Herman, Fundamentals of Computerized Tomography – ImageReconstruction from Projections, Springer, New York, 2009.M. Jiang and G. Wang, Convergence studies on iterative algorithmsfor image reconstruction, IEEE Trans. Medical Imaging, 22 (2003), pp.569–579.A. C. Kak and M. Slaney, Principles of Computerized TomographicImaging, IEEE Press, 1988.

Per Christian Hansen Algebraic Iterative Methods 51 / 51