iterative projection methods for sparse nonlinear ... · introduction outline 1 introduction 2...

Iterative projection methods for sparse nonlineareigenvalue problems

Heinrich [email protected]

Hamburg University of TechnologyInstitute of Mathematics

TUHH Heinrich Voss Iterative projection methods Shanghai, May 27, 2012 1 / 61

Outline

1 Introduction

2 Numerical methods for dense problems

3 Iterative projection methods

4 Variational characterization of eigenvalues

5 Numerical example

Introduction

Outline

1 Introduction

5 Numerical example

Introduction

Nonlinear eigenvalue problem

The term nonlinear eigenvalue problem is not used in a unique way in theliterature

On the one hand: Parameter dependent nonlinear (with respect to the statevariable) operator equations

T (λ,u) = 0

discussing— positivity of solutions— multiplicity of solution— dependence of solutions on the parameter; bifurcation— (change of ) stability of solutions

Introduction

Nonlinear eigenvalue problem

The term nonlinear eigenvalue problem is not used in a unique way in theliterature

On the one hand: Parameter dependent nonlinear (with respect to the statevariable) operator equations

T (λ,u) = 0

discussing— positivity of solutions— multiplicity of solution— dependence of solutions on the parameter; bifurcation— (change of ) stability of solutions

Introduction

In this presentation

For D ⊂ C let T (λ) ∈ Cn×n, λ ∈ D be a family of matrices(more generally a family of closed linear operators on a Banach space).

Find λ ∈ D and x 6= 0 such that

T (λ)x = 0.

Then λ is called an eigenvalue of T (·), and x a corresponding eigenvector.

In this talk we assume the matrices to be large and sparse.

Introduction

T (λ)x = 0.

Introduction

T (λ)x = 0.

Introduction

Electronic structure of quantum dots

Semiconductor nanostructures have attracted tremendous interest in the pastfew years because of their special physical properties and their potential forapplications in micro– and optoelectronic devices.

In such nanostructures, the free carriers are confined to a small region ofspace by potential barriers, and if the size of this region is less than theelectron wavelength, the electronic states become quantized at discreteenergy levels.

The ultimate limit of low dimensional structures is the quantum dot, in whichthe carriers are confined in all three directions, thus reducing the degrees offreedom to zero.

Therefore, a quantum dot can be thought of as an artificial atom.

Introduction

ProblemDetermine relevant energy states (i.e. eigenvalues) and corresponding wavefunctions (i.e. eigenfunctions) of a three-dimensional quantum dot embeddedin a matrix.

Introduction

Problem ct.

Governing equation: Schrodinger equation

−∇ ·(

2m(x ,E)∇Φ

)+ V (x)Φ = EΦ, x ∈ Ωq ∪ Ωm

where ~ is the reduced Planck constant, m(x ,E) is the electron effectivemass, and V (x) is the confinement potential.

m and V are discontinous across the heterojunction.

Boundary and interface conditions

Φ = 0 on outer boundary of matrix Ωm

BenDaniel–Duke condition1

∣∣∣∣∂Ωm

∣∣∣∣∂Ωq

on interface

Introduction

Problem ct.

−∇ ·(

2m(x ,E)∇Φ

)+ V (x)Φ = EΦ, x ∈ Ωq ∪ Ωm

∣∣∣∣∂Ωm

∣∣∣∣∂Ωq

on interface

Introduction

Problem ct.

−∇ ·(

2m(x ,E)∇Φ

)+ V (x)Φ = EΦ, x ∈ Ωq ∪ Ωm

∣∣∣∣∂Ωm

∣∣∣∣∂Ωq

on interface

Introduction

Electron effective mass

The electron effective mass m = mj , j ∈ q,m depends an the energy levelE .

Most simulations in the literature consider constant electron effective masses(e.g. mq = 0.023m0 and mm = 0.067m0 for an InAs quantum dot and theGaAs matrix, respectively, where m0 is the free electron mass).

Numerical examples demonstrate that the electronic behavior is substantiallydifferent: For the pyramidal quantum dot in our examples there are only 3confined electron states for the linear model whereas the nonlinear modelexhibits 7 confined states (i.e. energy levels which are smaller than theconfinement potential).The ground state of the nonlinear model is about 7% and the first exited stateabout 18% smaller than for the linear approximation.

Introduction

Effective mass ct.

The dependence of m(x ,E) on E can be derived from the eight-band k · panalysis and effective mass theory. Projecting the 8× 8 Hamiltonian onto theconduction band results in the single Hamiltonian eigenvalue problem with

m(x ,E) =

mq(E), x ∈ Ωqmm(E), x ∈ Ωm

1mj (E)

E + gj − Vj+

1E + gj − Vj + δj

), j ∈ m,q

where mj is the electron effective mass, Vj the confinement potential, Pj themomentum, gj the main energy gap, and δj the spin-orbit splitting in the j thregion.

Other types of effective mass (taking into account the effect of strain, e.g.)appear in the literature. They are all rational functions of E where 1/m(x ,E) ismonotonically decreasing with respect to E , and that’s all we need.

Introduction

Effective mass ct.

The dependence of m(x ,E) on E can be derived from the eight-band k · panalysis and effective mass theory. Projecting the 8× 8 Hamiltonian onto theconduction band results in the single Hamiltonian eigenvalue problem with

m(x ,E) =

mq(E), x ∈ Ωqmm(E), x ∈ Ωm

1mj (E)

E + gj − Vj+

1E + gj − Vj + δj

), j ∈ m,q

where mj is the electron effective mass, Vj the confinement potential, Pj themomentum, gj the main energy gap, and δj the spin-orbit splitting in the j thregion.

Other types of effective mass (taking into account the effect of strain, e.g.)appear in the literature. They are all rational functions of E where 1/m(x ,E) ismonotonically decreasing with respect to E , and that’s all we need.

Introduction

Variational form

Find E ∈ R and Φ ∈ H10 (Ω), Φ 6= 0, Ω := Ωq ∪ Ωm, such that

a(Φ,Ψ; E) :=~2

∫Ωq

1mq(x ,E)

∇Φ · ∇Ψ dx +~2

∫Ωm

1mm(x ,E)

∇Φ · ∇Ψ dx

∫Ωq

Vq(x)ΦΨ dx +

∫Ωm

Vm(x)ΦΨ dx

= E∫Ω

ΦΨ dx =: Eb(Φ,Ψ) for every Ψ ∈ H10 (Ω)

Discretizing by FEM yields a rational matrix eigenvalue problem

Introduction

Variational form

Find E ∈ R and Φ ∈ H10 (Ω), Φ 6= 0, Ω := Ωq ∪ Ωm, such that

a(Φ,Ψ; E) :=~2

∫Ωq

1mq(x ,E)

∇Φ · ∇Ψ dx +~2

∫Ωm

1mm(x ,E)

∇Φ · ∇Ψ dx

∫Ωq

Vq(x)ΦΨ dx +

∫Ωm

Vm(x)ΦΨ dx

= E∫Ω

ΦΨ dx =: Eb(Φ,Ψ) for every Ψ ∈ H10 (Ω)

Discretizing by FEM yields a rational matrix eigenvalue problem

Introduction

Classes of nonlinear eigenproblems

Quadratic eigenvalue problems

Polynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problems Recent survey: Tisseur, Meerbergen 2001

T (λ) = λ2A + λB + C, A,B,C ∈ Cn×n

Dynamic analysis of structuresStability analysis of flows in fluid mechanicsSignal processingVibration of spinning structuresVibration of fluid-solid structures (Conca et al. 1992)Lateral buckling analysis (Prandtl 1899)Corner singularities of anisotropic material (Wendland et al. 1992)Constrained least squares problems (Golub 1973)Regularization of total least squares problems (Sima et al. 2004)

Polynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problems

Rational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problems

T (λ) =k∑

λjAj , Aj ∈ Cn×n

Optimal control problems (Mehrmann 1991)Corner singularities of anisotropic material (Kozlov et al. 2001)Dynamic element discretization of linear eigenproblems (V. 1987)Least squares element methods (Rothe 1989)System of coupled Euler-Bernoulli and Timoshenko beams(Balakrishnan et al. 2004)Nonlinear integrated optics (Botchev et al. 2008)Electronic states of quantum dots (Hwang et al. 2005)

Rational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problems

General nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problems

T (λ) = λ2M + K +NMAT∑m=1

11 + bmλ

Vibration of structures with viscoelastic dampingVibration of sandwich plates (Soni 1981)Dynamics of plates with concentrated masses (Andreev et al. 1988)Vibration of fluid-solid structures (Conca et al. 1989)Exact condensation of linear eigenvalue problems(Wittrick, Williams 1971)Electronic states of semiconductor heterostructures(Luttinger, Kohn 1954)

General nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblems

Exponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblems

T (λ) = K − λM + ip∑

√λ− σjWj

Exact dynamic element methodsNonlinear eigenproblems in accelerator design (Igarashi et al. 1995)Vibrations of poroelastic structures (Dazel et al. 2002)Vibro-acoustic behavior of piezoelectric/poroelastic structures(Batifol et al. 2007)Nonlinear integrated optics (Dirichlet-to-Neumann) (Botchev 2008)Stability of acoustic pressure levels in combustion chambers(van Leeuwen 2007)

Exponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

Introduction

Quadratic eigenvalue problemsPolynomial eigenvalue problemsRational eigenvalue problemsGeneral nonlinear eigenproblemsExponential dependence on eigenvalues

s(t) = A0s(t) +

p∑j=1

Ajs(t − τj ), s(t) = xeλt

⇒ T (λ) = −λI + A0 +

p∑j=1

e−τjλAj

Stability of time-delay systems

Numerical methods for dense problems

Outline

1 Introduction

5 Numerical example

Inverse iteration

Require: Initial pair (λ0, x0) close to wanted eigenpair1: for k = 0,1,2, . . . until convergence do2: Solve T (λk )uk+1 = T ′(λk )xk for uk+13: λk+1 = λk − (vHuk+1)/(vHxk )4: Normalize xk+1 = uk+1/vHuk+15: end for

Inverse iteration (being a variant of Newton’s method) converges locally andquadratically for isolated eigenpairs.

Replacing the update of λ by the Rayleigh functional inverse iterationbecomes cubically convergent for real symmetric problems (Rothe, V. 1989);in the general case cubic convergence was proved for a two-sided version ofRayleigh functional iteration (Schreiber 2008)

Inverse iteration

Residual inverse iteration (Neumaier 1985)

Require: Initial pair (λ0, x0) close to wanted eigenpair and shift σ1: for k = 0,1,2, . . . until convergence do2: Solve vHT (σ)−1T (λk+1)xk = 0 for λk+1

or xHk T (λk+1)xk = 0 if T (λ) is Hermitian and λk+1 is real

3: Solve T (σ)dk = T (λk+1)xk for dk4: Set uk+1 = xk − dk , and normalize xk+1 = uk+1/vHuk+15: end for

If T (λ) is twice continuously differentiable, λ is a simple zero of det T (λ), andif x is an eigenvector normalized by vH x = 1, then residual inverse iterationconvergence for all σ sufficiently close to λ and it holds that

‖xk+1 − x‖‖xk − x‖

= O(|σ − λ|) and |λk+1 − λ| = (‖xk − x‖q)

where q = 2 if T (λ) is Hermitian, λ is real, and λk+1 solves xHk T (λk+1)xk = 0

in Step 3, and q = 1 otherwise.The domains of attraction of inverse iteration and residual inverse iteration isoften very small, and both methods are very sensitive with respect to inexactsolves of linear systems.

‖xk+1 − x‖‖xk − x‖

= O(|σ − λ|) and |λk+1 − λ| = (‖xk − x‖q)

‖xk+1 − x‖‖xk − x‖

= O(|σ − λ|) and |λk+1 − λ| = (‖xk − x‖q)

Iterative projection methods

Outline

1 Introduction

5 Numerical example

Iterative projection methodsFor linear sparse eigenproblems

T (λ) = λB − A

very efficient methods are iterative projection methods (Lanczos, Arnoldi,Jacobi–Davidson method, e.g.), where approximations to the wantedeigenvalues and eigenvectors are obtained from projections of theeigenproblem to subspaces of small dimension which are expanded in thecourse of the algorithm.

Generalizations to nonlinear sparse eigenproblemsnonlinear rational Krylov: Ruhe(2000,2005), Jarlebring, V. (2005))Arnoldi method: quadratic probl.: Meerbergen (2001)general problems: V. (2003,2004); Liao, Bai, Lee, Ko (2006), Liao (2007)Jacobi-Davidson:polynomial problems: Sleijpen, Boten, Fokkema, van der Vorst (1996)Hwang, Lin, Wang, Wang (2004,2005)general problems:T. Betcke, V. (2004), V. (2004,2007), Schwetlick,Schreiber (2006), Schreiber (2008)

T (λ) = λB − A

Iterative projection method

Require: Initial basis V with V HV = I; set m = 11: while m ≤ number of wanted eigenvalues do2: compute eigenpair (µ, y) of projected problem V T T (λ)Vy = 0.3: determine Ritz vector u = Vy , ‖u‖ = 1,and residual r = T (µ)u4: if ‖r‖ < ε then5: accept approximate eigenpair λm = µ, xm = u; increase m← m + 16: reduce search space V if necessary7: choose approximation (λm,u) to next eigenpair, and compute

r = T (λm)u8: end if9: expand search space V = [V , vnew]

10: update projected problem11: end while

Main tasksexpand search spacechoose eigenpair of projected problem (locking, purging)

Iterative projection method

Require: Initial basis V with V HV = I; set m = 11: while m ≤ number of wanted eigenvalues do2: compute eigenpair (µ, y) of projected problem V T T (λ)Vy = 0.3: determine Ritz vector u = Vy , ‖u‖ = 1,and residual r = T (µ)u4: if ‖r‖ < ε then5: accept approximate eigenpair λm = µ, xm = u; increase m← m + 16: reduce search space V if necessary7: choose approximation (λm,u) to next eigenpair, and compute

r = T (λm)u8: end if9: expand search space V = [V , vnew]

10: update projected problem11: end while

Main tasksexpand search spacechoose eigenpair of projected problem (locking, purging)

Expanding the subspace

Given subspace V ⊂ Cn . Expand V by a direction with high approximationpotential for the next wanted eigenvector.

Let θ be an eigenvalue of the projected problem

V HT (λ)Vy = 0

and x = Vy corresponding Ritz vector, then inverse iteration yieldssuitable candidate

v := T (θ)−1T ′(θ)x

BUT: In each step have to solve large linear system with varying matrixand in a truly large problem the vector v will not be accessible but only aninexact solution v := v + e of T (θ)v = T ′(θ)x , and the next iterate will be asolution of the projection of T (λ)x = 0 upon the expanded spaceV := spanV, v.

V HT (λ)Vy = 0

v := T (θ)−1T ′(θ)x

V HT (λ)Vy = 0

v := T (θ)−1T ′(θ)x

V HT (λ)Vy = 0

v := T (θ)−1T ′(θ)x

Expansion of search space ct.

We assume that x is already a good approximation to an eigenvector of T (·).Then v will be an even better approximation, and therefore the eigenvector weare looking for will be very close to the plane E := spanx , v.

We therefore neglect the influence of the orthogonal complement of x in V onthe next iterate and discuss the nearness of the planes E andE := spanx , v.

If the angle between these two planes is small, then the projection of T (λ)upon V should be similar to the one upon spanV, v, and the approximationproperties of inverse iteration should be maintained.

If this angle can become large, then it is not surprising that the convergenceproperties of inverse iteration are not reflected by the projection method.

Theorem

Let φ0 = arccos(xT v) denote the angle between x and v , and the relativeerror of v by ε := ‖e‖.

Then the maximal possible acute angle between the planes E and E is

β(ε) =

arccos

√1− ε2/ sin2 φ0 if ε ≤ | sinφ0|π2 if ε ≥ | sinφ0|

Theorem

Let φ0 = arccos(xT v) denote the angle between x and v , and the relativeerror of v by ε := ‖e‖.

Then the maximal possible acute angle between the planes E and E is

β(ε) =

arccos

√1− ε2/ sin2 φ0 if ε ≤ | sinφ0|π2 if ε ≥ | sinφ0|

“Proof”

Expansion by inexact inverse iteration

Obviously for every α ∈ R, α 6= 0 the plane E is also spanned by x and x +αv .

If E(α) is the plane which is spanned by x and a perturbed realizationx + αv + e of x + αv then by the same arguments as in the proof of theTheorem the maximum angle between E and E(α) is

γ(α, ε) =

arccos

√1− ε2/ sin2 φ(α) if ε ≤ | sinφ(α)|

π2 if ε ≥ | sinφ(α)|

where φ(α) denotes the angle between x and x + αv .

Since the mapping

φ 7→ arccos√

1− ε2/ sin2 φ

decreases monotonically the expansion of the search space by an inexactrealization of t := x + αv is most robust with respect to small perturbations, ifα is chosen such that x and x + αv are orthogonal

γ(α, ε) =

arccos

Since the mapping

φ 7→ arccos√

1− ε2/ sin2 φ

γ(α, ε) =

arccos

Since the mapping

φ 7→ arccos√

1− ε2/ sin2 φ

Expansion by inexact inverse iteration ct.

t := x + αv is orthogonal to x iff

v = x − xHxxHT (θ)−1T ′(θ)x

T (θ)−1T (θ)x .. (∗)

which yields a maximum acute angle between E and E(α)

γ(α, ε) =

arccos

√1− ε2 if ε ≤ 1

π2 if ε ≥ 1

t := x + αv is orthogonal to x iff

T (θ)−1T (θ)x .. (∗)

which yields a maximum acute angle between E and E(α)

γ(α, ε) =

arccos

√1− ε2 if ε ≤ 1

π2 if ε ≥ 1

Jacobi–Davidson method

The expansion

T (θ)−1T (θ)x .. (∗)

of the current search space V is the solution of the equation

(I − T ′(θ)xxH

xHT ′(θ)x)T (θ)(I − xxH)t = −r , t⊥x

This is the so called correction equation of the Jacobi–Davidson method whichwas derived in T. Betcke & Voss (2004) generalizing the approach of Sleijpenand van der Vorst (1996) for linear and polynomial eigenvalue problems.

Hence, the Jacobi–Davidson method is the most robust realization of anexpansion of a search space such that the direction of inverse iteration iscontained in the expanded space in the sense that it is least sensitive toinexact solves of linear systems T (θ)v = T ′(θ)x .

The expansion

T (θ)−1T (θ)x .. (∗)

(I − T ′(θ)xxH

The expansion

T (θ)−1T (θ)x .. (∗)

(I − T ′(θ)xxH

Nonlinear Jacobi-Davidson

1: Start with orthonormal basis V ; set m = 12: determine preconditioner M ≈ T (σ)−1; σ close to first wanted eigenvalue3: while m ≤ number of wanted eigenvalues do4: compute eigenpair (µ, y) of projected problem V T T (λ)Vy = 0.5: determine Ritz vector u = Vy , ‖u‖ = 1,and residual r = T (µ)u6: if ‖r‖ < ε then7: accept approximate eigenpair λm = µ, xm = u; increase m← m + 18: reduce search space V if necessary9: choose new preconditioner M ≈ T (µ) if indicated

10: choose approximation (λm, u) to next eigenpair, and compute r = T (λm)u11: end if12: solve approximately correction equation(

I − T ′(µ)uuH

uHT ′(µ)u

)T (µ)

(I − uuH

)t = −r , t ⊥ u

13: t = t − VV T t ,v = t/‖t‖, reorthogonalize if necessary14: expand search space V = [V , v ]15: update projected problem16: end while

Comments

I − T ′(µ)uuH

uHT ′(µ)u

)T (µ)

(I − uuH

)t = −r , t ⊥ u

Comments 1:

Here preinformation on eigenvectors can be introduced into the algorithm(approximate eigenvectors of contigous problems in reanalysis, e.g.)

If no information on eigenvectors is at hand, and if we are interested ineigenvalues close to the parameter σ ∈ D:

choose initial vector at random, and execute a few Arnoldi steps for the lineareigenproblem T (σ)u = θu or T (σ)u = θT ′(σ)u, and choose the eigenvectorcorresponding to the smallest eigenvalue in modulus or a small number ofSchur vectors as initial basis of the search space.

Starting with a random vector without this preprocessing usually will yield avalue µ in step 3 which is far away from σ and will avert convergence.

Comments 1:

Comments

I − T ′(µ)uuH

uHT ′(µ)u

)T (µ)

(I − uuH

)t = −r , t ⊥ u

Comments 4:

Since the dimension of the projected problem is small it can be solved by anydense solver like

linearization if T (λ) is polynomialmethods based on characteristic function det T (λ) = 0inverse iterationresidual inverse iterationsuccessive linear problems

Notice that symmetry properties that the original problem may have areinherited by the projected problem, and one can take advantage of theseproperties when solving the projected problem (safeguarded iteration,structure preserving methods).

Comments 4:

Since the dimension of the projected problem is small it can be solved by anydense solver like

linearization if T (λ) is polynomialmethods based on characteristic function det T (λ) = 0inverse iterationresidual inverse iterationsuccessive linear problems

Notice that symmetry properties that the original problem may have areinherited by the projected problem, and one can take advantage of theseproperties when solving the projected problem (safeguarded iteration,structure preserving methods).

Comments

I − T ′(µ)uuH

uHT ′(µ)u

)T (µ)

(I − uuH

)t = −r , t ⊥ u

Comments 8:

As the subspaces expand in the course of the algorithm the increasingstorage or the computational cost for solving the projected eigenvalueproblems may make it necessary to restart the algorithm and purge some ofthe basis vectors.

Since a restart destroys information on the eigenvectors and particularly onthe one the method is just aiming at, we restart only if an eigenvector has justconverged.

Resonable search spaces after restart are— the space spanned by the already converged eigenvectors (or a space

slightly larger)— an invariant space of T (σ) or eigenspace of T (σ)y = µT ′(σ)y

corresponding to small eigenvalues in modulus.

Comments 8:

Restarts

0 20 40 60 80 100 1200

iteration

CPU time

nonlinear eigensolver

Restarts

0 20 40 60 80 100 1200

iteration

CPU time

LU update

nonlin. solver

restart

Comments

I − T ′(µ)uuH

uHT ′(µ)u

)T (µ)

(I − uuH

)t = −r , t ⊥ u

Comments 12

The correction equation is solved approximately by a few steps of an iterativesolver (GMRES or BiCGStab).

The operator T (σ) is restricted to map the subspace x⊥ into itself. Hence, ifM ≈ T (σ) is a preconditioner of T (σ), σ ≈ µ, then a preconditioner for aniterativ solver of the correction equation should be modified correspondingly to

M := (I − T ′(µ)xxH

xHT (µ)x)M(I − xxH

Taking into account the projectors in the preconditioner, i.e. using M instead ofM, raises the cost of the preconditioned Krylov solver only slightly (cf.Sleijpen, van der Vorst). Only one additional linear solve with system matrix Mis required.

Comments 12

M := (I − T ′(µ)xxH

Comments 12

M := (I − T ′(µ)xxH

Non-iterative approximate solution

Hwang, Lin, Wang, Wang (2004) considered a non-iterative solution (whichwas already contained in the paper of Sleijpen et al. (1996), but was notconsidered in subsequent papers):Solve (

I − T ′(θ)uuH

uHT ′(θ)u

)T (θ)

(I − uuH

)t = −r , t ⊥ u

approximately

by computing

t = M−1r + αM−1T ′(θ)u with α :=uHM−1r

uHM−1T ′(θ)u

where M is a preconditioner of T (θ).

Method combines preconditioned Arnoldi method (M−1r ) and simplifiedinverse iteration (M−1T ′(θ)u)

I − T ′(θ)uuH

uHT ′(θ)u

)T (θ)

(I − uuH

)t = −r , t ⊥ u

approximately

by computing

uHM−1T ′(θ)u

I − T ′(θ)uuH

uHT ′(θ)u

)T (θ)

(I − uuH

)t = −r , t ⊥ u

approximately

by computing

uHM−1T ′(θ)u

Alternative expansion: Residual inverse It.

1: start with an approximation x1 to an eigenvector of T (λ)x = 02: for ` = 1,2, . . . until convergence do3: solve xH

` T (µ`+1)x` = 0 for µ`+14: compute the residual r` = T (µ`+1)x`5: solve T (σ)d` = r`6: set x`+1 = x` − d`, x`+1 = x`+1/‖x`+1‖7: end for

Arnoldi methodThis suggests: If θ is an eigenvalue of the projected problem V HT (λ)Vy = 0and x = Vy is a corresponding Ritz vector, then expand V by new direction

vRI = x − T (σ)−1T (θ)x

Since in iterative projection methods the new search direction isorthogonalized against the basis of the current search space and since x isalready contained in V, the expansion is chosen to be

vA := T (σ)−1T (θ)x .

The choice of vA is more robust than vRI since

|xHvRI |/‖vRI‖ → 1 and xHvA/‖vA‖ → 0

For the linear problem T (λ) = λI − A this is exactly the Cayley transformationor shifted-and-inverted Arnoldi method. Therefore the resulting iterativeprojection method is called nonlinear Arnoldi method although no Krylovspace is constructed and no Arnoldi recursion holds

vRI = x − T (σ)−1T (θ)x

vA := T (σ)−1T (θ)x .

vRI = x − T (σ)−1T (θ)x

vA := T (σ)−1T (θ)x .

vRI = x − T (σ)−1T (θ)x

vA := T (σ)−1T (θ)x .

Nonlinear Arnoldi Method

1: start with initial basis V , V HV = I; set k = m = 12: determine preconditioner M ≈ T (σ)−1, σ close to first wanted eigenvalue3: while m ≤ number of wanted eigenvalues do4: solve V HT (µ)Vy = 0 for (µ, y) and set u = Vy , rk = T (µ)u5: if ‖rk‖/‖u‖ < ε then6: Accept eigenpair λm = µ, xm = u,7: if m == number of wanted eigenvalues then STOP end if8: m = m + 19: if ‖rk−1‖/‖rk‖ > tol then

10: choose new pole σ, determine preconditioner M ≈ T (σ)−1

11: end if12: restart if necessary13: Choose approximations µ and u to next eigenvalue and eigenvector14: determine r = T (µ)u and set k = 015: end if16: v = Mr , k = k + 117: v = v − VV Hv ,v = v/‖v‖, V = [V , v ] and reorthogonalize if necessary18: end while

Locking of converged eigenvectors

A major problem with iterative projection methods for nonlinear eigenproblemswhen approximating more than one eigenvalue is to inhibit the method fromconverging to an eigenpair which was detected already previously.

linear problems: (incomplete) Schur decompositionquadratic problems: Meerbergen (2001) based on linearization andSchur form of linearized problem (lock 2 vectors in each step)cubic problem: Hwang, Lin, Liu, Wang (2005) based on linearization andknowledge of all eigenvalues of the linearized problemApproach of Hwang et al. can be generalized directly to general problemsif all eigenvalues of projected problems are determinedfor Hermitian problems one can often take advantage of a variationalcharacterization of eigenvaluesA new method for locking known eigenpairs based on invariant pairs wasintroduced by Kressner (2009) for Newton’s method. However, it is not yetclear how to combine it with iterative projection method.

Variational characterization of eigenvalues

Outline

1 Introduction

5 Numerical example

Nonlinear minmax theory

Let T (λ) ∈ Cn×n, T (λ) = T (λ)H , λ ∈ J ⊂ R an open interval, and assume thatthe entries of T depend continuously on λ.

Assume that for fixed x ∈ Cn, x 6= 0 the real equation

f (λ, x) := xHT (λ)x = 0

has at most one solution λ =: p(x) in J.

Then equation f (λ, x) = 0 implicitly defines a functional p on some subset Dof Cn which we call the Rayleigh functional.

Let(λ− p(x))f (λ, x) > 0 for every λ 6= p(x) and every x ∈ D.

f (λ, x) := xHT (λ)x = 0

Overdamped problems

If the Rayleigh functional p is defined on the entire space H \ 0 then theeigenproblem T (λ)x = 0 is called overdamped.

This notation is motivated by the finite dimensional quadratic eigenvalueproblem

T (λ)x = λ2Mx + λαCx + Kx = 0

where M, C and K are symmetric and positive definite matrices.

α = 0 all eigenvalues on imaginary axis

increase α eigenvalues go into left half plane as conjugate complex pairs

increase α complex pairs reach real axis, run in opposite directions

increase α all eigenvalues on the negative real axis

increase α all eigenvalues going to the left are smallerthan all eigenvalues going to the rightsystem is overdamped

Overdamped problems

Quadratic overdamped problems

For quadratic overdamped systems the two solutions

p±(x) =12

(−α〈Cx , x〉 ±

√α2〈Cx , x〉2 − 4〈Mx , x〉〈Kx , x〉

)/〈Mx , x〉.

of the quadratic equation

〈T (λ)x , x〉 = λ2〈Mx , x〉+ λα〈Cx , x〉+ 〈Kx , x〉 = 0 (1)

are real, and they satisfy supx 6=0 p−(x) < infx 6=<0 p+(x).

Hence, equation (1) defines two Rayleigh functionals p− and p+

corresponding to the intervals

J− := (−∞, infx 6=0

p+(x)) and J+ := (supx 6=0

p−(x),∞).

Quadratic overdamped problems

For quadratic overdamped systems the two solutions

p±(x) =12

(−α〈Cx , x〉 ±

√α2〈Cx , x〉2 − 4〈Mx , x〉〈Kx , x〉

)/〈Mx , x〉.

of the quadratic equation

〈T (λ)x , x〉 = λ2〈Mx , x〉+ λα〈Cx , x〉+ 〈Kx , x〉 = 0 (1)

are real, and they satisfy supx 6=0 p−(x) < infx 6=<0 p+(x).

Hence, equation (1) defines two Rayleigh functionals p− and p+

corresponding to the intervals

J− := (−∞, infx 6=0

p+(x)) and J+ := (supx 6=0

p−(x),∞).

Minmax principle

Poincare’s maxmin characterization was first generalized by Duffin (1955) tooverdamped quadratic eigenproblems of finite dimension, and for moregeneral overdamped problems of finite dimension it was proved by Rogers(1964).

Infinite dimensional eigenvalue problems were studied by Turner (1967),Langer (1968), and Weinberger (1969) who proved generalizations of both,the maxmin characterization of Poincare and of the minmax characterizationof Courant, Fischer and Weyl for quadratic (and by Turner (1968) forpolynomial) overdamped problems.

The corresponding generalizations for general overdamped problems ofinfinite dimension were derived by Hadeler (1968). Similar results (weakeningthe compactness or smoothness requirements) are contained in Rogers(1968), Werner (1971), Abramov (1973), Hadeler (1975), Markus (1985),Maksudov & Gasanov (1992), and Hasanov (2002).

Minmax principle

Nonoverdamped problems

For nonoverdamped eigenproblems the natural ordering to call the smallesteigenvalue the first one, the second smallest the second one, etc., is notappropriate.

This is obvious if we make a linear eigenvalue

T (λ)x := (λI − A)x = 0

nonlinear by restricting it to an interval J which does not contain the smallesteigenvalue of A.

Then all conditions are satisfied, p is the restriction of the Rayleigh quotientRA to

D := x 6= 0 : RA(x) ∈ J,

and infx∈D p(x) will in general not be an eigenvalue.

T (λ)x := (λI − A)x = 0

D := x 6= 0 : RA(x) ∈ J,

T (λ)x := (λI − A)x = 0

D := x 6= 0 : RA(x) ∈ J,

Enumeration of eigenvalues

If λ ∈ J is an eigenvalue of T (·) then µ = 0 is an eigenvalue of the linearproblem T (λ)y = µy , and therefore there exists ` ∈ N such that

0 = maxV∈H`

minv∈V\0

vHT (λ)v‖v‖2

where H` denotes the set of all `–dimensional subspaces of Cn.

In this case λ is called an `-th eigenvalue of T (·).

Enumeration of eigenvalues

If λ ∈ J is an eigenvalue of T (·) then µ = 0 is an eigenvalue of the linearproblem T (λ)y = µy , and therefore there exists ` ∈ N such that

0 = maxV∈H`

minv∈V\0

vHT (λ)v‖v‖2

where H` denotes the set of all `–dimensional subspaces of Cn.

In this case λ is called an `-th eigenvalue of T (·).

Minmax characterization (V., Werner 1982, V. 2009)Under the conditions given above it holds:

(i) For every ` ∈ N there is at most one `-th eigenvalue of T (·) which can becharacterized by

λ` = minV∈H`, V∩D 6=∅

supv∈V∩D

p(v). (∗)

The set of eigenvalues of T (·) in J is at most countable.

(ii) Ifλ` := inf

V∈H`, V∩D 6=∅sup

v∈V∩Dp(v) ∈ J

for some ` ∈ N then λ` is the `-th eigenvalue of T (·) in J, and (*) holds.

(iii) λ is an `-th eigenvalue if and only if µ = 0 is the ` largest eigenvalue ofthe linear eigenproblem T (λ)x = µx .

(iv) The minimum in (*) is attained for the invariant subspace of T (λ`)corresponding to its ` largest eigenvalues.

λ` = minV∈H`, V∩D 6=∅

supv∈V∩D

p(v). (∗)

(ii) Ifλ` := inf

v∈V∩Dp(v) ∈ J

λ` = minV∈H`, V∩D 6=∅

supv∈V∩D

p(v). (∗)

(ii) Ifλ` := inf

v∈V∩Dp(v) ∈ J

λ` = minV∈H`, V∩D 6=∅

supv∈V∩D

p(v). (∗)

(ii) Ifλ` := inf

v∈V∩Dp(v) ∈ J

Safeguarded iteration

The minimum inλ` = min

V∈H`V∩D 6=∅

supv∈V∩D

is attained by the invariant subspace of T (λ`) corresponding to the ` largesteigenvalues, and the maximum by every eigenvector corresponding to theeigenvalue 0. This suggests

Safeguarded iteration1: Start with an approximation µ1 to the `-th eigenvalue of T (λ)x = 02: for k = 1,2, . . . until convergence do3: determine eigenvector u corresponding to the `-largest eigenvalue of

T (µk )4: evaluate µk+1 = p(u)5: end for

Safeguarded iteration

The minimum inλ` = min

V∈H`V∩D 6=∅

supv∈V∩D

is attained by the invariant subspace of T (λ`) corresponding to the ` largesteigenvalues, and the maximum by every eigenvector corresponding to theeigenvalue 0. This suggests

Safeguarded iteration1: Start with an approximation µ1 to the `-th eigenvalue of T (λ)x = 02: for k = 1,2, . . . until convergence do3: determine eigenvector u corresponding to the `-largest eigenvalue of

T (µk )4: evaluate µk+1 = p(u)5: end for

Convergence of safeguarded iteration

For ` = 1 the safeguarded iteration converges globally to λ1

If λ` is a simple eigenvalue then the (local) convergence is quadraticIf T ′(λ) is positive definite and xk in Step 3 is replaced by an eigenvectorof

T (σk )x = µT ′(σk )x

corresponding to the `-th largest eigenvalue, then the convergence iseven cubic.A variant exists which is globally convergent also for higher eigenvalues.

Numerical example

Outline

1 Introduction

5 Numerical example

Numerical example

pyramidal quantum dot: baselength: 12.4 nm, height: 6.2 nmcubic matrix: 24.8× 24.8× 18.6 nm3

Parameters (Hwang, Lin, Wang, Wang 2004)P1 = 0.8503, g1 = 0.42, δ1 = 0.48, V1 = 0.0P2 = 0.8878, g2 = 1.52, δ2 = 0.34, V1 = 0.7

Discretization by FEM or FVM yields rational eigenproblem

T (λ)x = λMx − 1m1(λ)

A1x − 1m2(λ)

A2x − Bx = 0

where T (λ) is symmetric and satisfies conditions of minmax characterizationfor λ ≥ 0

All timings for MATLAB 7.0.4 on AMD Opteron Processor 248× 860 64 with2.2 GHz and 4 GB RAM

Numerical example

T (λ)x = λMx − 1m1(λ)

A1x − 1m2(λ)

A2x − Bx = 0

Numerical example

T (λ)x = λMx − 1m1(λ)

A1x − 1m2(λ)

A2x − Bx = 0

Numerical example

T (λ)x = λMx − 1m1(λ)

A1x − 1m2(λ)

A2x − Bx = 0

Numerical example

Numerical example ct.

FVM: Hwang, Lin, Wang, Wang (2004)

dim λ1 λ2/3 λ4 λ5 CPU time2′475 0.41195 0.58350 0.67945 0.70478 0.68 s

22′103 0.40166 0.57668 0.68418 0.69922 8.06 s186′543 0.39878 0.57477 0.68516 0.69767 150.92 s

1′532′255 0.39804 0.57427 0.68539 0.69727 4017.67 s12′419′775 0.39785 0.57415 overnight

FEM: Cubic Lagrangian elements on tetrahedal grid

dimension: 96’640 ((DofQD,Dofmat,Dofinterf) = (43′615,43′897,9′128)

dim λ1 λ2 λ3 λ4 λ5 CPU time96′640 0.39779 0.57411 0.57411 0.68547 0.69714Arnoldi 44 it. 29 it. 29 it. 24 it. 21 it. 188.8 s

JD 15 it. 9 it. 1 it. 7 it. 7 it. 204.4 sHLWW 45 it. 49 it. 5 it. 24 it. 21 it. 226.7 s

Numerical example

Numerical example ct.

FVM: Hwang, Lin, Wang, Wang (2004)

dim λ1 λ2/3 λ4 λ5 CPU time2′475 0.41195 0.58350 0.67945 0.70478 0.68 s

22′103 0.40166 0.57668 0.68418 0.69922 8.06 s186′543 0.39878 0.57477 0.68516 0.69767 150.92 s

1′532′255 0.39804 0.57427 0.68539 0.69727 4017.67 s12′419′775 0.39785 0.57415 overnight

FEM: Cubic Lagrangian elements on tetrahedal grid

dimension: 96’640 ((DofQD,Dofmat,Dofinterf) = (43′615,43′897,9′128)

dim λ1 λ2 λ3 λ4 λ5 CPU time96′640 0.39779 0.57411 0.57411 0.68547 0.69714Arnoldi 44 it. 29 it. 29 it. 24 it. 21 it. 188.8 s

JD 15 it. 9 it. 1 it. 7 it. 7 it. 204.4 sHLWW 45 it. 49 it. 5 it. 24 it. 21 it. 226.7 s

Numerical example

Convergence history

Numerical example

Preconditioning

incomplete LU with cut-off threshold τ

τ JD Arnoldi HLWW precond.0.1 261.4 1084.1 1212.4 3.4

0.01 132.7 117.1 155.7 71.70.001 118.9 61.2 96.0 246.6

0.0001 155.6 46.6 71.1 665.6

Sparse approximate inverse

τ JD Arnoldi HLWW precond.0.4 968.6 3105.7 4073.4 314.00.3 819.9 2027.6 2744.7 641.20.2 718.1 1517.5 2157.2 1557.00.1 694.9 1461.5 2124.4 1560.9

Numerical example

Preconditioning

incomplete LU with cut-off threshold τ

τ JD Arnoldi HLWW precond.0.1 261.4 1084.1 1212.4 3.4

0.01 132.7 117.1 155.7 71.70.001 118.9 61.2 96.0 246.6

0.0001 155.6 46.6 71.1 665.6

Sparse approximate inverse

τ JD Arnoldi HLWW precond.0.4 968.6 3105.7 4073.4 314.00.3 819.9 2027.6 2744.7 641.20.2 718.1 1517.5 2157.2 1557.00.1 694.9 1461.5 2124.4 1560.9

Numerical example

Conclusions

Iterative projection methods of Jacobi-Davidson and Arnoldi type areefficient methods for solving sparse nonlinear eigenvalue problems.If accurate preconditioners are available the nonlinear Arnoldi methodseems to be superior to Jacobi-Davidson methodA cruical task when computing more than one eigenpair is to inhibit themethod from converging to an eigenpair which was detected alreadypreviously.If T (λ) allows for a minmax characterization of its eigenvalues they canbe determined safely one after the other by solving the projectedproblems with safeguarded iteration.

Numerical example

Conclusions

Numerical example

Conclusions

Numerical example

Conclusions

Numerical example

References

H. Voss. An Arnoldi method for nonlinear symmetric eigenvalue problems. OnlineProc. SIAM Conf. Applied Linear Algebra, Williamsburg, 2003.T. Betcke, H. Voss. A Jacobi–Davidson–type projection method for nonlineareigenvalue problems. Future Generation Comput. Syst. 20, 363 – 372 (2004)H. Voss. An Arnoldi method for nonlinear eigenvalue problems. BIT NumericalMathematics 44, 387 – 401 (2004)H. Voss. A new justification of the Jacobi–Davidson method for largeeigenproblems. Linear Algebra Appl. 424, 448 – 455 (2007)H. Voss. A Jacobi–Davidson method for nonlinear and nonsymmetriceigenproblems. Computers & Structures 85, 1284 – 1292 (2007)M.M. Betcke, H. Voss. Restarting projection methods for rational eigenproblemsarising in fluid-solid vibrations. Math. Modelling Anal. 13, 171 – 182 (2008)H. Voss. Iterative projection methods for large-scale nonlinear eigenvalueproblems. pp. 187 – 214 in B.H.V. Topping et al. (eds.), ComputationalTechnology Reviews, vol. 1, 2010M.M. Betcke, H. Voss. Analysis and efficient solution of stationary Schrodingerequation governing electronic states of quantum dots and rings in magnetic fieldCommun.Comput.Phys. 12, 1591 – 1617 (2012)

Thanks for your attention!TUHH Heinrich Voss Iterative projection methods Shanghai, May 27, 2012 61 / 61

iterative projection methods for sparse nonlinear ... · introduction outline 1 introduction 2...

Documents