hierarchical matrix approximation with blockwise constraints

BIT Numer Math (2013) 53:311–339DOI 10.1007/s10543-012-0413-1

Hierarchical matrix approximation with blockwiseconstraints

M. Bebendorf · M. Bollhöfer · M. Bratsch

Received: 30 March 2012 / Accepted: 2 December 2012 / Published online: 22 December 2012© Springer Science+Business Media Dordrecht 2012

Abstract A new technique is presented to preserve constraints to a hierarchical ma-trix approximation. This technique is used to preserve spaces from which the eigen-vectors corresponding to small eigenvalues can be approximated, which guaranteesspectral equivalence for approximate preconditioners.

Keywords Approximate LU decomposition · Preconditioning · Hierarchicalmatrices

Mathematics Subject Classification 35C20 · 65F05 · 65F50 · 65N30

1 Introduction

Hierarchical (H-) matrices (see [14, 16]) provide a setting in which fully populatedlarge-scale matrices can be treated with logarithmic-linear complexity. The data-sparsity is achieved by restricting each block of a suitable hierarchical partition toa matrix of bounded rank k. Obviously, not all matrices can be approximated by

Communicated by Michiel Hochstenbach.

This work was supported by DFG collaborative research center SFB 611.

M. Bebendorf (�) · M. BratschInstitute for Numerical Simulation, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn,Germanye-mail: [email protected]

M. Bratsche-mail: [email protected]

M. BollhöferInstitute for Computational Mathematics, Technische Universität Braunschweig, Braunschweig,Germanye-mail: [email protected]

mailto:[email protected]



312 M. Bebendorf et al.

H-matrices. However, it can be proved (see [2]) that at least those matrices whicharise from finite element discretization of elliptic boundary value problems fall intothis class; see [5] for similar results for Maxwell’s equations. While other fast meth-ods such as multigrid methods [13] for the most part have to be customized to achieveoptimal performance, H-matrix approximations are able to handle, for instance, co-efficients of a partial differential operator independently of their smoothness. H-matrices provide replacements of the usual matrix operations such as addition, mul-tiplication, and inversion. However, these operations are approximate. The reason forthis is that the sum of two rank-k matrices usually exceeds rank k. In order to notincrease the complexity, the sum has to be truncated to rank k. Therefore, H-matrixapproximations A of a given matrix A ∈ C

I×J , where I and J are two index sets,can be computed with any precision

‖A − A‖ ≤ ε‖A‖, ε > 0, (1.1)

for the price of increasing the local rank k, which depends logarithmically on 1/ε inthe case of second order elliptic boundary value problems.

A major application of H-matrices is preconditioning. Since (1.1) guaranteesspectral equivalence only for the upper part of the spectrum, whereas the relativeaccuracy of the smallest eigenvalues is determined by the condition number of A,the prescribed accuracy ε has to decrease for an increasing number of degrees offreedom. Although ε enters the complexity of the H-matrix approximation only log-arithmically, this effect increases the asymptotic complexity and, from a practicalpoint of view, adds a parameter which is difficult to choose in general. To avoid it, inthis article we propose to preserve approximations of the eigenvectors correspondingto small eigenvalues, i.e., in addition to (1.1), the constraints

AX = AX (1.2)

are to be satisfied for a given matrix X ∈ CJ×r so that spectral equivalence can

be guaranteed. Note that this can be done without actually computing approximateeigenvectors as long as the linear span of the columns of X is close to eigenvectorscorresponding to small eigenvalues. Due to the smoothness of the latter eigenvectorsin the case of second order elliptic partial differential operators, it is, for instance,sufficient to preserve piecewise constants.

A technique will be presented which preserves constraints on each block of ahierarchical matrix. Due to the hierarchical structure, the local preservation of fewvectors leads to a global preservation (1.2) of significantly more vectors (what willbe referred to as a cascadic basis). Since the vectors to be preserved do not dependon the operator, the new technique does not require more specific knowledge of theproblem. The black-box character is even extended, because the accuracy ε can bechosen independently of the condition number.

Although we will mainly focus on the application of preconditioning, the tech-niques presented can also be used in other cases, e.g. in mechanics, where rigid bodymotions shall remain in the null space, and Stokes equations, where a vanishing (dis-crete) divergence is desired. Furthermore, the principle of blockwise preservationof constraints is not restricted to H-matrices. E.g., the incomplete LU factorization

Hierarchical matrix approximation with blockwise constraints 313

(ILU) can benefit from the ideas of this article. On one side these benefits are ob-vious from the general approach of preserving constraints on smaller blocks locallyand thus preserving a larger set of constraints globally, which is accepted for blockincomplete factorizations as well (e.g. frequency filtering decompositions [24, 26]).On the other side, as we will show in Sect. 5, preserving a wisely chosen set of testvectors that approximate eigenvectors associated with small eigenvalues may substi-tute the role of a coarse grid correction.

A technique to satisfy conditions (1.1) and (1.2) has been presented in [15,Sect. 6.8.1]. Its basic idea is to repair deviations from (1.2) resulting from H-matrixapproximation by a global rank-r correction; for details see Sect. 2.2. This approachis designed to globally preserve a few vectors for the H-matrix approximation A

when approximating A. The technique for preserving constraints presented in thisarticle is based on a different approach. We will preserve the respective part of theconstraint on each block of the H-matrix partition by modifying the usual truncation.The constraint is kept during the whole approximation process rather than correctingit after the approximation has been computed. In particular, this allows to preserveconstraints for high-level operations in the H-matrix arithmetic, because approxima-tion errors occur only when truncating the sums of low-rank matrices. In addition,the blockwise preservation of constraints has the advantage that it leads to the for-mation of cascadic bases, which can be used to guarantee spectral equivalence. Toachieve the same property with the technique from [15], the number of global vectorspreserved would spoil the overall logarithmic-linear complexity.

The article is organized as follows. In Sect. 2, the required notation, basic con-cepts, and properties of hierarchical matrices will be introduced. Section 2.2 reviewsthe correction method from [15]. In Sect. 3, two methods for the blockwise preser-vation of given vectors on each block of a hierarchical partition will be presented.Their idea is to carry out the truncation perpendicular to the constraints. It will beseen that the preservation of blockwise constraints leads to the preservation of globalconstraints even for the H-matrix arithmetic. In Sect. 4, a more sophisticated wayof preserving vectors will be presented. This particular set of constraints guaranteesthat a significantly larger set of vectors is preserved by the whole matrix due to itshierarchical block structure. It will be seen that without changing the overall com-plexity of the H-matrix approximation, a cascadic basis of dimension O(k log |J |)is preserved, where k denotes the maximum rank of each block. This property is ex-ploited in Sect. 5 for spanning a basis from which the eigenvectors corresponding tosmall eigenvalues can be approximated. Spectral equivalence will be proved for thecorresponding H-matrix preconditioner. In the last section, numerical experimentsconfirm that globally preserving vectors as in [15] is not able to guarantee a boundednumber of iterations. Although the additional constraints add ranks to each block ofa hierarchical matrix, the new technique improves the overall amount of storage atthe same preconditioning effect, because the right direction of approximation is en-forced by the constraints while the accuracy of all other directions can be lowered.The article closes with a comparison with algebraic multigrid preconditioners.


2 Hierarchical matrices

In many applications it is necessary to efficiently treat fully populated matrices aris-ing from the discretization of non-local operators, e.g. finite or boundary elementdiscretizations of integral operators and inverses or the factors of the LU decompo-sition of FE discretizations of elliptic partial differential operators. For this purpose,Tyrtyshnikov [22] and Hackbusch et al. [14, 16] introduced the structure of mosaicskeleton matrices or hierarchical matrices (H-matrices) [15]. A similar approach,which is designed towards only the fast multiplication of a matrix by a vector, are theearlier developed fast summation methods tree code [1], fast multipole methods [10,11], and panel clustering [17].

In contrast to many other existing fast methods which are based on multi-levelstructures, the efficiency of H-matrices is due to two principles, matrix partitioningand low-rank representation. For an appropriate partition P of the set of matrix in-dices I × J , cluster trees TI and TJ are constructed by recursively subdividing I

and J , respectively. The subdivision is done such that indices which are in somesense close to each other are grouped into the same cluster. There are three strategieswhich are commonly used. Two of them (bounding boxes [7] and principal compo-nent analysis [2]) use grid information, whereas the method presented in [4] is basedon the matrix graph of a sparse matrix.

In the following, L(TI ) will denote the set of leaves of TI . The depth level(t)denotes the distance of t ∈ TI to the root I , and L(TI ) will be used as the maxi-mum level in TI increased by one. The block cluster tree TI×J is built by recursivelysubdividing I × J . Each block t × s is subdivided into the sons t ′ × s′, where t ′and s′ are taken from the lists of sons S(t) and S(s) of t and s in TI and TJ , re-spectively. The recursion is done for a block b := t × s until it is small enough orsatisfies a so-called admissibility condition. The previous condition guarantees thatthe restriction Ab of A ∈ C

I×J can be approximated by a matrix of low rank; see [2].All other blocks are small enough and stored as a dense matrix. The set of leavesof the block cluster tree TI×J constitutes the partition P . The constructed parti-tion has the property that for a given cluster t ∈ TI a constantly bounded numbercr

sp(t) := |{s ⊂ J : t × s ∈ P }| of blocks t × s appear in P . Similarly, given s ∈ TJ ,the expression cc

sp(s) := |{t ⊂ I : t × s ∈ P }| is bounded by a constant. Hence, thesparsity constant

csp := maxt∈TI ,s∈TJ

{cr

sp(t), ccsp(s)

}

is bounded independently of the sizes of I and J ; see [8]. The set of hierarchicalmatrices on the partition P and blockwise rank k is then defined as

H(P , k) := {A ∈ CI×J : rankAb ≤ k for all b ∈ P

}.

Elements of H(P , k) provide data-sparse representations of fully populated matricesbecause the elements of this set can be stored with logarithmic-linear complexity.This can be easily seen from the boundedness of csp and the inequality

∑

t×s∈TI×J

|t | + |s| ≤ csp[L(TI )|I | + L(TJ )|J |]; (2.1)


for further details see [2].

2.1 Approximate arithmetic

A major advantage of H-matrices over the above mentioned fast summation methodsis that by exploiting the hierarchical structure of the partition P , an approximatealgebra on H(P , k) can be defined which is based on divide-and-conquer versions ofthe usual block operations such as multiplication, inversion, and LU factorization. Itis obvious that H(P , k) is not a linear space, because the rank of the sum of two rank-k matrices is in general bounded only by 2k. Therefore, the sum has to be truncatedto rank k in order to not exceed a maximum rank.

A common way (see [2]) to approximate a rank-k′ matrix UV H by a matrix oflower rank k < k′ is to compute QR-decompositions of U = QURU and V = QV RV

and afterwards separate the product

RURHV = WΣZH ∈ C

k′×k′, Σ := diag(σ1, . . . , σk′),

via singular value decomposition. This yields

UV H = QURURHV QH

V = (QUW)Σ(QV Z)H . (2.2)

As an approximation of UV H we use the rank-k matrix U V H , where

U := (QUW)Σk, V := QV Z, (2.3)

and Σk := diag(σ1, . . . , σk,0, . . . ,0) ∈ Rk′×k′

. The previous type of approxima-tion (2.3) can be computed with linear complexity (k′)2(|t | + |s|). The rank-k matrixU V H is a best approximation, because due to Mirsky [19] one has

minM∈C

|t |×|s|k

∥∥UV H − M∥∥

2 = ∥∥UV H − U V H∥∥

2 = ‖Σ − Σk‖2 = σk+1,

where

Cm×nk := {A ∈ C

m×n : rankA ≤ k}

denotes the set of matrices of rank at most k. Hence, if a given accuracy ε > 0 is tobe guaranteed, one can choose

k(ε) := min{k ∈ N : σk+1 < εσ1}.

2.2 Treating constraints

For many applications it is essential to exactly preserve constraints satisfied by thematrix A. For instance it could be useful to preserve some eigenvectors to ensurespectral equivalence. In other applications the constant vector or translations and ro-tations (i.e. rigid body motions) shall remain in the null space. A disadvantage of theapproximation (2.3) is that also the constraints will be disturbed by an error of orderσk+1. Hence, approximation of the blocks will generally destroy the side constraints


for the whole matrix. Our aim is it to find an H-matrix approximation A which satis-fies (1.1) and (1.2).

A method to preserve given vectors was presented in [15, Sect. 6.8.1]. This ap-proach first computes an H-matrix approximation A of A via the usual trunca-tion (2.3) and afterwards applies a rank-r correction

A := A + δA, δA := δY(XH X

)−1XH ,

and δY := Y − AX to preserve the constraints AX = Y . From δAX = δY it is easyto see that

AX = AX = Y.

Since the rank of δA is bounded by r , the approximation A is in H(P , k + r) ifA ∈ H(P , k). The advantage of this method is that the rank-r correction δA canbe easily added to the H-matrix A. On the other hand, r has to be small for anefficient computation. The correction approach might lead to some disadvantagesif the approximation A is sought in the form of an LU decomposition, for instance.If constraints are to be preserved for the H-LU decomposition of A, the update δA

cannot simply be added to the approximation. Either δA has to be stored and appliedto a vector separately or the factors L and U have to be updated as in [3] if r is small.

The previous approach is designed to globally preserve a few vectors for theH-matrix approximation A to A. The techniques for preserving constraints presentedin this article are based on a different approach. We will preserve the respective partof the constraint on each block t × s of a partition by modifying the usual trunca-tion (2.3). The constraint is kept during the whole approximation process rather thanexplicitly correcting it after the computation. In particular, this allows to preserveconstraints for high-level operations in the H-matrix arithmetic, because approxima-tion errors occur only when truncating the sums of low-rank matrices. It will be seenthat this does not only preserve the prescribed vectors X but even a whole spacespanned by the restrictions of them. Furthermore, preserving local constraints duringthe computation directly influences the ongoing computation of the remaining blocksand may result in a smaller relative error. As a result of this difference, our approachwill be able to guarantee spectral equivalence, whereas the global preservation hasalmost no effect on the condition number; see the numerical example in Sect. 6.1.

3 Methods to blockwise preserve vectors

In this section, two methods are presented that truncate a given matrix A ∈ H(P , k′)to a hierarchical matrix A ∈ H(P , k + r), k < k′, i.e.

Ats = UV H ≈ Ats = U V H , t × s ∈ P , (3.1)

while preserving the constraint (1.2) blockwise, i.e.

AtsXs = AtsXs, t × s ∈ P . (3.2)


Afterwards, the use of these methods for the H-matrix Cholesky and LU factorizationwill be shown.

For simplicity, the following two methods will first concentrate on HermitianH-matrices. Afterwards it will be shown how they can be adapted to general ma-trices. Any block t × s in the upper triangular part P + := {t × s ∈ P : max t < min s}of the symmetric partition P also appears in the lower triangular part. Hence, (3.2) isequivalent to

AtsXs = AtsXs and AHts Xt = AH

ts Xt , t × s ∈ P +. (3.3)

Symmetric matrix approximations can therefore be created by preserving constraintsto the matrices Ats and their transpose. Notice that diagonal blocks Att are storedwithout approximation and hence trivially satisfy the constraints.

3.1 Preservation based on the Gram-Schmidt method

We first describe how to create an approximation Ats of a single block (3.1) under theconstraint (3.3). To this end, only the components of UV H which are perpendicularto the constraints are approximated. We define

U ′ := P(Xt )U, V ′ := P(Xs)V (3.4)

resulting from the orthogonal projection

P(X) := I − X(XH X

)−1XH

applied to U and V , respectively. Notice that the product U ′V ′H can be approximatedwith at least the same accuracy. This can be seen from

minB ′∈C

|t |×|s|k

∥∥U ′V ′H − B ′∥∥2 = min

B ′∈C|t |×|s|k

B ′=P(Xt )B′P(Xs)

∥∥U ′V ′H − B ′∥∥2

= minB∈C

|t |×|s|k

∥∥P(Xt)(UV H − B

)P(Xs)

∥∥2

≤ minB∈C

|t |×|s|k

∥∥P(Xt )∥∥

2

∥∥P(Xs)∥∥

2

∥∥UV H − B∥∥

2

≤ minB∈C

|t |×|s|k

∥∥UV H − B∥∥

2

due to the self-adjointness of P(X) and ‖P(X)‖2 ≤ 1. Let U ′V ′H be the approxima-tion of U ′V ′H defined by (2.3). For later purposes we remark that

XHt U ′ = 0 = XH

t U ′, XHs V ′ = 0 = XH

s V ′, (3.5)

because the SVD preserves the null space.


After the approximation we add two low-rank matrices with rank r and obtain theapproximation Ats := U V H defined by

U := [Y(t),Q(Xt ), U′] ∈ C

|t |×(k+2r),

V := [Q(Xs),P (Xs)Y (s), V ′] ∈ C|s|×(k+2r)

(3.6)

where Y(t) := AtsXs , Y(s) := AHts Xt , and Q(X) := X(XH X)−1. The following

lemma shows that the definition of Ats := U V H ∈ C|t |×|s|k+2r actually has the desired

properties.

Lemma 3.1 Let Ats be constructed as above. Then Ats − Ats = U ′V ′H − U ′V ′Hand

‖Ats − Ats‖2 ≤ minB∈C

|t |×|s|k

∥∥UV H − B∥∥

2.

Furthermore, Ats satisfies (3.3).

Proof For the first part of the assertion observe that

Ats − Ats = UV H − [Y(t),Q(Xt ), U′][Q(Xs),P (Xs)Y (s), V ′]H

= UV H − Y(t)Q(Xs)H − Q(Xt)Y (s)H

+ Q(Xt)Y (s)H XsQ(Xs)H − U ′V ′H

= U ′V ′H − U ′V ′H .

The vectors are preserved, because due to (3.5)

AtsXs = U[Q(Xs),P (Xs)Y (s), V ′]H Xs = [Y(t),Q(Xt ), U

′][I,0,0]H = Y(t)

and

AHts Xt = V

[Y(t),Q(Xt ), U

′]H Xt = [Q(Xs),P (Xs)Y (s), V ′][XHt Y (t), I,XH

t U ′]H

= Q(Xs)Y (t)H Xt + Y(s) − Q(Xs)XHs Y (s) = Y(s),

which follows from Y(t)H Xt = XHs Y (s). �

The approximation (3.6) can be adapted from symmetric to general matrices.Let V ′ be defined as in (3.4). In the non-symmetric case, it is sufficient to approximatethe product UV ′H , because in contrast to symmetric problems there is no constraintfor the transpose. Let U∗V ∗H denote the approximation generated from applying theusual truncation (2.3) to UV ′H . In the non-symmetric case, one defines the approxi-mation

U := [Y(t),U∗] ∈ C|t |×(k+r), V := [Q(Xs), V

∗] ∈ C|s|×(k+r). (3.7)


As in the symmetric case, it can be easily shown that this is an approximation for Ats

satisfying (3.2).The advantage of the approximations (3.6) and (3.7) is that they are easily im-

plemented. However, they are no best approximations respecting the constraint (3.3).There could still be redundancies between the orthogonal projector and the addedrank-r matrices. Another disadvantage is that this procedure has the same stabilityissues as the Gram-Schmidt method. A method based on Householder reflections ismore desirable.

3.2 Preservation based on Householder reflections

For the second method we again first focus on symmetric matrices and will look atmore general problems later on. We take a single off-diagonal low-rank block (3.1)of an H-matrix and aim at approximating it with the additional conditions (3.3).

Comparing the decomposition (2.2) of the rank-k′ matrix Ats and the approxima-tion (2.3), we have

Ats = U V H + EFH , (3.8)

where the rank-k matrix EFH , k := k′ − k, is the remainder of the approximation.Dropping the latter term in (3.8) is responsible for the violation of the constraint; seethe discussion in Sect. 2.2. By the following method only the part of EFH is droppedwhich is orthogonal to the constraint.

Let Xt = H1A and Xs = H2B be QR decompositions of Xt and Xs , respectively.Herein, H1, H2 denote products of Householder matrices and A ∈ C

|t |×r , B ∈ C|s|×r

are upper triangular matrices. Using Householder matrices to transform the remainderof the approximation in (3.8)

HH1 E =:

[C

G1

], C ∈ C

r×k , HH2 F =:

[D

G2

], D ∈ C

r×k ,

we obtain

EFH = H1

[C

G1

][D

G2

]H

HH2 = H1

[CDH CGH

2G1D

H G1GH2

]HH

2 . (3.9)

The Householder matrices H1 and H2 have the effect that either C or D has a con-tribution to the conditions (3.3) when multiplying (3.8) by Xt or Xs , because onlythe first r rows of A and B are non-zero. Motivated by this, we neglect G1G

H2 in

(3.9). This does not violate the constraints (3.3) as we shall see later on. In order tofurther save costs in future operations, we reduce the size of the inner |t | × |s| ma-trix in (3.9) by QR decompositions of G1D

H = Q1R1 and G2CH = Q2R2 with the

Householder matrices Q1 ∈ C(|t |−r)×(|t |−r), Q2 ∈ C

(|s|−r)×(|s|−r) and upper triangu-lar matrices R1 := [RH

1 ,0]H ∈ C(|t |−r)×r , R2 := [RH

2 ,0]H ∈ C(|s|−r)×r . This leads

to

H1

[CDH CGH

2G1D

H 0

]HH

2 = H1

[I

Q1

]

︸︷︷︸=:H1

⎡

⎣CDH RH

2 0R1 0 00 0 0

⎤

⎦[I

Q2

]H

HH2

︸︷︷︸=:HH

2

.


Finally, there might be some redundancies in the inner matrix. To get rid of them, weemploy the singular value decomposition of the inner 2r × 2r matrix. Using the SVD

[CDH RH

2R1 0

]=: WSZH ,

we define the blockwise approximation

Ats := U V H + H1

[WSZH 0

0 0

]HH

2 = [U , W ][I

S

][V , Z]H , (3.10)

with W := H1[WH ,0]H and Z := H2[ZH ,0]H . Compared with the algorithm thatuses the Gram-Schmidt method, the latter method has good stability properties.

Lemma 3.2 The approximation (3.10) preserves the conditions (3.3) and rank Ats ≤k + 2r . Furthermore, we have Ats − Ats = EF H , where

E = H1

[0

G1

], F = H2

[0

G2

],

and ‖EF H ‖2 ≤ ‖EFH ‖2.

Proof It is easy to verify that by construction we have

Ats = Ats + EF H .

Hence, it suffices to show that EH Xt = 0, F H Xs = 0, which follows from

EH Xt =[

0G1

]H

HH1 Xt =

[0

G1

]H

HH1 H1A =

[0

G1

]H

A = 0,

because at most the leading r rows of A are non-zero. Similar arguments apply whenproving F H Xs = 0. �

Another important property is that the approximation (3.10) is a best approxima-tion with respect to ‖ · ‖2 under the constraints of (3.3). This is easy to see, becausethe decomposition (3.8) results from an SVD. The term EFH is only used to satisfythe constraints (3.3) and is truncated with another SVD to WSZH .

The above approximation (3.10) can be adapted to non-symmetric matrices. Ifonly (3.2) needs to be fulfilled, then in addition to G1G

H2 we can also drop CGH

2in (3.9). In this case, the rank of Ats is bounded by k + r .

3.3 Hierarchical matrix algebra

In the previous section, we introduced two truncation algorithms which preserve oneach block the corresponding restriction of vectors. In this section, the effect ofpreservation will be investigated for H-matrix operations. In short, it will be seen


that the result of the approximate H-matrix addition, multiplication, inversion, andLU factorization also satisfies the constraint (1.2).

Since the H-matrix addition is just a blockwise truncated addition of two low-rankmatrices, the rounded sum of two H-matrices can obviously be modified to satisfy(1.2). The next step is to define a rounded multiplication which uses this truncation.For the multiplication of two H-matrices A ∈ H(TI×J , kA) and B ∈ H(TJ×K,kB)

assume that they are subdivided according to their block cluster trees TI×J and TJ×K :

A =[A11 A12A21 A22

], B =

[B11 B12B21 B22

].

Then C := AB is computed recursively via

C =[A11B11 + A12B21 A11B12 + A12B22A21B11 + A22B21 A21B12 + A22B22

],

which involves multiplications of smaller size and rounded additions. Assuming thatthe subblocks of the result C are computed such that they preserve the correspondingparts of X, i.e.

CijXj = CijXj , i, j = 1,2,

with X := [XH1 ,XH

2 ]H , then we even have

C

[X1 00 X2

]= C

[X1 00 X2

],

which in particular yields CX = CX. If the exact result C of the multiplication has acoarser structure, it is necessary to merge the low-rank representations Cij = UijV

Hij

to a single low-rank representation C = UV H . This reduces the number of constraintsmerely to CX = CX.

Since the H-matrix LU and Cholesky factorization are computed via divide-and-conquer operations using the H-matrix addition and multiplication, the previous ar-guments can be applied to prove the next lemma.

Lemma 3.3 If the restrictions of X corresponding to the blocks of A are preservedduring truncation, then for the approximate LU decomposition A ≈ LU it holds that

(LU)tsXs = AtsXs for all t × s ∈ P .

In particular, we have LUX = AX.

4 Cascadic bases

The preservation of blockwise constraints (3.2)

AtsXs = AtsXs, t × s ∈ P ,


resulting from restricting X during truncation can be utilized to obtain global con-straints (1.2)

AX = AX

even for the approximate H-matrix operations. Notice that the number of vectorspreserved globally coincides with the number of vectors preserved locally. In thissection, we will see that with a more sophisticated way of preserving vectors oneach block, a significantly larger set of vectors can be preserved globally due to thehierarchical structure of P .

The basis that will be presented uses the substructure of the H-matrix in a nativeway. The number of vectors that each block preserves depends on its level in thecluster tree. With the right choice of vectors which have to be preserved blockwise itwill be shown that the whole matrix preserves a set of |L(TJ )| vectors.

4.1 Construction of cascadic bases

Given a vector u ∈ CJ , we define the linear hull of restrictions of u to the leaf clusters

of s ∈ TJ

Λ(s) := span{us : s ∈ L(Ts)

}⊂ Cs ,

where Ts is the subtree of TJ rooted at s and us ∈ Cs denotes the zero extension of

the restriction us ∈ Cs to C

s , i.e.

(us)i :={

ui, i ∈ s,

0, i ∈ s \ s.

The vector u needs to be different from zero on the restrictions to the leaves of thetree Ts , i.e. us = 0, s ∈ L(Ts), such that {us : s ∈ L(Ts)} is a basis for Λ(s).

The following lemma generalizes the observation that given an approximation

[A11 A12

A21 A22

]≈[A11 A12A21 A22

]

such that each block satisfies

AijXj = AijXj , i, j = 1,2,

then we already have

[A11 A12

A21 A22

][X1 00 X2

]=[A11 A12A21 A22

][X1 00 X2

].

We will see that to maintain property (1.2) the partition of H-matrices requires thatlarger blocks have to fulfill more constraints than leaves of smaller size.

Lemma 4.1 Let t × s ∈ TI×J \ P and let At ′s′ preserve Λ(s′) for all s′ ∈ S(s) andt ′ ∈ S(t). Then Ats preserves Λ(s).


Proof According to the assumption, each block t ′ × s′ of A preserves Λ(s′), i.e.

At ′s′uσ = At ′s′uσ , σ ∈ L(Ts′).

Since (uσ )s′ = 0 for σ ∩ s′ = ∅, this can be extended to

At ′s′(uσ )s′ = At ′s′(uσ )s′ , σ ∈ L(Ts),

and hence∑

s′∈S(s)

At ′s′(uσ )s′ =∑

s′∈S(s)

At ′s′(uσ )s′ , σ ∈ L(Ts).

From uσ = (uσ )s′ for all s′ ∈ S(s) and all σ ∈ L(Ts), we obtain

At ′suσ =∑

s′∈S(s)

At ′s′(uσ )s′ =∑

s′∈S(s)

At ′s′(uσ )s′ = At ′suσ , σ ∈ L(Ts),

for all t ′ ∈ S(t). �

According to the previous lemma, it is sufficient to impose few conditions on theleaves in order to preserve a large number of vectors on a global level. The nextlemma shows how many vectors are preserved by the whole matrix.

Lemma 4.2 Let each block Ats , t × s ∈ P , preserve Λ(s). Then, A preserves Λ(J ).The number of linear independent vectors preserved by A is |L(TJ )|.

Proof According to the assumption, each block t × s ∈ P preserves Λ(s). It fol-lows from inductively applying Lemma 4.1 going from the leaves to the root that allblocks t × s ∈ TI×J preserve Λ(s). In particular, A preserves Λ(J ).

It is obvious that

dimΛ(s) =∑

s′∈S(s)

dimΛ(s′). (4.1)

Since dimΛ(s) = 1 for all s ∈ L(TI ), the dimension of Λ(J ) equals the num-ber |L(TJ )| of leaves in TJ . �

The number |L(TJ )| of leaves of TJ is of the order |J |. Hence, for the number oflinear independent vectors preserved by A it follows that dimΛ(J ) ∼ |J |. PreservingO(|J |) vectors almost completely determines the whole matrix. By (4.1) the dimen-sion of Λ(s) is the sum the dimensions of all Λ(s′) for all s′ ∈ S(s). From this itfollows that larger blocks in P have to fulfill more constraints than smaller blocks.To be precise, for any leaf t × s ∈ P the depth of the subtree Ts of TJ rooted at s

determines the dimension of Λ(s). As a consequence, the logarithmic-linear com-plexity of the resulting H-matrices in terms of storage will be destroyed as we shallsee later on.

To avoid this, the depth q of the smallest cluster s ∈ TJ for which us is preservedglobally, needs to be reduced. Blocks t × s with depth greater q only preserve the


Fig. 1 An H-matrix and itspreserved vectors for q = 2

restricted vector us . Here, the depth level(s) denotes the distance of s′ ∈ Ts to theroot J independently of s ∈ TJ . The reduced basis for a certain cluster s ∈ {s ∈ TJ :∃t × s ∈ P } is defined in the following way:

Λq(s) :={

span{us : s ∈ Lq(Ts)}, level(s) < q,

span{us}, otherwise,(4.2)

and Lq(Ts) := {s′ ∈ Ts : level(s′) = q}. The next lemma states a similar estimate asLemma 4.2 for the number of linear independent vectors that are preserved globallyby a reduced basis. The level of a block is defined by its distance from the root in theblock cluster tree.

Lemma 4.3 Let each block Ats , t × s ∈ P , preserve Λq(s). Then the number oflinear independent vectors preserved by A is at least |Lq (TJ )| with

q := max(q,min

{level(t × s) : t × s ∈ P

}).

Proof If the depth of the cluster trees TI and TJ is virtually restricted to q thenLemma 4.2 can be applied. �

Figure 1 depicts the vectors us,us , s ∈ L2(Ts), preserved in the case of the con-stant vector u = 1. Off-diagonal blocks are considered to be low-rank, while diagonalblocks are refined recursively. The white blocks in Fig. 1 are treated without approxi-mation, whereas green blocks are approximated by low-rank matrices preserving thecontained vectors. One can see that four linear independent vectors are preserved forthe whole matrix in Fig. 1, which is in line with Lemma 4.3 because the minimallevel is q = 2 and |L2(TJ )| ≥ 4.

The constant q of Lemma 4.3 can be large for certain applications. This means thateven though only a single vector is preserved per block, the total number of globallypreserved linear independent vectors might be huge due to the structure of the par-tition. One application of this type is nested dissection reorderings of FE systemsof elliptic operators; cf. [4]. Admissible blocks only appear in the separator, whichdepending on the sparsity is relatively small.


4.2 Storage requirements vs. number of preserved vectors

The preservation of given vectors increases the blockwise rank by the number ofpreserved vectors and thus the storage requirements. The number of conditions of ablock t × s is given by

{|Lq(Ts)|, level(s) < q,

1, otherwise.

We assume that the number of sons κ := S(s) does not depend on s ∈ TI . This is inline with the usual choice κ = 2 or κ = 3. In this case, the number of conditions for ablock t × s from the �-th level of TI×J is

c∗(�) :={

κq−�, � < q,

1, otherwise.

Hence, the storage required for t × s ∈ P is bounded by

Nst(Ab) ≤ k(|t | + |s|)

with k := k + c∗. Here, k is the rank of a usual low-rank approximation definedin (2.3). Non-admissible blocks Ats are small, i.e. min{|s|, |t |} ≤ nmin for somenmin ∈ N. These blocks are stored entrywise and therefore require

|t ||s| = min{|t |, |s|}max

{|t |, |s|}≤ nmin(|t | + |s|)

units of storage. Hence, at most max{k, nmin}(|t | + |s|) units of storage are neededfor Ats . With (2.1) it follows that

∑

t×s∈P

max{k,nmin}(|t | + |s|)≤ csp max{k,nmin}

[L(TI )|I | + L(TJ )|J |].

The number c∗ of conditions per block depends on the level. Hence, a more sophisti-cated estimate is needed. First, observe that∑

t×s∈TI×J

c∗(�)(|t | + |s|)=

∑

t∈TI

∑

{s∈TJ :t×s∈TI×J }c∗(�)|t | +

∑

s∈TJ

∑

{t∈TI :t×s∈TI×J }c∗(�)|s|.

For the first term we obtain

∑

t∈TI

∑

{s∈TJ :t×s∈TI×J }c∗(�)|t | =

L(TI )−1∑

�=0

∑

{t∈TI :level(t)=�}

∑

{s∈TJ :t×s∈TI×J }c∗(�)|t |

≤ csp

L(TI )−1∑

�=0

c∗(�)∑

{t∈TI :level(t)=�}|t |

= csp|I |L(TI )−1∑

�=0

c∗(�)


= csp|I |(

κq

q∑

�=0

κ−� + L(TI ) − q − 1

)

= csp|I |(

κq+1 − 1

κ − 1+ L(TI ) − q − 1

).

If q is chosen as q ∼ logκ(kL(TI )), we obtain

∑

t∈TI

∑

{s∈TJ :t×s∈TI×J }c∗(�)|t | ∼ cspkL(TI )|I |

provided that L(TI ) ≈ L(TJ ). Hence, for this choice of q the storage required whenpreserving vectors is asymptotically the same as without preservation

Nst(A) ∼ csp max{k,nmin}[L(TI )|I | + L(TJ )|J |].

The results of this section are summarized in the following theorem.

Theorem 4.4 Assume that each block Ats , t × s ∈ P , preserves Λq(s) with

q ∼ log(k log |I |).

Then A requires O(k[|I | log |I | + |J | log |J |]) units of storage and preservesO(k log |J |) linear independent vectors.

Proof The total number of vectors that are preserved by the whole matrix is∣∣Lq(TJ )

∣∣∼ κq ∼ kL(TI×J ) ∼ k log |J |. �

5 Preservation of vectors and preconditioning

One of the applications of the blockwise preservation of vectors is preconditioning.Usually, H-matrix approximations A to a given matrix A ∈ C

n×n are constructedsuch that

‖A − A‖2 ≤ ε‖A‖2. (5.1)

Assume that A is Hermitian and positive definite. In this case, the previous condition(5.1) guarantees that for the i-th largest eigenvalues λi it holds that

∣∣λi(A) − λi(A)∣∣≤ ελ1(A)

due to Weyl’s theorem [25]. Hence, the relative accuracy of all large eigenval-ues λ1(A) ≥ · · · ≥ λl(A) of the same order as λ1(A) is approximately ε. In con-trast to that, the relative accuracy of the remaining eigenvalues λm(A), where m =l + 1, . . . , n is only ελ1/λm. Therefore, an absolute error of order ε‖A‖2 gives A

much more the character of a smoother in the sense of multigrid methods for el-liptic problems but does not necessarily result in a good preconditioner A unless A


is well-conditioned or other additional criteria can be used. To avoid this, we couldchoose a smaller ε, which increases the computational complexity of the H-matrixapproximation. Another idea is to preserve the eigenvectors (or at least approxima-tions) corresponding to small eigenvalues. In the case of matrices arising from FEsystems of elliptic operators, these eigenvectors are smooth so that it can be expectedthat they can be approximated well by a relatively small cascadic space consisting ofpiecewise constant vectors.

5.1 Spectral equivalence

In multigrid methods for elliptic problems, the space associated with small eigenval-ues is incorporated by the coarse grid correction. Here, in contrast to multigrid meth-ods we directly approximate the eigenvectors vj with respect to the small eigenvaluesor some space Q nearby, by preserving a set of suitably chosen test vectors. To illus-trate what has to be required for Q, we consider A−1 and subtract all rank-1 matricesλ−1

j vj vHj , j = l + 1, . . . , n. Then the remaining matrix is just a sum of similar rank-1

matrices but corresponding to the large eigenvalues λ−11 , . . . , λ−1

l , which are on theorder of ‖A‖−1. Thus, formally the eigenspace with respect to the small eigenvaluescan be characterized by the observation that the error ‖A−1 − Q(QH AQ)−1QH ‖is on the order of ‖A‖−1, where Q := [vl+1, . . . , vn]. But this observation is moregeneral and does not explicitly require that Q is exactly the set of eigenvectors corre-sponding to λl+1, . . . , λn. This (approximate) eigenspace Q has to be approximatedby our test vectors, say X, appropriately with respect to some norm, i.e., ‖Q − X‖has to be sufficiently small. Our main theorem will now justify the idea of preserv-ing (approximate) eigenvectors X corresponding to small eigenvalues λl+1, . . . , λn.Moreover, preserving test vectors is some kind of substitute for adding a coarse gridcorrection to the preconditioner.

For Hermitian positive definite matrices A it is known that for the condition num-ber it holds that

cond(A) = ‖A‖2∥∥A−1

∥∥2 = λmax(A)

λmin(A).

Here, λmax(A) and λmin(A) denote the eigenvalues of A of largest and smallest mod-ulus, respectively.

Theorem 5.1 Let A, A ∈ Cn×n be Hermitian, where A is positive definite. Suppose

there exist τ > 2ε > 0 such that the following four conditions are fulfilled.

1. Assume that ‖A − A‖2 ≤ ε‖A‖2.2. Let AX = AX for some prescribed X ∈ C

n×m.3. Suppose that there exists Q ∈ C

n×m with rankQ = m such that

∥∥(Q − X)A−1Q (Q − X)H

∥∥2 ≤ 1

τ‖A‖2, where AQ := QH AQ and

4. we require that Q itself is chosen such that

∥∥A−1 − QA−1Q QH

∥∥2 ≤ 1

τ‖A‖2.


Then A is also positive definite and the condition number of A−1/2AA−1/2 is boundedby

cond(A−1/2AA−1/2)= λmax(A

−1A)

λmin(A−1A)≤ τ + 2ε

τ − 2ε.

Proof We decompose A−1 as A−1 = B +QA−1Q QH , where B := A−1 −QA−1

Q QH .

Since B is still positive semidefinite, there exists B1/2. Thus we can write A−1 as

A−1 =[B1/2 QA

−1/2Q

][ B1/2

A−1/2Q QH

]

=: LLH .

With E := A − A it will be shown that

∥∥A−1/2EA−1/2∥∥

2 ≤ 2ε

τ. (5.2)

We remark that for a Hermitian matrix E and a Hermitian positive definite matrixA−1 = LLH we have ‖A−1/2EA−1/2‖2 = ‖LH EL‖2. Thus it is enough to bound

∥∥∥∥∥

[B1/2

A−1/2Q QH

]

E[B1/2 QA

−1/2Q

]∥∥∥∥∥

2

.

Since this is a 2 × 2 block matrix, it is therefore sufficient to bound each block byε/τ in order to obtain (5.2).

First considering B1/2EB1/2, we find that

∥∥B1/2EB1/2∥∥

2 ≤ ∥∥B1/2∥∥2

2‖E‖2 = ‖B‖2‖E‖2 ≤ 1

τ‖A‖2ε‖A‖2.

Next we consider B1/2EQA−1/2Q . Since AX = AX we have that

EQ = (A − A)Q = (A − A)(Q − X) = E(Q − X).

Hence,

∥∥B1/2EQA−1/2Q

∥∥2 = ∥∥B1/2E(Q − X)A

−1/2Q

∥∥2 ≤ ∥∥B1/2

∥∥2‖E‖2

∥∥(Q − X)A−1/2Q

∥∥2

=√‖B‖2‖E‖2

√∥∥(Q − X)A−1Q (Q − X)H

∥∥2

≤ 1√τ‖A‖2

ε‖A‖21√

τ‖A‖2= ε

τ.

Obviously the same bound holds for ‖A−1/2Q QH EB1/2‖2.


Finally we have to bound ‖A−1/2Q QH EQA

−1/2Q ‖2. Exploiting again that EQ =

E(Q − X), it immediately follows that QH EQ = (Q − X)H E(Q − X). Thereforewe obtain

∥∥A−1/2Q QH EQA

−1/2Q

∥∥2 = ∥∥A−1/2

Q (Q − X)H E(Q − X)A−1/2Q

∥∥2

≤ ∥∥A−1/2Q (Q − X)H

∥∥2‖E‖2

∥∥(Q − X)A−1/2Q

∥∥2

= ∥∥A−1/2Q (Q − X)H

∥∥2

2‖E‖2

= ∥∥(Q − X)A−1Q (Q − X)H

∥∥2‖E‖2

≤ 1

τ‖A‖2ε‖A‖2 = ε

τ.

In total, we have

∥∥A−1/2(A − A)A−1/2∥∥

2 = ∥∥A−1/2EA−1/2∥∥

2 ≤ 2ε

τ< 1.

This implies that the extremal eigenvalues of A−1A are bounded by 1 ± 2 ετ

and inparticular A must be positive definite as well. Consequently the eigenvalues of A−1A

are bounded by ττ−2ε

and respectively ττ+2ε

. This completes the proof. �

We like to emphasize that in Theorem 5.1, X need not have full rank, so one couldrewrite X = PW for some full rank matrix P with rank r � m and only requirethat AP = AP . This may reduce the number of constraints that really have to bepreserved.

As mentioned prior to Theorem 5.1, a special choice of Q is certainly the spaceof eigenvectors associated with the m smallest eigenvalues of A. In this case thecondition ‖A−1 − QA−1

Q QH ‖ ≤ 1τ‖A‖2

simplifies to

1

λi

≤ 1

τ‖A‖2, i = 1, . . . , l,

for the largest l = n − m eigenvalues λ1 ≥ · · · ≥ λl of A. So we can particularlyset τ := λl/‖A‖2. Clearly, A−1

Q simplifies to diag(λ−1l+1, . . . , λ

−1n ). According to

Lemma 4.2, given a fixed test vector u, we obtain r = |L(TJ )| linear independenttest vectors that could be preserved if we would preserve more test vectors the largerthe block Ats is. We have also illustrated that this strategy would become expensivebecause of the increasing number of local constraints to be preserved. Therefore, ina realistic algorithm, according to Lemma 4.3, only r = |Lq (TJ )| constraints will bepreserved.

In the case that u is the vector with all ones (which will be the case in our experi-ments), the columns of X refer to piecewise constant vectors extended by zero. Thus,the H-Cholesky decomposition computes a hierarchical LU decomposition accord-ing to some prescribed ε and preserves |Lq (TJ )| piecewise constant partitions of u.


Therefore, l has to be chosen large enough such that

∥∥∥∥(Q − X)diag

(1√λl+1

, . . . ,1√λn

)∥∥∥∥

2

2≤ 1

λl

.

This means that eigenvectors with respect to smaller eigenvalues have to be preservedmore accurately than those associated with larger eigenvalues. Assuming that thechosen test vector u and its partitions approximate the eigenvectors correspondingto λl+1, . . . , λn (which are often referred to as low frequencies) sufficiently well, weexpect that at least m = n− l ∼ |Lq (TJ )| (low frequency) eigenvectors are sufficientlyapproximated. Using τ = λl/‖A‖2, we will obtain mesh independent convergence,whenever τ is a constant.

We finally remark that our main Theorem 5.1 could be applied to any approxima-tion A of A. E.g., a modified incomplete Cholesky factorization such as the classicalMILU(0) (see [12]) or modified incomplete Cholesky decompositions with respectto a given threshold ε that additional preserve u as well as multilevel ILU variants ofsimilar type; cf., e.g., [6, 21, 23]. However, preserving sufficiently many partitions ofthe test vector u seems to be crucial to cover the low frequencies.

5.2 Relaxed preservation

Numerical experiments indicate that the prescribed depth q of the smallest clusters ∈ TJ in (4.2) can be chosen independently of the number of unknowns |I | and |J |to create a good preconditioner. Furthermore, in practice preserving a single vectorper block is enough, which significantly improves the complexity. This seems to bein contradiction to the arguments of Theorem 5.1, because preserving a single vectorper block may not necessarily lead to a sufficiently large space of vectors preservedby the whole matrix. In the rest of this section, we try to give some algebraic argu-ments why neglecting blockwise constraints does only slightly influence the numberof iterations in the preconditioned conjugate gradient method (PCG, [18]). Usingan H-matrix approximation A based on the truncated subspace Λq(s), we preserveless test vectors compared with the larger space Λ(s) which corresponds to A. Eachblock of the H-matrix approximation A refers to a low-rank perturbation of A and inthe symmetric case any low-rank part that is neglected in the lower triangular has acounter part in the upper triangular part, i.e., the skipped low-rank corrections W1, W2are of the form W1W

H2 +W2W

H1 and can obviously be written as a difference of two

Hermitian semi-definite low-rank matrices

W1WH2 + W2W

H1 = 1

2(W1 + W2)(W1 + W2)

H − 1

2(W1 − W2)(W1 − W2)

H .

A realistic model of A would be

A = A + W1WH2 + W2W

H1 + E,

where A = LLH ∈ Cn×n corresponds to the computed hierarchical factorization and

E refers to a perturbation such that ‖L−1EL−H ‖2 ≤ ε < 1. On the one hand, this


model reveals the relatively small perturbation E as justified in Lemma 5.1 by us-ing test vectors. On the other hand, it incorporates the fact that additional low-rankcorrections are taken into account when using Λq(s).

To simplify the discussion, we only consider the Hermitian positive definite caseand the situation where

A = A + WWH + E, W ∈ Cn×k,

knowing that the case A = A − WWH + E would be slightly more complicated butleads to a similar conclusion.

Lemma 5.2 Let A, A ∈ Cn×n be as above such that ‖L−1EL−H ‖2 ≤ ε < 1 and let

σ1, σk denote the largest (resp. smallest) singular value of L−1W . Using the conju-gate gradient method for A with A as preconditioner, the iterates x�, � = 1,2,3, . . .

satisfy

‖x − x�‖A

‖x − x0‖A

≤

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

max

{ε,2(σ 2

1 + ε)

(√κ − 1√κ + 1

)�−1}, � = 1, . . . , k − 1, (5.3)

2

(√κε − 1√κε + 1

)�−k

, � ≥ k, (5.4)

where ‖x‖A := √xH Ax, κ := 1+σ 2

1 +ε

1+σ 2k −ε

and κε := 1+ε1−ε

.

Proof We note that Ax = b ⇔ By = c, where B = L−1AL−H , y = LH x, c = L−1b,and using the conjugate gradient method for B yields transformed iterates y� = LH x�

such that ‖y − y�‖B = ‖x − x�‖A. We set W = L−1W , E = L−1EL−H and pointout that

B = I + WWH + E,

where ‖E‖2 ≤ ε < 1 and W ∈ Cn×k . The residual r� := c − By� satisfies r� =

π�(B)r0, where r0 is the initial residual and π�(z) is a normalized polynomial ofdegree � satisfying π�(0) = 1; cf. [9]. Let λ1 ≥ · · · ≥ λn > 0 be the eigenvalues of B .The conjugate gradient method is known to satisfy

‖y − y�‖B ≤ ‖y − y0‖B · minπ�(0)=1

maxj=1,...,n

∣∣π�(λj )∣∣.

We denote by μ1 ≥ · · · ≥ μn the eigenvalues of I + WWH . Then the theorem ofWeyl states that

|λi − μi | ≤ ‖E‖2 ≤ ε < 1.

We remark that I + WWH has μ = 1 as eigenvalue of multiplicity n − k and furtherk eigenvalues associated with the low-rank part, i.e., the eigenvalues are

μi = 1 + σ 2i , i = 1, . . . , k, and μk+1 = · · · = μn = 1,


where σ1 ≥ · · · ≥ σk ≥ 0 denote the singular values of W in decreasing order. FromWeyl’s theorem, we obtain that

λ1, . . . , λk ∈ [1 + σ 2k − ε,1 + σ 2

1 + ε], λk+1, . . . , λn ∈ [1 − ε,1 + ε].

We first discuss the case that � < k. Notay [20] discusses convergence bounds for CGin the presence of rounding errors, where perturbed isolated eigenvalues are locatedat the lower or the upper end of the spectrum. Here, the perturbed eigenvalue is μ = 1.Following these ideas, we can bound |π�(λj )| by

minπ�(0)=1

maxj=1,...,n

∣∣π�(λj )∣∣≤ min

π�−1(0)=1max

j=1,...,n

∣∣π�−1(λj )∣∣|1 − λj |.

Here, we can easily estimate |π�(λj )| by choosing

π�−1(λ) = ζ�−1(λ)

ζ�−1(0),

where ζ�−1(λ) is the translated Chebyshev polynomial of degree � − 1 with respectto the interval [1 + σ 2

k − ε,1 + σ 21 + ε]. With arguments analogous to the standard

conjugate gradients methods [9], we obtain

minπ�(0)=1

maxj=1,...,n

∣∣π�(λj )∣∣

≤ minπ�−1(0)=1

max{ε max

j>k

∣∣π�−1(λj )∣∣,(σ 2

1 + ε)

maxj≤k

∣∣π�−1(λj )∣∣}

≤ max

{ε max

x∈[1−ε,1+ε]|ζ�−1(x)||ζ�−1(0)|

︸︷︷︸≤1

,(σ 2

1 + ε)

maxx∈[1+σ 2

k −ε,1+σ 21 +ε]

|ζ�−1(x)||ζ�−1(0)|

}

≤ max

{ε,2(σ 2

1 + ε)(

√κ − 1√κ + 1

)�−1},

where κ = 1+σ 21 +ε

1+σ 2k −ε

.

Now, we consider the case � ≥ k. Let r ≤ k such that λ1 ≥ · · · ≥ λr > 1 + ε. Thistime, ζ�−r (x) is chosen as the Chebyshev polynomial of degree � − r with respect tothe interval [1 − ε,1 + ε]. Using

π�(λ) = ζ�−r (λ)

ζ�−r (0)

r∏

j=1

λj − λ

λj

,

we thus obtain

minπ�(0)=1

maxj=1,...,n

∣∣π�(λj )∣∣≤ max

j>r

∣∣π�(λj )∣∣≤ max

x∈[1−ε,1+ε]|ζ�−r (x)||ζ�−r (0)| ,


which can be bounded by

2

(√κε − 1√κε + 1

)�−r

in terms of the remaining condition number κε = 1+ε1−ε

. The worst case applies ifk = r . �

By (5.3), the convergence primarily depends on the low-rank part during thefirst k − 1 steps, which only requires a moderate number of iteration steps whenk is small. Starting at step k, the convergence turns back to the behavior as if thelow-rank part were not present, causing a delay of at most k steps, as stated by (5.4).From the construction of the hierarchical matrix approximation, κε is expected to besmall, i.e., we expect to have not significantly more than k additional iteration steps.This justifies that it is advantageous to neglect some constraints in particular for largeblocks.

6 Numerical results

The following numerical examples show the effect of blockwise preservation of givenvectors when preconditioning discretizations of Poisson problems. In the first exam-ple, we present results obtained from a usual H-matrix preconditioner that is solelybased on (1.1). In the following, this preconditioner will be referred to as Prec0and is compared with the two preconditioners PrecGlobal (cf. [15, Sect. 6.8.1]and Sect. 2.2) preserving a given vector globally in an exact way and PrecLocalintroduced in Sect. 4. In the second example, the efficiency of the new technique isdemonstrated via comparing it with an algebraic multigrid solver.

All linear systems were solved using the preconditioned conjugate gradientmethod up to an accuracy of 1e−10. A minimal block size nmin = 150 is used. Thetests were performed on a single core of an Intel Xeon X5482 processor at 3.2 GHzwith 64 GB of core memory using the H-matrix library AHMED.1

6.1 Local vs. global update

To compare the effect of locally preserving vectors with a global preservation, weconsider the two-dimensional Poisson’s equation with Dirichlet data

− u = f in Ω,

u = 0 on ∂Ω,

on a square Ω = (0,1)2. Already this simple problem reveals the different asymptoticbehaviors of the methods compared. A regular grid allows to avoid the influence ofgrid refinement on the number of iterations. Practically more relevant examples willbe presented in Sect. 6.2.

1See the web site http://bebendorf.ins.uni-bonn.de/AHMED.html.

http://bebendorf.ins.uni-bonn.de/AHMED.html


Fig. 2 Number of PCG steps for the first example on a log-scale

For comparison with our H-Cholesky preconditioners we will first present anal-ogous results for the incomplete LU decomposition (ILU) using a prescribed droptolerance ε = 1e−3 with and without preserving the constant test vector X :=[1, . . . ,1]T . The latter is usually referred to as modified ILU (MILU) or modifiedincomplete Cholesky decomposition; see [12]. For the iterative solution the precon-ditioned conjugate gradient method is used. These experiments were performed withMATLAB and its built-in function ichol. Although MILU outperforms the variantwithout preserving X, both approaches share the effect that with increasing numberof unknowns the number of iteration steps significantly increases; see Fig. 2. The it-eration process clearly dominates the total computation time and as shown in [12] forthe modified incomplete Cholesky decomposition without fill-in, a condition num-ber of the order h−1 is observable, where h is the mesh size of the triangularization.In particular, neither the version with nor the version without preserving X is ableto achieve a mesh independent condition number. We will see that for the H-matrixpreconditioner presented in this article this effect is different.

In addition to MILU, we compare the following three H-Cholesky precondition-ers. PrecGlobal globally preserves the constant vector X := [1, . . . ,1]T (see (1.1)and (1.2)), whereas PrecLocal locally preserves X on each block of the parti-tion; cf. (3.1) and (3.2). The third one Prec0 is a usual H-Cholesky decomposition;see (1.1). All three are computed from the stiffness matrix A with accuracy ε = 0.1.Figure 3 depicts the effect of these three H-Cholesky preconditioners on the con-jugate gradient method. Neither Prec0 nor PrecGlobal is able to guarantee abounded number of iterations. However, the number of PCG steps remains constantfor the local preservation of constraints. Since the additional effort spent for comput-ing PrecGlobal does not pay compared with Prec0, we neglect this precondi-tioner in the following comparisons.


Fig. 3 Number of PCG steps for the first example on a log-scale

Fig. 4 Computational domains for comparison with algebraic multigrid

6.2 Speedup and comparison with algebraic multigrid

For the comparison with algebraic multigrid methods, we choose the mixed boundaryvalue problem (6.1) on two different domains in R

3 (see Fig. 4). Two domains withdifferent Dirichlet parts were chosen in order to investigate the robustness of bothpreconditioning techniques.

In Fig. 4(a), the domain Ω consists of the union of two conductors. The bound-ary Γ1 is the upper end of the conductors and Γ2 the lower one. The remaining bound-ary is denoted by Γ3 := ∂Ω\(Γ1 ∪ Γ2). The conductivity σ of the two conductors isσ = 1e−7 and σ = 1e−4, respectively, while it is zero in the non-conductive part. InFig. 4(b), Ω is a coil with the shape of a pyramid. The boundary Γ1 is the upper end ofthe conductor and Γ2 the lower one. The remaining boundary is Γ3 := ∂Ω\(Γ1 ∪Γ2).The conductivity σ of the conductor is σ = 1 in the left and σ = 1000 in the righthalf, while it is zero in the non-conductive part.


Table 1 Comparison of the runtime for the H-Cholesky decomposition and PCG iteration with equalmemory consumption for the geometry in Fig. 4(a)

size memory PrecLocal Prec0 savings

Chol. PCG steps Chol. PCG steps

2.0 × 105 0.4 GB 31 s 14 s 29 24 s 26 s 54 5 s (11 %)

7.6 × 105 1.6 GB 160 s 57 s 29 120 s 105 s 53 8 s (4 %)

1.3 × 106 3.0 GB 345 s 121 s 32 257 s 235 s 62 26 s (6 %)

4.1 × 106 9.5 GB 1152 s 548 s 45 849 s 1655 s 136 804 s (47 %)

The following diffusion problem is considered for both geometries

−div(σ∇u) = 0 in Ω,

u = 0 on Γ1,

∂u

∂ν= I on Γ2,

∂u

∂ν= 0 on Γ3.

(6.1)

For the discretization, in both examples quadratic ansatz functions have been chosen.The H-matrix partition was constructed via the techniques [4], which are based onnested dissection. The discretization of the domain was done using NETGEN.2

As a first set of tests for both examples, we compare PrecLocal with Prec0.The accuracy (see (1.1)) of PrecLocal was set to 5.6e−3 in the first and 1.0e−6 insecond example. The accuracy of Prec0 was adapted so that it requires almost thesame amount (relative difference less than 1 %) of storage as PrecLocal.

To further increase performance in the first example, the side constraints weredropped if the singular values as in (3.10) are below a relative threshold of 1e−5.The savings resulting from this technique are about 5 %. In the second example, theapproximation accuracy is already very low so that the threshold was set to machineprecision.

From Tables 1 and 2 it can be seen that the new approach results in a significantreduction in the number of PCG steps compared to the usual H-Cholesky decom-position for both geometries. Hence, significant savings in terms of runtime can beachieved, because lower accuracies can be chosen. In one case, it was not possibleto construct Prec0, because the approximation lead to an indefinite matrix. In thiscase, the respective entry in the table was crossed out.

In the second set of tests, PrecLocal was compared with an algebraic multigridsolver, here BoomerAMG in standard configuration from the HYPRE 2.0.03 library.We were not able to find out the exact memory consumption of BoomerAMG. Never-theless, the Linux system information indicates that it is similar to the memory usedfor PrecLocal.

2See the web site http://sourceforge.net/projects/netgen-mesher.3See the web site http://acts.nersc.gov/hypre.

http://sourceforge.net/projects/netgen-mesher

http://acts.nersc.gov/hypre


Table 2 Comparison of the runtime for the H-Cholesky decomposition and PCG iteration with equalmemory consumption for the geometry in Fig. 4(b)

size memory PrecLocal Prec0 savings

Chol. PCG steps Chol. PCG steps

6.6 × 104 0.07 GB 0.7 s 0.3 s 4 0.7 s 0.5 s 7 0.2 s (19 %)

1.2 × 105 0.14 GB 1.3 s 0.6 s 4 1.3 s 1.8 s 13 1.2 s (63 %)

3.9 × 105 0.60 GB 8.7 s 2.5 s 4 7.9 s 9.0 s 15 5.7 s (51 %)

5.2 × 105 0.74 GB 11.2 s 3.2 s 4 9.9 s 20.0 s 26 15.5 s (108 %)

7.5 × 105 1.40 GB 27.4 s 5.8 s 4 23.9 s 22.4 s 16 13.1 s (40 %)

2.6 × 106 6.87 GB 364.4 s 46.6 s 7 – – – –

Table 3 Comparision of the runtime for the PCG using H-Cholesky decomposition with locally preservedvectors and an algebraic multigrid method

size start up PrecLocal BoomerAMG savings

Chol. PCG steps start up PCG steps

2.0 × 105 12 s 31 s 14 s 29 8 s 62 s 4 13 s (23 %)

7.6 × 105 51 s 160 s 57 s 29 40 s 207 s 3 −21 s (−8 %)

1.3 × 106 101 s 345 s 121 s 32 80 s 395 s 3 −92 s (−16 %)

4.1 × 106 341 s 1152 s 548 s 45 320 s 1536 s 3 −185 s (−9 %)

Table 4 Comparison of the runtime for the PCG using H-Cholesky decomposition with locally preservedvectors and an algebraic multigrid method

size start up PrecLocal BoomerAMG savings

Chol. PCG steps start up PCG steps

6.6 × 104 1.0 s 0.7 s 0.3 s 4 0.6 s 53.8 s 36 52 s (2620 %)

1.2 × 105 2.9 s 1.3 s 0.6 s 4 1.5 s 73.7 s 21 70 s (1467 %)

3.8 × 105 14.4 s 8.7 s 2.5 s 4 8.5 s 751.3 s 40 734 s (2868 %)

5.2 × 105 19.9 s 11.2 s 3.2 s 4 12.6 s 868.8 s 32 847 s (2470 %)

7.5 × 105 33.8 s 27.4 s 5.8 s 4 23.5 s 1272.6 s 26 1229 s (1834 %)

2.6 × 106 170.0 s 364.4 s 46.6 s 7 101.4 s 11208.6 s 47 10729 s (1847 %)

In Table 3 and 4, we have added the time for the creation of the cluster and blockcluster tree in the H-Cholesky as start-up time. For the first geometry (see Fig. 4(a)),the overall time of both techniques are comparable. The method based on Boomer-AMG is slightly more efficient for most problem sizes. However, the difference isin the range of code optimization. For the second geometry (see Fig. 4(b)), a hugegain of computational time can be observed. The speedup factor is about 15 to 30compared to a usual algebraic multigrid method. The performance increase is dueto the challenging geometry (see Fig. 4(b)) and shows the robustness of H-matrixpreconditioners and in particular of PrecLocal.


The numerical experiments show that the preservation of extra vectors can bebeneficial in terms of runtime and memory. The accuracy of the approximation hasto be adapted hardly with increasing number of unknowns as opposed to the usual

H-matrix approach to create a spectrally equivalent preconditioner.

7 Conclusion

A new technique for preserving constraints AX = AX during H-matrix approxima-tion is presented. Since the preservation is done blockwise, it carries over to approxi-mate H-matrix operations such as the LU factorization. A particular set of blockwiseconstraints leads to the preservation of a significantly larger set of global constraintsdue to the structure of the hierarchical matrix while maintaining its logarithmic-linearcomplexity. This effect is exploited for the construction of preconditioners for sec-ond order elliptic boundary value problems. The blockwise preservation of a smallset of piecewise constant vectors leads to the global preservation of the linear space ofpiecewise constants, which lies in the vicinity of eigenvectors corresponding to smalleigenvalues, and thus guarantees spectral equivalence. Despite the additional con-straints satisfied by the H-matrix approximation, its complexity is reduced comparedto a usual H-matrix approximation with a similar preconditioning effect, because theright direction of approximation is enforced by the constraints and the approximationaccuracy in the other directions can be lowered significantly.

References

1. Barnes, J., Hut, P.: A hierarchical O(N logN) force calculation algorithm. Nature 324, 446–449(1986)

2. Bebendorf, M.: Hierarchical Matrices: a Means to Efficiently Solve Elliptic Boundary Value Prob-lems. Lecture Notes in Computational Science and Engineering (LNCSE), vol. 63. Springer, Berlin(2008). ISBN 978-3-540-77146-3

3. Bebendorf, M., Chen, Y.: Efficient solution of nonlinear elliptic problems using hierarchical matriceswith Broyden updates. Computing 81, 239–257 (2007)

4. Bebendorf, M., Fischer, T.: On the purely algebraic data-sparse approximation of the inverse and thetriangular factors of sparse matrices. Numer. Linear Algebra Appl. 18, 105–122 (2011)

5. Bebendorf, M., Ostrowski, J.: Parallel hierarchical matrix preconditioners for the curl-curl operator.J. Comput. Math. 27(5), 624–641 (2009). Special issue on Adaptive and Multilevel Methods for Elec-tromagnetics

6. Botta, E., Wubs, F.: Matrix Renumbering ILU: an effective algebraic multilevel ILU preconditionerfor sparse matrices. SIAM J. Matrix Anal. Appl. 20(4), 1007–1026 (1999)

7. Giebermann, K.: Multilevel approximation of boundary integral operators. Computing 67, 183–207(2001)

8. Grasedyck, L., Hackbusch, W.: Construction and arithmetics of H-matrices. Computing 70, 295–334(2003)

9. Greenbaum, A.: Iterative methods for solving linear systems. In: Frontiers in Applied Mathematics,vol. 17. SIAM, Philadelphia (1997)

10. Greengard, L.F., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)

11. Greengard, L.F., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation inthree dimensions. In: Acta Numerica, vol. 6, pp. 229–269. Cambridge University Press, Cambridge(1997)

12. Gustafsson, I.: A class of first order factorization methods. BIT Numer. Math. 18, 142–156 (1978)


13. Hackbusch, W.: Multi-Grid Methods and Applications. Springer, Berlin (1985)14. Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. Part I: Introduction to H-matrices.

Computing 62(2), 89–108 (1999)15. Hackbusch, W.: Hierarchische Matrizen. Springer, Berlin (2009)16. Hackbusch, W., Khoromskij, B.N.: A sparse H-matrix arithmetic. Part II: Application to multi-

dimensional problems. Computing 64(1), 21–47 (2000)17. Hackbusch, W., Nowak, Z.P.: On the fast matrix multiplication in the boundary element method by

panel clustering. Numer. Math. 54(4), 463–491 (1989)18. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl.

Bur. Stand. 49, 409–436 (1952)19. Mirsky, L.: Symmetric gauge functions and unitarily invariant norms. Q. J. Math. 11, 50–59 (1960)20. Notay, Y.: On the convergence rate of the conjugate gradient methods in presence of rounding errors.

Numer. Math. 65, 301–317 (1993)21. Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)22. Tyrtyshnikov, E.E.: Mosaic-skeleton approximations. Calcolo 33(1–2), 47–57 (1996). Toeplitz matri-

ces: structures, algorithms and applications (Cortona, 1996)23. Van der Ploeg, A., Botta, E., Wubs, F.: Nested grids ILU-decomposition (NGILU). J. Comput. Appl.

Math. 66, 515–526 (1996)24. Wagner, C.: Tangential frequency filtering decompositions for symmetric matrices. Numer. Math. 78,

119–142 (1996)25. Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford University Press, London (1988)26. Wittum, G.: Schnelle Löser für Große Gleichungssysteme. Teubner, Leipzig (1992)

hierarchical matrix approximation with blockwise constraints

Documents