10.1.1.63.1051

23

Upload: muhammad-enam-ul-haq

Post on 12-Apr-2015

5 views

Category:

Documents


0 download

DESCRIPTION

xcvx

TRANSCRIPT

Page 1: 10.1.1.63.1051

Incomplete LU Preconditioning for Large Scale Dense

Complex Linear Systems from Electromagnetic Wave

Scattering Problems �

Jeonghwa Lee�y � Jun Zhang�z � and Cai�Cheng Lu�x

�Laboratory for High Performance Scienti�c Computing and Computer Simulation�

Department of Computer Science� University of Kentucky�

Lexington� KY ���������� USA

� Department of Electrical and Computer Engineering�

University of Kentucky�

Lexington� KY ����������� USA

April �� ����

Abstract

We consider preconditioned iterative solution of the linear system Ax � b� where the co�e�cient matrix A is a large scale dense complex valued matrix arising from discretizing theintegral equation of electromagnetic scattering� For some scattering structures this matrix canbe poorly conditioned� The main purpose of this study is to evaluate the e�ciency of a classof incomplete LU �ILU� factorization preconditioners for solving this type of matrices� Wesolve the electromagnetic wave equations using the BiCG method with an ILU preconditionerin the context of a multilevel fast multipole algorithm �MLFMA�� The novelty of this work isthat the ILU preconditioner is constructed using the near part block diagonal submatrix gen�erated from the MLFMA� Experimental results show that the ILU preconditioner reduces thenumber of BiCG iterations substantially� compared to the block diagonal preconditioner� Thepreconditioned iteration scheme also maintains the computational complexity of the MLFMA�and consequently reduces the total CPU time�

Key words Krylov subspace methods� ILU preconditioning� multilevel fast multipole algorithm� electro�

magnetic wave equation

Mathematics Subject Classi�cation �F� � �R� � �F� � ��C�

�Technical Report No� ������� Department of Computer Science� University of Kentucky� Lexington� KY� USA������

yE�mail leejhengr�uky�edu� URL http��www�csr�uky�edu��leejh� This author�s research work was supportedin part by the U�S� National Science Foundation under grant CCR��� �����

zE�mail jzhangcs�uky�edu� URL http��www�cs�uky�edu��jzhang� This author�s research work was supportedin part by the U�S� National Science Foundation under grants CCR� ������ CCR� ������ CCR��� ����� andACI����� ��� in part by the U�S� Department of Energy under grant DE�FG�����ER�� ��� in part by the JapaneseResearch Organization for Information Science and Technology� and in part by the University of Kentucky ResearchCommittee�

xE�mail ccluengr�uky�edu� URL http��www�engr�uky�edu��cclu� This author�s research work was supportedin part by the U�S� National Science Foundation under grant ECS��� �� �� and in part by the U�S� O�ce of NavalResearch under grant N����������������

Page 2: 10.1.1.63.1051

� Introduction

The electromagnetic wave scattering by three dimensional arbitrarily shaped dielectric and con�

ducting objects can be obtained by �nding the solution of an integral equation� The solution

to electromagnetic wave interaction with material coated objects has applications in radar cross

section prediction for coated targets� printed circuit� and microstrip antenna analysis ���� ��� ��

This problem can be studied using integral equation solvers� For example� to calculate radar scat�

tering by a conducting target coated by dielectric material� the hybrid surface and volume integral

equations can be used ���� In this case� a surface integral equation is formed for the conducting

surfaces and a volume integral equation is formed for the dielectrics� The integral equations are

discretized into a matrix equation by the method of moments �MoM ���� With this procedure�

we obtain a linear system of the form

Ax � b� ��

where the coe�cient matrix A is a large scale� dense� and complex valued matrix for electrically

large targets�

The matrix equation �� can be solved by numerical matrix equation solvers� Typical matrix

solvers can be categorized into two classes� The �rst class is the direct solution methods� of

which the Gauss elimination method is representative� The second class is the iterative solution

methods� of which the Krylov subspace methods are considered to be the most e�ective ones

currently available ���� Biconjugate gradient �BiCG method is one of the many Krylov subspace

methods ����

The direct solution methods are very expensive in both memory space and CPU time for

large size problems� For the Gauss elimination method the �oating point operation count is on

the order of O�N� � where N is the number of unknowns �columns of the matrix A� In contrary�

the complexity of the BiCG type iterative methods is on the order of O�N� if the convergence

is achieved in a small number of iterations� The Krylov subspace methods such as BiCG require

the computation of some matrix�vector product operations at each iteration� which account for

the major computational cost of this class of methods� The fast multipole method �FMM speeds

up the matrix�vector product operations when it is used to solve the matrix equation iteratively�

The FMM approach reduces the computational complexity of a matrix�vector product from O�N�

to O�N��� ��� �� ��� With the multilevel fast multipole algorithm �MLFMA the computational

complexity is further reduced to O�N logN ���� � ��� ���

It should be noted that a matrix problem involvingN unknowns may be solved in CN iterNAx

�oating point operations� where C is a constant depending on the implementation of a particular

iterative method ��� N iter is the number of iterations needed to reach the prespeci�ed convergence

criterion� and NAx is the �oating point operations needed for each matrix�vector multiplication�

By using the MLFMA� the complexity of the matrix�vector multiplication operations are reduced

to O�N logN � For many realistic problems� N iter depends on both the iterative solver and the

target properties �shape and material � For example� a problem with an open�ended cavity needs

much more number of iterations than that with a solid conducting box of the same size� Since N iter

is a proportional factor in the CPU counter� to further reduce the total CPU time� it is necessary to

Page 3: 10.1.1.63.1051

reduce the number of iterations of the iterative solvers� Hence preconditioning techniques� which

may speed up the convergence rate of the Krylov subspace methods �accelerators � are needed in

this application�

There are various preconditioning techniques in existence� most of them are developed in the

context of iteratively solving large sparse matrices ���� Until recently� preconditioning techniques

for solving dense matrices are less extensively studied ��� compared to those for the sparse matrices�

In the electromagnetic scattering simulation �eld� the diagonal and block diagonal preconditioners

are considered by a few authors ���� ��� �� in the context of the MLFMA� The individual blocks �of

the block diagonal matrix are numerically available and are easy to invert� More recently developed

strategies are related to the sparsity pattern based incomplete LU factorization preconditioning

or sparse approximate inverse preconditioning ��� ��� Some of them are based on the pattern of

a sparsi�ed coe�cient matrix ��� �� Others are constructed using the geometric or topological

information of the underlying problems ��� ��� Most of these preconditioning techniques� such

as the ILU�� � rely on a �xed sparsity pattern� obtained from the sparsi�ed coe�cient matrix by

dropping small magnitude entries� The ILU�� preconditioner� which uses a static sparsity pattern

and is constructed from a sparsi�ed matrix� is shown to be ine�ective in some electromagnetic

scattering problems ���

In the MLFMA implementation� the global coe�cient matrix A is not explicitly available�

Thus the strategy of extracting a sparsity pattern by dropping small magnitude entries from the

dense coe�cient matrix to form a sparse matrix as a basis for constructing a preconditioner is not

feasible� We propose to use an incomplete lower�upper �ILU triangular factorization with a dual

dropping strategy ��� to construct a preconditioner from the near part matrix of the coe�cient

matrix in the MLFMA implementation� The near part matrix is naturally available in the MLFMA

implementation� By not using a static �prespeci�ed sparsity pattern� we hope to capture the most

important �large magnitude entries in the incomplete LU factorization� while not to consume a

large amount of memory� This is achieved by dynamically dropping small magnitude entries during

the computation and by only allowing a �xed number of nonzeros kept in each row of the L and

U factors�

In our experimental study� we use the BiCG method as the iterative solver� coupled with three

preconditioning strategies to solve a few study cases of representative electromagnetic scattering

problems� We compare mainly the preconditioning e�ect of di�erent strategies� The precondition�

ing strategies that we consider are �a no preconditioner �equivalent to using the identity matrix as

the preconditioner � �b the block diagonal preconditioner� and �c the incomplete LU triangular

factorization preconditioner� We show by numerical simulations that the iterative methods with

the incomplete LU �ILU preconditioner reduce the iteration number signi�cantly� compared to

the cases with the block diagonal preconditioner as well as without a preconditioner�

This paper is organized as follows� Section � gives a concise introduction to the hybrid integral

equation approach in electromagnetic scattering and the multilevel fast multipole algorithm� In

Section �� we outline the BiCG iterative method and discuss in detail the ILUT preconditioner�

Extensive numerical experiments with a few electromagnetic scattering problems are conducted

Page 4: 10.1.1.63.1051

in Section �� in which we also explain and analyze the data obtained in our experiments� Some

concluding remarks are given in Section ��

� Discretization of Integral Equation and Fast Multipole

Method

The hybrid integral equation approach combines the volume integral equation �VIE and the surface

integral equation �SIE to model the scattering and radiation by mixed dielectric and conducting

structures ���� ��� For example� when a radome is applied to an antenna� the combined system

consists of both dielectrics and conductors� Hence� the hybrid surface�volume integral equation

is ideal for this problem �� The volume integral equation is applied to material region and the

surface integral equation is enforced over the conducting surface� The integral equations can be

formally written as follows�

fLS�r� r� � JS�r

� � LV �r� r� � JV �r

� gtan � �Einctan�r � r � S�

�E � LS�r� r� � JS�r

� � LV �r� r� � JV �r

� � �Einc�r � r � V�

where Einc stands for the excitation �eld produced by an instant radar� the subscript �tan� stands

for taking the tangent component from the vector it applies to� and L�� �� � S� V � is an integral

operator that maps the source J� to electric �eld E�r and it is de�ned as

L��r� r� � J��r

� � i��

Z�

�I � k��b rr

�eikjr�r

�j����jr � r�j �

It should be pointed out that E is related to JV in the above integral equations� This results in a

very general model as all the volume and surface regions are modeled properly� The advantage of

this approach is that in the coated object scattering problems� the coating material can be inhomo�

geneous� and in the printed circuit and microstrip antenna simulation problems the substrate can

be of �nite size� The simplicity of the Green�s function in both the volume integral equation and

the surface integral equation has an important impact on the implementation of the fast solvers�

However� the additional cost here is the increase in the number of unknowns since the volume

that is occupied by the dielectric material is meshed� This results in larger memory requirement

and longer solution time in solving the corresponding matrix equation� But this de�ciency can be

overcome by applying fast integral equation solvers such as the multilevel fast multipole algorithm

�MLFMA ��

We follow the general steps of the method of moments to discretize the hybrid volume�

surface integral equations� The hybrid integral equations are converted into a matrix equation

that is formally written as

�ZSS ZSV

ZV S ZV V

��

�aS

aV

��

�US

UV

�� ��

where aS and aV stand for the vectors of the expansion coe�cients for the surface current and

the volume current� respectively �� ��� The coe�cient matrix arising from discretized hybrid

Page 5: 10.1.1.63.1051

integral equations is nonsymmetric� Once the matrix equation �� is solved by numerical matrix

equation solvers� the expansion coe�cients aS and aV can be used to calculate the scattered �eld

and radar cross section� In antenna analysis problems the coe�cients can be used to retrieve the

antenna�s input impedance and calculate the antenna�s radiation pattern� In the following� we use

A to denote the coe�cient matrix in Equation �� � and x � �aS � aV T for simplicity�

The basis function is an elementary current that has local support� To solve the matrix equa�

tion by an iterative method� matrix�vector multiplications are needed at each iteration� Physically�

a matrix�vector multiplication corresponds to one cycle of interactions between basis functions�

The basic idea of the FMM is to convert the interaction of element�to�element to the interaction of

group�to�group� Here a group includes the elements residing in a spatial box� The mathematical

foundation of the FMM is the addition theorem for the free�space scalar Green�s function ��� Using

the addition theorem� the matrix�vector product Ax can be written as

Ax � �AD �AN x � Vf�Vsx� ��

where Vf � �� and Vs are sparse matrices� In fact� the dense matrix A can be structurally divided

into three parts� AD � AN � and AF � Vf�Vs� AD is the block diagonal part of A� AN is the block

near�diagonal part of A� and AF is the far part of A� Here the terms �near� and �far� refer to the

distance between two groups of elements� Figure � illustrates the data structure of a partitioned

dense matrix A from electromagnetic scattering� The blank squares are AD� the biased squares

are AN � and AF is scattered in the rest of the area�

Figure �� An illustration of the data structure of a dense matrix A from electromagnetic scattering�

To implement the FMM� the spatial region of all the basis functions are subdivided into small

cubic boxes� Then the scattered �eld of di�erent scattering centers within a group are translated

into a single center called group center� This is the aggregation process� Hence� the number of

scattering centers is reduced� If two groups are separated far enough� the interactions between them

are carried out through translation� After the interactions among the groups are done� the �eld

received by a group is redistributed to the group members� This is the disaggregation process ����

In the FMM� the matrix elements are calculated di�erently depending on the distance between a

testing function and a basis function� For near�neighbor matrix elements �those in AD and AN �

the calculation remains the same as in the MoM procedure� However� those elements in AF are

not explicitly computed and stored� Hence they are not numerically available in the FMM� It can

Page 6: 10.1.1.63.1051

be shown that with optimum grouping� the number of nonzero elements in the sparse matrices in

Equation �� are all on the order of N���� and hence the operation count to perform Ax is O�N��� �

If the above process is implemented in multilevel� then the total cost �in memory and CPU

time can be reduced further to be proportional to N logN for one matrix vector multiplication�

Due to this reduced computational complexity� the memory and CPU time savings can be orders

of magnitude compared with the direct dense matrix multiplication methods of O�N� if N is very

large ���� ��

� Iterative Methods and Preconditioners

The most e�ective iterative solvers currently available are the class of the Krylov subspace methods

���� The Biconjugate method �BiCG is one of them especially aimed at solving nonsymmetric

and unstructured linear systems ���� To speed up the convergence rate of the iteration processes�

Krylov subspace iterations are almost always combined with a suitable preconditioning technique�

Preconditioned Krylov subspace methods have become a popular choice of iterative solution tech�

niques for solving large sparse and dense matrices� Generally speaking� the performance of a Krylov

subspace solver will improve when combined with some suitable ILU preconditioners� Precondi�

tioners based on the incomplete LU factorization of the coe�cient matrix are usually inherently

sequential both in constructions and in applications�

There have been a few preconditioning techniques recently developed for solving dense linear

systems arising from discretized integral equations in electromagnetic applications and in other

applications ��� �� Among these� the diagonal and block diagonal preconditioners are easy to

implement and have been shown to be e�ective for certain type of integral equations arising from

electromagnetic scattering problems ���� ��� ��� The ILU�� preconditioner and preconditioners

based on various sparse approximate inverse techniques have recently been tested for solving similar

problems ��� ��� The ILU�� preconditioner constructed from a sparsi�ed coe�cient matrix has

been shown to be ine�ective for solving certain class of electromagnetic scattering problems ��� On

the other hand� some sparse approximate inverse techniques need access to the full coe�cient matrix

�to construct a sparsi�ed matrix � which is not available in the FMM� Other sparse approximate

inverse techniques have been exploiting a �geometric or topological concept similar to our near

part matrix strategies ��� In many cases� the sparse approximate inverse techniques are far more

expensive to construct than the incomplete LU type preconditioning techniques� This is the case

when the inherent parallelism of the sparse approximate inverse preconditioning strategies is not

exploited�

We propose to solve the dense linear system �� by using the BiCG method with an ILU

preconditioner� with a block diagonal preconditioner� and without a preconditioner� The BiCG

algorithm ��� is a projection process onto a Krylov subspace

Km � spanfv�� Av�� � � � � Am��v�g

Page 7: 10.1.1.63.1051

and is orthogonal to another Krylov subspace

Lm � spanfw�� ATw�� � � � � �A

T m��w�g

taking v� � r��jjr�jj�� where r� is the initial residual vector� Here AT is the complex conjugate

transpose of the matrix A� The vector w� can be arbitrarily chosen� provided �v�� w� �� �� but it

is often chosen to be equal to v�� If there is a dual system ATx� � b� to solve with AT � then w�

is obtained by scaling the initial residual b� �ATx���

For completeness� we give the algorithm of the BiCG method from ����

Algorithm ��� The BiCG Method�

�� Compute r� �� b�Ax�� Choose r�� such that �r�� r

�� �� ��

�� Set p� �� r�� p�� �� r��

�� For j � �� �� � � �� until convergence Do�

�� �j �� �rj � r�j ��Apj � p

�j

�� xj�� �� xj � �jpj

� rj�� �� rj � �jApj

� r�j�� �� r�j � �jAT p�j

�� j �� �rj��� r�j�� ��rj � r

�j

�� pj�� �� rj�� � jpj

� � p�j�� �� r�j�� � jp�j

��� EndDo

In Algorithm ���� the initial guess is x�� The stopping creterion is usually set to measure the

reduction rate or the magnitude of a certain residual norm� A limit on the maximum number of

iterations is also used in almost all implementations of iterative methods� Implicitly� the algorithm

solves not only the original system Ax � b but also a dual linear sytem ATx� � b� with AT ����

At each iteration� BiCG needs two matrix�vector products with both A and AT �

It is known that an iterative method such as BiCG may not converge �or converges very

slowly for solving large scale ill�conditioned dense matrices� In practical applications� precondi�

tioning techniques should be used with the BiCG method to reduce the number of iterations�

We will evaluate two preconditoners� the ILU preconitioner with a dual dropping strategy �a

�ll�in and a drop tolerance ��� and the block diagonal preconditioner� using the trivial identity

matrix preconditioner as comparison� The block diagonal preconditioner for the multilevel FMM

is implemented in ���� Due to space limit� we will not be able to evaluate other type of precon�

ditioning technqiues� The e�ciency and examples of the ILU�� and some sparse approximate

inverse preconditioners in the FMM implementation can be found in ��� ���

By the block diagonal preconditioning� we construct a preconditioner A��D from the block

diagonal matrix AD � and then apply the preconditioner A��D to the linear system �� � That is�

A��D Ax � A��D b�

�A��D T AT x � �A��D T b�

Page 8: 10.1.1.63.1051

Since AD is a block diagonal matrix� each individual block can be inverted independently�

By the ILU preconditioning� we construct a preconditioner M�� from the incomplete LU

factorization of the near part matrix �AD � AN � and then apply the preconditioner M�� to the

linear system �� � That is�

M��Ax � M�� b�

�M�� T AT x � �M�� T b�

It is well known that a simple preconditioner� such as the ILU�� � which uses the sparsity pattern

of A� may not be robust for solving some large scale ill�conditioned problems� Di�erent high

accuracy preconditioners have been proposed for solving sparse matrices ���� Many of them are

constructed using either an enlarged sparsity pattern or using some threshold based drop tolerance

to allow more �ll�in in the construction phase� Some high accuracy ILU preconditioning strategies

may not be able to control the amount of memory space needed a priori� Since memory cost is

of vital importance in large scale electromagnetic scattering simulations� we resort to a particular

incomplete LU factorization strategy which uses a dual dropping strategy �ILUT to control both

the computational cost and the memory cost�

The following algorithm of the ILUT factorization on a matrix A � �aij N�N is due to Saad

and is extracted from ���� ���

Algorithm ��� The ILUT Algorithm�

�� For i � �� � � � � N Do�

�� w �� ai�

�� For k � �� � � � � i� � and when wk �� � Do�

�� wk �� wk�akk

�� Apply a dropping rule to wk

� If wk �� � then

� w �� w � wk � uk�

�� EndIf

�� EndDo

� � Apply a dropping rule to row w

��� li�j �� wj for j � �� � � � � i� �

��� ui�j �� wj for j � i� � � � � N

��� w �� �

��� EndDo

In Algorithm ���� wk is a work array and ai� denotes the ith �current row of the matrix A�

The dual dropping strategy of the ILUT is implemented using the two parameters and p� The

small entries of wk �with respect to and relative to a certain norm of the current row are dropped

in line �� In line ��� small entries are dropped again� A search algorithm is used to �nd out the

largest p entries in magnitude� These p largest entries are kept� the others are dropped again� The

total storage is bounded by �pN � where p is the given �ll�in parameter and N is the number of

Page 9: 10.1.1.63.1051

unknowns� Here controls the computational cost� and p controls the memory cost� By judiciously

choosing the two parameters and p� we may be able to construct an ILU preconditioner that is

e�ective and does not use much memory space�

In large scale electromagnetic scattering problems� the memory space is of vital important�

Preconditioners using a lot of memory space are less useful� In the MLFMA implementation� the

global matrix is not explicitly available� thus preconditioners need access to the global matrix are

not suitable� By using the ILUT� we anticipate to use not very much more than the amount of

memory space used to to store the near part matrix �AD � AN � Thus the memory and �oating

point operation complexity of the MLFMA is maintained�

When incomplete LU factorization preconditioners are used in combination with an iterative

process such as a Krylov subspace method� they tend to dictate the convergence behavior more

than the iterative process itself ���� In general� high accuracy ILU type preconditioners with a

large amount of �ll�in are more robust but cost more in construction than their lower accuracy

counterparts do ���� Since an ILUT preconditioner may be reused several times by solving the

same matrices with several di�erent right hand sides in electromagnetic scattering simulations� it

is possible to amortize the cost of constructing a high accuracy preconditioner� However� it is

important to note that the memory space is usually a bottleneck in large scale electromagnetic

scattering simulations�

The following algorithm shows the preconditioned BiCG method with a preconditioner M �

Algorithm ��� The Preconditioned BiCG Method�

�� Compute r� �� b�Ax�� Choose r�� such that �r�� r

�� �� ��

�� Compute z� ��M��r�� z�� �� �M�� T r�� �

�� Set� p� �� z�� p�� �� z��

�� For j � �� �� � � �� until convergence Do�

�� �j �� �rj � z�j ��Apj � p

�j

� xj�� �� xj � �jpj

� rj�� �� rj � �jApj

�� r�j�� �� r�j � �jAT p�j

�� zj�� �� M��rj��

� � z�j�� �� �M�� T r�j����� j �� �rj��� z

�j�� ��rj � z

�j

��� pj�� �� rj�� � jpj

��� p�j�� �� z�j�� � jp�j

��� EndDo

Few theoretical results are known about the convergence of BiCG ��� In practice� it is

observed that the convergence behavior may be quite irregular� and the method may even break

down� The breakdown situation due to the possible event that �rj � z�j � � can be circumvented by

the so�called look�ahead strategy ���� The other breakdown situation� �Apj � p�j � �� occurs when

the LU decomposition fails� and can be repaired by using another decomposition� Sometimes� the

Page 10: 10.1.1.63.1051

breakdown or near�breakdown situations can be satisfactorily avoided by a restart at the iteration

step immediately before the �near� breakdown step� Another possibility is to switch to a more

robust �but possibly more expensive method� like GMRES ��� ���

For symmetric positive de�nite systems the BiCG method delivers the same results as the

conjugate gradient �CG method� but at twice the cost per iteration� For nonsymmetric matrices

it has been shown that in phases of the process where there is signi�cant reduction of the norm of

the residual� the method is more or less comparable to full GMRES �in terms of the numbers of

iterations ����

� Numerical Experiments and Analysis

In this section� we present a number of numerical examples to demonstrate the e�ciency of the

ILU preconditioner with a dual dropping strategy �a �ll�in and a drop tolerance ���� compared

with the block diagonal preconditioner and without a preconditioner� in speeding up the BiCG

iterations� All cases are tested on one processor of an HP Superdome cluster at the University of

Kentucky� The processor has � GB local memory and runs at ��� MHz� The code is written in

Fortran �� programming language and is run in single precision�

To demonstrate the performance of our preconditioned BiCG solver� we calculate the radar

cross section �RCS of di�erent conducting geometries with and without coating� The geometries

considered include plates� spheres� antennas and antenna arrays� The mesh sizes for all the test

structures are about one tenth of a wavelength� The information describing the structures and the

generated matrices is given in Table �� The second column entitled �level� indicates the number of

levels in the multilevel fast multipole algorithm� In most cases� three or four levels are used� For

the largest case P�A� we have used seven levels� In Table � we also give the number of nonzeros

of the full matrix A� the block diagonal matrix AD � and the near part matrix �AD �AN �

Numerical data from each test case are shown in Tables � � ��� Corresponding convergence

history and certain computed solutions for some selected cases are plotted in Figures � � ��� There

are two stopping criteria implemented in the code� One is the error�bound �residual norm � and

the other is the limit of iteration numbers� The error�bound is to reduce the ��norm residual by

����� and the limit of the maximum number of iterations is set as �� ����

Here are the explanations of the notations shown in the data tables�

� prec � the preconditioner used with the iterative method

� NONE � no preconditioner

� BLOCK � the block diagonal preconditioner

� ILUT � the incomplete LU preconditioner with a dual dropping strategy

� � the threshold drop tolerance used in ILUT

� p � the �ll�in parameter used in ILUT

��

Page 11: 10.1.1.63.1051

Table �� Information about the matrices used in the experiments ��� is the wavelength in meters �

cases level unknowns matrix nonzeros target size and descriptionP�A � ��� A ������� ��� � ��������

AD ������ ��� � ����AD � AN ���� � Conducting plate

P�B � ����� A ��������� �� �� �� � ������ �� � ������ ��AD ������ ��� � ����

AD � AN ������� Dielectric plate over conducting plateS�A � ����� A ���������� � �� � ������ �� � ������ ��

AD ����� � ��� � �����AD � AN ��� ����� Conducting sphere �partial�

S�B � ����� A ���� ����� ������ �� � ��� �� �� � ��� ��AD ������ �� ��� � ����� Conducting sphere

AD � AN ������ �� with dielectric coating �partial�P�A � ��� A � ����� ����� �� � ������ �� � ������ ��

AD ������ ��� � ���� ��AD � AN ������� Microstrip antenna �no substrate�

P�B � ����� A �������� ����� �� � ������ �� ������� ��AD ������ ��� � ���� ��

AD � AN ������� Microstrip antennaS�A � ������ A ����������� � �� � ��� � � ��

AD ������� ��� � ��AD � AN ��� ����� Large conducting sphere

S�B � ������ A ���� �������� ��� �� � ��� �� � ��� ��AD ��������� ��� � ��

AD � AN ���������� Large sphere with dielectric coatingP�A � ������� A �������������� ������ �� � ������ ��

AD ��������� ��� � �������AD � AN ��������� Antenna array

� ratiof � the sparsity ratio of the number of nonzeros of the L and U factors with respect to

the number of nonzeros of the full matrix A

� ration � the sparsity ratio of the number of nonzeros of the L and U factors with respect to

the number of nonzeros of the near part matrix �AD �AN

� LUcpu � the CPU time for performing the incomplete LU factorization

� itnum � the number of the preconditioned BiCG iterations

� itcpu � the CPU time for the iteration phase

� sol � iterative solution quality� compared with the exact solution from the Gauss elimination

�LUD solver� or with the solution from the converged cases when the Gauss elimination

solver cannot be used due to the memory limit

� same � converged within the given stopping criteria

� di� � did not converge within the given stopping criteria

We summarize our experimental results as follows�

�� For the class of problems we tested� the block diagonal preconditioner improves the BiCG

convergence in only � �P�B� P�A� and P�B out of the � cases� In the remaining other

��

Page 12: 10.1.1.63.1051

cases� the block diagonal preconditioner actually hampers the BiCG convergence� It is our

conclusion that the block diagonal preconditioner is not robust for solving this class of dense

matrices arising from combined hybrid integral formulation of the electromagnetic scattering

problems�

�� For almost all problems� the ILUT preconditioner improves the BiCG convergence signi��

cantly and reduces the total CPU time substantially� Our tests show that the ILUT precon�

ditioner is robust for solving this class of dense matrices� We also see that di�erent choices of

the ILUT parameters and p may a�ect the convergence rate of the preconditioned iterative

solver�

�� It is worth pointing out that the memory cost of the ILUT preconditioner is reasonable� In

most cases� the memory cost of the ILUT preconditioner is less than the cost of storing the

near part matrix �AD �AN � see the data in the column entitled ration�

�� For most problems� the computation �construction cost �LUcpu of the ILUT preconditioner

is low� compared to the preconditioned iterative solution cost �itcpu � The exceptions are

found in the S�B and S�B cases� These two cases are related to modeling conducting spheres

with dielectric coating�

�� For the converged cases� the approximate solutions computed from the preconditioned BiCG

solver are comparable to those computed from the direct solver�

If a test case converges �ne� we show the computed solution graphs from each methods �the

left hand side graphs of the Figures � � � � The solution graphs indicate the scattering pattern of

the objects�

We compare the computed solution from each preconditioning strategies with the exact so�

lution �LUD of the linear system �� �the right hand side graphs of the Figures � � � � If the

number of unknowns are over ������ then we did not try to get the exact solution from the LUD

decomposition because it is too time consuming�

Figures � and � show that the solutions of these cases computed with all three preconditioning

strategies are similar and approximate to the exact solution� Figures � and � show that the solutions

of the BiCG method without a preconditioner and with the ILUT preconditioner are similar and

both solutions approximate to the exact solution� The block diagonal preconditioner does not

converge within the given stopping criteria�

In the case of P�A� the number of the ILUT iterations and the CPU time is relatively small

compared to those of the iterative solver without a preconditioner and with the block diagonal

preconditioner� The sparsity ration of the ILUT preconditioner for most cases is less than � or

very close to � �see Tables � � �� and Tables � and �� except in the case of S�A �see Table � �

Therefore� the ILUT preconditioner with a suitable drop tolerance parameter and a suitable �ll�in

parameter does not need a large amount of memory space�

In all cases� the convergence behavior graphs �the right hand side graphs of the Figures �

� � and Figures � �� show that BiCG with the ILUT preconditioner converges very fast with

��

Page 13: 10.1.1.63.1051

Table �� Numerical data of the P�A case �unknowns � ��

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ����� ���� sameBLOCK ������� ����� ������ sameILUT ���� �� ���� ������ �� ���� � ���� same

�� ���� ������� ������� �� ���� same���� �� ���� ������� �� ��� � ���� same

�� ���� ������� ������� � ���� same���� �� ���� ������� �� ���� �� ���� same

�� ���� ������ ������� �� ���� same���� �� ���� ������� �� ���� �� ���� same

�� ���� ������ ������� �� ���� same

0 10 20 30 40 50 60 70 80 90 100−50

−40

−30

−20

−10

0

10

RC

S(d

Bsw

)

Angle of observation

LUD non−precondblock diag ILU

0 200 400 600 800 1000 120010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the P�A case�ILUT parameters are � ���� and p � ���

a small number of iterations compared to with the block diagonal preconditioner and without a

preconditioner� High accuracy ILUT preconditioner with more �ll�in is shown to be more robust

and more e�cient�

Accuracy and stability of the ILUT factorization� If the sparsity ratio ration of the ILUT

preconditioner is very small the preconditioned BiCG iteration may not converge within the given

stopping criteria or diverge� There are two possible reasons for the ILU type preconditioners

to perform poorly for solving certain type of matrices� Of course� if the ILU preconditioner is

not accurate� such as using large values and small p values in the ILUT preconditioner� the

preconditioning matrix M is a poor approximation to the near part matrix �AD � AN �or to

the coe�cient matrix A � It is expected that the preconditioning e�ect of these poor quality ILU

preconditioners is not good� Such poor quality ILU preconditioners due to insu�cient amount of

�ll�in can usually be improved by allowing more �ll�in� i�e�� by choosing a smaller value of and a

larger value of p in the ILUT implementation�

Poor preconditioning quality can also come from an unstable factorization� which is usually

associated with small pivots encountered in the ILU factorizations ��� ��� Small pivots usually pro�

duce the L and U factors with large magnitude entries� which make the relatively small magnitude

��

Page 14: 10.1.1.63.1051

Table �� Numerical data of the P�B case �unknowns � �� ��

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ����� ������ sameBLOCK ������� ����� ������ sameILUT ���� �� ���� ������ ������� �� ���� same

�� ���� ������� ������� �� ���� same�� ���� ���� � ������� �� ��� � same

���� �� ���� ������� ����� � �� ���� same�� ���� ������� ������� �� ���� same�� ���� ���� �� ������� ��� ����� same

���� �� ���� ������� ������� �� ���� same�� ���� ����� � ������� � ���� same�� ���� ���� � ������� ��� ��� � same

���� �� ���� ������ ������� �� ���� same�� ���� ������� ���� �� �� ���� same�� ���� ������� ���� �� ��� ���� same

0 10 20 30 40 50 60 70 80 90 100−40

−30

−20

−10

0

10

20

RC

S(d

Bsw

)

Angle of observation

LUD non−precondblock diag ILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the P�B case�ILUT parameters are � ���� and p � ���

Table �� Numerical data of the S�A case �unknowns � �� ��� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE �� ������ sameBLOCK ������� ����� ������� di�ILUT ���� ��� ����� ������� ������� ��� ����� same

��� ����� ������� ������� � � �� �� same���� ��� ����� ������� ������� ��� ����� same

��� ����� ������� ������� � � �� � � same���� ��� ����� ������� ������� � ��� � same

��� ����� ������� ������� ��� ������ same���� ��� ����� ������� ������� ��� ����� same

��� ����� ������� ����� � ��� ������ same

��

Page 15: 10.1.1.63.1051

0 10 20 30 40 50 60 70 80 90 100−45

−40

−35

−30

−25

−20

−15

−10R

CS

(dB

sw)

Angle of observation

LUD non−precondILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the S�A case�ILUT parameters are � ���� and p � ����

Table �� Numerical data of the S�B case �unknowns � �� �� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ����� ������� sameBLOCK ������� ����� ������� di�ILUT ���� ��� � ���� ������� ��� ��� �� ������ same

��� ������ ������� ������� �� ������ same���� ��� � ��� ������� ��� � � �� ������ same

��� ������ ������ ����� � �� ���� � same���� ��� � ���� ������� ��� ��� �� ������ same

��� ������ ������� ������� �� ������ same���� ��� � ���� ������� ��� ��� � ������ same

��� ���� � ������� ������� �� �� ��� same

0 10 20 30 40 50 60 70 80 90 100−45

−40

−35

−30

−25

−20

−15

−10

−5

RC

S(d

Bsw

)

Angle of observation

LUD non−precondILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the S�B case�ILUT parameters are � ���� and p � ����

��

Page 16: 10.1.1.63.1051

Table � Numerical data of the P�A case �unknowns � �� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ����� �� sameBLOCK ������� ��� �� � sameILUT ���� �� ���� ������� ������� � ��� same

�� ���� ������� ���� �� �� ���� same�� ���� ������� ������� � ���� same

���� �� ���� ������� ������� � ��� same�� ���� ������� ���� �� �� ���� same�� ���� ������� ������� � ���� same

���� �� ���� ������� ������� � ���� same�� ���� ������� ���� �� �� ���� same�� ��� ������� ������� � ���� same

���� �� ���� ������� ������� �� ��� same�� ���� ������ ������� �� ���� same�� ���� ������� ����� � � ���� same

0 100 200 300 400 500 600 700 800 900 100010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure � Convergence history for solving the P�A case� ILUT parameters are � ���� and p ����

Table �� Numerical data of the P�B case �unknowns � �� �� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ����� ������ di�BLOCK ������� � ����� sameILUT ���� ��� ���� ������� ������� �� ���� same

� ���� ������ ������� �� ���� same�� ���� ��� ��� ����� � � ���� same

���� ��� ���� ������� ������� �� ���� same � ��� ����� � ������� �� ���� same�� ���� ��� � � ������� � ���� same

���� ��� ��� ������ ���� �� �� ���� same � ��� ������� ���� �� �� ���� same�� ���� ��� ��� ����� � � ���� same

���� ��� ���� ������� ���� �� �� ���� same � ���� ������� ���� �� �� ���� same�� ���� ��� �� ������� �� ��� same

Page 17: 10.1.1.63.1051

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

102

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Convergence history for solving the P�B case� ILUT parameters are � ���� and p �����

Table �� Numerical data of the S�A case �unknowns � ��� ��� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ��� ������ sameBLOCK ������� ����� ������� di�ILUT ���� ��� ����� ������� ������� ��� ������ same

��� ����� ������� ������� � � ������ same��� ����� ������� ������� ��� ������� same

���� ��� ����� ������� ������� ��� ������ same��� ����� ������� ������� � � ������ same

���� ��� ���� ������� ������� ��� ������ same��� ����� ������� ������� ��� �� ��� same

���� ��� ����� ������� ������� ��� �� ��� same��� ����� ������� ����� � ��� ������ same

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Convergence history for solving the S�A case� ILUT parameters are � ���� and p �����

��

Page 18: 10.1.1.63.1051

Table �� Numerical data of the S�B case �unknowns � ��� ��� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE � � ������� sameBLOCK ������� ����� �������� di�ILUT ���� ��� ������� ������� ����� � ��� ������� same

��� ������� ������� ���� �� ��� � ����� same��� ������� ������� ������� ����� ������ same

���� ��� ������� ������� ������� ��� ������ same��� ������ ������� ���� �� ��� � ����� same

���� ��� ������� ������� ������� ��� ����� � same��� ������� ������� ���� � � � � ��� � same

���� ��� ������� ������� ������� ��� ������ same��� ������ ������� ������ ��� � � ��� same

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Convergence history for solving the S�B case� ILUT parameters are � ���� and p �����

Table ��� Numerical data of the P�A case �unknowns � ���� ��� �

prec � p LUcpu ratiof ration itnum itcpu sol

NONE ��� ������ sameBLOCK ������� ��� ������� sameILUT ���� �� ������ ������� ������� �� �� ��� same

�� ������ ������� ������� �� ������ same�� ������ ������� ������� �� ������ same�� ������ ������� ��� ��� ��� ������� same

���� �� ������ ������� ����� �� ������ same�� ���� � ������� ������� �� ������ same�� ������ ������� ��� ��� ��� ������� same

���� �� ������ ������� ������� �� ���� � same�� ������ ������� ������� �� ������ same�� ������ ������� ��� ��� ��� � ����� same

���� �� ������ ������� ������� �� ������ same�� ������ ������� ������ �� ���� � same�� ������ ������� ��� ��� ��� ������� same

��

Page 19: 10.1.1.63.1051

0 100 200 300 400 500 60010

−3

10−2

10−1

100

101

Number of iterations

Res

idua

l nor

m

non block diagILU

Figure ��� Convergence history for solving the P�A case� ILUT parameters are � ���� and p ����

entries of the matrix A insigni�cant in �nite precision computations� In sparse matrix computa�

tions� such problems can happen when performing ILU factorization on inde�nite matrices ��� ���

In constructing an ILUT preconditioner for a dense matrix� we use single precision arithmetic

due to the large amount of �oating point operations and storage requirement� If the condition

number of the matrix M�� � �LU �� is large� say� on the order of ��� computations in such

a precision with the matrix M�� is likely to be inaccurate� The preconditioned matrix M��A

might be worse conditioned than the original matrix A� In this situation� the preconditioned BiCG

method may take more iterations to converge or may not converge at all� On the other hand� due

to the implementation in which we stop the iteration when the �preconditioned residual norm is

reduced by ����� it is possible that the initial residual norm is huge� Convergence might appear

even if the actual residual norm is still large� if the initial residual norm is large� Such a convergence

is called a false convergence�

To distinguish a false convergence from a true convergence� or to determine if a preconditioner

might be e�ective before a preconditioned iteration procedure is carried out� it is informative to

estimate the condition number of the �LU �� matrix� Chow and Saad proposes to estimate this

number by calculating jj�LU ��ejj� where e is the vector of all ones� This statistics quantity

is called condest in ��� We note that this statistics is also a lower bound for jj�LU ��jj� and

indicates a relation between the unstable triangular solutions and the poorly conditioned L and

U factors ��� ��� The computation of the quantity k�LU ��ek� is just one forward solution and

back substitution with the L and U factors�

We calculate the condest numbers for the P�A and S�A cases� in order to investigate what

might happen with the ILUT factorization in solving a complex valued dense matrix� We �nd

that the drop tolerance value �in the range of our studies does not a�ect very much the value

of condest� Condest value mostly depends on the �ll�in parameter p� As the �ll�in parameter p

decreases and the drop tolerance parameter increases� the value of condest increases a lot in

certain situations� We also �nd that the values of condest are related to the solution patterns� If

condest is very large� the preconditioned BiCG method does not converge within the given stopping

��

Page 20: 10.1.1.63.1051

Table ��� Condest values of the ILUT preconditioner for solving the P�A case �unknowns � �� �

� p condest itnum convergence sol

���� �� ���� ��� � yes same�� ������� �� yes same�� � � ���� �� yes di��� �������E�� � yes di��� �������E�� � diverge �

���� �� �������� �� yes same�� �������� �� yes same�� � ������ �� yes di��� ���� ��E�� � yes di�

Table ��� Condest values of the ILUT preconditioning for solving the S�A case �unknowns ��� ��� �

� p condest itnum convergence sol

���� ��� �� ���� ��� yes same��� �������� � � yes same��� �������E��� ����� no di�

���� ��� �� �� � ��� yes same��� �������� ��� yes same��� �������E��� ����� no di�

criteria or diverges and the computed solution is not good� as we see in Tables �� � ���

In Table ��� the ILUT with p � �� and � ���� converges in � iterations� but the computed

solution is not good� This is an example of false convergence� We can determine the quality of

this ILUT preconditioner without even starting the BiCG iteration process� The condest value of

��������� indicates that any single precision computation with �LU �� is likely to be inaccurate�

� Conclusions

In the implementation of the MLFMA� the ILU preconditioners based on a dual dropping strategy

�ILUT has been shown to reduce the number of BiCG iterations dramatically� compared to the

block diagonal preconditioner and without a preconditioner� Solving the large complex valued

dense linear system arising from electromagnetic scattering using the BiCG accelerator with an

ILU preconditioner maintains the computational complexity of the MLFMA� and consequently

reduces the total CPU time�

According to the results from our numerical experiments� we can see that the ILUT pre�

conditioner constructed from the near part matrix improves the computational e�ciency in the

sense of both memory and CPU time� The results show that the BiCG accelerator with the ILUT

preconditioner is robust for solving three dimensional model cases from electromagnetic scattering

simulations�

One advantage of our implementation is that we do not need access to the global matrix to

construct our preconditioner �ideal for the MLFMA in which only AD and AN are numerically

��

Page 21: 10.1.1.63.1051

available � This is in contrast to many existing preconditioning techniques either in the incomplete

LU factorization forms or in the sparse approximate inverse forms� in which a sparsi�ed global

matrix may be needed to construct a preconditioner ��� ��

We also discussed the situations that inaccurate or unstable ILUT preconditioners may be

computed� We proposed to use a simple condest value to predict the quality of a constructed

ILUT preconditioner before the preconditioned BiCG iteration is started� Our experimental results

indicate that this strategy is practically useful in predicting false convergence and divergence of

the preconditioned iterative solvers�

References

�� G� All�eon� M� Benzi� and L� Giraud� Sparse approximate inverse preconditioning for dense

linear systems arising in computational electromagnetics� Numer� Algorithms� ������� �����

�� R� Barrett� M� Berry� T� F� Chan� J� Demmel� J� Donato� J� Dongarra� V� Eijkhout� R� Pozo�

C� Romine� and H� van der Vorst� Templates for the Solution of Linear Systems� Building

Blocks for Iterative Methods� �nd Edition� SIAM� Philadelphia� �����

�� B� Carpentieri� I� S� Du�� and L� Giraud� Experiments with sparse preconditioning of dense

problems from electromagnetic applications� Technical Report TR PA �� ��� CERFACS�

Toulouse� France� �����

�� B� Carpentieri� I� S� Du�� and M� Magolu monga Made� Sparse symmetric preconditioners

for dense linear systems in electromagnetism� Technical Report TR PA �� ��� CERFACS�

Toulouse� France� �����

�� K� Chen� On a class of preconditioning methods for the dense linear systems from boundary

elements� SIAM J� Sci� Comput�� ���� ������� �����

� W� C� Chew� J� M� Jin� E� Midielssen� and J� M� Song� Fast and E�cient Algorithms in

Computational Electromagnetics� Artech House� Boston� �����

�� E� Chow and Y� Saad� Experimental study of ILU preconditioners for inde�nite matrices� J�

Comput� Appl� Math�� ���������� �����

�� R� Coifman� V� Rokhlin� and S� Wandzura� The fast multipole method for the wave equation�

a pedestrian prescription� IEEE Antennas Propagat� Mag�� ���� ������ �����

�� E� Darve� The fast multipole method� numerical implementation� J� Comput� Phys�� �������

���� �����

��� R� Freund and N� Nachtigal� QMR� A quasi�minimal residual method for non�Hermitian linear

systems� Numer� Math�� ���������� �����

��

Page 22: 10.1.1.63.1051

��� B� M� Kolundzija� Electromagnetic modeling of composite metallic and dielectric structures�

IEEE Trans� Micro� Theory Tech�� ���� ����������� �����

��� C� Lanczos� Solution of systems of linear equations by minimized iterations� J� Research

National Bureau Standards� ��������� �����

��� C� C� Lu and W� C� Chew� A multilevel algorithm for solving a boundary integral equation

of wave scattering� IEEE Trans� Micro� Opt� Tech� Lett�� ���� ������� �����

��� C� C� Lu� and W� C� Chew� A coupled surface�volume integral equation approach for the

calculation of electromagnetic scattering from composite metallic and material targets� IEEE

Trans� Antennas Propagat�� ����� �������� �����

��� B� N� Parlett� D� R� Taylor� and Z� A� Liu� A look�ahead Lanczos algorithm for unsymmetric

matrices� Math� Comput�� ����������� �����

�� J� Rahola� Experiments on iterative methods and the fast multipole method in electromagnetic

scattering calculations� Technical Report TR PA �� ��� CERFACS� Toulouse� France� �����

��� S� M� Rao� D� R� Wilton� and A� W� Glisson� Electromagnetic scattering by surface of arbitrary

shape� IEEE Trans� Antenna and Propagat�� AP����� ��������� �����

��� V� Rokhlin� Rapid solution of integral equations of scattering theory in two dimensions� J�

Comput� Phys�� ��� ��������� �����

��� Y� Saad� ILUT� a dual threshold incomplete LU factorization� Numer� Linear Algebra Appl��

��� ��������� �����

��� Y� Saad� Iterative Methods for Sparse Linear Systems� PWS Publishing Company� Boston�

����

��� K� Sertel and J� L� Volakis� Incomplete LU preconditioner for FMM implementation� Micro�

Opt� Tech� Lett�� ��� ������� �����

��� T� K� Shark� S� M� Rao� and A� R� Djordievic� Electromagnetic scattering and radiation from

�nite microstrip structures� IEEE Trans� Micro� Opt� Tech�� ����� ���������� �����

��� J� M� Song and W� C� Chew� Multilevel fast multipole algorithm for solving combined �eld

integral equation of electromagnetic scattering� Micro� Opt� Tech� Lett�� ���� ������� �����

��� J� M� Song� C� C� Lu� and W� C� Chew� Multilevel fast multipole algorithm for electromagnetic

scattering by large complex objects� IEEE Trans� Antennas Propagat�� AP������ �����������

�����

��� T� Vaupel and V� Hansen� Electrodynamic analysis of combined microstrip and copla�

nar slotline structure with ��D components based on a surface volume integral equation ap�

proach� IEEE Trans� Micro� Theory Tech�� ���� ����������� �����

��

Page 23: 10.1.1.63.1051

�� C� F� Wang and J� M� Jin� A fast full�wave analysis of scattering and radiation from large �nite

arrays of microstrip antennas� IEEE Trans� Antennas Propagat�� ���� ���������� �����

��� J� Zhang� Sparse approximate inverse and multilevel block ILU preconditioning techniques

for general sparse matrices� Appl� Numer� Math�� ������� �����

��� J� Zhang� A multilevel dual reordering strategy for robust incomplete LU factorization of

inde�nite matrices� SIAM J� Matrix Anal� Appl�� ���� ��������� �����

��