10.1.1.63.1051

Incomplete LU Preconditioning for Large Scale Dense

Complex Linear Systems from Electromagnetic Wave

Scattering Problems �

Jeonghwa Lee�y � Jun Zhang�z � and Cai�Cheng Lu�x

�Laboratory for High Performance Scienti�c Computing and Computer Simulation�

Department of Computer Science� University of Kentucky�

Lexington� KY �� USA

� Department of Electrical and Computer Engineering�

University of Kentucky�

Lexington� KY �� USA

April ��

Abstract

We consider preconditioned iterative solution of the linear system Ax � b� where the co�e�cient matrix A is a large scale dense complex valued matrix arising from discretizing theintegral equation of electromagnetic scattering� For some scattering structures this matrix canbe poorly conditioned� The main purpose of this study is to evaluate the e�ciency of a classof incomplete LU �ILU� factorization preconditioners for solving this type of matrices� Wesolve the electromagnetic wave equations using the BiCG method with an ILU preconditionerin the context of a multilevel fast multipole algorithm �MLFMA�� The novelty of this work isthat the ILU preconditioner is constructed using the near part block diagonal submatrix gen�erated from the MLFMA� Experimental results show that the ILU preconditioner reduces thenumber of BiCG iterations substantially� compared to the block diagonal preconditioner� Thepreconditioned iteration scheme also maintains the computational complexity of the MLFMA�and consequently reduces the total CPU time�

Key words Krylov subspace methods� ILU preconditioning� multilevel fast multipole algorithm� electro�

magnetic wave equation

Mathematics Subject Classi�cation �F� � �R� � �F� � ��C�

�Technical Report No� �� Department of Computer Science� University of Kentucky� Lexington� KY� USA��

yE�mail leejhengr�uky�edu� URL http��www�csr�uky�edu��leejh� This author�s research work was supportedin part by the U�S� National Science Foundation under grant CCR��

zE�mail jzhangcs�uky�edu� URL http��www�cs�uky�edu��jzhang� This author�s research work was supportedin part by the U�S� National Science Foundation under grants CCR� �� CCR� �� CCR�� andACI�� in part by the U�S� Department of Energy under grant DE�FG��ER�� in part by the JapaneseResearch Organization for Information Science and Technology� and in part by the University of Kentucky ResearchCommittee�

xE�mail ccluengr�uky�edu� URL http��www�engr�uky�edu��cclu� This author�s research work was supportedin part by the U�S� National Science Foundation under grant ECS�� and in part by the U�S� O�ce of NavalResearch under grant N��

�

� Introduction

The electromagnetic wave scattering by three dimensional arbitrarily shaped dielectric and con�

ducting objects can be obtained by �nding the solution of an integral equation� The solution

to electromagnetic wave interaction with material coated objects has applications in radar cross

section prediction for coated targets� printed circuit� and microstrip antenna analysis ��

This problem can be studied using integral equation solvers� For example� to calculate radar scat�

tering by a conducting target coated by dielectric material� the hybrid surface and volume integral

equations can be used �� In this case� a surface integral equation is formed for the conducting

surfaces and a volume integral equation is formed for the dielectrics� The integral equations are

discretized into a matrix equation by the method of moments �MoM �� With this procedure�

we obtain a linear system of the form

Ax � b� ��

where the coe�cient matrix A is a large scale� dense� and complex valued matrix for electrically

large targets�

The matrix equation �� can be solved by numerical matrix equation solvers� Typical matrix

solvers can be categorized into two classes� The �rst class is the direct solution methods� of

which the Gauss elimination method is representative� The second class is the iterative solution

methods� of which the Krylov subspace methods are considered to be the most e�ective ones

currently available �� Biconjugate gradient �BiCG method is one of the many Krylov subspace

methods ��

The direct solution methods are very expensive in both memory space and CPU time for

large size problems� For the Gauss elimination method the �oating point operation count is on

the order of O�N� � where N is the number of unknowns �columns of the matrix A� In contrary�

the complexity of the BiCG type iterative methods is on the order of O�N� if the convergence

is achieved in a small number of iterations� The Krylov subspace methods such as BiCG require

the computation of some matrix�vector product operations at each iteration� which account for

the major computational cost of this class of methods� The fast multipole method �FMM speeds

up the matrix�vector product operations when it is used to solve the matrix equation iteratively�

The FMM approach reduces the computational complexity of a matrix�vector product from O�N�

to O�N�� With the multilevel fast multipole algorithm �MLFMA the computational

complexity is further reduced to O�N logN ��

It should be noted that a matrix problem involvingN unknowns may be solved in CN iterNAx

�oating point operations� where C is a constant depending on the implementation of a particular

iterative method �� N iter is the number of iterations needed to reach the prespeci�ed convergence

criterion� and NAx is the �oating point operations needed for each matrix�vector multiplication�

By using the MLFMA� the complexity of the matrix�vector multiplication operations are reduced

to O�N logN � For many realistic problems� N iter depends on both the iterative solver and the

target properties �shape and material � For example� a problem with an open�ended cavity needs

much more number of iterations than that with a solid conducting box of the same size� Since N iter

is a proportional factor in the CPU counter� to further reduce the total CPU time� it is necessary to

�

reduce the number of iterations of the iterative solvers� Hence preconditioning techniques� which

may speed up the convergence rate of the Krylov subspace methods �accelerators � are needed in

this application�

There are various preconditioning techniques in existence� most of them are developed in the

context of iteratively solving large sparse matrices �� Until recently� preconditioning techniques

for solving dense matrices are less extensively studied �� compared to those for the sparse matrices�

In the electromagnetic scattering simulation �eld� the diagonal and block diagonal preconditioners

are considered by a few authors �� in the context of the MLFMA� The individual blocks �of

the block diagonal matrix are numerically available and are easy to invert� More recently developed

strategies are related to the sparsity pattern based incomplete LU factorization preconditioning

or sparse approximate inverse preconditioning �� Some of them are based on the pattern of

a sparsi�ed coe�cient matrix �� Others are constructed using the geometric or topological

information of the underlying problems �� Most of these preconditioning techniques� such

as the ILU�� rely on a �xed sparsity pattern� obtained from the sparsi�ed coe�cient matrix by

dropping small magnitude entries� The ILU�� preconditioner� which uses a static sparsity pattern

and is constructed from a sparsi�ed matrix� is shown to be ine�ective in some electromagnetic

scattering problems ��

In the MLFMA implementation� the global coe�cient matrix A is not explicitly available�

Thus the strategy of extracting a sparsity pattern by dropping small magnitude entries from the

dense coe�cient matrix to form a sparse matrix as a basis for constructing a preconditioner is not

feasible� We propose to use an incomplete lower�upper �ILU triangular factorization with a dual

dropping strategy �� to construct a preconditioner from the near part matrix of the coe�cient

matrix in the MLFMA implementation� The near part matrix is naturally available in the MLFMA

implementation� By not using a static �prespeci�ed sparsity pattern� we hope to capture the most

important �large magnitude entries in the incomplete LU factorization� while not to consume a

large amount of memory� This is achieved by dynamically dropping small magnitude entries during

the computation and by only allowing a �xed number of nonzeros kept in each row of the L and

U factors�

In our experimental study� we use the BiCG method as the iterative solver� coupled with three

preconditioning strategies to solve a few study cases of representative electromagnetic scattering

problems� We compare mainly the preconditioning e�ect of di�erent strategies� The precondition�

ing strategies that we consider are �a no preconditioner �equivalent to using the identity matrix as

the preconditioner � �b the block diagonal preconditioner� and �c the incomplete LU triangular

factorization preconditioner� We show by numerical simulations that the iterative methods with

the incomplete LU �ILU preconditioner reduce the iteration number signi�cantly� compared to

the cases with the block diagonal preconditioner as well as without a preconditioner�

This paper is organized as follows� Section � gives a concise introduction to the hybrid integral

equation approach in electromagnetic scattering and the multilevel fast multipole algorithm� In

Section �� we outline the BiCG iterative method and discuss in detail the ILUT preconditioner�

Extensive numerical experiments with a few electromagnetic scattering problems are conducted

�

in Section �� in which we also explain and analyze the data obtained in our experiments� Some

concluding remarks are given in Section ��

� Discretization of Integral Equation and Fast Multipole

Method

The hybrid integral equation approach combines the volume integral equation �VIE and the surface

integral equation �SIE to model the scattering and radiation by mixed dielectric and conducting

structures �� For example� when a radome is applied to an antenna� the combined system

consists of both dielectrics and conductors� Hence� the hybrid surface�volume integral equation

is ideal for this problem �� The volume integral equation is applied to material region and the

surface integral equation is enforced over the conducting surface� The integral equations can be

formally written as follows�

fLS�r� r� � JS�r

� � LV �r� r� � JV �r

� gtan � �Einctan�r � r � S�

�E � LS�r� r� � JS�r

� � LV �r� r� � JV �r

� � �Einc�r � r � V�

where Einc stands for the excitation �eld produced by an instant radar� the subscript �tan� stands

for taking the tangent component from the vector it applies to� and L�� S� V � is an integral

operator that maps the source J� to electric �eld E�r and it is de�ned as

L��r� r� � J��r

� � i��

Z�

�I � k��b rr

�eikjr�r

�j��jr � r�j �

It should be pointed out that E is related to JV in the above integral equations� This results in a

very general model as all the volume and surface regions are modeled properly� The advantage of

this approach is that in the coated object scattering problems� the coating material can be inhomo�

geneous� and in the printed circuit and microstrip antenna simulation problems the substrate can

be of �nite size� The simplicity of the Green�s function in both the volume integral equation and

the surface integral equation has an important impact on the implementation of the fast solvers�

However� the additional cost here is the increase in the number of unknowns since the volume

that is occupied by the dielectric material is meshed� This results in larger memory requirement

and longer solution time in solving the corresponding matrix equation� But this de�ciency can be

overcome by applying fast integral equation solvers such as the multilevel fast multipole algorithm

�MLFMA ��

We follow the general steps of the method of moments to discretize the hybrid volume�

surface integral equations� The hybrid integral equations are converted into a matrix equation

that is formally written as

�ZSS ZSV

ZV S ZV V

��

�aS

aV

��

�US

UV

��

where aS and aV stand for the vectors of the expansion coe�cients for the surface current and

the volume current� respectively �� The coe�cient matrix arising from discretized hybrid

�

integral equations is nonsymmetric� Once the matrix equation �� is solved by numerical matrix

equation solvers� the expansion coe�cients aS and aV can be used to calculate the scattered �eld

and radar cross section� In antenna analysis problems the coe�cients can be used to retrieve the

antenna�s input impedance and calculate the antenna�s radiation pattern� In the following� we use

A to denote the coe�cient matrix in Equation �� and x � �aS � aV T for simplicity�

The basis function is an elementary current that has local support� To solve the matrix equa�

tion by an iterative method� matrix�vector multiplications are needed at each iteration� Physically�

a matrix�vector multiplication corresponds to one cycle of interactions between basis functions�

The basic idea of the FMM is to convert the interaction of element�to�element to the interaction of

group�to�group� Here a group includes the elements residing in a spatial box� The mathematical

foundation of the FMM is the addition theorem for the free�space scalar Green�s function �� Using

the addition theorem� the matrix�vector product Ax can be written as

Ax � �AD �AN x � Vf�Vsx� ��

where Vf � �� and Vs are sparse matrices� In fact� the dense matrix A can be structurally divided

into three parts� AD � AN � and AF � Vf�Vs� AD is the block diagonal part of A� AN is the block

near�diagonal part of A� and AF is the far part of A� Here the terms �near� and �far� refer to the

distance between two groups of elements� Figure � illustrates the data structure of a partitioned

dense matrix A from electromagnetic scattering� The blank squares are AD� the biased squares

are AN � and AF is scattered in the rest of the area�

Figure �� An illustration of the data structure of a dense matrix A from electromagnetic scattering�

To implement the FMM� the spatial region of all the basis functions are subdivided into small

cubic boxes� Then the scattered �eld of di�erent scattering centers within a group are translated

into a single center called group center� This is the aggregation process� Hence� the number of

scattering centers is reduced� If two groups are separated far enough� the interactions between them

are carried out through translation� After the interactions among the groups are done� the �eld

received by a group is redistributed to the group members� This is the disaggregation process ��

In the FMM� the matrix elements are calculated di�erently depending on the distance between a

testing function and a basis function� For near�neighbor matrix elements �those in AD and AN �

the calculation remains the same as in the MoM procedure� However� those elements in AF are

not explicitly computed and stored� Hence they are not numerically available in the FMM� It can

�

be shown that with optimum grouping� the number of nonzero elements in the sparse matrices in

Equation �� are all on the order of N�� and hence the operation count to perform Ax is O�N��

If the above process is implemented in multilevel� then the total cost �in memory and CPU

time can be reduced further to be proportional to N logN for one matrix vector multiplication�

Due to this reduced computational complexity� the memory and CPU time savings can be orders

of magnitude compared with the direct dense matrix multiplication methods of O�N� if N is very

large ��

� Iterative Methods and Preconditioners

The most e�ective iterative solvers currently available are the class of the Krylov subspace methods

�� The Biconjugate method �BiCG is one of them especially aimed at solving nonsymmetric

and unstructured linear systems �� To speed up the convergence rate of the iteration processes�

Krylov subspace iterations are almost always combined with a suitable preconditioning technique�

Preconditioned Krylov subspace methods have become a popular choice of iterative solution tech�

niques for solving large sparse and dense matrices� Generally speaking� the performance of a Krylov

subspace solver will improve when combined with some suitable ILU preconditioners� Precondi�

tioners based on the incomplete LU factorization of the coe�cient matrix are usually inherently

sequential both in constructions and in applications�

There have been a few preconditioning techniques recently developed for solving dense linear

systems arising from discretized integral equations in electromagnetic applications and in other

applications �� Among these� the diagonal and block diagonal preconditioners are easy to

implement and have been shown to be e�ective for certain type of integral equations arising from

electromagnetic scattering problems �� The ILU�� preconditioner and preconditioners

based on various sparse approximate inverse techniques have recently been tested for solving similar

problems �� The ILU�� preconditioner constructed from a sparsi�ed coe�cient matrix has

been shown to be ine�ective for solving certain class of electromagnetic scattering problems �� On

the other hand� some sparse approximate inverse techniques need access to the full coe�cient matrix

�to construct a sparsi�ed matrix � which is not available in the FMM� Other sparse approximate

inverse techniques have been exploiting a �geometric or topological concept similar to our near

part matrix strategies �� In many cases� the sparse approximate inverse techniques are far more

expensive to construct than the incomplete LU type preconditioning techniques� This is the case

when the inherent parallelism of the sparse approximate inverse preconditioning strategies is not

exploited�

We propose to solve the dense linear system �� by using the BiCG method with an ILU

preconditioner� with a block diagonal preconditioner� and without a preconditioner� The BiCG

algorithm �� is a projection process onto a Krylov subspace

Km � spanfv�� Av�� Am��v�g

and is orthogonal to another Krylov subspace

Lm � spanfw�� ATw�� A

T m��w�g

taking v� � r��jjr�jj�� where r� is the initial residual vector� Here AT is the complex conjugate

transpose of the matrix A� The vector w� can be arbitrarily chosen� provided �v�� w� �� but it

is often chosen to be equal to v�� If there is a dual system ATx� � b� to solve with AT � then w�

is obtained by scaling the initial residual b� �ATx��

For completeness� we give the algorithm of the BiCG method from ��

Algorithm �� The BiCG Method�

�� Compute r� �� b�Ax�� Choose r�� such that �r�� r

��

�� Set p� �� r�� p�� r��

�� For j � �� until convergence Do�

�� j �� rj � r�j ��Apj � p

�j

�� xj�� xj � �jpj

� rj�� rj � �jApj

� r�j�� r�j � �jAT p�j

�� j �� rj�� r�j�� rj � r

�j

�� pj�� rj�� jpj

� � p�j�� r�j�� jp�j

�� EndDo

In Algorithm �� the initial guess is x�� The stopping creterion is usually set to measure the

reduction rate or the magnitude of a certain residual norm� A limit on the maximum number of

iterations is also used in almost all implementations of iterative methods� Implicitly� the algorithm

solves not only the original system Ax � b but also a dual linear sytem ATx� � b� with AT ��

At each iteration� BiCG needs two matrix�vector products with both A and AT �

It is known that an iterative method such as BiCG may not converge �or converges very

slowly for solving large scale ill�conditioned dense matrices� In practical applications� precondi�

tioning techniques should be used with the BiCG method to reduce the number of iterations�

We will evaluate two preconditoners� the ILU preconitioner with a dual dropping strategy �a

�ll�in and a drop tolerance �� and the block diagonal preconditioner� using the trivial identity

matrix preconditioner as comparison� The block diagonal preconditioner for the multilevel FMM

is implemented in �� Due to space limit� we will not be able to evaluate other type of precon�

ditioning technqiues� The e�ciency and examples of the ILU�� and some sparse approximate

inverse preconditioners in the FMM implementation can be found in ��

By the block diagonal preconditioning� we construct a preconditioner A��D from the block

diagonal matrix AD � and then apply the preconditioner A��D to the linear system �� That is�

A��D Ax � A��D b�

�A��D T AT x � �A��D T b�

�

Since AD is a block diagonal matrix� each individual block can be inverted independently�

By the ILU preconditioning� we construct a preconditioner M�� from the incomplete LU

factorization of the near part matrix �AD � AN � and then apply the preconditioner M�� to the

linear system �� That is�

M��Ax � M�� b�

�M�� T AT x � �M�� T b�

It is well known that a simple preconditioner� such as the ILU�� which uses the sparsity pattern

of A� may not be robust for solving some large scale ill�conditioned problems� Di�erent high

accuracy preconditioners have been proposed for solving sparse matrices �� Many of them are

constructed using either an enlarged sparsity pattern or using some threshold based drop tolerance

to allow more �ll�in in the construction phase� Some high accuracy ILU preconditioning strategies

may not be able to control the amount of memory space needed a priori� Since memory cost is

of vital importance in large scale electromagnetic scattering simulations� we resort to a particular

incomplete LU factorization strategy which uses a dual dropping strategy �ILUT to control both

the computational cost and the memory cost�

The following algorithm of the ILUT factorization on a matrix A � �aij N�N is due to Saad

and is extracted from ��

Algorithm �� The ILUT Algorithm�

�� For i � �� N Do�

�� w �� ai�

�� For k � �� i� � and when wk �� Do�

�� wk �� wk�akk

�� Apply a dropping rule to wk

� If wk �� then

� w �� w � wk � uk�

�� EndIf

�� EndDo

� � Apply a dropping rule to row w

�� li�j �� wj for j � �� i� �

�� ui�j �� wj for j � i� � � � � N

�� w ��

�� EndDo

In Algorithm �� wk is a work array and ai� denotes the ith �current row of the matrix A�

The dual dropping strategy of the ILUT is implemented using the two parameters and p� The

small entries of wk �with respect to and relative to a certain norm of the current row are dropped

in line �� In line �� small entries are dropped again� A search algorithm is used to �nd out the

largest p entries in magnitude� These p largest entries are kept� the others are dropped again� The

total storage is bounded by �pN � where p is the given �ll�in parameter and N is the number of

�

unknowns� Here controls the computational cost� and p controls the memory cost� By judiciously

choosing the two parameters and p� we may be able to construct an ILU preconditioner that is

e�ective and does not use much memory space�

In large scale electromagnetic scattering problems� the memory space is of vital important�

Preconditioners using a lot of memory space are less useful� In the MLFMA implementation� the

global matrix is not explicitly available� thus preconditioners need access to the global matrix are

not suitable� By using the ILUT� we anticipate to use not very much more than the amount of

memory space used to to store the near part matrix �AD � AN � Thus the memory and �oating

point operation complexity of the MLFMA is maintained�

When incomplete LU factorization preconditioners are used in combination with an iterative

process such as a Krylov subspace method� they tend to dictate the convergence behavior more

than the iterative process itself �� In general� high accuracy ILU type preconditioners with a

large amount of �ll�in are more robust but cost more in construction than their lower accuracy

counterparts do �� Since an ILUT preconditioner may be reused several times by solving the

same matrices with several di�erent right hand sides in electromagnetic scattering simulations� it

is possible to amortize the cost of constructing a high accuracy preconditioner� However� it is

important to note that the memory space is usually a bottleneck in large scale electromagnetic

scattering simulations�

The following algorithm shows the preconditioned BiCG method with a preconditioner M �

Algorithm �� The Preconditioned BiCG Method�

�� Compute r� �� b�Ax�� Choose r�� such that �r�� r

��

�� Compute z� ��M��r�� z�� M�� T r��

�� Set� p� �� z�� p�� z��

�� For j � �� until convergence Do�

�� j �� rj � z�j ��Apj � p

�j

� xj�� xj � �jpj

� rj�� rj � �jApj

�� r�j�� r�j � �jAT p�j

�� zj�� M��rj��

� � z�j�� M�� T r�j�� j �� rj�� z

�j�� rj � z

�j

�� pj�� rj�� jpj

�� p�j�� z�j�� jp�j

�� EndDo

Few theoretical results are known about the convergence of BiCG �� In practice� it is

observed that the convergence behavior may be quite irregular� and the method may even break

down� The breakdown situation due to the possible event that �rj � z�j � � can be circumvented by

the so�called look�ahead strategy �� The other breakdown situation� �Apj � p�j � �� occurs when

the LU decomposition fails� and can be repaired by using another decomposition� Sometimes� the

�

breakdown or near�breakdown situations can be satisfactorily avoided by a restart at the iteration

step immediately before the �near� breakdown step� Another possibility is to switch to a more

robust �but possibly more expensive method� like GMRES ��

For symmetric positive de�nite systems the BiCG method delivers the same results as the

conjugate gradient �CG method� but at twice the cost per iteration� For nonsymmetric matrices

it has been shown that in phases of the process where there is signi�cant reduction of the norm of

the residual� the method is more or less comparable to full GMRES �in terms of the numbers of

iterations ��

� Numerical Experiments and Analysis

In this section� we present a number of numerical examples to demonstrate the e�ciency of the

ILU preconditioner with a dual dropping strategy �a �ll�in and a drop tolerance �� compared

with the block diagonal preconditioner and without a preconditioner� in speeding up the BiCG

iterations� All cases are tested on one processor of an HP Superdome cluster at the University of

Kentucky� The processor has � GB local memory and runs at �� MHz� The code is written in

Fortran �� programming language and is run in single precision�

To demonstrate the performance of our preconditioned BiCG solver� we calculate the radar

cross section �RCS of di�erent conducting geometries with and without coating� The geometries

considered include plates� spheres� antennas and antenna arrays� The mesh sizes for all the test

structures are about one tenth of a wavelength� The information describing the structures and the

generated matrices is given in Table �� The second column entitled �level� indicates the number of

levels in the multilevel fast multipole algorithm� In most cases� three or four levels are used� For

the largest case P�A� we have used seven levels� In Table � we also give the number of nonzeros

of the full matrix A� the block diagonal matrix AD � and the near part matrix �AD �AN �

Numerical data from each test case are shown in Tables � � �� Corresponding convergence

history and certain computed solutions for some selected cases are plotted in Figures � � �� There

are two stopping criteria implemented in the code� One is the error�bound �residual norm � and

the other is the limit of iteration numbers� The error�bound is to reduce the ��norm residual by

�� and the limit of the maximum number of iterations is set as ��

Here are the explanations of the notations shown in the data tables�

� prec � the preconditioner used with the iterative method

� NONE � no preconditioner

� BLOCK � the block diagonal preconditioner

� ILUT � the incomplete LU preconditioner with a dual dropping strategy

� � the threshold drop tolerance used in ILUT

� p � the �ll�in parameter used in ILUT

��

Table �� Information about the matrices used in the experiments �� is the wavelength in meters �

cases level unknowns matrix nonzeros target size and descriptionP�A � �� A ��

AD �� AD � AN �� Conducting plate

P�B � �� A �� AD ��

AD � AN �� Dielectric plate over conducting plateS�A � �� A ��

AD �� AD � AN �� Conducting sphere �partial�

S�B � �� A �� AD �� Conducting sphere

AD � AN �� with dielectric coating �partial�P�A � �� A � ��

AD �� AD � AN �� Microstrip antenna �no substrate�

P�B � �� A �� AD ��

AD � AN �� Microstrip antennaS�A � �� A ��

AD �� AD � AN �� Large conducting sphere

S�B � �� A �� AD ��

AD � AN �� Large sphere with dielectric coatingP�A � �� A ��

AD �� AD � AN �� Antenna array

� ratiof � the sparsity ratio of the number of nonzeros of the L and U factors with respect to

the number of nonzeros of the full matrix A

� ration � the sparsity ratio of the number of nonzeros of the L and U factors with respect to

the number of nonzeros of the near part matrix �AD �AN

� LUcpu � the CPU time for performing the incomplete LU factorization

� itnum � the number of the preconditioned BiCG iterations

� itcpu � the CPU time for the iteration phase

� sol � iterative solution quality� compared with the exact solution from the Gauss elimination

�LUD solver� or with the solution from the converged cases when the Gauss elimination

solver cannot be used due to the memory limit

� same � converged within the given stopping criteria

� di� � did not converge within the given stopping criteria

We summarize our experimental results as follows�

�� For the class of problems we tested� the block diagonal preconditioner improves the BiCG

convergence in only � �P�B� P�A� and P�B out of the � cases� In the remaining other

��

cases� the block diagonal preconditioner actually hampers the BiCG convergence� It is our

conclusion that the block diagonal preconditioner is not robust for solving this class of dense

matrices arising from combined hybrid integral formulation of the electromagnetic scattering

problems�

�� For almost all problems� the ILUT preconditioner improves the BiCG convergence signi��

cantly and reduces the total CPU time substantially� Our tests show that the ILUT precon�

ditioner is robust for solving this class of dense matrices� We also see that di�erent choices of

the ILUT parameters and p may a�ect the convergence rate of the preconditioned iterative

solver�

�� It is worth pointing out that the memory cost of the ILUT preconditioner is reasonable� In

most cases� the memory cost of the ILUT preconditioner is less than the cost of storing the

near part matrix �AD �AN � see the data in the column entitled ration�

�� For most problems� the computation �construction cost �LUcpu of the ILUT preconditioner

is low� compared to the preconditioned iterative solution cost �itcpu � The exceptions are

found in the S�B and S�B cases� These two cases are related to modeling conducting spheres

with dielectric coating�

�� For the converged cases� the approximate solutions computed from the preconditioned BiCG

solver are comparable to those computed from the direct solver�

If a test case converges �ne� we show the computed solution graphs from each methods �the

left hand side graphs of the Figures � � � � The solution graphs indicate the scattering pattern of

the objects�

We compare the computed solution from each preconditioning strategies with the exact so�

lution �LUD of the linear system �� the right hand side graphs of the Figures � � � � If the

number of unknowns are over �� then we did not try to get the exact solution from the LUD

decomposition because it is too time consuming�

Figures � and � show that the solutions of these cases computed with all three preconditioning

strategies are similar and approximate to the exact solution� Figures � and � show that the solutions

of the BiCG method without a preconditioner and with the ILUT preconditioner are similar and

both solutions approximate to the exact solution� The block diagonal preconditioner does not

converge within the given stopping criteria�

In the case of P�A� the number of the ILUT iterations and the CPU time is relatively small

compared to those of the iterative solver without a preconditioner and with the block diagonal

preconditioner� The sparsity ration of the ILUT preconditioner for most cases is less than � or

very close to � �see Tables � � �� and Tables � and �� except in the case of S�A �see Table � �

Therefore� the ILUT preconditioner with a suitable drop tolerance parameter and a suitable �ll�in

parameter does not need a large amount of memory space�

In all cases� the convergence behavior graphs �the right hand side graphs of the Figures �

� � and Figures � �� show that BiCG with the ILUT preconditioner converges very fast with

��

Table �� Numerical data of the P�A case �unknowns � ��

prec � p LUcpu ratiof ration itnum itcpu sol

NONE �� sameBLOCK �� sameILUT �� same

�� same�� same



�� same

0 10 20 30 40 50 60 70 80 90 100−50

−40

−30

−20

−10

0

10

RC

S(d

Bsw

)

Angle of observation

LUD non−precondblock diag ILU

0 200 400 600 800 1000 120010

−3

10−2

10−1

100

101

Res

idua

l nor

m

Number of iterations

non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the P�A case�ILUT parameters are � �� and p � ��

a small number of iterations compared to with the block diagonal preconditioner and without a

preconditioner� High accuracy ILUT preconditioner with more �ll�in is shown to be more robust

and more e�cient�

Accuracy and stability of the ILUT factorization� If the sparsity ratio ration of the ILUT

preconditioner is very small the preconditioned BiCG iteration may not converge within the given

stopping criteria or diverge� There are two possible reasons for the ILU type preconditioners

to perform poorly for solving certain type of matrices� Of course� if the ILU preconditioner is

not accurate� such as using large values and small p values in the ILUT preconditioner� the

preconditioning matrix M is a poor approximation to the near part matrix �AD � AN �or to

the coe�cient matrix A � It is expected that the preconditioning e�ect of these poor quality ILU

preconditioners is not good� Such poor quality ILU preconditioners due to insu�cient amount of

�ll�in can usually be improved by allowing more �ll�in� i�e�� by choosing a smaller value of and a

larger value of p in the ILUT implementation�

Poor preconditioning quality can also come from an unstable factorization� which is usually

associated with small pivots encountered in the ILU factorizations �� Small pivots usually pro�

duce the L and U factors with large magnitude entries� which make the relatively small magnitude

��

Table �� Numerical data of the P�B case �unknowns � ��




�� same�� same�� same



0 10 20 30 40 50 60 70 80 90 100−40

−30

−20

−10

0

10

20

RC

S(d

Bsw

)


LUD non−precondblock diag ILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the P�B case�ILUT parameters are � �� and p � ��

Table �� Numerical data of the S�A case �unknowns � ��


NONE �� sameBLOCK �� di�ILUT �� same




�� same

��

0 10 20 30 40 50 60 70 80 90 100−45

−40

−35

−30

−25

−20

−15

−10R

CS

(dB

sw)


LUD non−precondILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the S�A case�ILUT parameters are � �� and p � ��

Table �� Numerical data of the S�B case �unknowns � ��






�� same

0 10 20 30 40 50 60 70 80 90 100−45

−40

−35

−30

−25

−20

−15

−10

−5

RC

S(d

Bsw

)


LUD non−precondILU

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure �� Solution comparison �left and convergence history �right for solving the S�B case�ILUT parameters are � �� and p � ��

��

Table � Numerical data of the P�A case �unknowns � ��







0 100 200 300 400 500 600 700 800 900 100010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure � Convergence history for solving the P�A case� ILUT parameters are � �� and p ��

Table �� Numerical data of the P�B case �unknowns � ��


NONE �� di�BLOCK �� sameILUT �� same

� �� same�� same

�� same � �� same�� same



�

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

102

Res

idua

l nor

m


non block diagILU

Figure �� Convergence history for solving the P�B case� ILUT parameters are � �� and p ��

Table �� Numerical data of the S�A case �unknowns � ��







0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure �� Convergence history for solving the S�A case� ILUT parameters are � �� and p ��

��

Table �� Numerical data of the S�B case �unknowns � ��


NONE � � �� sameBLOCK �� di�ILUT �� same





0 200 400 600 800 1000 1200 1400 1600 1800 200010

−3

10−2

10−1

100

101

Res

idua

l nor

m


non block diagILU

Figure �� Convergence history for solving the S�B case� ILUT parameters are � �� and p ��

Table �� Numerical data of the P�A case �unknowns � ��







��

0 100 200 300 400 500 60010

−3

10−2

10−1

100

101


Res

idua

l nor

m

non block diagILU

Figure �� Convergence history for solving the P�A case� ILUT parameters are � �� and p ��

entries of the matrix A insigni�cant in �nite precision computations� In sparse matrix computa�

tions� such problems can happen when performing ILU factorization on inde�nite matrices ��

In constructing an ILUT preconditioner for a dense matrix� we use single precision arithmetic

due to the large amount of �oating point operations and storage requirement� If the condition

number of the matrix M�� LU �� is large� say� on the order of �� computations in such

a precision with the matrix M�� is likely to be inaccurate� The preconditioned matrix M��A

might be worse conditioned than the original matrix A� In this situation� the preconditioned BiCG

method may take more iterations to converge or may not converge at all� On the other hand� due

to the implementation in which we stop the iteration when the �preconditioned residual norm is

reduced by �� it is possible that the initial residual norm is huge� Convergence might appear

even if the actual residual norm is still large� if the initial residual norm is large� Such a convergence

is called a false convergence�

To distinguish a false convergence from a true convergence� or to determine if a preconditioner

might be e�ective before a preconditioned iteration procedure is carried out� it is informative to

estimate the condition number of the �LU �� matrix� Chow and Saad proposes to estimate this

number by calculating jj�LU ��ejj� where e is the vector of all ones� This statistics quantity

is called condest in �� We note that this statistics is also a lower bound for jj�LU ��jj� and

indicates a relation between the unstable triangular solutions and the poorly conditioned L and

U factors �� The computation of the quantity k�LU ��ek� is just one forward solution and

back substitution with the L and U factors�

We calculate the condest numbers for the P�A and S�A cases� in order to investigate what

might happen with the ILUT factorization in solving a complex valued dense matrix� We �nd

that the drop tolerance value �in the range of our studies does not a�ect very much the value

of condest� Condest value mostly depends on the �ll�in parameter p� As the �ll�in parameter p

decreases and the drop tolerance parameter increases� the value of condest increases a lot in

certain situations� We also �nd that the values of condest are related to the solution patterns� If

condest is very large� the preconditioned BiCG method does not converge within the given stopping

��

Table �� Condest values of the ILUT preconditioner for solving the P�A case �unknowns � ��

� p condest itnum convergence sol

�� yes same�� yes same�� yes di�� E�� yes di�� E�� diverge �

�� yes same�� yes same�� yes di�� E�� yes di�

Table �� Condest values of the ILUT preconditioning for solving the S�A case �unknowns ��

� p condest itnum convergence sol

�� yes same�� yes same�� E�� no di�

�� yes same�� yes same�� E�� no di�

criteria or diverges and the computed solution is not good� as we see in Tables ��

In Table �� the ILUT with p � �� and � �� converges in � iterations� but the computed

solution is not good� This is an example of false convergence� We can determine the quality of

this ILUT preconditioner without even starting the BiCG iteration process� The condest value of

�� indicates that any single precision computation with �LU �� is likely to be inaccurate�

� Conclusions

In the implementation of the MLFMA� the ILU preconditioners based on a dual dropping strategy

�ILUT has been shown to reduce the number of BiCG iterations dramatically� compared to the

block diagonal preconditioner and without a preconditioner� Solving the large complex valued

dense linear system arising from electromagnetic scattering using the BiCG accelerator with an

ILU preconditioner maintains the computational complexity of the MLFMA� and consequently

reduces the total CPU time�

According to the results from our numerical experiments� we can see that the ILUT pre�

conditioner constructed from the near part matrix improves the computational e�ciency in the

sense of both memory and CPU time� The results show that the BiCG accelerator with the ILUT

preconditioner is robust for solving three dimensional model cases from electromagnetic scattering

simulations�

One advantage of our implementation is that we do not need access to the global matrix to

construct our preconditioner �ideal for the MLFMA in which only AD and AN are numerically

��

available � This is in contrast to many existing preconditioning techniques either in the incomplete

LU factorization forms or in the sparse approximate inverse forms� in which a sparsi�ed global

matrix may be needed to construct a preconditioner ��

We also discussed the situations that inaccurate or unstable ILUT preconditioners may be

computed� We proposed to use a simple condest value to predict the quality of a constructed

ILUT preconditioner before the preconditioned BiCG iteration is started� Our experimental results

indicate that this strategy is practically useful in predicting false convergence and divergence of

the preconditioned iterative solvers�

References

�� G� All�eon� M� Benzi� and L� Giraud� Sparse approximate inverse preconditioning for dense

linear systems arising in computational electromagnetics� Numer� Algorithms� ��

�� R� Barrett� M� Berry� T� F� Chan� J� Demmel� J� Donato� J� Dongarra� V� Eijkhout� R� Pozo�

C� Romine� and H� van der Vorst� Templates for the Solution of Linear Systems� Building

Blocks for Iterative Methods� �nd Edition� SIAM� Philadelphia� ��

�� B� Carpentieri� I� S� Du�� and L� Giraud� Experiments with sparse preconditioning of dense

problems from electromagnetic applications� Technical Report TR PA �� CERFACS�

Toulouse� France� ��

�� B� Carpentieri� I� S� Du�� and M� Magolu monga Made� Sparse symmetric preconditioners

for dense linear systems in electromagnetism� Technical Report TR PA �� CERFACS�

Toulouse� France� ��

�� K� Chen� On a class of preconditioning methods for the dense linear systems from boundary

elements� SIAM J� Sci� Comput��

� W� C� Chew� J� M� Jin� E� Midielssen� and J� M� Song� Fast and E�cient Algorithms in

Computational Electromagnetics� Artech House� Boston� ��

�� E� Chow and Y� Saad� Experimental study of ILU preconditioners for inde�nite matrices� J�

Comput� Appl� Math��

�� R� Coifman� V� Rokhlin� and S� Wandzura� The fast multipole method for the wave equation�

a pedestrian prescription� IEEE Antennas Propagat� Mag��

�� E� Darve� The fast multipole method� numerical implementation� J� Comput� Phys��

��

�� R� Freund and N� Nachtigal� QMR� A quasi�minimal residual method for non�Hermitian linear

systems� Numer� Math��

��

�� B� M� Kolundzija� Electromagnetic modeling of composite metallic and dielectric structures�

IEEE Trans� Micro� Theory Tech��

�� C� Lanczos� Solution of systems of linear equations by minimized iterations� J� Research

National Bureau Standards� ��

�� C� C� Lu and W� C� Chew� A multilevel algorithm for solving a boundary integral equation

of wave scattering� IEEE Trans� Micro� Opt� Tech� Lett��

�� C� C� Lu� and W� C� Chew� A coupled surface�volume integral equation approach for the

calculation of electromagnetic scattering from composite metallic and material targets� IEEE

Trans� Antennas Propagat��

�� B� N� Parlett� D� R� Taylor� and Z� A� Liu� A look�ahead Lanczos algorithm for unsymmetric

matrices� Math� Comput��

�� J� Rahola� Experiments on iterative methods and the fast multipole method in electromagnetic

scattering calculations� Technical Report TR PA �� CERFACS� Toulouse� France� ��

�� S� M� Rao� D� R� Wilton� and A� W� Glisson� Electromagnetic scattering by surface of arbitrary

shape� IEEE Trans� Antenna and Propagat�� AP��

�� V� Rokhlin� Rapid solution of integral equations of scattering theory in two dimensions� J�

Comput� Phys��

�� Y� Saad� ILUT� a dual threshold incomplete LU factorization� Numer� Linear Algebra Appl��

��

�� Y� Saad� Iterative Methods for Sparse Linear Systems� PWS Publishing Company� Boston�

��

�� K� Sertel and J� L� Volakis� Incomplete LU preconditioner for FMM implementation� Micro�

Opt� Tech� Lett��

�� T� K� Shark� S� M� Rao� and A� R� Djordievic� Electromagnetic scattering and radiation from

�nite microstrip structures� IEEE Trans� Micro� Opt� Tech��

�� J� M� Song and W� C� Chew� Multilevel fast multipole algorithm for solving combined �eld

integral equation of electromagnetic scattering� Micro� Opt� Tech� Lett��

�� J� M� Song� C� C� Lu� and W� C� Chew� Multilevel fast multipole algorithm for electromagnetic

scattering by large complex objects� IEEE Trans� Antennas Propagat�� AP��

��

�� T� Vaupel and V� Hansen� Electrodynamic analysis of combined microstrip and copla�

nar slotline structure with ��D components based on a surface volume integral equation ap�

proach� IEEE Trans� Micro� Theory Tech��

��

�� C� F� Wang and J� M� Jin� A fast full�wave analysis of scattering and radiation from large �nite

arrays of microstrip antennas� IEEE Trans� Antennas Propagat��

�� J� Zhang� Sparse approximate inverse and multilevel block ILU preconditioning techniques

for general sparse matrices� Appl� Numer� Math��

�� J� Zhang� A multilevel dual reordering strategy for robust incomplete LU factorization of

inde�nite matrices� SIAM J� Matrix Anal� Appl��

��

10.1.1.63.1051

Documents