10.1.1.63.1051
DESCRIPTION
xcvxTRANSCRIPT
Incomplete LU Preconditioning for Large Scale Dense
Complex Linear Systems from Electromagnetic Wave
Scattering Problems �
Jeonghwa Lee�y � Jun Zhang�z � and Cai�Cheng Lu�x
�Laboratory for High Performance Scienti�c Computing and Computer Simulation�
Department of Computer Science� University of Kentucky�
Lexington� KY ���������� USA
� Department of Electrical and Computer Engineering�
University of Kentucky�
Lexington� KY ����������� USA
April �� ����
Abstract
We consider preconditioned iterative solution of the linear system Ax � b� where the co�e�cient matrix A is a large scale dense complex valued matrix arising from discretizing theintegral equation of electromagnetic scattering� For some scattering structures this matrix canbe poorly conditioned� The main purpose of this study is to evaluate the e�ciency of a classof incomplete LU �ILU� factorization preconditioners for solving this type of matrices� Wesolve the electromagnetic wave equations using the BiCG method with an ILU preconditionerin the context of a multilevel fast multipole algorithm �MLFMA�� The novelty of this work isthat the ILU preconditioner is constructed using the near part block diagonal submatrix gen�erated from the MLFMA� Experimental results show that the ILU preconditioner reduces thenumber of BiCG iterations substantially� compared to the block diagonal preconditioner� Thepreconditioned iteration scheme also maintains the computational complexity of the MLFMA�and consequently reduces the total CPU time�
Key words Krylov subspace methods� ILU preconditioning� multilevel fast multipole algorithm� electro�
magnetic wave equation
Mathematics Subject Classi�cation �F� � �R� � �F� � ��C�
�Technical Report No� ������� Department of Computer Science� University of Kentucky� Lexington� KY� USA������
yE�mail leejhengr�uky�edu� URL http��www�csr�uky�edu��leejh� This author�s research work was supportedin part by the U�S� National Science Foundation under grant CCR��� �����
zE�mail jzhangcs�uky�edu� URL http��www�cs�uky�edu��jzhang� This author�s research work was supportedin part by the U�S� National Science Foundation under grants CCR� ������ CCR� ������ CCR��� ����� andACI����� ��� in part by the U�S� Department of Energy under grant DE�FG�����ER�� ��� in part by the JapaneseResearch Organization for Information Science and Technology� and in part by the University of Kentucky ResearchCommittee�
xE�mail ccluengr�uky�edu� URL http��www�engr�uky�edu��cclu� This author�s research work was supportedin part by the U�S� National Science Foundation under grant ECS��� �� �� and in part by the U�S� O�ce of NavalResearch under grant N����������������
�
� Introduction
The electromagnetic wave scattering by three dimensional arbitrarily shaped dielectric and con�
ducting objects can be obtained by �nding the solution of an integral equation� The solution
to electromagnetic wave interaction with material coated objects has applications in radar cross
section prediction for coated targets� printed circuit� and microstrip antenna analysis ���� ��� ��
This problem can be studied using integral equation solvers� For example� to calculate radar scat�
tering by a conducting target coated by dielectric material� the hybrid surface and volume integral
equations can be used ���� In this case� a surface integral equation is formed for the conducting
surfaces and a volume integral equation is formed for the dielectrics� The integral equations are
discretized into a matrix equation by the method of moments �MoM ���� With this procedure�
we obtain a linear system of the form
Ax � b� ��
where the coe�cient matrix A is a large scale� dense� and complex valued matrix for electrically
large targets�
The matrix equation �� can be solved by numerical matrix equation solvers� Typical matrix
solvers can be categorized into two classes� The �rst class is the direct solution methods� of
which the Gauss elimination method is representative� The second class is the iterative solution
methods� of which the Krylov subspace methods are considered to be the most e�ective ones
currently available ���� Biconjugate gradient �BiCG method is one of the many Krylov subspace
methods ����
The direct solution methods are very expensive in both memory space and CPU time for
large size problems� For the Gauss elimination method the �oating point operation count is on
the order of O�N� � where N is the number of unknowns �columns of the matrix A� In contrary�
the complexity of the BiCG type iterative methods is on the order of O�N� if the convergence
is achieved in a small number of iterations� The Krylov subspace methods such as BiCG require
the computation of some matrix�vector product operations at each iteration� which account for
the major computational cost of this class of methods� The fast multipole method �FMM speeds
up the matrix�vector product operations when it is used to solve the matrix equation iteratively�
The FMM approach reduces the computational complexity of a matrix�vector product from O�N�
to O�N��� ��� �� ��� With the multilevel fast multipole algorithm �MLFMA the computational
complexity is further reduced to O�N logN ���� � ��� ���
It should be noted that a matrix problem involvingN unknowns may be solved in CN iterNAx
�oating point operations� where C is a constant depending on the implementation of a particular
iterative method ��� N iter is the number of iterations needed to reach the prespeci�ed convergence
criterion� and NAx is the �oating point operations needed for each matrix�vector multiplication�
By using the MLFMA� the complexity of the matrix�vector multiplication operations are reduced
to O�N logN � For many realistic problems� N iter depends on both the iterative solver and the
target properties �shape and material � For example� a problem with an open�ended cavity needs
much more number of iterations than that with a solid conducting box of the same size� Since N iter
is a proportional factor in the CPU counter� to further reduce the total CPU time� it is necessary to
�
reduce the number of iterations of the iterative solvers� Hence preconditioning techniques� which
may speed up the convergence rate of the Krylov subspace methods �accelerators � are needed in
this application�
There are various preconditioning techniques in existence� most of them are developed in the
context of iteratively solving large sparse matrices ���� Until recently� preconditioning techniques
for solving dense matrices are less extensively studied ��� compared to those for the sparse matrices�
In the electromagnetic scattering simulation �eld� the diagonal and block diagonal preconditioners
are considered by a few authors ���� ��� �� in the context of the MLFMA� The individual blocks �of
the block diagonal matrix are numerically available and are easy to invert� More recently developed
strategies are related to the sparsity pattern based incomplete LU factorization preconditioning
or sparse approximate inverse preconditioning ��� ��� Some of them are based on the pattern of
a sparsi�ed coe�cient matrix ��� �� Others are constructed using the geometric or topological
information of the underlying problems ��� ��� Most of these preconditioning techniques� such
as the ILU�� � rely on a �xed sparsity pattern� obtained from the sparsi�ed coe�cient matrix by
dropping small magnitude entries� The ILU�� preconditioner� which uses a static sparsity pattern
and is constructed from a sparsi�ed matrix� is shown to be ine�ective in some electromagnetic
scattering problems ���
In the MLFMA implementation� the global coe�cient matrix A is not explicitly available�
Thus the strategy of extracting a sparsity pattern by dropping small magnitude entries from the
dense coe�cient matrix to form a sparse matrix as a basis for constructing a preconditioner is not
feasible� We propose to use an incomplete lower�upper �ILU triangular factorization with a dual
dropping strategy ��� to construct a preconditioner from the near part matrix of the coe�cient
matrix in the MLFMA implementation� The near part matrix is naturally available in the MLFMA
implementation� By not using a static �prespeci�ed sparsity pattern� we hope to capture the most
important �large magnitude entries in the incomplete LU factorization� while not to consume a
large amount of memory� This is achieved by dynamically dropping small magnitude entries during
the computation and by only allowing a �xed number of nonzeros kept in each row of the L and
U factors�
In our experimental study� we use the BiCG method as the iterative solver� coupled with three
preconditioning strategies to solve a few study cases of representative electromagnetic scattering
problems� We compare mainly the preconditioning e�ect of di�erent strategies� The precondition�
ing strategies that we consider are �a no preconditioner �equivalent to using the identity matrix as
the preconditioner � �b the block diagonal preconditioner� and �c the incomplete LU triangular
factorization preconditioner� We show by numerical simulations that the iterative methods with
the incomplete LU �ILU preconditioner reduce the iteration number signi�cantly� compared to
the cases with the block diagonal preconditioner as well as without a preconditioner�
This paper is organized as follows� Section � gives a concise introduction to the hybrid integral
equation approach in electromagnetic scattering and the multilevel fast multipole algorithm� In
Section �� we outline the BiCG iterative method and discuss in detail the ILUT preconditioner�
Extensive numerical experiments with a few electromagnetic scattering problems are conducted
�
in Section �� in which we also explain and analyze the data obtained in our experiments� Some
concluding remarks are given in Section ��
� Discretization of Integral Equation and Fast Multipole
Method
The hybrid integral equation approach combines the volume integral equation �VIE and the surface
integral equation �SIE to model the scattering and radiation by mixed dielectric and conducting
structures ���� ��� For example� when a radome is applied to an antenna� the combined system
consists of both dielectrics and conductors� Hence� the hybrid surface�volume integral equation
is ideal for this problem �� The volume integral equation is applied to material region and the
surface integral equation is enforced over the conducting surface� The integral equations can be
formally written as follows�
fLS�r� r� � JS�r
� � LV �r� r� � JV �r
� gtan � �Einctan�r � r � S�
�E � LS�r� r� � JS�r
� � LV �r� r� � JV �r
� � �Einc�r � r � V�
where Einc stands for the excitation �eld produced by an instant radar� the subscript �tan� stands
for taking the tangent component from the vector it applies to� and L�� �� � S� V � is an integral
operator that maps the source J� to electric �eld E�r and it is de�ned as
L��r� r� � J��r
� � i��
Z�
�I � k��b rr
�eikjr�r
�j����jr � r�j �
It should be pointed out that E is related to JV in the above integral equations� This results in a
very general model as all the volume and surface regions are modeled properly� The advantage of
this approach is that in the coated object scattering problems� the coating material can be inhomo�
geneous� and in the printed circuit and microstrip antenna simulation problems the substrate can
be of �nite size� The simplicity of the Green�s function in both the volume integral equation and
the surface integral equation has an important impact on the implementation of the fast solvers�
However� the additional cost here is the increase in the number of unknowns since the volume
that is occupied by the dielectric material is meshed� This results in larger memory requirement
and longer solution time in solving the corresponding matrix equation� But this de�ciency can be
overcome by applying fast integral equation solvers such as the multilevel fast multipole algorithm
�MLFMA ��
We follow the general steps of the method of moments to discretize the hybrid volume�
surface integral equations� The hybrid integral equations are converted into a matrix equation
that is formally written as
�ZSS ZSV
ZV S ZV V
��
�aS
aV
��
�US
UV
�� ��
where aS and aV stand for the vectors of the expansion coe�cients for the surface current and
the volume current� respectively �� ��� The coe�cient matrix arising from discretized hybrid
�
integral equations is nonsymmetric� Once the matrix equation �� is solved by numerical matrix
equation solvers� the expansion coe�cients aS and aV can be used to calculate the scattered �eld
and radar cross section� In antenna analysis problems the coe�cients can be used to retrieve the
antenna�s input impedance and calculate the antenna�s radiation pattern� In the following� we use
A to denote the coe�cient matrix in Equation �� � and x � �aS � aV T for simplicity�
The basis function is an elementary current that has local support� To solve the matrix equa�
tion by an iterative method� matrix�vector multiplications are needed at each iteration� Physically�
a matrix�vector multiplication corresponds to one cycle of interactions between basis functions�
The basic idea of the FMM is to convert the interaction of element�to�element to the interaction of
group�to�group� Here a group includes the elements residing in a spatial box� The mathematical
foundation of the FMM is the addition theorem for the free�space scalar Green�s function ��� Using
the addition theorem� the matrix�vector product Ax can be written as
Ax � �AD �AN x � Vf�Vsx� ��
where Vf � �� and Vs are sparse matrices� In fact� the dense matrix A can be structurally divided
into three parts� AD � AN � and AF � Vf�Vs� AD is the block diagonal part of A� AN is the block
near�diagonal part of A� and AF is the far part of A� Here the terms �near� and �far� refer to the
distance between two groups of elements� Figure � illustrates the data structure of a partitioned
dense matrix A from electromagnetic scattering� The blank squares are AD� the biased squares
are AN � and AF is scattered in the rest of the area�
Figure �� An illustration of the data structure of a dense matrix A from electromagnetic scattering�
To implement the FMM� the spatial region of all the basis functions are subdivided into small
cubic boxes� Then the scattered �eld of di�erent scattering centers within a group are translated
into a single center called group center� This is the aggregation process� Hence� the number of
scattering centers is reduced� If two groups are separated far enough� the interactions between them
are carried out through translation� After the interactions among the groups are done� the �eld
received by a group is redistributed to the group members� This is the disaggregation process ����
In the FMM� the matrix elements are calculated di�erently depending on the distance between a
testing function and a basis function� For near�neighbor matrix elements �those in AD and AN �
the calculation remains the same as in the MoM procedure� However� those elements in AF are
not explicitly computed and stored� Hence they are not numerically available in the FMM� It can
�
be shown that with optimum grouping� the number of nonzero elements in the sparse matrices in
Equation �� are all on the order of N���� and hence the operation count to perform Ax is O�N��� �
If the above process is implemented in multilevel� then the total cost �in memory and CPU
time can be reduced further to be proportional to N logN for one matrix vector multiplication�
Due to this reduced computational complexity� the memory and CPU time savings can be orders
of magnitude compared with the direct dense matrix multiplication methods of O�N� if N is very
large ���� ��
� Iterative Methods and Preconditioners
The most e�ective iterative solvers currently available are the class of the Krylov subspace methods
���� The Biconjugate method �BiCG is one of them especially aimed at solving nonsymmetric
and unstructured linear systems ���� To speed up the convergence rate of the iteration processes�
Krylov subspace iterations are almost always combined with a suitable preconditioning technique�
Preconditioned Krylov subspace methods have become a popular choice of iterative solution tech�
niques for solving large sparse and dense matrices� Generally speaking� the performance of a Krylov
subspace solver will improve when combined with some suitable ILU preconditioners� Precondi�
tioners based on the incomplete LU factorization of the coe�cient matrix are usually inherently
sequential both in constructions and in applications�
There have been a few preconditioning techniques recently developed for solving dense linear
systems arising from discretized integral equations in electromagnetic applications and in other
applications ��� �� Among these� the diagonal and block diagonal preconditioners are easy to
implement and have been shown to be e�ective for certain type of integral equations arising from
electromagnetic scattering problems ���� ��� ��� The ILU�� preconditioner and preconditioners
based on various sparse approximate inverse techniques have recently been tested for solving similar
problems ��� ��� The ILU�� preconditioner constructed from a sparsi�ed coe�cient matrix has
been shown to be ine�ective for solving certain class of electromagnetic scattering problems ��� On
the other hand� some sparse approximate inverse techniques need access to the full coe�cient matrix
�to construct a sparsi�ed matrix � which is not available in the FMM� Other sparse approximate
inverse techniques have been exploiting a �geometric or topological concept similar to our near
part matrix strategies ��� In many cases� the sparse approximate inverse techniques are far more
expensive to construct than the incomplete LU type preconditioning techniques� This is the case
when the inherent parallelism of the sparse approximate inverse preconditioning strategies is not
exploited�
We propose to solve the dense linear system �� by using the BiCG method with an ILU
preconditioner� with a block diagonal preconditioner� and without a preconditioner� The BiCG
algorithm ��� is a projection process onto a Krylov subspace
Km � spanfv�� Av�� � � � � Am��v�g
and is orthogonal to another Krylov subspace
Lm � spanfw�� ATw�� � � � � �A
T m��w�g
taking v� � r��jjr�jj�� where r� is the initial residual vector� Here AT is the complex conjugate
transpose of the matrix A� The vector w� can be arbitrarily chosen� provided �v�� w� �� �� but it
is often chosen to be equal to v�� If there is a dual system ATx� � b� to solve with AT � then w�
is obtained by scaling the initial residual b� �ATx���
For completeness� we give the algorithm of the BiCG method from ����
Algorithm ��� The BiCG Method�
�� Compute r� �� b�Ax�� Choose r�� such that �r�� r
�� �� ��
�� Set p� �� r�� p�� �� r��
�� For j � �� �� � � �� until convergence Do�
�� �j �� �rj � r�j ��Apj � p
�j
�� xj�� �� xj � �jpj
� rj�� �� rj � �jApj
� r�j�� �� r�j � �jAT p�j
�� j �� �rj��� r�j�� ��rj � r
�j
�� pj�� �� rj�� � jpj
� � p�j�� �� r�j�� � jp�j
��� EndDo
In Algorithm ���� the initial guess is x�� The stopping creterion is usually set to measure the
reduction rate or the magnitude of a certain residual norm� A limit on the maximum number of
iterations is also used in almost all implementations of iterative methods� Implicitly� the algorithm
solves not only the original system Ax � b but also a dual linear sytem ATx� � b� with AT ����
At each iteration� BiCG needs two matrix�vector products with both A and AT �
It is known that an iterative method such as BiCG may not converge �or converges very
slowly for solving large scale ill�conditioned dense matrices� In practical applications� precondi�
tioning techniques should be used with the BiCG method to reduce the number of iterations�
We will evaluate two preconditoners� the ILU preconitioner with a dual dropping strategy �a
�ll�in and a drop tolerance ��� and the block diagonal preconditioner� using the trivial identity
matrix preconditioner as comparison� The block diagonal preconditioner for the multilevel FMM
is implemented in ���� Due to space limit� we will not be able to evaluate other type of precon�
ditioning technqiues� The e�ciency and examples of the ILU�� and some sparse approximate
inverse preconditioners in the FMM implementation can be found in ��� ���
By the block diagonal preconditioning� we construct a preconditioner A��D from the block
diagonal matrix AD � and then apply the preconditioner A��D to the linear system �� � That is�
A��D Ax � A��D b�
�A��D T AT x � �A��D T b�
�
Since AD is a block diagonal matrix� each individual block can be inverted independently�
By the ILU preconditioning� we construct a preconditioner M�� from the incomplete LU
factorization of the near part matrix �AD � AN � and then apply the preconditioner M�� to the
linear system �� � That is�
M��Ax � M�� b�
�M�� T AT x � �M�� T b�
It is well known that a simple preconditioner� such as the ILU�� � which uses the sparsity pattern
of A� may not be robust for solving some large scale ill�conditioned problems� Di�erent high
accuracy preconditioners have been proposed for solving sparse matrices ���� Many of them are
constructed using either an enlarged sparsity pattern or using some threshold based drop tolerance
to allow more �ll�in in the construction phase� Some high accuracy ILU preconditioning strategies
may not be able to control the amount of memory space needed a priori� Since memory cost is
of vital importance in large scale electromagnetic scattering simulations� we resort to a particular
incomplete LU factorization strategy which uses a dual dropping strategy �ILUT to control both
the computational cost and the memory cost�
The following algorithm of the ILUT factorization on a matrix A � �aij N�N is due to Saad
and is extracted from ���� ���
Algorithm ��� The ILUT Algorithm�
�� For i � �� � � � � N Do�
�� w �� ai�
�� For k � �� � � � � i� � and when wk �� � Do�
�� wk �� wk�akk
�� Apply a dropping rule to wk
� If wk �� � then
� w �� w � wk � uk�
�� EndIf
�� EndDo
� � Apply a dropping rule to row w
��� li�j �� wj for j � �� � � � � i� �
��� ui�j �� wj for j � i� � � � � N
��� w �� �
��� EndDo
In Algorithm ���� wk is a work array and ai� denotes the ith �current row of the matrix A�
The dual dropping strategy of the ILUT is implemented using the two parameters and p� The
small entries of wk �with respect to and relative to a certain norm of the current row are dropped
in line �� In line ��� small entries are dropped again� A search algorithm is used to �nd out the
largest p entries in magnitude� These p largest entries are kept� the others are dropped again� The
total storage is bounded by �pN � where p is the given �ll�in parameter and N is the number of
�
unknowns� Here controls the computational cost� and p controls the memory cost� By judiciously
choosing the two parameters and p� we may be able to construct an ILU preconditioner that is
e�ective and does not use much memory space�
In large scale electromagnetic scattering problems� the memory space is of vital important�
Preconditioners using a lot of memory space are less useful� In the MLFMA implementation� the
global matrix is not explicitly available� thus preconditioners need access to the global matrix are
not suitable� By using the ILUT� we anticipate to use not very much more than the amount of
memory space used to to store the near part matrix �AD � AN � Thus the memory and �oating
point operation complexity of the MLFMA is maintained�
When incomplete LU factorization preconditioners are used in combination with an iterative
process such as a Krylov subspace method� they tend to dictate the convergence behavior more
than the iterative process itself ���� In general� high accuracy ILU type preconditioners with a
large amount of �ll�in are more robust but cost more in construction than their lower accuracy
counterparts do ���� Since an ILUT preconditioner may be reused several times by solving the
same matrices with several di�erent right hand sides in electromagnetic scattering simulations� it
is possible to amortize the cost of constructing a high accuracy preconditioner� However� it is
important to note that the memory space is usually a bottleneck in large scale electromagnetic
scattering simulations�
The following algorithm shows the preconditioned BiCG method with a preconditioner M �
Algorithm ��� The Preconditioned BiCG Method�
�� Compute r� �� b�Ax�� Choose r�� such that �r�� r
�� �� ��
�� Compute z� ��M��r�� z�� �� �M�� T r�� �
�� Set� p� �� z�� p�� �� z��
�� For j � �� �� � � �� until convergence Do�
�� �j �� �rj � z�j ��Apj � p
�j
� xj�� �� xj � �jpj
� rj�� �� rj � �jApj
�� r�j�� �� r�j � �jAT p�j
�� zj�� �� M��rj��
� � z�j�� �� �M�� T r�j����� j �� �rj��� z
�j�� ��rj � z
�j
��� pj�� �� rj�� � jpj
��� p�j�� �� z�j�� � jp�j
��� EndDo
Few theoretical results are known about the convergence of BiCG ��� In practice� it is
observed that the convergence behavior may be quite irregular� and the method may even break
down� The breakdown situation due to the possible event that �rj � z�j � � can be circumvented by
the so�called look�ahead strategy ���� The other breakdown situation� �Apj � p�j � �� occurs when
the LU decomposition fails� and can be repaired by using another decomposition� Sometimes� the
�
breakdown or near�breakdown situations can be satisfactorily avoided by a restart at the iteration
step immediately before the �near� breakdown step� Another possibility is to switch to a more
robust �but possibly more expensive method� like GMRES ��� ���
For symmetric positive de�nite systems the BiCG method delivers the same results as the
conjugate gradient �CG method� but at twice the cost per iteration� For nonsymmetric matrices
it has been shown that in phases of the process where there is signi�cant reduction of the norm of
the residual� the method is more or less comparable to full GMRES �in terms of the numbers of
iterations ����
� Numerical Experiments and Analysis
In this section� we present a number of numerical examples to demonstrate the e�ciency of the
ILU preconditioner with a dual dropping strategy �a �ll�in and a drop tolerance ���� compared
with the block diagonal preconditioner and without a preconditioner� in speeding up the BiCG
iterations� All cases are tested on one processor of an HP Superdome cluster at the University of
Kentucky� The processor has � GB local memory and runs at ��� MHz� The code is written in
Fortran �� programming language and is run in single precision�
To demonstrate the performance of our preconditioned BiCG solver� we calculate the radar
cross section �RCS of di�erent conducting geometries with and without coating� The geometries
considered include plates� spheres� antennas and antenna arrays� The mesh sizes for all the test
structures are about one tenth of a wavelength� The information describing the structures and the
generated matrices is given in Table �� The second column entitled �level� indicates the number of
levels in the multilevel fast multipole algorithm� In most cases� three or four levels are used� For
the largest case P�A� we have used seven levels� In Table � we also give the number of nonzeros
of the full matrix A� the block diagonal matrix AD � and the near part matrix �AD �AN �
Numerical data from each test case are shown in Tables � � ��� Corresponding convergence
history and certain computed solutions for some selected cases are plotted in Figures � � ��� There
are two stopping criteria implemented in the code� One is the error�bound �residual norm � and
the other is the limit of iteration numbers� The error�bound is to reduce the ��norm residual by
����� and the limit of the maximum number of iterations is set as �� ����
Here are the explanations of the notations shown in the data tables�
� prec � the preconditioner used with the iterative method
� NONE � no preconditioner
� BLOCK � the block diagonal preconditioner
� ILUT � the incomplete LU preconditioner with a dual dropping strategy
� � the threshold drop tolerance used in ILUT
� p � the �ll�in parameter used in ILUT
��
Table �� Information about the matrices used in the experiments ��� is the wavelength in meters �
cases level unknowns matrix nonzeros target size and descriptionP�A � ��� A ������� ��� � ��������
AD ������ ��� � ����AD � AN ���� � Conducting plate
P�B � ����� A ��������� �� �� �� � ������ �� � ������ ��AD ������ ��� � ����
AD � AN ������� Dielectric plate over conducting plateS�A � ����� A ���������� � �� � ������ �� � ������ ��
AD ����� � ��� � �����AD � AN ��� ����� Conducting sphere �partial�
S�B � ����� A ���� ����� ������ �� � ��� �� �� � ��� ��AD ������ �� ��� � ����� Conducting sphere
AD � AN ������ �� with dielectric coating �partial�P�A � ��� A � ����� ����� �� � ������ �� � ������ ��
AD ������ ��� � ���� ��AD � AN ������� Microstrip antenna �no substrate�
P�B � ����� A �������� ����� �� � ������ �� ������� ��AD ������ ��� � ���� ��
AD � AN ������� Microstrip antennaS�A � ������ A ����������� � �� � ��� � � ��
AD ������� ��� � ��AD � AN ��� ����� Large conducting sphere
S�B � ������ A ���� �������� ��� �� � ��� �� � ��� ��AD ��������� ��� � ��
AD � AN ���������� Large sphere with dielectric coatingP�A � ������� A �������������� ������ �� � ������ ��
AD ��������� ��� � �������AD � AN ��������� Antenna array
� ratiof � the sparsity ratio of the number of nonzeros of the L and U factors with respect to
the number of nonzeros of the full matrix A
� ration � the sparsity ratio of the number of nonzeros of the L and U factors with respect to
the number of nonzeros of the near part matrix �AD �AN
� LUcpu � the CPU time for performing the incomplete LU factorization
� itnum � the number of the preconditioned BiCG iterations
� itcpu � the CPU time for the iteration phase
� sol � iterative solution quality� compared with the exact solution from the Gauss elimination
�LUD solver� or with the solution from the converged cases when the Gauss elimination
solver cannot be used due to the memory limit
� same � converged within the given stopping criteria
� di� � did not converge within the given stopping criteria
We summarize our experimental results as follows�
�� For the class of problems we tested� the block diagonal preconditioner improves the BiCG
convergence in only � �P�B� P�A� and P�B out of the � cases� In the remaining other
��
cases� the block diagonal preconditioner actually hampers the BiCG convergence� It is our
conclusion that the block diagonal preconditioner is not robust for solving this class of dense
matrices arising from combined hybrid integral formulation of the electromagnetic scattering
problems�
�� For almost all problems� the ILUT preconditioner improves the BiCG convergence signi��
cantly and reduces the total CPU time substantially� Our tests show that the ILUT precon�
ditioner is robust for solving this class of dense matrices� We also see that di�erent choices of
the ILUT parameters and p may a�ect the convergence rate of the preconditioned iterative
solver�
�� It is worth pointing out that the memory cost of the ILUT preconditioner is reasonable� In
most cases� the memory cost of the ILUT preconditioner is less than the cost of storing the
near part matrix �AD �AN � see the data in the column entitled ration�
�� For most problems� the computation �construction cost �LUcpu of the ILUT preconditioner
is low� compared to the preconditioned iterative solution cost �itcpu � The exceptions are
found in the S�B and S�B cases� These two cases are related to modeling conducting spheres
with dielectric coating�
�� For the converged cases� the approximate solutions computed from the preconditioned BiCG
solver are comparable to those computed from the direct solver�
If a test case converges �ne� we show the computed solution graphs from each methods �the
left hand side graphs of the Figures � � � � The solution graphs indicate the scattering pattern of
the objects�
We compare the computed solution from each preconditioning strategies with the exact so�
lution �LUD of the linear system �� �the right hand side graphs of the Figures � � � � If the
number of unknowns are over ������ then we did not try to get the exact solution from the LUD
decomposition because it is too time consuming�
Figures � and � show that the solutions of these cases computed with all three preconditioning
strategies are similar and approximate to the exact solution� Figures � and � show that the solutions
of the BiCG method without a preconditioner and with the ILUT preconditioner are similar and
both solutions approximate to the exact solution� The block diagonal preconditioner does not
converge within the given stopping criteria�
In the case of P�A� the number of the ILUT iterations and the CPU time is relatively small
compared to those of the iterative solver without a preconditioner and with the block diagonal
preconditioner� The sparsity ration of the ILUT preconditioner for most cases is less than � or
very close to � �see Tables � � �� and Tables � and �� except in the case of S�A �see Table � �
Therefore� the ILUT preconditioner with a suitable drop tolerance parameter and a suitable �ll�in
parameter does not need a large amount of memory space�
In all cases� the convergence behavior graphs �the right hand side graphs of the Figures �
� � and Figures � �� show that BiCG with the ILUT preconditioner converges very fast with
��
Table �� Numerical data of the P�A case �unknowns � ��
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ����� ���� sameBLOCK ������� ����� ������ sameILUT ���� �� ���� ������ �� ���� � ���� same
�� ���� ������� ������� �� ���� same���� �� ���� ������� �� ��� � ���� same
�� ���� ������� ������� � ���� same���� �� ���� ������� �� ���� �� ���� same
�� ���� ������ ������� �� ���� same���� �� ���� ������� �� ���� �� ���� same
�� ���� ������ ������� �� ���� same
0 10 20 30 40 50 60 70 80 90 100−50
−40
−30
−20
−10
0
10
RC
S(d
Bsw
)
Angle of observation
LUD non−precondblock diag ILU
0 200 400 600 800 1000 120010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Solution comparison �left and convergence history �right for solving the P�A case�ILUT parameters are � ���� and p � ���
a small number of iterations compared to with the block diagonal preconditioner and without a
preconditioner� High accuracy ILUT preconditioner with more �ll�in is shown to be more robust
and more e�cient�
Accuracy and stability of the ILUT factorization� If the sparsity ratio ration of the ILUT
preconditioner is very small the preconditioned BiCG iteration may not converge within the given
stopping criteria or diverge� There are two possible reasons for the ILU type preconditioners
to perform poorly for solving certain type of matrices� Of course� if the ILU preconditioner is
not accurate� such as using large values and small p values in the ILUT preconditioner� the
preconditioning matrix M is a poor approximation to the near part matrix �AD � AN �or to
the coe�cient matrix A � It is expected that the preconditioning e�ect of these poor quality ILU
preconditioners is not good� Such poor quality ILU preconditioners due to insu�cient amount of
�ll�in can usually be improved by allowing more �ll�in� i�e�� by choosing a smaller value of and a
larger value of p in the ILUT implementation�
Poor preconditioning quality can also come from an unstable factorization� which is usually
associated with small pivots encountered in the ILU factorizations ��� ��� Small pivots usually pro�
duce the L and U factors with large magnitude entries� which make the relatively small magnitude
��
Table �� Numerical data of the P�B case �unknowns � �� ��
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ����� ������ sameBLOCK ������� ����� ������ sameILUT ���� �� ���� ������ ������� �� ���� same
�� ���� ������� ������� �� ���� same�� ���� ���� � ������� �� ��� � same
���� �� ���� ������� ����� � �� ���� same�� ���� ������� ������� �� ���� same�� ���� ���� �� ������� ��� ����� same
���� �� ���� ������� ������� �� ���� same�� ���� ����� � ������� � ���� same�� ���� ���� � ������� ��� ��� � same
���� �� ���� ������ ������� �� ���� same�� ���� ������� ���� �� �� ���� same�� ���� ������� ���� �� ��� ���� same
0 10 20 30 40 50 60 70 80 90 100−40
−30
−20
−10
0
10
20
RC
S(d
Bsw
)
Angle of observation
LUD non−precondblock diag ILU
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Solution comparison �left and convergence history �right for solving the P�B case�ILUT parameters are � ���� and p � ���
Table �� Numerical data of the S�A case �unknowns � �� ��� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE �� ������ sameBLOCK ������� ����� ������� di�ILUT ���� ��� ����� ������� ������� ��� ����� same
��� ����� ������� ������� � � �� �� same���� ��� ����� ������� ������� ��� ����� same
��� ����� ������� ������� � � �� � � same���� ��� ����� ������� ������� � ��� � same
��� ����� ������� ������� ��� ������ same���� ��� ����� ������� ������� ��� ����� same
��� ����� ������� ����� � ��� ������ same
��
0 10 20 30 40 50 60 70 80 90 100−45
−40
−35
−30
−25
−20
−15
−10R
CS
(dB
sw)
Angle of observation
LUD non−precondILU
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Solution comparison �left and convergence history �right for solving the S�A case�ILUT parameters are � ���� and p � ����
Table �� Numerical data of the S�B case �unknowns � �� �� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ����� ������� sameBLOCK ������� ����� ������� di�ILUT ���� ��� � ���� ������� ��� ��� �� ������ same
��� ������ ������� ������� �� ������ same���� ��� � ��� ������� ��� � � �� ������ same
��� ������ ������ ����� � �� ���� � same���� ��� � ���� ������� ��� ��� �� ������ same
��� ������ ������� ������� �� ������ same���� ��� � ���� ������� ��� ��� � ������ same
��� ���� � ������� ������� �� �� ��� same
0 10 20 30 40 50 60 70 80 90 100−45
−40
−35
−30
−25
−20
−15
−10
−5
RC
S(d
Bsw
)
Angle of observation
LUD non−precondILU
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Solution comparison �left and convergence history �right for solving the S�B case�ILUT parameters are � ���� and p � ����
��
Table � Numerical data of the P�A case �unknowns � �� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ����� �� sameBLOCK ������� ��� �� � sameILUT ���� �� ���� ������� ������� � ��� same
�� ���� ������� ���� �� �� ���� same�� ���� ������� ������� � ���� same
���� �� ���� ������� ������� � ��� same�� ���� ������� ���� �� �� ���� same�� ���� ������� ������� � ���� same
���� �� ���� ������� ������� � ���� same�� ���� ������� ���� �� �� ���� same�� ��� ������� ������� � ���� same
���� �� ���� ������� ������� �� ��� same�� ���� ������ ������� �� ���� same�� ���� ������� ����� � � ���� same
0 100 200 300 400 500 600 700 800 900 100010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure � Convergence history for solving the P�A case� ILUT parameters are � ���� and p ����
Table �� Numerical data of the P�B case �unknowns � �� �� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ����� ������ di�BLOCK ������� � ����� sameILUT ���� ��� ���� ������� ������� �� ���� same
� ���� ������ ������� �� ���� same�� ���� ��� ��� ����� � � ���� same
���� ��� ���� ������� ������� �� ���� same � ��� ����� � ������� �� ���� same�� ���� ��� � � ������� � ���� same
���� ��� ��� ������ ���� �� �� ���� same � ��� ������� ���� �� �� ���� same�� ���� ��� ��� ����� � � ���� same
���� ��� ���� ������� ���� �� �� ���� same � ���� ������� ���� �� �� ���� same�� ���� ��� �� ������� �� ��� same
�
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
102
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Convergence history for solving the P�B case� ILUT parameters are � ���� and p �����
Table �� Numerical data of the S�A case �unknowns � ��� ��� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ��� ������ sameBLOCK ������� ����� ������� di�ILUT ���� ��� ����� ������� ������� ��� ������ same
��� ����� ������� ������� � � ������ same��� ����� ������� ������� ��� ������� same
���� ��� ����� ������� ������� ��� ������ same��� ����� ������� ������� � � ������ same
���� ��� ���� ������� ������� ��� ������ same��� ����� ������� ������� ��� �� ��� same
���� ��� ����� ������� ������� ��� �� ��� same��� ����� ������� ����� � ��� ������ same
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Convergence history for solving the S�A case� ILUT parameters are � ���� and p �����
��
Table �� Numerical data of the S�B case �unknowns � ��� ��� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE � � ������� sameBLOCK ������� ����� �������� di�ILUT ���� ��� ������� ������� ����� � ��� ������� same
��� ������� ������� ���� �� ��� � ����� same��� ������� ������� ������� ����� ������ same
���� ��� ������� ������� ������� ��� ������ same��� ������ ������� ���� �� ��� � ����� same
���� ��� ������� ������� ������� ��� ����� � same��� ������� ������� ���� � � � � ��� � same
���� ��� ������� ������� ������� ��� ������ same��� ������ ������� ������ ��� � � ��� same
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−3
10−2
10−1
100
101
Res
idua
l nor
m
Number of iterations
non block diagILU
Figure �� Convergence history for solving the S�B case� ILUT parameters are � ���� and p �����
Table ��� Numerical data of the P�A case �unknowns � ���� ��� �
prec � p LUcpu ratiof ration itnum itcpu sol
NONE ��� ������ sameBLOCK ������� ��� ������� sameILUT ���� �� ������ ������� ������� �� �� ��� same
�� ������ ������� ������� �� ������ same�� ������ ������� ������� �� ������ same�� ������ ������� ��� ��� ��� ������� same
���� �� ������ ������� ����� �� ������ same�� ���� � ������� ������� �� ������ same�� ������ ������� ��� ��� ��� ������� same
���� �� ������ ������� ������� �� ���� � same�� ������ ������� ������� �� ������ same�� ������ ������� ��� ��� ��� � ����� same
���� �� ������ ������� ������� �� ������ same�� ������ ������� ������ �� ���� � same�� ������ ������� ��� ��� ��� ������� same
��
0 100 200 300 400 500 60010
−3
10−2
10−1
100
101
Number of iterations
Res
idua
l nor
m
non block diagILU
Figure ��� Convergence history for solving the P�A case� ILUT parameters are � ���� and p ����
entries of the matrix A insigni�cant in �nite precision computations� In sparse matrix computa�
tions� such problems can happen when performing ILU factorization on inde�nite matrices ��� ���
In constructing an ILUT preconditioner for a dense matrix� we use single precision arithmetic
due to the large amount of �oating point operations and storage requirement� If the condition
number of the matrix M�� � �LU �� is large� say� on the order of ��� computations in such
a precision with the matrix M�� is likely to be inaccurate� The preconditioned matrix M��A
might be worse conditioned than the original matrix A� In this situation� the preconditioned BiCG
method may take more iterations to converge or may not converge at all� On the other hand� due
to the implementation in which we stop the iteration when the �preconditioned residual norm is
reduced by ����� it is possible that the initial residual norm is huge� Convergence might appear
even if the actual residual norm is still large� if the initial residual norm is large� Such a convergence
is called a false convergence�
To distinguish a false convergence from a true convergence� or to determine if a preconditioner
might be e�ective before a preconditioned iteration procedure is carried out� it is informative to
estimate the condition number of the �LU �� matrix� Chow and Saad proposes to estimate this
number by calculating jj�LU ��ejj� where e is the vector of all ones� This statistics quantity
is called condest in ��� We note that this statistics is also a lower bound for jj�LU ��jj� and
indicates a relation between the unstable triangular solutions and the poorly conditioned L and
U factors ��� ��� The computation of the quantity k�LU ��ek� is just one forward solution and
back substitution with the L and U factors�
We calculate the condest numbers for the P�A and S�A cases� in order to investigate what
might happen with the ILUT factorization in solving a complex valued dense matrix� We �nd
that the drop tolerance value �in the range of our studies does not a�ect very much the value
of condest� Condest value mostly depends on the �ll�in parameter p� As the �ll�in parameter p
decreases and the drop tolerance parameter increases� the value of condest increases a lot in
certain situations� We also �nd that the values of condest are related to the solution patterns� If
condest is very large� the preconditioned BiCG method does not converge within the given stopping
��
Table ��� Condest values of the ILUT preconditioner for solving the P�A case �unknowns � �� �
� p condest itnum convergence sol
���� �� ���� ��� � yes same�� ������� �� yes same�� � � ���� �� yes di��� �������E�� � yes di��� �������E�� � diverge �
���� �� �������� �� yes same�� �������� �� yes same�� � ������ �� yes di��� ���� ��E�� � yes di�
Table ��� Condest values of the ILUT preconditioning for solving the S�A case �unknowns ��� ��� �
� p condest itnum convergence sol
���� ��� �� ���� ��� yes same��� �������� � � yes same��� �������E��� ����� no di�
���� ��� �� �� � ��� yes same��� �������� ��� yes same��� �������E��� ����� no di�
criteria or diverges and the computed solution is not good� as we see in Tables �� � ���
In Table ��� the ILUT with p � �� and � ���� converges in � iterations� but the computed
solution is not good� This is an example of false convergence� We can determine the quality of
this ILUT preconditioner without even starting the BiCG iteration process� The condest value of
��������� indicates that any single precision computation with �LU �� is likely to be inaccurate�
� Conclusions
In the implementation of the MLFMA� the ILU preconditioners based on a dual dropping strategy
�ILUT has been shown to reduce the number of BiCG iterations dramatically� compared to the
block diagonal preconditioner and without a preconditioner� Solving the large complex valued
dense linear system arising from electromagnetic scattering using the BiCG accelerator with an
ILU preconditioner maintains the computational complexity of the MLFMA� and consequently
reduces the total CPU time�
According to the results from our numerical experiments� we can see that the ILUT pre�
conditioner constructed from the near part matrix improves the computational e�ciency in the
sense of both memory and CPU time� The results show that the BiCG accelerator with the ILUT
preconditioner is robust for solving three dimensional model cases from electromagnetic scattering
simulations�
One advantage of our implementation is that we do not need access to the global matrix to
construct our preconditioner �ideal for the MLFMA in which only AD and AN are numerically
��
available � This is in contrast to many existing preconditioning techniques either in the incomplete
LU factorization forms or in the sparse approximate inverse forms� in which a sparsi�ed global
matrix may be needed to construct a preconditioner ��� ��
We also discussed the situations that inaccurate or unstable ILUT preconditioners may be
computed� We proposed to use a simple condest value to predict the quality of a constructed
ILUT preconditioner before the preconditioned BiCG iteration is started� Our experimental results
indicate that this strategy is practically useful in predicting false convergence and divergence of
the preconditioned iterative solvers�
References
�� G� All�eon� M� Benzi� and L� Giraud� Sparse approximate inverse preconditioning for dense
linear systems arising in computational electromagnetics� Numer� Algorithms� ������� �����
�� R� Barrett� M� Berry� T� F� Chan� J� Demmel� J� Donato� J� Dongarra� V� Eijkhout� R� Pozo�
C� Romine� and H� van der Vorst� Templates for the Solution of Linear Systems� Building
Blocks for Iterative Methods� �nd Edition� SIAM� Philadelphia� �����
�� B� Carpentieri� I� S� Du�� and L� Giraud� Experiments with sparse preconditioning of dense
problems from electromagnetic applications� Technical Report TR PA �� ��� CERFACS�
Toulouse� France� �����
�� B� Carpentieri� I� S� Du�� and M� Magolu monga Made� Sparse symmetric preconditioners
for dense linear systems in electromagnetism� Technical Report TR PA �� ��� CERFACS�
Toulouse� France� �����
�� K� Chen� On a class of preconditioning methods for the dense linear systems from boundary
elements� SIAM J� Sci� Comput�� ���� ������� �����
� W� C� Chew� J� M� Jin� E� Midielssen� and J� M� Song� Fast and E�cient Algorithms in
Computational Electromagnetics� Artech House� Boston� �����
�� E� Chow and Y� Saad� Experimental study of ILU preconditioners for inde�nite matrices� J�
Comput� Appl� Math�� ���������� �����
�� R� Coifman� V� Rokhlin� and S� Wandzura� The fast multipole method for the wave equation�
a pedestrian prescription� IEEE Antennas Propagat� Mag�� ���� ������ �����
�� E� Darve� The fast multipole method� numerical implementation� J� Comput� Phys�� �������
���� �����
��� R� Freund and N� Nachtigal� QMR� A quasi�minimal residual method for non�Hermitian linear
systems� Numer� Math�� ���������� �����
��
��� B� M� Kolundzija� Electromagnetic modeling of composite metallic and dielectric structures�
IEEE Trans� Micro� Theory Tech�� ���� ����������� �����
��� C� Lanczos� Solution of systems of linear equations by minimized iterations� J� Research
National Bureau Standards� ��������� �����
��� C� C� Lu and W� C� Chew� A multilevel algorithm for solving a boundary integral equation
of wave scattering� IEEE Trans� Micro� Opt� Tech� Lett�� ���� ������� �����
��� C� C� Lu� and W� C� Chew� A coupled surface�volume integral equation approach for the
calculation of electromagnetic scattering from composite metallic and material targets� IEEE
Trans� Antennas Propagat�� ����� �������� �����
��� B� N� Parlett� D� R� Taylor� and Z� A� Liu� A look�ahead Lanczos algorithm for unsymmetric
matrices� Math� Comput�� ����������� �����
�� J� Rahola� Experiments on iterative methods and the fast multipole method in electromagnetic
scattering calculations� Technical Report TR PA �� ��� CERFACS� Toulouse� France� �����
��� S� M� Rao� D� R� Wilton� and A� W� Glisson� Electromagnetic scattering by surface of arbitrary
shape� IEEE Trans� Antenna and Propagat�� AP����� ��������� �����
��� V� Rokhlin� Rapid solution of integral equations of scattering theory in two dimensions� J�
Comput� Phys�� ��� ��������� �����
��� Y� Saad� ILUT� a dual threshold incomplete LU factorization� Numer� Linear Algebra Appl��
��� ��������� �����
��� Y� Saad� Iterative Methods for Sparse Linear Systems� PWS Publishing Company� Boston�
����
��� K� Sertel and J� L� Volakis� Incomplete LU preconditioner for FMM implementation� Micro�
Opt� Tech� Lett�� ��� ������� �����
��� T� K� Shark� S� M� Rao� and A� R� Djordievic� Electromagnetic scattering and radiation from
�nite microstrip structures� IEEE Trans� Micro� Opt� Tech�� ����� ���������� �����
��� J� M� Song and W� C� Chew� Multilevel fast multipole algorithm for solving combined �eld
integral equation of electromagnetic scattering� Micro� Opt� Tech� Lett�� ���� ������� �����
��� J� M� Song� C� C� Lu� and W� C� Chew� Multilevel fast multipole algorithm for electromagnetic
scattering by large complex objects� IEEE Trans� Antennas Propagat�� AP������ �����������
�����
��� T� Vaupel and V� Hansen� Electrodynamic analysis of combined microstrip and copla�
nar slotline structure with ��D components based on a surface volume integral equation ap�
proach� IEEE Trans� Micro� Theory Tech�� ���� ����������� �����
��
�� C� F� Wang and J� M� Jin� A fast full�wave analysis of scattering and radiation from large �nite
arrays of microstrip antennas� IEEE Trans� Antennas Propagat�� ���� ���������� �����
��� J� Zhang� Sparse approximate inverse and multilevel block ILU preconditioning techniques
for general sparse matrices� Appl� Numer� Math�� ������� �����
��� J� Zhang� A multilevel dual reordering strategy for robust incomplete LU factorization of
inde�nite matrices� SIAM J� Matrix Anal� Appl�� ���� ��������� �����
��