a crash course on fast multipole method (fmm)hanliang/publications/fmm_tutorial_hanl… · fmm...
TRANSCRIPT
A Crash Course on Fast Multipole Method (FMM)
Hanliang Guo Kanso Biodynamical System Lab
AME 599, 2018 April
AME 599Apr 2018
What is FMM?
• A fast algorithm to study N body interactions (Gravitational, Coulombic interactions…)
2
AME 599Apr 2018
Top 10 algorithms in the 20th century*
3
• Metropolis Algorithm for Monte Carlo • Simplex Method for Linear Programming • Krylov Subspace Iteration Methods • The Decompositional Approach to Matrix Computations • The Fortran Optimizing Compiler • QR Algorithm for Computing Eigenvalues • Quicksort Algorithm for Sorting • Fast Fourier transform • Integer Relation Detection • Fast Multipole Method
*:chronological order J. Dongarra & F. Sullivan, Computing in Science & Engineering, (2000)
AME 599Apr 2018
What is FMM?
• N body interactions (Coulombic, Gravitational interactions…)
• Reduces complexity from O(N2) to O(N)
• An approximation with well defined error tolerance
• Not (too) difficult to implement
• Exists libraries to use directly
4
AME 599Apr 2018
Goal for this course
• Learn: • Understand the algorithm
(Why is it accurate… Why is it fast…) • Get a hand on my toy FMM code (matlab) • How to use existing libraries
• Future: • Adaptive mesh • Periodic boundary conditions
5
AME 599Apr 2018
Outline
• Complexity
• Multipole expansion
• Coarse graining
• N log(N) algorithm
• Fast multipole method (FMM)
• Application to Stokes flow
• Kernel independent FMM (KIFMM)
6
AME 599Apr 2018
Complexity
• How computational cost scales with the problem size
• Does not care the constant factor
• Extremely important for large N
• O(N2) complexity is usually considered infeasible!
• The constant factor is also very important in practice
• A naive implementation of N body interaction problem has O(N2) complexity
7
Reference Algorithm book…
AME 599Apr 2018
A working example
8
�x
o
(x) = � log(kx� x
o
k)
E
x
o
(x) =x� x
o
kx� x
o
k2
N=400
N source points with charges qi
Want to know: Potential and electrostatic forces at N target points Assume target points and source points are the same (not necessary)
Away from source point:
AME 599Apr 2018
A working example
9
�x
o
(x) = Re(� log(z � zo
))
E
x
o
(x) = r�x
o
(x)
N=400
N source points with charges qi
Want to know: Potential and electrostatic forces at N target points Assume target points and source points are the same (not necessary)
Away from source point:
AME 599Apr 2018
Multipole expansion
Suppose that m charges of strengths {qi, i=1,…,m} are located at points {zi, i=1,…,m}, with |zi|<r. Then for any z with |z|>r, the potential \phi(z) induced by the charges is given by
10
�(z) = Q log(z) +1X
k=1
akzk
Q =mX
i=1
qi ak =mX
i=1
�qizkik
Can we truncate the infinite series with a known error tolerance?
r
z
qi, zi Proof…
AME 599Apr 2018
Error bound
11
c =���z
r
��� , A =mX
i=1
|qi|, ↵ =A
1� | rz |
For any p>=1,������(z)�Q log(z)�pX
k=1
akzk
����� 1
p+ 1
↵���r
z
���p+1
✓
A
p+ 1
◆✓1
c� 1
◆✓1
c
◆p
cr
For any tolerance \epsilon, choose p s.t.
p = � logc ✏
r
z
qi, zi
AME 599Apr 2018
A naive coarse graining method
12
• Group particles in each box as clusters.
• Form a multipole expansion for each box.
• Compute interactions between distant clusters when possible.
• Add far-field and direct near-field (direct computations) together.
Complexity is still O(N2)!
The constant factor could be reduced.
AME 599Apr 2018
N log(N) algorithm
Consider clusters of particles at successive levels of spatial refinement, and to compute interactions between distant clusters by means of expansions when possible.
13
Divide and conquer!
AME 599Apr 2018
Near neighbors: two boxes are at the same refinement level and share a boundary point
Well separated: two boxes are at same refinement level and are not near neighbors
Interaction list: box i’s interaction list consists of the children of the near neighbors of i’s parent which are well separated from box i.
14
N log(N) algorithm
Near neighbors: two boxes are at the same refinement level and share a boundary point
Well separated: two boxes are at same refinement level and are not near neighbors
Interaction list: box i’s interaction list consists of the children of the near neighbors of i’s parent which are well separated from box i.
AME 599Apr 2018 15
Maximum size of interaction list is 27
N log(N) algorithm
AME 599Apr 2018
• At every level, the multipole expansion is formed for each box due to the particles it contains. The resulting expansion is then evaluated for each particle in the region covered by its interaction list.
• Halt the recursive process after roughly log(N) levels of refinement.
• At the finest level, compute interactions between nearest neighbors.
16
Operation count…
N log(N) algorithm
AME 599Apr 2018
Fast multipole method (FMM)
• Can we do better than Nlog(N)?
• The previous algorithm is of Nlog(N) complexity because all particles are “accessed” at every level of refinement.
• We can avoid this by some more analytical manipulations.
• In particular… Translation of a multipole expansion (M2M)
Conversion of a multipole expansion into a local expansion (M2L)
Translation of a local expansion (L2L)
17
AME 599Apr 2018
Translation of a multipole expansion (M2M)
Suppose that
is a multipole expansion of the potential due to a set of m charges of strengths q1,q2,…,qm, all of which are located inside the circle D of radius R with center at zo.
18
�(z) = ao
log(z � zo
) +
1X
k=1
ak
(z � zo
)
k
�(z) = ao
log(z) +1X
l=1
bl
zl
bl
= �ao
zlo
l+
lX
k=1
ak
zl�k
o
✓l � 1k � 1
◆where
Then for z outside the circle D1 of radius (R+|zo|) and center at the origin,
qi, zi
zo
R+ |zo
|R
z
O
D1
D
Proof…
AME 599Apr 2018
Error bound of M2M
For any p>=1,
19
������(z)� ao
log(z)�pX
l=1
bl
zl
�����
0
@ A
1���� |zo|+R
z
���
1
A����|z
o
|+R
z
����p+1
A =mX
i=1
|qi|
Similar to the multipole expansion…������(z)�Q log(z)�
pX
k=1
akzk
����� 1
p+ 1
↵���r
z
���p+1
✓
A
p+ 1
◆✓1
c� 1
◆✓1
c
◆p
qi, zi
zo
R+ |zo
|R
z
O
D1
D
AME 599Apr 2018
Conversion of a multipole expansion into a local expansion (M2L)
Suppose that m charges of strengths q1,q2,…,qm are located inside the circle D1 with radius R and center zo, and that |zo|=(c+1)R with c>1. Then the corresponding multipole expansion converges inside the circle D2 of radius R centered about the origin. Inside D2, the potential due to the charges is described by a power series:
20
�(z) =1X
l=0
blzl
bo
= ao
log(�zo
) +
1X
k=1
ak
zko
(�1)
k
bl
= � ao
lzlo
+1
zlo
1X
k=1
ak
zko
✓l + k � 1k � 1
◆(�1)k
zo
R
z
R
qi, zi
O
�(z) = ao
log(z � zo
) +
1X
k=1
ak
(z � zo
)
k
AME 599Apr 2018
Error bounds of M2L
21
������(z)�pX
l=0
blzl
����� <A(4e(p+ c)(c+ 1) + c2)
c(c� 1)
✓1
c
◆p+1
Proof… See Greengard & Rokhlin JCP 1987…
zo
R
z
R
qi, zi
O
(c>1)
AME 599Apr 2018
Translation of a local expansion (L2L)
For any complex zo, z, and {ak}, k=0,1,2,…,n,
22
nX
k=0
ak
(z � zo
)k =nX
l=0
nX
k=l
ak
✓kl
◆(�z
o
)k�l
!zl
zo
O
nX
k=0
ak
(z � zo
)kExpand series close to the origin.
Immediate results of Maclaurin series. No error bound needed!
AME 599Apr 2018
More generally… (in case you want to code)
23
Multipole expansion about zC:
M2M from zC to zM:
M2L from zM to zL:
L2L from zL to zT:
�(z) =pX
l=0
cl(z � zL)l +O(✏)
co
= bo
log(zL
� zM
) +
pX
k=1
(�1)
kbk
(zM
� zL
)
k
, cl
= � bo
l(zM
� zL
)
l
+
1
(zM
� zL
)
l
pX
k=1
(�1)
kbk
(zM
� zL
)
k
✓l + k � 1
k � 1
◆
�(z) = bo
log(z � zM
) +
pX
l=1
bl
(z � zM
)
l
+O(✏)
bo
= ao
, bl
= �ao
(zC
� zM
)l
l+
lX
k=1
ak
(zC
� zM
)l�k
✓l � 1k � 1
◆
�(z) = ao
log(z � zC
) +
pX
k=1
ak
(z � zC
)
k
+O(
rp
Rp
)
ao
=mX
j=1
qj
, ak
=mX
j=1
�qj
(zj
� zC
)k
k
dl =pX
k=1
ck
✓kl
◆(zT � zL)
k�l
�(z) =pX
l=0
dl(z � zT )l
Ying et al. JCP (2004)
AME 599Apr 2018
The fast part!
• If we know the multipole expansion for all children boxes, we can get the multipole expansion for the parent box by doing M2M four times. (4p2 operations)
• If we know the multipole expansion for any box, we can get the local expansion for any well separated box by doing M2L one time. (p2 operations)
• If we know the local expansion for the parent box, we can get the local expansion for all the children boxes by doing L2L four times. (4p2 operations)
24
AME 599Apr 2018
FMM algorithm
• Initialization: Choose a number of levels so that there are, on average, s particles per box at the finest level.
• Upward Pass: Begin at the finest level, create multipole expansions from the source positions and strengths. The expansions for all boxes at all higher levels are the formed by the merging procedure (M2M).
• Downward Pass: We convert the multipole expansion into a local expansion about the center of all boxes in b’s interaction list (M2L). Shift local expansions from parent boxes to children boxes (L2L). Add far field and near field interactions at the finest level.
• Note: Within all steps, you only save the coefficients of the expansions.
25
AME 599Apr 2018
FMM algorithm
26
O(N) AlgorithmUpward Pass Downward Pass
1. Upward pass computes multipoles for all cells: M-to-M translation2. Downward pass translates multipoles to local terms for all cells• Constant (189 in 3D) interactive (cousin) cells per destination cell contribute to M-to-L translation• Inheritance from the parent cell: L-to-L translation (& delegation)
3. Direct interactions for the nearest-neighbor leaf cells
local-to-local
multipole-to-local
multipole-to-local
multipole-to-multipole
See lecture notes for the MM, ML & LL formula in 2D
Within parent’s neighbor but not my neighbor
Adapted from A. Nakano’s lecture note
AME 599Apr 2018
Algorithm - initialization & upward pass
• Initialization
• Choose a level of refinement n ≈ log4N, a precision \epsilon, and set p ≈ -log2(\epsilon)
• Upward pass
1. Form multipole expansions of potential field due to particles in each box about the box center at the finest mesh level
2. Form multipole expansions about the centers of all boxes at all coarser mesh levels, each expansion representing the potential field due to all particles contained in one box.
27
AME 599Apr 2018
Algorithm - downward pass
• Downward pass
3. Form a local expansion about the center of each box at each mesh level l<=n-1. This local expansion describes the field due to all particles in the system that are not contained in the current box or its nearest neighbors. Once the local expansion is obtained for a given box, it is shifted, in the second inner loop to the centers of the box’s children, forming the initial expansion for the boxes at the next level.
4. Compute interactions at finest mesh level.
5. Evaluate local expansions at particle positions.
6. Compute potential (or force) due to nearest neighbor directly.
7. Add direct and far-field terms together.
28
AME 599Apr 2018
Algorithm - complexity
29
Greengard & Rokhlin, JCP, 1987
AME 599Apr 2018
A toy problem in matlab
• www-scf.usc.edu/~hanliang/fmm_working.m
• www-scf.usc.edu/~hanliang/fmm.m (solution)
30
AME 599Apr 2018
Outline
• Complexity
• Multipole expansion
• Coarse graining
• N log(N) algorithm
• Fast multipole method (FMM)
• Application to Stokes flow
• Kernel independent FMM (KIFMM)
31
AME 599Apr 2018
Application to Stokes flow
One fundamental solution to Stokes flow is the Stokeslet
32
ui(x) = Sij(x,y)fj(y), i, j = 1, 2, 3
Sij(x,y) =�ij
|x� y| +(xi � yi)(xj � yj)
|x� y|3
�r2u(x) +rp(x) = f(y)�(x� y)
AME 599Apr 2018
Application to Stokes flow
33
For a set of N Stokeslets, velocity induced by the other Stokeslets at the m-th Stokeslet:
umi =
NX
n=1,n 6=m
3X
j=1
Sij(xm,xn)fn
j , m = 1, · · · , N
Direct evaluation of velocities at all N Stokeslets leads to O(N2) complexity
We seek a fast approach that can make use of the FMM to reduce the complexity to O(N)
3D Laplace FMM:
�(x,y) =1
|x� y| , Ei(x,y) =xi � yi
|x� y|3 Sij(x,y) =�ij
|x� y| +(xi � yi)(xj � yj)
|x� y|3
3D Stokes kernel:
AME 599Apr 2018
Application to Stokes flow
34
For a set of N Stokeslets, velocity induced by the other Stokeslets at the m-th Stokeslet:
umi =
NX
n=1,n 6=m
3X
j=1
Sij(xm,xn)fn
j , m = 1, · · · , N
Direct evaluation of velocities at all N Stokeslets leads to O(N2) complexity
We seek a fast approach that can make use of the FMM to reduce the complexity to O(N)
3D Laplace FMM:
�(x,y) =1
|x� y| , Ei(x,y) = � @
@xi
1
|x� y|
3D Stokes kernel:
Sij(x,y) =
✓�ij � (xj � yj)
@
@xi
◆1
|x� y|
AME 599Apr 2018
Application to Stokes flow
35
For a set of N Stokeslets, velocity induced by the other Stokeslets at the m-th Stokeslet:
umi =
NX
n=1,n 6=m
3X
j=1
Sij(xm,xn)fn
j , m = 1, · · · , N
Four FMM calls!
u
mi =
NX
n=1,n 6=m
3X
j=1
✓�ij � (xm
j � x
nj )
@
@x
mi
◆f
nj
|xm � x
n|
rnm = |xm � x
n|
u
mi =
3X
j=1
2
4✓�ij � x
mj
@
@x
mi
◆ NX
n=1,n 6=m
f
nj
rnm
3
5+@
@x
mi
NX
n=1,n 6=m
x
n · fn
rnm
AME 599Apr 2018
Kernel independent FMM (KIFMM)
• The traditional FMM works well as long as the multipole expansions for the underlying kernel K are known.
• Sometimes it is difficult/impossible to expand a kernel (e.g. the regularized Stokeslet)
• What can we do?
36
�(x) =
Z
�K(x, y)q(y)ds �(xi) =
X
j
K(xi, yj)q(yj)
AME 599Apr 2018
KIFMM - Basic idea
• To use a distribution of source point with known kernel (typically Laplace kernel) outside the box to represent the far-field/near-field potential generated by the source points inside/outside the box.
37
equivalent density, equivalent surface
AME 599Apr 2018
KIFMM - Uniqueness theorem
• If u is real and its first and second partial derivatives are continuous in a region V and on its boundary S, and where \rho, f and g are prescribed functions, then u is unique. (or at least up to a constant)
38
equivalent density, equivalent surface
r2u = ⇢ in V, u = f or @u/@n = g
AME 599Apr 2018
KIFMM - Multipole expansion
• Match the potential generated by the sources and the equivalent density at the check surface outside the equivalent surface.
• Two potentials are the same on and outside the check surface.(uniqueness of the 2nd order PDE’s solution for outer Dirichlet problem)
39
Upward equivalent density, Upward equivalent surface
Upward check surface 1. Evaluate induced potential of the sources on the check surface
2. Solve the upward equivalent density�(x(c)
i ) =X
j
K(x(c)i ,y(e)
j )q(y(e)j )
�(x(c)i ) =
X
j
K⇤(x(c)i ,y(s)
j )q⇤(y(s)j )
12
R(e)= (
p2 + d)r, R(c)
= (4�p2� 2d)r
0 d 4�p2
3
, 2r is the box size
AME 599Apr 2018
KIFMM - Local expansion
40
• Match the potential generated by the sources and the equivalent density at the check surface inside the equivalent surface.
• Two potentials are the same on and inside the check surface.(uniqueness of the 2nd order PDE’s solution for inner Dirichlet problem)
Downward check surface
1. Evaluate induced potential of the sources on the check surface
2. Solve the downward equivalent density�(x(c)
i ) =X
j
K(x(c)i ,y(e)
j )q(y(e)j )
�(x(c)i ) =
X
j
K⇤(x(c)i ,y(s)
j )q⇤(y(s)j )
1
2
Downward equivalent density, Downward equivalent surface
R(e)= (4�
p2� 2d)r, R(c)
= (
p2 + d)r
0 d 4�p2
3
, 2r is the box size
AME 599Apr 2018
KIFMM - M2M, M2L, L2L
41
M2M M2L L2L
1. Evaluate induced potential of the equivalent density 1 on the check surface 2. Solve for the equivalent density 2
Ying et al. JCP (2004)
�(x(c)i ) =
X
j
K(x(c)i ,y(e1)
j )q(y(e1)j ) �(x(c)
i ) =X
j
K(x(c)i ,y(e2)
j )q(y(e2)j )
AME 599Apr 2018
KIFMM - Notes
42
• Algorithm stays exactly* the same as the traditional FMM.
• No need to expand any kernels!
• Still need to evaluate induced potential for the new kernel at the leaf boxes (bottom level)
• Equivalent densities are found by solving p algebraic equations (p is the number of discretization points on the surface, usually small)
AME 599Apr 2018
Algorithm - initialization & upward pass
• Initialization
• Choose a level of refinement n ≈ log4N, a precision \epsilon, and set p ≈ -log2(\epsilon)
• Upward pass
1. Form multipole expansions of potential field due to particles in each box about the box center at the finest mesh level
2. Form multipole expansions about the centers of all boxes at all coarser mesh levels, each expansion representing the potential field due to all particles contained in one box.
43
AME 599Apr 2018
Algorithm - downward pass
• Downward pass
3. Form a local expansion about the center of each box at each mesh level l<=n-1. This local expansion describes the field due to all particles in the system that are not contained in the current box or its nearest neighbors. Once the local expansion is obtained for a given box, it is shifted, in the second inner loop to the centers of the box’s children, forming the initial expansion for the boxes at the next level.
4. Compute interactions at finest mesh level.
5. Evaluate local expansions at particle positions.
6. Compute potential (or force) due to nearest neighbor directly.
7. Add direct and far-field terms together.
44
AME 599Apr 2018
KIFMM - A toy problem in matlab
• www-scf.usc.edu/~hanliang/kifmm_working.m
• www-scf.usc.edu/~hanliang/kifmm.m (solution)
45
AME 599Apr 2018
References
• L. Greengard & V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, 1987
• R. Beatson & L. Greengard, A short course on fast multipole methods, in Wavelets, Multilevel Methods and Elliptic PDEs, Clarendon Press, Oxford, 1997
• H. Cheng et al., A fast adaptive multipole algorithm in three dimensions, Journal of Computational Physics, 1999
• L.X. Ying et al., A kernel-independent adaptive fast multipole algorithm in two and three dimensions, Journal of Computational Physics, 2004
• A-K Tornberg & L. Greengard, A fast multipole method for the three-dimensional Stokes equations, Journal of Computational Physics, 2008
• D. Malhotra & G. Biros, PVFMM: A parallel kernel independent FMM for particle and volume potentials, Communications in Computational Physics, 2015
• W. Yan & M. Shelley, Flexibly imposing periodicity in kernel independent FMM: A Multipole-To-Local operator approach, Journal of Computational Physics, 2018
46
AME 599Apr 2018
Libraries
• Greengard: https://cims.nyu.edu/cmcl/software.html
• PVFMM: https://github.com/dmalhotra/pvfmm
47