a crash course on fast multipole method (fmm)hanliang/publications/fmm_tutorial_hanl… · fmm...

A Crash Course on Fast Multipole Method (FMM)

Hanliang Guo Kanso Biodynamical System Lab

AME 599, 2018 April

AME 599Apr 2018

What is FMM?

• A fast algorithm to study N body interactions (Gravitational, Coulombic interactions…)

2

AME 599Apr 2018

Top 10 algorithms in the 20th century*

3

• Metropolis Algorithm for Monte Carlo • Simplex Method for Linear Programming • Krylov Subspace Iteration Methods • The Decompositional Approach to Matrix Computations • The Fortran Optimizing Compiler • QR Algorithm for Computing Eigenvalues • Quicksort Algorithm for Sorting • Fast Fourier transform • Integer Relation Detection • Fast Multipole Method

*:chronological order J. Dongarra & F. Sullivan, Computing in Science & Engineering, (2000)

AME 599Apr 2018

What is FMM?

• N body interactions (Coulombic, Gravitational interactions…)

• Reduces complexity from O(N2) to O(N)

• An approximation with well defined error tolerance

• Not (too) difficult to implement

• Exists libraries to use directly

4

AME 599Apr 2018

Goal for this course

• Learn: • Understand the algorithm

(Why is it accurate… Why is it fast…) • Get a hand on my toy FMM code (matlab) • How to use existing libraries

• Future: • Adaptive mesh • Periodic boundary conditions

5

AME 599Apr 2018

Outline

• Complexity

• Multipole expansion

• Coarse graining

• N log(N) algorithm

• Fast multipole method (FMM)

• Application to Stokes flow

• Kernel independent FMM (KIFMM)

6

AME 599Apr 2018

Complexity

• How computational cost scales with the problem size

• Does not care the constant factor

• Extremely important for large N

• O(N2) complexity is usually considered infeasible!

• The constant factor is also very important in practice

• A naive implementation of N body interaction problem has O(N2) complexity

7

Reference Algorithm book…

AME 599Apr 2018

A working example

8

�x

o

(x) = � log(kx� x

o

k)

E

x

o

(x) =x� x

o

kx� x

o

k2

N=400

N source points with charges qi

Want to know: Potential and electrostatic forces at N target points Assume target points and source points are the same (not necessary)

Away from source point:

AME 599Apr 2018

A working example

9

�x

o

(x) = Re(� log(z � zo

))

E

x

o

(x) = r�x

o

(x)

N=400

N source points with charges qi

Want to know: Potential and electrostatic forces at N target points Assume target points and source points are the same (not necessary)

Away from source point:

AME 599Apr 2018

Multipole expansion

Suppose that m charges of strengths {qi, i=1,…,m} are located at points {zi, i=1,…,m}, with |zi|<r. Then for any z with |z|>r, the potential \phi(z) induced by the charges is given by

10

�(z) = Q log(z) +1X

k=1

akzk

Q =mX

i=1

qi ak =mX

i=1

�qizkik

Can we truncate the infinite series with a known error tolerance?

r

z

qi, zi Proof…

AME 599Apr 2018

Error bound

11

c =��z

r

�� , A =mX

i=1

|qi|, ↵ =A

1� | rz |

For any p>=1,��(z)�Q log(z)�pX

k=1

akzk

�� 1

p+ 1

↵��r

z

��p+1

✓

A

p+ 1

◆✓1

c� 1

◆✓1

c

◆p

cr

For any tolerance \epsilon, choose p s.t.

p = � logc ✏

r

z

qi, zi

AME 599Apr 2018

A naive coarse graining method

12

• Group particles in each box as clusters.

• Form a multipole expansion for each box.

• Compute interactions between distant clusters when possible.

• Add far-field and direct near-field (direct computations) together.

Complexity is still O(N2)!

The constant factor could be reduced.

AME 599Apr 2018

N log(N) algorithm

Consider clusters of particles at successive levels of spatial refinement, and to compute interactions between distant clusters by means of expansions when possible.

13

Divide and conquer!

AME 599Apr 2018

Near neighbors: two boxes are at the same refinement level and share a boundary point

Well separated: two boxes are at same refinement level and are not near neighbors

Interaction list: box i’s interaction list consists of the children of the near neighbors of i’s parent which are well separated from box i.

14

N log(N) algorithm

Near neighbors: two boxes are at the same refinement level and share a boundary point

Well separated: two boxes are at same refinement level and are not near neighbors

Interaction list: box i’s interaction list consists of the children of the near neighbors of i’s parent which are well separated from box i.

AME 599Apr 2018 15

Maximum size of interaction list is 27

N log(N) algorithm

AME 599Apr 2018

• At every level, the multipole expansion is formed for each box due to the particles it contains. The resulting expansion is then evaluated for each particle in the region covered by its interaction list.

• Halt the recursive process after roughly log(N) levels of refinement.

• At the finest level, compute interactions between nearest neighbors.

16

Operation count…

N log(N) algorithm

AME 599Apr 2018

Fast multipole method (FMM)

• Can we do better than Nlog(N)?

• The previous algorithm is of Nlog(N) complexity because all particles are “accessed” at every level of refinement.

• We can avoid this by some more analytical manipulations.

• In particular… Translation of a multipole expansion (M2M)

Conversion of a multipole expansion into a local expansion (M2L)

Translation of a local expansion (L2L)

17

AME 599Apr 2018

Translation of a multipole expansion (M2M)

Suppose that

is a multipole expansion of the potential due to a set of m charges of strengths q1,q2,…,qm, all of which are located inside the circle D of radius R with center at zo.

18

�(z) = ao

log(z � zo

) +

1X

k=1

ak

(z � zo

)

k

�(z) = ao

log(z) +1X

l=1

bl

zl

bl

= �ao

zlo

l+

lX

k=1

ak

zl�k

o

✓l � 1k � 1

◆where

Then for z outside the circle D1 of radius (R+|zo|) and center at the origin,

qi, zi

zo

R+ |zo

|R

z

O

D1

D

Proof…

AME 599Apr 2018

Error bound of M2M

For any p>=1,

19

��(z)� ao

log(z)�pX

l=1

bl

zl

��

0

@ A

1�� |zo|+R

z

��

1

A��|z

o

|+R

z

��p+1

A =mX

i=1

|qi|

Similar to the multipole expansion…��(z)�Q log(z)�

pX

k=1

akzk

�� 1

p+ 1

↵��r

z

��p+1

✓

A

p+ 1

◆✓1

c� 1

◆✓1

c

◆p

qi, zi

zo

R+ |zo

|R

z

O

D1

D

AME 599Apr 2018

Conversion of a multipole expansion into a local expansion (M2L)

Suppose that m charges of strengths q1,q2,…,qm are located inside the circle D1 with radius R and center zo, and that |zo|=(c+1)R with c>1. Then the corresponding multipole expansion converges inside the circle D2 of radius R centered about the origin. Inside D2, the potential due to the charges is described by a power series:

20

�(z) =1X

l=0

blzl

bo

= ao

log(�zo

) +

1X

k=1

ak

zko

(�1)

k

bl

= � ao

lzlo

+1

zlo

1X

k=1

ak

zko

✓l + k � 1k � 1

◆(�1)k

zo

R

z

R

qi, zi

O

�(z) = ao

log(z � zo

) +

1X

k=1

ak

(z � zo

)

k

AME 599Apr 2018

Error bounds of M2L

21

��(z)�pX

l=0

blzl

�� <A(4e(p+ c)(c+ 1) + c2)

c(c� 1)

✓1

c

◆p+1

Proof… See Greengard & Rokhlin JCP 1987…

zo

R

z

R

qi, zi

O

(c>1)

AME 599Apr 2018

Translation of a local expansion (L2L)

For any complex zo, z, and {ak}, k=0,1,2,…,n,

22

nX

k=0

ak

(z � zo

)k =nX

l=0

nX

k=l

ak

✓kl

◆(�z

o

)k�l

!zl

zo

O

nX

k=0

ak

(z � zo

)kExpand series close to the origin.

Immediate results of Maclaurin series. No error bound needed!

AME 599Apr 2018

More generally… (in case you want to code)

23

Multipole expansion about zC:

M2M from zC to zM:

M2L from zM to zL:

L2L from zL to zT:

�(z) =pX

l=0

cl(z � zL)l +O(✏)

co

= bo

log(zL

� zM

) +

pX

k=1

(�1)

kbk

(zM

� zL

)

k

, cl

= � bo

l(zM

� zL

)

l

+

1

(zM

� zL

)

l

pX

k=1

(�1)

kbk

(zM

� zL

)

k

✓l + k � 1

k � 1

◆

�(z) = bo

log(z � zM

) +

pX

l=1

bl

(z � zM

)

l

+O(✏)

bo

= ao

, bl

= �ao

(zC

� zM

)l

l+

lX

k=1

ak

(zC

� zM

)l�k

✓l � 1k � 1

◆

�(z) = ao

log(z � zC

) +

pX

k=1

ak

(z � zC

)

k

+O(

rp

Rp

)

ao

=mX

j=1

qj

, ak

=mX

j=1

�qj

(zj

� zC

)k

k

dl =pX

k=1

ck

✓kl

◆(zT � zL)

k�l

�(z) =pX

l=0

dl(z � zT )l

Ying et al. JCP (2004)

AME 599Apr 2018

The fast part!

• If we know the multipole expansion for all children boxes, we can get the multipole expansion for the parent box by doing M2M four times. (4p2 operations)

• If we know the multipole expansion for any box, we can get the local expansion for any well separated box by doing M2L one time. (p2 operations)

• If we know the local expansion for the parent box, we can get the local expansion for all the children boxes by doing L2L four times. (4p2 operations)

24

AME 599Apr 2018

FMM algorithm

• Initialization: Choose a number of levels so that there are, on average, s particles per box at the finest level.

• Upward Pass: Begin at the finest level, create multipole expansions from the source positions and strengths. The expansions for all boxes at all higher levels are the formed by the merging procedure (M2M).

• Downward Pass: We convert the multipole expansion into a local expansion about the center of all boxes in b’s interaction list (M2L). Shift local expansions from parent boxes to children boxes (L2L). Add far field and near field interactions at the finest level.

• Note: Within all steps, you only save the coefficients of the expansions.

25

AME 599Apr 2018

FMM algorithm

26

O(N) AlgorithmUpward Pass Downward Pass

1. Upward pass computes multipoles for all cells: M-to-M translation2. Downward pass translates multipoles to local terms for all cells• Constant (189 in 3D) interactive (cousin) cells per destination cell contribute to M-to-L translation• Inheritance from the parent cell: L-to-L translation (& delegation)

3. Direct interactions for the nearest-neighbor leaf cells

local-to-local

multipole-to-local

multipole-to-local

multipole-to-multipole

See lecture notes for the MM, ML & LL formula in 2D

Within parent’s neighbor but not my neighbor

Adapted from A. Nakano’s lecture note

AME 599Apr 2018

Algorithm - initialization & upward pass

• Initialization

• Choose a level of refinement n ≈ log4N, a precision \epsilon, and set p ≈ -log2(\epsilon)

• Upward pass

1. Form multipole expansions of potential field due to particles in each box about the box center at the finest mesh level

2. Form multipole expansions about the centers of all boxes at all coarser mesh levels, each expansion representing the potential field due to all particles contained in one box.

27

AME 599Apr 2018

Algorithm - downward pass

• Downward pass

3. Form a local expansion about the center of each box at each mesh level l<=n-1. This local expansion describes the field due to all particles in the system that are not contained in the current box or its nearest neighbors. Once the local expansion is obtained for a given box, it is shifted, in the second inner loop to the centers of the box’s children, forming the initial expansion for the boxes at the next level.

4. Compute interactions at finest mesh level.

5. Evaluate local expansions at particle positions.

6. Compute potential (or force) due to nearest neighbor directly.

7. Add direct and far-field terms together.

28

AME 599Apr 2018

Algorithm - complexity

29

Greengard & Rokhlin, JCP, 1987

AME 599Apr 2018

A toy problem in matlab

• www-scf.usc.edu/~hanliang/fmm_working.m

• www-scf.usc.edu/~hanliang/fmm.m (solution)

30

http://www-scf.usc.edu/~hanliang/fmm_working.m

http://www-scf.usc.edu/~hanliang/fmm.m

AME 599Apr 2018

Outline

• Complexity

• Multipole expansion

• Coarse graining

• N log(N) algorithm

• Fast multipole method (FMM)

• Application to Stokes flow

• Kernel independent FMM (KIFMM)

31

AME 599Apr 2018

Application to Stokes flow

One fundamental solution to Stokes flow is the Stokeslet

32

ui(x) = Sij(x,y)fj(y), i, j = 1, 2, 3

Sij(x,y) =�ij

|x� y| +(xi � yi)(xj � yj)

|x� y|3

�r2u(x) +rp(x) = f(y)�(x� y)

AME 599Apr 2018


33

For a set of N Stokeslets, velocity induced by the other Stokeslets at the m-th Stokeslet:

umi =

NX

n=1,n 6=m

3X

j=1

Sij(xm,xn)fn

j , m = 1, · · · , N

Direct evaluation of velocities at all N Stokeslets leads to O(N2) complexity

We seek a fast approach that can make use of the FMM to reduce the complexity to O(N)

3D Laplace FMM:

�(x,y) =1

|x� y| , Ei(x,y) =xi � yi

|x� y|3 Sij(x,y) =�ij

|x� y| +(xi � yi)(xj � yj)

|x� y|3

3D Stokes kernel:

AME 599Apr 2018


34


umi =

NX

n=1,n 6=m

3X

j=1

Sij(xm,xn)fn

j , m = 1, · · · , N

Direct evaluation of velocities at all N Stokeslets leads to O(N2) complexity

We seek a fast approach that can make use of the FMM to reduce the complexity to O(N)

3D Laplace FMM:

�(x,y) =1

|x� y| , Ei(x,y) = � @

@xi

1

|x� y|

3D Stokes kernel:

Sij(x,y) =

✓�ij � (xj � yj)

@

@xi

◆1

|x� y|

AME 599Apr 2018


35


umi =

NX

n=1,n 6=m

3X

j=1

Sij(xm,xn)fn

j , m = 1, · · · , N

Four FMM calls!

u

mi =

NX

n=1,n 6=m

3X

j=1

✓�ij � (xm

j � x

nj )

@

@x

mi

◆f

nj

|xm � x

n|

rnm = |xm � x

n|

u

mi =

3X

j=1

2

4✓�ij � x

mj

@

@x

mi

◆ NX

n=1,n 6=m

f

nj

rnm

3

5+@

@x

mi

NX

n=1,n 6=m

x

n · fn

rnm

AME 599Apr 2018

Kernel independent FMM (KIFMM)

• The traditional FMM works well as long as the multipole expansions for the underlying kernel K are known.

• Sometimes it is difficult/impossible to expand a kernel (e.g. the regularized Stokeslet)

• What can we do?

36

�(x) =

Z

�K(x, y)q(y)ds �(xi) =

X

j

K(xi, yj)q(yj)

AME 599Apr 2018

KIFMM - Basic idea

• To use a distribution of source point with known kernel (typically Laplace kernel) outside the box to represent the far-field/near-field potential generated by the source points inside/outside the box.

37

equivalent density, equivalent surface

AME 599Apr 2018

KIFMM - Uniqueness theorem

• If u is real and its first and second partial derivatives are continuous in a region V and on its boundary S, and where \rho, f and g are prescribed functions, then u is unique. (or at least up to a constant)

38

equivalent density, equivalent surface

r2u = ⇢ in V, u = f or @u/@n = g

AME 599Apr 2018

KIFMM - Multipole expansion

• Match the potential generated by the sources and the equivalent density at the check surface outside the equivalent surface.

• Two potentials are the same on and outside the check surface.(uniqueness of the 2nd order PDE’s solution for outer Dirichlet problem)

39

Upward equivalent density, Upward equivalent surface

Upward check surface 1. Evaluate induced potential of the sources on the check surface

2. Solve the upward equivalent density�(x(c)

i ) =X

j

K(x(c)i ,y(e)

j )q(y(e)j )

�(x(c)i ) =

X

j

K⇤(x(c)i ,y(s)

j )q⇤(y(s)j )

12

R(e)= (

p2 + d)r, R(c)

= (4�p2� 2d)r

0 d 4�p2

3

, 2r is the box size

AME 599Apr 2018

KIFMM - Local expansion

40

• Match the potential generated by the sources and the equivalent density at the check surface inside the equivalent surface.

• Two potentials are the same on and inside the check surface.(uniqueness of the 2nd order PDE’s solution for inner Dirichlet problem)

Downward check surface

1. Evaluate induced potential of the sources on the check surface

2. Solve the downward equivalent density�(x(c)

i ) =X

j

K(x(c)i ,y(e)

j )q(y(e)j )

�(x(c)i ) =

X

j

K⇤(x(c)i ,y(s)

j )q⇤(y(s)j )

1

2

Downward equivalent density, Downward equivalent surface

R(e)= (4�

p2� 2d)r, R(c)

= (

p2 + d)r

0 d 4�p2

3

, 2r is the box size

AME 599Apr 2018

KIFMM - M2M, M2L, L2L

41

M2M M2L L2L

1. Evaluate induced potential of the equivalent density 1 on the check surface 2. Solve for the equivalent density 2

Ying et al. JCP (2004)

�(x(c)i ) =

X

j

K(x(c)i ,y(e1)

j )q(y(e1)j ) �(x(c)

i ) =X

j

K(x(c)i ,y(e2)

j )q(y(e2)j )

AME 599Apr 2018

KIFMM - Notes

42

• Algorithm stays exactly* the same as the traditional FMM.

• No need to expand any kernels!

• Still need to evaluate induced potential for the new kernel at the leaf boxes (bottom level)

• Equivalent densities are found by solving p algebraic equations (p is the number of discretization points on the surface, usually small)

AME 599Apr 2018

Algorithm - initialization & upward pass

• Initialization

• Choose a level of refinement n ≈ log4N, a precision \epsilon, and set p ≈ -log2(\epsilon)

• Upward pass

1. Form multipole expansions of potential field due to particles in each box about the box center at the finest mesh level

2. Form multipole expansions about the centers of all boxes at all coarser mesh levels, each expansion representing the potential field due to all particles contained in one box.

43

AME 599Apr 2018

Algorithm - downward pass

• Downward pass

3. Form a local expansion about the center of each box at each mesh level l<=n-1. This local expansion describes the field due to all particles in the system that are not contained in the current box or its nearest neighbors. Once the local expansion is obtained for a given box, it is shifted, in the second inner loop to the centers of the box’s children, forming the initial expansion for the boxes at the next level.

4. Compute interactions at finest mesh level.

5. Evaluate local expansions at particle positions.

6. Compute potential (or force) due to nearest neighbor directly.

7. Add direct and far-field terms together.

44

AME 599Apr 2018

KIFMM - A toy problem in matlab

• www-scf.usc.edu/~hanliang/kifmm_working.m

• www-scf.usc.edu/~hanliang/kifmm.m (solution)

45

http://www-scf.usc.edu/~hanliang/kifmm_working.m

http://www-scf.usc.edu/~hanliang/kifmm.m

AME 599Apr 2018

References

• L. Greengard & V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, 1987

• R. Beatson & L. Greengard, A short course on fast multipole methods, in Wavelets, Multilevel Methods and Elliptic PDEs, Clarendon Press, Oxford, 1997

• H. Cheng et al., A fast adaptive multipole algorithm in three dimensions, Journal of Computational Physics, 1999

• L.X. Ying et al., A kernel-independent adaptive fast multipole algorithm in two and three dimensions, Journal of Computational Physics, 2004

• A-K Tornberg & L. Greengard, A fast multipole method for the three-dimensional Stokes equations, Journal of Computational Physics, 2008

• D. Malhotra & G. Biros, PVFMM: A parallel kernel independent FMM for particle and volume potentials, Communications in Computational Physics, 2015

• W. Yan & M. Shelley, Flexibly imposing periodicity in kernel independent FMM: A Multipole-To-Local operator approach, Journal of Computational Physics, 2018

46

AME 599Apr 2018

Libraries

• Greengard: https://cims.nyu.edu/cmcl/software.html

• PVFMM: https://github.com/dmalhotra/pvfmm

47

https://cims.nyu.edu/cmcl/software.html



https://github.com/dmalhotra/pvfmm

a crash course on fast multipole method (fmm)hanliang/publications/fmm_tutorial_hanl… · fmm...

Documents