simulations with (hybrid) monte carlo...

$: Simulations with (hybrid) Monte Carlo Algorithmstheor.jinr.ru/~diastp/summer14/lectures/Sternbeck.pdf · Simulations with (hybrid) Monte Carlo Algorithms ... S2010 \Simulations with$
Simulations with (hybrid) Monte Carlo Algorithms

— Introduction for beginners —

Andre Sternbeck

University of Jena, Germany

Helmholtz International Summer SchoolAugust/September 2014, JINR Dubna

1 / 107

Literature

Introduction

Why stochastic methods for integration?

Random numbers with non-uniform probability distribution

Simulation Algorithms for lattice gauge theories

Metropolis algorithm

HOR algorithm (heatbath + overrelaxation)

Hybrid Monte Carlo preliminaries

HMC algorithm for scalar field theory

HMC for lattice QCD simulations

HMC diagnostics

Inverting the fermion matrix using the CG

HMC: Adding another flavor

HMC: Improvements

2 / 107

MonographsParticularly useful for the present topic

GL2010 “Quantum Chromodynamics on the Lattice” by C. Gattringerand C.B. Lang, Springer 2010.

R2010 “Lattice Gauge Theories” by H.J. Rothe, 3rd edition, WorldScientific 2005.

MM1994 “Quantum Fields on a Lattice” by I. Montvay and G. Munster,Cambridge 1994.

L2010 “Computational Strategies in Lattice QCD” by M. Luscher,[1002.4232], Summer School on “Modern perspectives inlattice QCD” Les Houches, August 3–28, 2009.

S2010 “Simulations with the Hybrid Monte Carlo algorithm:implementation and data analysis” by S. Schafer, SummerSchool on “Modern perspectives in lattice QCD” Les Houches,August 3–28, 2009.

3 / 107

Expectation values in lattice QCD

The expectation value of an observable in lattice QCD is given by

〈O〉 =1

Z

∫DU DψDψ O[U, ψ, ψ] e−S[U,ψ,ψ] (1)

I Z denotes the “partition function“

Z =

∫DU DψDψ e−S[U,ψ,ψ] (2)

I S denotes a lattice QCD action, there are many choices(real-valued functional of link and fermionic variables)

S [U, ψ, ψ] = Sg [U]︸︷︷︸gauge part

+ Sf [U, ψ, ψ]︸︷︷︸fermionic part

(3)

I O denotes an arbitrary observable, which is a (simple or complicated)functional of U, ψ and ψ

4 / 107



〈O〉 =1

Z


I Link variables Uxµ sit on the lattice links from coordinate x to x + µ,

I Each Uxµ ∈ SU(3) is a complex 3× 3 matrix

I det Uxµ = 1, U†xµ = U−1xµ

I parametrized by (N2c − 1) = 8 real parameters

Other gauge theories (not QCD):I SU(2): Uxµ ∈ SU(2), parametrized by 3 real parameters.I compact U(1): Uxµ ∈ C, parametrized by 1 real parameter (angle).

I DU =∏

x,µ dUxµ the integration measure of link variables Uxµ

4 / 107



〈O〉 =1

Z


I At each site x , ψx and ψ have 3× 4 components (a = 1..3, α = 1..4)

I ψx = ψa,α(x) and ψx = ψa,α(x) are Grassmann-valued fields

I Dψ =∏

x,a,α dψa,αx the integration measure, with dψa,α

x dψb,βx the

integration measure of a pair of Grassmann numbers

I Integration over Grassmann variables can be done exactly:

detM∑

k1···kn

εk1k2·knj1j2···jnM

−1k1 ii· · ·M−1

kn in=

∫DψDψ ψj1 ψi1 · · ·ψn1 ψin e

−ψMψ︸︷︷︸e−Sf

(4)

I M denotes the Lattice Dirac (Fermion) matrix, det makes action non-local

4 / 107


After integration over the (Grassmann) fermonic fields

〈O〉 =1

Z

∫DU O[U] e−Seff [U] (5)

I Effective action Seff real-valued but non-local functional of link variablesonly

Seff = Sg [U] + log detM[U] (6)

I Wilson gauge action:

Sg = SW := β∑

x,µ<ν

(1− 1

3ReTr UxµUx+µ,νU†x+ν,µU†xν

)I Wilson Dirac matrix: M

(q)xy [U] = δqq′

(δxy − κq

∑±µ δy,x+µ(1 + γµ)Uxµ

)I Observable O[U] is now also a functional of links only

I Average plaquette: O[U] = O[U] = 16V

∑x,µ<ν UxµUx+µ,νU†x+ν,µU†xν

I Hadronic 2-point function: O[U] ∼∑contractions M−1[U]ni nj M−1[U]nk nl

5 / 107


Still, these expectation values are tremendously high-dimensional integralswhich have to be solved numerically (stochastically).

Example

State-of-the-art lattice QCD simulations are performed on lattice sizes:

L3s × Lt = 643 × 128

Number of integration variables (Nc = 3,Nd = 4):

643 × 128 ×(N2c − 1)× Nd = 1 073 741 824

163 × 32 ×(N2c − 1)× Nd = 4 194 304

For each, a numerical integration had to be performed.

Therefore, statistical methods must be employed for the numerical integration.

6 / 107

Monte Carlo estimates of expectation values


〈O〉 =1

Z


Integral will be dominated by link configurations with small action values

P[U] =1

Ze−S[U] (8)

Need to employ “important sampling”,i.e., generate U with high-probability

Markov chain: U(1),U(2), . . . ,U(N)

〈O〉 can be estimated by Monte-Carloaverage O:

〈O〉 ≈ O ≡ 1

N

N∑i=1

O[U(i)] (9)

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8

S[U ]

e−S[U ]

7 / 107

This lectures

This lecture will teach you how these Markov chains U(1),U(2), . . . ,U(N) aregenerated.

I Metropolis algorithm

I Heat-bath algorithm

I Hybrid Monte Carlo (HMC)

To better understand these algorithms, it is instructive to look first at thegeneration of real numbers x ∈ R for a given probability distribution p(x).

8 / 107

Random numbers with uniform probability distribution

I Probability of uniformly-distributed random numbers r ∈ [x , x + dx ]

p(x)dx =

{dx 0 < x < 10 otherwise

(10)

Probability density is normalized such that∫ ∞−∞

p(x)dx = 1 (11)

I Popular pseudo random number generators (with large period lengths)optimal for parallel computing

I RANLUX: Luscher [Comput.Phys.Commun. 79 (1994) 100]http://luscher.web.cern.ch/luscher/ranlux

I Mersenne-Twister:http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

9 / 107

http://www.sciencedirect.com/science/article/pii/0010465594902321

http://luscher.web.cern.ch/luscher/ranlux

http://luscher.web.cern.ch/luscher/ranlux

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html




For many physics applications, need to sample random numbers

I which are not uniformly distributed

I in high dimensions

Can generate such numbers using uniformly distributed random numbersas input for

1. Inverse transform sampling method(possible only for a few cases: Gauss, exponential,...)

2. Accept-reject method (Monte-Carlo technique)(generally applicable, efficiency is problem dependent, see important sampling)

10 / 107


Transformation method

I Based on the fact that a variable x defined as the value of the cumulativedistribution function F

x := F (y) ≡∫ y

−∞f (y ′)dy ′ = Pr(Y < y) (12)

is a uniformly-distributed random number x ∈ [0, 1] for any f .

I F (−∞) = 0, F (∞) = 1

I If we know the inverse F−1 and define the transformation

y(x) := F−1(x) (13)

we take the uniform deviate x into y which is distributed as f (y)

11 / 107


Geometrical interpretation

Figure: Transformation method for generating a random deviate y from a knownprobability distribution p(y). The indefinite integral of p(y) must be known andinvertible. A uniform deviate x is chosen between 0 and 1. Its corresponding y on thedefinite-integral curve is the desired deviate.Figure and caption taken from Numerical Recipes Vol.1, Figure 7.2.1.

12 / 107

Random numbers with non-uniform probability distributionExample: exponential distribution

Illustration for exponentially distributed random numbers:

p(y) =

{e−y y ∈ [0,∞)0 otherwise

(14)

The cumulative distribution function is

F (y) =

∫ y

−∞p(y ′)dy ′ =

∫ y

0

e−y′dy ′ = 1− e−y . (15)

Use inverse F−1 to define the transformation

y(x) := F−1(x) = − ln(1− x) (16)

13 / 107

Random numbers with non-uniform probability distributionExample: exponential distribution

In practice

1. Generate uniformly distributedrandom numbers x ∈ [0, 1)

2. Apply transformation

x → F−1(x) = − ln(1− x)

3. Return transformed x

Fortran function

1 subroutine ran_exp(x)23 ! Returns random number x with prob.4 ! density p(x) = exp(-x)56 real, intent(out) :: x78 call random_number(x)9 x = -log(1.-x) ! x = F^{-1}(x)

1011 end subroutine ran_exp

14 / 107

Random numbers with normal (Gaussian) probability distributionBox-Muller method

Sample uniformly distributed ~x = (x1, x2), i.e. xi ∈ [0, 1), and consider thetransformation ~y(~x)

y1(~x) =√−2 ln x1 cos 2πx2 (17a)

y2(~x) =√−2 ln x1 sin 2πx2 (17b)

whose inverse relation ~x = F (~y) is

x1(~y) = e−12

(y21 +y2

2 ) (18a)

x2(~y) =1

2πarctan (y2/y1) . (18b)

Since the corresponding Jacobian looks like∣∣∣∣∂(x1, x2)

∂(y1, y2)

∣∣∣∣ =

∣∣∣∣∣ ∂x1∂y1

∂x1∂y2

∂x2∂y1

∂x2∂y2

∣∣∣∣∣ = −[

1√2π

e−y21 /2

] [1√2π

e−y22 /2

](19)

with ~y(~x) we get two independent normal-distributed numbers y1 and y2.

15 / 107

Random numbers with normal (Gaussian) probability distributionBox-Muller method

1 subroutine ran_normal(x, var)23 ! Returns random number x with probability4 ! density p(x) = 1/sqrt(2) exp(-x^2/2*var)5 ! Default: var = 1.067 real, intent(out) :: x8 real, intent(in), optional :: var9

10 real, parameter :: pi = 4.*atan(1.0)11 real :: u(2), R, phi, twovar12 complex, save :: z13 logical, save :: scnd = .false.1415 twovar = 2.016 if (present(var)) twovar = 2.0 * var1718 if (scnd) then19 x = aimag(z)20 scnd = .false.2122 else2324 ! Sample two flat random numbers: u(1), u(2)25 call random_number(u)2627 R = sqrt(-twovar * log(1.-u(1)))28 phi = 2.*pi * u(2)2930 z = cmplx(R*cos(phi), R*sin(phi), kind=kind(z))31 x = real(z)32 scnd = .true.3334 endif

16 / 107


For many physics applications need to sample random numbers




1. Inverse transform sampling method X(possible only for a few cases: Gauss, exponential,..., where we know F−1)

2. Monte-Carlo sampling (accept-reject)(generally applicable, efficiency problem dependent)

17 / 107

Random numbers with non-uniform probability distributionAccept-reject method

If we know the cumulative distribution function F and its inverse F−1 we cansample random numbers y with the probability distribution f (y)

What if those are difficult to get? All is not lost, all we need is

I the desired probability distribution f (y)

I a powerful random number generator for uniformly distributed randomnumbers

Remember: Monte-Carlo method to estimate π

1. Sample N pairs of uniformly distributedrandom numbers (x , y) in the interval [0, 1)

2. Collect in Nin the number of pairs withx2 + y 2 ≤ 1

3. Ratio NinN

approaches π4

when N →∞

π =A©A�

= 4 limN→∞

Nin

N

0

1

0 1

N = 1000, π ≈ 4Nin/N = 3.104

18 / 107

Random numbers with non-uniform probability distributionAccept-reject method

Similar one does for a probability function p(x), for which F (x) is a complexfunction or even unknown and we have no chance of getting F−1.

If we enclose p(x) inside a shape C times an easily generated distribution f (x)(for which we know F−1) we can sample random numbers with probabilitydensity p(x) as follows:

1. Approximate p(x) by a function C ·f (x) > p(x) ∀x2. Sample a uniformly distributed random number u1 ∈ [0,C) and calculate

x = F−1(u1).

3. Sample another uniformly distributed random number u2 ∈ [0, 1) and testif u2 · C · f (x) lies below p(x)

u2 ≤p(x)

C ·f (x)(20)

4. If fulfilled, we accept x ; otherwise we reject it and start again

19 / 107

Random numbers with non-uniform probability distributionAccept-reject method: Illustration

Figure: Rejection method for generating a random deviate x from a known probabilitydistribution p(x) that is everywhere less than some other function f (x). The transfor-mation method is first used to generate a random deviate x of the distribution f (cf.Figure 7.2.1). A second uniform deviate is used to decide whether to accept or rejectthat x . If it is rejected, a new deviate of f is found; and so on. The ratio of acceptedto rejected points is the ratio of the area under p to the area between p and f .Figure and caption taken from Numerical Recipes Vol.1, Figure 7.3.1.

20 / 107

Arbitrary random numbers via accept-reject methodExample

Want to sample x ∈ [0, 2] with probability density(see heat-bath algorithm below)

p(x) = cp ·√

2− x where cp ≡3

4√

2so that

∫ 2

0

dx p(x) = 1 (21)

Enclose p(x) by rectangle C · f (x) where

f (x) =1

2∀x ∈ [0, 2] and C =

3

2

Note that

F (a) =

∫ a

0

dx f (x) with F (2) = 1

(uniform prob. distribution for x ∈ [0, 2]).x

C · f(x)

p(x) = cp√2− x

0.0

0.2

0.4

0.6

0.8

0.0 0.5 1.0 1.5 2.0

21 / 107

Arbitrary random numbers via accept-reject methodExample

Recipe

1. Sample a uniformly distributed random number u1 ∈ [0, 1) and calculatex = F−1(u1) = 2 · u1.

2. Sample another uniformly distributed u2 ∈ [0, 1) and test if

u2 ≤p(x)

C ·f (x)=

1√2

√2− x ⇐⇒ 2u2

2 ≤ 2− x

3. If fulfilled accept x , otherwise go back to step 1.

22 / 107

Arbitrary random numbers via accept-reject methodExample code in Fortran

1 program main23 implicit none4 integer :: i5 real :: x67 do i = 1,10000008 call sample(x)9 print *, x

10 enddo1112 contains1314 !---------------------------------------15 subroutine sample(x)1617 ! Returns x in [0,2] with probability18 ! density p(x), defined further below1920 real, intent(inout) :: x21 real :: u(2)22 real, parameter :: C = 3./2.2324 do25 ! Get two random numbers u(1), u(2)26 call random_number(u)2728 ! x = F^{-1}(u_1)29 x = 2.*u(1)3031 ! Accept-reject step32 if ( u(2) <= p(x) / (C*f(x)) ) exit33 end do3435 end subroutine sample

1 !---------------------------------------2 pure real function p(x)34 real, intent(in) :: x5 real, parameter :: cp = 0.75/sqrt(2.)67 if (x<0 .or. x>2) then8 p = 09 else

10 p = cp * sqrt(2-x)11 endif1213 end function p1415 !---------------------------------------16 pure real function f(x)1718 real, intent(in) :: x1920 if (x<0 .or. x>2) then21 f = 022 else23 f = 0.524 endif2526 end function f2728 end program main

23 / 107

Arbitrary random numbers via accept-reject methodExample code in Fortran

Histogram H(x) of sample data: x = 0.73478, 0.14750, . . .

I N = 106 randomnumbers u1, u2

I bin size ∆H = 0.1

I H(xi ) = p(xi ) ·∆H︸︷︷︸probability

·N

0

10000

20000

30000

40000

50000

60000

70000

80000

0 0.5 1 1.5 2

p(x) · 0.1 · 106

We will come back to this for the heat-bath algorithm

24 / 107


For many physics applications need to sample random numbers




1. Inverse transform sampling method X(possible only for a few cases: Gauss, exponential,..., where we know F−1)

2. Accept-Reject method (Monte-Carlo technique) X(generally applicable, efficiency is problem dependent)

Have to find f (x) which is a good approximation for p(x)→ important sampling, otherwise there will be many rejected x

25 / 107

Simulation Algorithms for lattice gauge theories

26 / 107



〈O〉 =1

Z


I Z denotes the “partition function“

Z =

∫DU DψDψ e−S[U,ψ,ψ] (23)

I S denotes a lattice QCD action, there are many choices(real-valued functional of link and fermionic variables)

S [U, ψ, ψ] = Sg [U]︸︷︷︸gauge part

+ Sf [U, ψ, ψ]︸︷︷︸fermionic part

(24)

I O denotes an arbitrary observable, which is a (simple or complicated)functional of U, ψ and ψ

27 / 107

Monte Carlo estimates of expectation values


〈O〉 =1

Z


Integral will be dominated by link configurations with small action values

P[U] =1

Ze−S[U] (8)

Need to employ “important sampling”,i.e., generate U with high-probability

Markov chain: U(1),U(2), . . . ,U(N)

〈O〉 can be estimated by Monte-Carloaverage O:

〈O〉 ≈ O ≡ 1

N

N∑i=1

O[U(i)] (9)

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8

S[U ]

e−S[U ]

28 / 107

This lectures

This lecture will teach you how these Markov chains U(1),U(2), . . . ,U(N) aregenerated.

I Metropolis algorithm

I Heat-bath algorithm

I Hybrid Monte Carlo (HMC)

To better understand these algorithms, it is instructive to look first at the

generation of real numbers x ∈ R for a given probability distribution p(x). X

29 / 107


30 / 107

Markov process and detailed balance

Let us denote a configuration of link variables C = {. . . ,Uxµ, . . .}.

Markov process

I Stochastic process which generates a finite set of configurations(Markov chain) one after the other according to some transitionprobability P(C → C ′)

I P(C → C ′) is independent of all previous states except for C

I If P(C → C ′) satisfies detailed balance

e−S[C ]P(C → C ′) = e−S[C ′]P(C ′ → C) (25)

configurations are sampled with probability distribution ∝ e−S(C)

For a proof see, e.g., the book by H. Rothe “Lattice Gauge Theories...“

=⇒ “Simulation Time” average O = 1N

∑Ni=1 O[Ci ] equals ensemble

average corresponding to given Boltzmann distribution

31 / 107

The Metropolis methodMetropolis et al. (1953)


I Assume we have a configuration C = {. . . ,Uxµ, . . .}I Propose a new configuration C ′ = {. . . ,U ′xµ, . . .} with transition

probability T (C → C) satisfying

T (C → C ′) = T (C ′ → C) (reversibility) (26)

I Local, extended or global changes of U’s are allowed

I We accept C ′ if the action S(C ′) < S(C) is lowered (i.e., ∆S < 0)

I If action is increased, we accept C ′ with probability e−S(C ′)/e−S(C)

I If C ′ is rejected, we keep C (i.e., C ′ = C) and start again

In practise one generates a random number r ∈ [0, 1] and accepts C ′ if

r ≤ e−∆S ≡ e−S(C ′)

e−S(C)(27)

32 / 107

The Metropolis method

Advantages

I Applicable to any system

I Will generate link configurations U with probability P ∝ e−S[U]

I Powerful, if combined with other methods which provide good candidatesfor new configurations (see HMC below)

Disadvantages

I If the proposal for a new configuration C ′ is unguided/random a largechange of action will typically result in low acceptance rates

I Small changes of C will increase acceptance rate, but will also cause largeautocorrelation times

C ∼ C ‘ ∼ C“ ∼ · · · ∼ C (n)

→ have to skip many C ’s in between

33 / 107

The Metropolis method

Satisfies detailed balance

I If ∆S < 0

P(C → C ′) = T (C → C ′) ≡ ”probability for suggesting C’“ (28)

P(C ′ → C) = T (C ′ → C)e−S[C ]

e−S[C ′](29)

I If ∆S > 0

P(C → C ′) = T (C → C ′)e−S[C ′]

e−S[C ](30)

P(C ′ → C) = T (C ′ → C) ≡ ”probability for suggesting C“ (31)

Since T (C → C ′) = T (C ′ → C) detailed balance is satisfied for both cases.

34 / 107

Pure lattice gauge theory

Wilson gauge action

SW [U] = β∑plaq

(1− 1

NcReTrUplaq

)(32)

where ∑plaq

≡∑

x

∑1≤µ<ν≤4

Uplaq ≡ UxµUx+µ,νU†x+ν,µU

†xν

Degrees of freedom

compact U(1) SU(2) SU(3)

Uxµ e iφxµ ∈ C C2×2 C3×3

(U†xµ = U−1xµ , det Uxµ = 1)

free parameters 1 angle 3 real 8 real

Plaquette:

x x+µ

x+ν

35 / 107

Pure lattice gauge theory

Local update

I Consider change of link variable Uxµ → U ′xµI All other links Uyν with (y , µ) 6= (x , µ) kept fix

I Will change SW everywhere Uxµ joins a plaquette

SW [U] −→ SW [U ′] = − β

NcReTr(U ′xµWxµ)− const.

W denotes sum of staples attached to Uxµ: x x + µ

Wxµ :=∑ν 6=µ

Ux+µ,νU†x+ν,µU

†xν︸︷︷︸

”upper“ staple

+ U†x+µ−ν,νU†x−ν,µUx−ν,ν︸︷︷︸

”lower” staple

(33)

36 / 107

The Metropolis method in actionCode snippet for compact U(1)

1 !....2 complex :: w, u(volume,4)3 !....45 iacc = 06 do mu = 1,47 do i = 1, volume89 w = staple(u, i, mu)

10 c = w * u(i,mu)1112 ! Propose rotation u(i,mu) --> dc * u(i,mu)13 call random_number(r)14 dphi = eps * TWOPI * (r - 0.5) ! reversible: T(C->C') = T(C'->C)15 dc = cmplx(cos(dphi), sin(dphi))1617 ! Change of action for proposed new link1819 dS = - Re(c) + Re(c) * Re(dc) - Im(c)*Im(dc) ! -Re(c) + Re(c*dc)20 exp_dS = exp(beta * dS)2122 ! Accept/reject step2324 call random_number(r)25 if ( exp_dS >= r ) then26 u(i,mu) = dc * u(i,mu)27 iacc = iacc + 128 endif2930 end do31 end do32 !....

37 / 107

HOR algorithm (heatbath + overrelaxation)

38 / 107

Pure lattice gauge theorySU(2) gauge group

Consider first case of SU(2) gauge theory (Nc = 2)

I Parametrization of SU(2) link variable Uxµ via real vector axµ

Uxµ = a0I + i~σ ·~a where axµ = (a0, a1, a2, a3)xµ ∈ R4 (34)

I Pauli matrices: ~σ = (σ1, σ2, σ3)

σ1 =

(0 11 0

), σ2 =

(0 −ii 0

), σ3 =

(1 00 −1

)(35)

satisfy σjσk = δjk + iεjklσl

I Sum of staples is proportional to SU(2) element (only for SU(2))

Wxµ = ω0I + i~σ · ~ω where ωxµ = (ω0, ω1, ω2, ω3)xµ ∈ R4

Wxµ ∝ SU(2) exactly:Wxµ√

detWxµ

∈ SU(2)

39 / 107

Heat-bath algorithm for pure SU(2) lattice gauge theory

Idea

I Consider local change of a single link variable Uxµ, all others fixed

I Combine proposal-of-new-Uxµ and accept/reject step of Metropolisalgorithm into one step

Boltzmann distribution of link variable Uxµ ∈ SU(2)

dP(Uxµ) =1

Zeβk2

Re Tr UxµWxµ 1

π2δ(a2

xµ − 1)d4axµ︸︷︷︸dUxµ

(36)

where Wxµ = Wxµ/k ∈ SU(2) with k ≡√

detW

We see: transition probability P(U → U ′) = dP(U ′xµ) is independent of U

40 / 107


Since the Haar measure is invariant under right multiplication dU = d(UW )we can define

V ≡ UxµWxµ = I · v0 + ~σ · ~v ∈ SU(2) , (37)

sample V with the probability distribution

dP(V ) ∼ eβk2

Re Tr V dV (38)

and finally get the new link Uxµ → V · W †xµ = V ·W †xµ/√

detWxµ.

Again:

I V only depends on ρ ≡ β · kI new link U → U ′ is independent of the old, it only depends on the

surrounding links and the “temperature” 1/ρk (heat-bath)

41 / 107


Since V ∈ SU(2), the Monte-Carlo sampling is straightforward

eβk2

Re Tr V dV = eρv01

π2δ(v 2 − 1)d4v (39a)

=1

2π2

√1− v 2

0 eρv0dv0︸︷︷︸P(v0)

d(cos θ)dφ︸︷︷︸unit sphere in 3D

(39b)

That is, we are looking for vectors v = (v0, ~v) of unit length |v | = 1 where

I v0 ∈ [−1, 1] is a real number with the probability distribution

dP(v0) ∼√

1− v 20 eρv0dv0 (40)

I ~v =√

1− v 20 ~v , where ~v is uniformly distributed point on the surface of

the unit sphere in 3D.

42 / 107


To sample v0 we introduce the variable λ ∈ [0, 1) and set

v0 = 1− 2λ2

This gives

dP(v0) ∼√

1− v 20 eρv0dv0 −→ dP(λ) ∼ dλλ2

√1− λ2e−2ρλ2

(41)

One option to sample λ (simple, but there are more efficient ways):

1. Generate 3 uniformly-distributed random numbers r1, r2, r3 and set

λ2 = − 1

2ρln r1 + cos2(2πr2) ln(r3) (42)

This generates λ2 distributed as: P(λ2) ∼ dλλ2e−2ρλ2

2. We then accept v0 = 1− 2λ2 by choosing another random number r4 andtesting if

r 24 ≤ 1− λ2 (43)

43 / 107


Generating ~v is simple: We sample two further (uniformly-distributed) randomnumbers r5 and r6 and set

v1 = sin(θ) cos(φ) cos(θ) = 2r5 − 1

v2 = sin(θ) sin(φ) sin(θ) =√

1− cos2(θ)

v3 = cos(θ) φ = 2πr6

To define V we set

V = v0I +∑i=1,3

σivi

√1− v 2

0 (44)

New link: Uxµ → U ′xµ = V · W †xµ = V ·W †xµ/√

detWxµ.

44 / 107

Microcanonical overrelaxation step

Before continuing with pure SU(3) lattice gauge theory, let’s also considermicrocanonical update steps of SU(2) link variables.

I Change gauge links but leave the action invariant

I Reduces autocorrelation between subsequent configurations

I Not ergodic, therefore has to be combined with the heatbath step.

Consider Trα · ω† and try to find a matrix α which fulfills Trαω† = Trω†

α ≡ Uxµ = a0I + i~σ ·~a ∈ SU(2) (45)

ω ≡Wxµ = ω0I + i~σ · ~ω ∝ SU(2) (46)

Solution:

α = −1 +4ω0

Trωω†ω (47)

Replaces the corresponding step in the heatbath, the rest stays the same.

Note the Trωω† can become very small→ no update if Trωω† < 10−10 or so.

45 / 107

The HOR algorithm and the Cabbibo-Marinari trick

Typical update algorithm for pure gauge theory: Hybrid overrelaxation (HOR)

1 HOR step = 1 heatbath step + a few microcanonical update steps

Very successful for SU(2) gauge theory, more complicated if applied to SU(n)directly.

Elegant solution by Cabbibo-Marinari: Do updates in SU(2) subgroups andapply them subsequently.

An Update starts with U(0) = U and ends with U ′ = U(m), where after eachupdate step U(k) = A(k)U(k−1) with A(k) ∈ SU(2), U(k) ∈ SU(n)

Consider SU(3)

A(1) =

α11 α12 0α21 α22 00 0 1

, A(2) =

1 0 00 α11 α12

0 α21 α22

, A(3) =

α11 0 α12

0 1 0α21 0 α22

46 / 107

The HOR algorithm and the Cabbibo-Marinari trick

General case: loop over m = N(N − 1)/2 matrices A where i ∈ [1,N − 1],j ∈ [i + 1,N]

A(k) =

1 · · · 0 · · · · · · 0 · · · 00 αii 0 · · · · · · αij · · · 00 : 1 · · · · · · 0 · · · 00 : 0 1 · · · 0 · · · 00 αji 0 · · · · · · αjj · · · 00 0 0 · · · · · · 0 · · · 1

(48)

For the (local) action it follows

S [A(k)U(k−1)xµ ] = − β

NReTrA(k)X (k−1)

xµ + C

where X(k−1)xµ = U

(k−1)xµ Wxµ

Note that: ReTrAX =∑

k 6=i,j Xkk + αiiXiiαijXji + αjiXijαjjXjj

47 / 107

Hybrid Monte Carlo preliminaries

48 / 107

Monte-Carlo simulations with fermions

HOR algorithm (Heatbath + OR steps)

I Works well for local updates of gauge links (pure SU(N) gauge theories)

I Shorter autocorrelation lengths compared to Metropolis algorithm

I Restricted range of application: Not an option for unquenched gaugetheories (det M makes the action non-local)

Hybrid Monte Carlo algorithm

I “Work horse” for unquenched QCD

I Produces Markov chain: U(1),U(2), . . . ,U(n) with probability density eSeff [U]

Seff [U] = Sg [U] + log detM[U]

I Used in different variants nowadays(different improvement strategies and fermionic actions)

49 / 107

Monte-Carlo simulations with fermions

Main “ingredients” for Metropolis algorithm

1. Propose a new configuration C ′ = {. . . ,U ′xµ, . . .} with transitionprobability T (C → C) satisfying

T (C → C ′) = T (C ′ → C) (reversibility) (49)

(local, extended or global changes of C ’s are allowed)

2. Accept new configuration if e−∆S ≥ r ∈ [0, 1) (random number)

⇒ Detailed balance =⇒ Markov chain of C ,C ′,C ′′, . . . with eS[C ]

Problem of Metropolis algorithm

I Choose C ′ randomly ∆S will be very large =⇒ low acceptance rate

I Choose C ′ = C + δC where δC is random but small change to C=⇒ large autocorrelation lengths

Goal: Global update which keeps ∆S moderate

50 / 107

The Hybrid Monte Carlo algorithm

1. Introduce auxiliary canonical momenta for each (dynamical) degree offreedom and set up a Hamiltonian H

I Scalar field theory S(φ): H(π, φ) =∑

x π2x + S(φ)

I Pure gauge theory Sg (U): H(P,U) =∑

xµ P2xµ + Sg (U)

2. Evolve, e.g., all πx and φx along a trajectory τ in the phase space (p, φ)using Hamilton’s equation of motion (EoM):

d

dτφx =

∂H

∂πx

d

dτπx = − ∂H

∂φx(50)

I Evolution must be reversible (π, φ)↔ (π′, φ′) (see step 1 of Metropolis)I Numerical evolution: ∆H = H′ − H 6= 0 (small changes are welcome)

3. After one trajectory, accept/reject new (p′, φ′) using step 2 of MetropolisI Corrects for small ∆H 6= 0 due to numerical integration of EoM

Reversibility and Metropolis step make HMC algorithm exact(remember ingredients for Metropolis algorithm to ensure detailed balance).

51 / 107

The Hybrid Monte Carlo algorithm

Discrete evolution

∆τ

τ = n∆τ

Tunning parameter: ∆τ = τn

I If small, ∆H ≈ 0 (large autocorrelation)

I If large, |∆H| � 0, acceptance rate decreases.

Optimum: Set τ such that acceptance rate ∼ 70− 80%

52 / 107

HMC algorithm for scalar field theory

53 / 107

HMC for a scalar field theory

As an example, we consider a simple case: Scalar field theory

Lattice action (unrenormalized)

S [φ] =∑

x

a4

[1

2

4∑µ=1

(∆fµφx )2 +

m20

2φ2

x +g0

4!φ4

x

](51)

where ∆fµφx := 1

a(φx+µ − φx ) defines a forward derivative on the lattice.

For numerical simulations it is advantageous to express S such that it onlycontains dimensionless quantities. We achieve this by

aφ→√

2κφ, a2m0 =1− 2λ

κ− 8, g0 =

6λ

κ2(52)

which gives us

S [φ] =∑

x

[−2κ

4∑µ=1

φxφx+µ + φ2x + λ(φ2

x − 1)2 − λ

](53)

54 / 107


To set up the HMC we use the last form of the action (repeated here)

S [φ] =∑

x

[−2κ

4∑µ=1

φxφx+µ + φ2x + λ(φ2

x − 1)2 − λ

]

and define the Hamiltonian of a fictitious phase space (π, φ)

H[π, φ] =1

2

∑x

π2x + S [φ] . (54)

H is a real-valued function of φx ∈ R and their conjugate/canonical momentaπx ∈ R.

We will also need the force

Fx :=∂S [φ]

∂φx= 2φx + 4λ(φ2

x − 1)φx − 2κ4∑µ=1

(φx+µ + φx−µ) (55)

55 / 107


The HMC algorithm consist of the following three steps:

1. A trajectory is started by choosing a set of Gaussian distributed momentaπ = {πx}, i.e.,

P[π] ∝ e−12

∑x π

2x

Why? Those momenta are auxiliary fields we need for step 2 and 3. Wetherefore have to integrate over them∫

[Dφ] [Dπ]︸︷︷︸∏x dπx

e−∑

x π2x

︸︷︷︸V -dim Gaussian integral

e−S[φ]

for which we use important sampling again. It is sufficient to refresh π atthe start at every trajectory.

56 / 107



2. A configuration (π, φ) is then evolved along a discretized trajectory in thephase space until it reaches a new configuration (π′, φ′) after nτ steps.The evolution is governed by the EoM

d

dτπx = − ∂H

∂φx= − ∂S

∂φx,

d

dτφx =

∂H

∂πx= πx (56)

which are

I deterministic, i.e., (π′, φ′) is uniqueI reversible, i.e., (π, φ)↔ (π′, φ′)I area-preserving, i.e., [Dπ′,Dφ′] = [Dπ,Dφ]

Important: any (discrete) implementation of the evolution must fulfillthese properties, as for example the leap-frog integrator (see below).

56 / 107



3. After a full trajectory (π, φ)→ (π′, φ′), that is after n steps of integration,the new configuration (π′, φ′) is either accepted or rejected using theacceptance probability of the Metropolis algorithm

P = min{1, e−∆H} where ∆H = H ′ − H

Again one chooses a random number r ∈ [0, 1) and accepts (π′, φ′) if

r ≤ P

If r > P the last φ is kept as initial φ for the next trajectory.

56 / 107


Note that if the evolution (step 2) was exact, i.e., we could solve the EoMexactly, H would be conserved and so all new configurations would be accepted.

However, we can solve/integrate these equations only approximately.

A popular choice to integrate the discretized Hamiltonian equations is theleapfrog algorithm, which for any finite ∆τ is

I Simple

I Deterministic

I Reversible (apart from numerical rounding errors)

I Area preserving

Since ∆τ is finite, numerical errors are unavoidable (∆H 6= 0). The half-sizesteps of the leapfrog (see below) introduce O(∆τ 2) errors, the full steps errorsof O(∆τ 3). These errors are however corrected by step 3 (accept-reject step)

57 / 107

The leapfrog algorithm for a scalar field theory

Starting point (τ = 0): (π(0), φ(0))

Elementary operations:

Iπ(ε) : π(i) → π(i+1) = π(i) − ε

Eq.(55)︷︸︸︷F [φ(i)] (57a)

Iφ(ε) : φ(i) → φ(i+1) = φ(i) + ε π(i+1) (57b)

End point (τ = n ·∆τ):

(π(n), φ(n)) =

[Iπ

(∆τ

2

)Iφ(∆τ) Iπ

(∆τ

2

)]n

︸︷︷︸I (∆τ,n)

(π(0), φ(0)) (58)

58 / 107

The leapfrog algorithm for a scalar field theory

Remember for detailed balance the molecular dynamic step needs to be

I Reversible: I (∆τ, n) · I (−∆τ, n) = 1(invertible mapping of phase space)

I Area-preserving: [Dπ(0),Dφ(0)] = [Dπ(N),Dφ(N)]integration measure preserved

We will now show that the leapfrog algorithm satisfies both

59 / 107

The leapfrog algorithm for a scalar field theoryArea-preserving

For a single step i → i + 1

(π(i+1), φ(i+1)) =

[Iπ

(∆τ

2

)Iφ(∆τ) Iπ

(∆τ

2

)](π(i), φ(i)) (59)

we have for the Jacobian

det

[∂(π(i+1), φ(i+1))

∂(π(i), φ(i))

]= det

[∂(π(i+1), φ(i+1))

∂(π(i+1/2), φ(i+1))

∂(π(i+1/2), φ(i+1))

∂(π(i+1/2), φ(i))

∂(π(i+1/2), φ(i))

∂(π(i), φ(i))

]= det

[(1 . . .0 1

)(1 0

∆τ 1

)(1 . . .0 1

)]= 1

60 / 107

The leapfrog algorithm for a scalar field theoryReversibility

If for simplicity we combine the elementary operations, Iπ and Iφ, of oneintegration step

φ(i+1) = φ(i) + επ(i) − ε2

2F [φ(i)] (60a)

π(i+1) = π(i) − ε

2

(F [φ(i)] + F [φ(i+1)]

)(60b)

we easily see that if we integrate backwards, i.e. we start the integration from(φ(i+1), π(i+1)) with −ε as step size we reach again (φ(i), π(i)):

φ(i+1) → φ(i+1) − επ(i+1) − ε2

2F [φ(i+1)]

= φ(i+1) − επ(i) +ε2

2

(F [φ(i)] + F [φ(i+1)]

)− ε2

2F [φ(i+1)]

= φ(i+1) − επ(i) +ε

2F [φ(i)] = φ(i) (61a)

π(i+1) → π(i+1) +ε

2

(F [φ(i+1)] + F [φ(i)]

)= π(i) (61b)

q.e.d.

61 / 107


62 / 107


The HMC algorithm for lattice QCD proceeds similarly.

The involved expressions (lattice fields, force term) are however morecomplicated and computationally expensive, which is due to the non-abeliannature of the link variables U = {Uxµ}.

Comparison of lattice fields

φ4 LQCD

φx ∈ R Uxµ = e iωaxµT a

∈ SU(3), ωaxµ ∈ R, T a ∈ su(3),T a = T a†

πx ∈ R Pxµ = iPaxµT

a ∈ su(3), Paxµ ∈ R

63 / 107


Comparison of force terms

φ4 : Fx :=∂S [φ]

∂φx

= 2φx + 4λ(φ2x − 1)φx − 2κ

4∑µ=1

(φx+µ + φx−µ) (62)

LQCD : F axµ :=

∂S [U]

∂Uxµ≡ ∂S [e iωU]

∂ωaxµ

∣∣∣∣ω=0

where ω ≡ {ωaxµT

a}

=∂Sg [e iωU]

∂ωaxµ

∣∣∣∣ω=0

+∂

∂ωaxµ

ln detM2[e iωU]

∣∣∣∣ω=0

(63)

(for simplicity we consider Nf = 2 degenerate fermions)

64 / 107

Force term for pure gauge theoryHMC for lattice QCD simulations

Let’s focus first on pure SU(3) lattice gauge theory with the standard Wilsongauge action (i.e., no fermions).

Sg = SW := β∑

x,µ<ν

(1− 1

3ReTrUxµUx+µ,νU

†x+ν,µU

†xν

)(64)

The force term reads

F axµ :=

∂SW [e iωU]

∂ωaxµ

∣∣∣∣ω=0

= −β3ReTr iT aUxµWxµ =

β

3ImTrT aUxµWxµ (65)

W denotes the sum of staples at x in direction µ: x x + µ

Wxµ :=∑ν 6=µ

Ux+µ,νU†x+ν,µU

†xν + U†x+µ−ν,νU

†x−ν,µU

†x−ν,ν (66)

65 / 107

HMC for pure lattice gauge theory

For the Wilson gauge action the HMC reads:

1. A trajectory is started choosing a set of Gaussian distributed momentumcomponents Pa

xµ ∈ R. These define the set of initial momenta P = {Pxµ}with

Pxµ =∑

a

iPaxµT

a ∈ su(3)

conjugate to the link variables U = {Uxµ}.

2. This start configuration (P,U) is evolved along a discretized trajectoryapplying n leapfrog steps of size ∆τ = τ/n (typically τ = 1)

(P ′,U ′) =

[IP

(∆τ

2

)IU (∆τ) IP

(∆τ

2

)]n

︸︷︷︸I (∆τ,n)

(P,U) (67)

Hamiltonian: H[P,U] = 12

∑xµa(Pa

xµ)2 + SW [U]

66 / 107


The elementary operations for each leapfrog step are

IP (ε) : Paxµ → Pa

xµ − εF axµ[U] (68a)

IU (ε) : Uxµ → e iεPaxµT a

Uxµ (68b)

Remember, the force term components read F axµ = β

3ImTrT aUxµWxµ,

and that we split IP (ε) into two half steps. P after the first half step isused for IU (ε)

3. After n leapfrog steps the new configuration (P ′,U ′) is either accepted orrejected using the Metropolis step: We choose a random r ∈ [0, 1) andaccept (P ′,U ′) if r < P with

P = min{1, e−∆H} where ∆H = H ′ − H

Otherwise (P ′,U ′) is rejected and becomes the old (P ′,U ′) = (P,U).

67 / 107


Useful simplification for implementation of step 2:

(P ′,U ′) =

[IP

(∆τ

2

)IU (∆τ) IP

(∆τ

2

)]n

(P,U)

= IP

(∆τ

2

)IU (∆τ) IP

(∆τ

2

)IP

(∆τ

2

)︸︷︷︸

IP (∆τ)

IU (∆τ) · · · IU (∆τ)IP

(∆τ

2

)(P,U)

= IP

(∆τ

2

)IU (∆τ)︸︷︷︸

final step

IP (∆τ)IU (∆τ) · · · IP (∆τ)IU (∆τ)︸︷︷︸(n−1) full steps

IP

(∆τ

2

)︸︷︷︸first half step

(P,U)

68 / 107

HMC for lattice QCD with fermions

Up to now we have neglected contributions to S from dynamical quarks.Adding them will change the action, in particular the force, in a nontrivial way(most expensive part of HMC calculation).

For step 3, the change of H is harmless, but the force F becomes non-local andrenders the calculation of F a

xµ[U] computationally expensive for each of themany iterations (step 2)


xµ − εF axµ[U]

Calculation of force F

I For plain Wilson action: only staples are needed

I With fermions: an inversion of the Dirac operator is needed for eachleapfrog step IP (ε).

Let us look at this now in more detail.

69 / 107


Lattice action with dynamical fermions

S [U, ψ, ψ] = Sg [U] + Sf [U, ψ, ψ] (69)

Simplest form: Wilson gauge action for the gauge part Sg = SW and plainWilson fermions for the fermionic part

Sf [U, ψ, ψ] =∑

q=u,d,s,..

∑x

{ψq

xψqx − κq

∑±µ

(ψq

x+µ[1 + γµ]Uxµψqx

)}︸︷︷︸∑

xy ψx M(q)xy ψx

(70)

where M(q)xy [U] = δqq′

(δxy − κq

∑±µ

δy,x+µ(1 + γµ)Uxµ

)(71)

is the Wilson Dirac operator; notation: γ−µ ≡ −γµ and Ux,−µ = U†x−µ,µ

70 / 107



〈O〉 =1

Z



〈O〉 =1

Z


withSeff [U] = Sg [U]− ln detM (74)

We got rid off the Grassmann numbers but our effective action is non-local.

How to deal with that?

71 / 107

Dealing with the fermionic determinant

What about treating the determinant as part of an observable?

〈O〉 =〈detMu detMdO[U]〉U〈detMu detMd〉U

(75a)

where

〈(•)〉U ≡1

Z

∫[DU] eSg [U] (•) (75b)

is the path integral of pure gauge theory

Sounds like a nice idea, but values for the determinant covering several ordersof magnitude

=⇒ large fluctuations. Not an option for QCD.

72 / 107


Fermionic determinant is considered as part of the probability weight factor forthe Markov chain

P[U] =1

Ze−Sg [U] detMu detMd (76)

To this end: determinant must be (a) real and (b) positive

(a) M is γ5-symmetric: Myx = γ5M†xyγ5 and so

detM = detM† = (detM)∗ is real

(b) M can be negative: Consider two mass-degenerate fermion flavors:

M ≡ M(u) = M(d) (κu = κd )

Then: detM(u) detM(d) = detM detM ≥ 0 is positive.

It is useful to consider: detM detM = detM detM† = detMM† which isHermitian.

73 / 107


Ok fine. But still how to calculate the determinant? M is sparse butnonetheless a huge matrix.

We use the exact relation [GL]:

πN

detA· e

∑kl χ†k

(A)−1klχl =

∫ N∏k=1

dφRdφI e−

∑kl φk Aklφl +

∑k (φ†kχk +χ

†kφk ) (77)

I A has eigenvalues which all have positive real parts

I φ = φR + iφI ∈ C; similar for χ

This is very useful since

1. if A = MM† we can get M−1kl and TrM−1 from a Gaussian integral

(see stochastic noise integration, e.g., of disconnected diagrams)

2. if A = (MM†)−1 we can express detMM† as a Gaussian-type integral(this is what we need for the HMC)

74 / 107

HMC for lattice QCD with two mass-degenerate fermions

Introducing a bosonic integral [k, l are multi-indices: k = (x , α, a)]

detMM† =1

πN

∫[DφR ][DφI ] e−

∑kl φ†k

[MM†]−1klφl︸︷︷︸

= e−ηkηk where φ= Mη

(78)

With this the path integral for the expectation values becomes

〈O〉 =1

Z

∫[DU] O[U] e−Seff [U]

=1

Z ′

∫[DU][DφR ][DφI ] e

−Sg +φ†[MM†]−1φ O[U] (79a)

where

Z ′ =

∫[DU][DφR ][DφI ] e

−Sg +φ†[MM†]−1φ (79b)

75 / 107


That is, we have to generate a Markov chain of U’s with the probability weight

P(U) =

∫[DφR ][DφI ]e

−Sg [U]+φ†[M(U)M†(U)]−1φ (80)

Remember, we want to generate this Markov chain using the HMC algorithm,and for the step


xµ − εF axµ[U]

we need the fermionic part to force F

F axµ =

∂SW [e iωU]

∂ωaxµ

∣∣∣∣ω=0︸︷︷︸

β3Im Tr T aUxµWxµ

+∂

∂ωaxµ

ln detM2[e iωU]

∣∣∣∣ω=0︸︷︷︸

??

= . . . . . . +

∫[DφR ][DφI ]

∂

∂ωaxµ

eφ†[M(e iωU)M†(e iωU)]−1φ

∣∣∣∣ω=0

76 / 107


The contribution from the (two-flavour) fermionic part is (M≡ MM†)

∂axµ

(φ†M−1 φ

)= −φ†M−1 (∂a

xµM)M−1φ (81)

= −(M−1φ)†[(∂a

xµM)M† + M(∂axµM

†)]

(M−1φ) (82)

where for the derivative of M we have (next slide)

(∂axµM)yz = −κ

[δxzδy,z+µ(1 + γµ)iT aUxµ − δxyδy,z−µ(1− γµ)iT aU†xµ

]. (83)

Thus, we sample Gaussian-distributed complex φax at the beginning of each

trajectory and solve for each leapfrog step M[U(i)]χ(i) = φ

I using the CG-algorithm or any more advanced inversion algorithm.

I χ(i)[(∂a


†)]χ(i) then gives the fermionic part to F a

xµ

77 / 107

Derivative of the Wilson-Dirac operator

∂axµf [U] ≡ ∂

∂ωaxµ

f [e iωU]

∣∣∣∣ω=0

⇒ ∂axµUzν = δxzδµν

∂

∂ωaxµ

e+iT bωbxµUxµ

∣∣∣∣ω=0

= +δxzδµν iTaUxµ

⇒ ∂axµU

†zν = δxzδµν

∂

∂ωaxµ

e−iT bωbxµU†xµ

∣∣∣∣ω=0

= −δxzδµν iTaU†xµ

For the derivative of the Wilson-Dirac operator we thus get

(∂axµM)yz = ∂a

xµ

(δyz − κ

∑±ν

δy,z+ν(1 + γν)Uzν

)

= ∂axµ

(δyz − κ

∑ν

δy,z+ν(1 + γν)Uzν + δy,z−ν(1− γν)U†z−ν,ν

)= −κ

[δxzδy,z+µ(1 + γµ)iT aUxµ − δxyδy,z−µ(1− γµ)iT aU†xµ

]

78 / 107

HMC for two-flavor lattice QCDA summary

1. A trajectory is started choosing

(a) Gaussian distributed momentum components Paxµ ∈ R. These define the set

of initial momenta P = {Pxµ} conjugate to the link variables U = {Uxµ}

Pxµ =∑

a

iPaxµT a ∈ su(3)

conjugate to the link variables U = {Uxµ}.(b) Gaussian distributed momentum complex numbers ηa,α

x ∈ C for the pseudofermion fields φ = Mη, needed for the fermionic force

2. (P,U) is evolved along a discretized trajectory applying n leapfrog steps ofsize ∆τ = τ/n (no evolution of φ is needed)

= IP

(∆τ

2

)IU (∆τ)︸︷︷︸

final step

IP (∆τ)IU (∆τ) · · · IP (∆τ)IU (∆τ)︸︷︷︸(n−1) full steps

IP

(∆τ

2

)︸︷︷︸first half step

(P,U)

79 / 107

HMC for two-flavor lattice QCDA summary

2. . . .The elementary operations for each leapfrog step are


xµ − εF axµ[U] (84a)

IU (ε) : Uxµ → e iεPaxµT a

Uxµ (84b)

where the force consists of F (Wilson gauge + Nf = 2 Wilson fermions)

gauge part:β

3ImTrT aUxµWxµ (85)

fermionic part: χ(i)[(∂a


†)]χ(i) (86)

with χ a solution of M[U(i)]χ(i) = φ we get from an inversion using theCG or any more advanced inverter.

3. After n leapfrog steps the new configuration (U ′) is either accepted orrejected if for a random r ∈ [0, 1)

r ≤ min{1, e−∆H(U,P→U′,P′)}

80 / 107

HMC diagnostics

81 / 107

HMC diagnostics

Acceptance rate

I Should be at 70%− 85%

I If acceptance rate = 0%, something is wrong with the MD evolution

I If acceptance rate = 100%, configurations are changed only little(large autocorrelation lengths)

I Adjust ∆τ to tune acceptance rate

Change of Hamiltonian

I ∆H will fluctuate above and below zero. Check that on average

〈e−∆H〉 = 1

I Check also that ∆H scales with ∆τ 2

82 / 107

HMC diagnostics

0.0

1.0

2.0

3.0

4.0

0 500 1000 1500

〈exp

∆H〉

τ

acc-rate: 80.7%

0.0

1.0

2.0

3.0

4.0

0 100

Figure: History of 〈e−∆H〉

83 / 107

HMC diagnostics

Reversibility test

I To get the right probability distribution the MD evolution needs to bereversible

I Check reversibility by integrating forward and backward (−ε)

Plaquette

I If one starts the thermalisation of new configurations, there will be athermalisation phase

I During this phase the average plaquette 〈P〉 will increase or decrease

I Check that the HMC saturates at the same 〈P〉, starting from “cold”,“hot” or another configuration

84 / 107

HMC diagnostics

0.5305

0.5310

0.5315

0.5320

0.5325

0 500 1000 1500

〈P〉

τ

0.5305

0.5310

0.5315

0.5320

0.5325

0 100

Figure: History of plaquette values

85 / 107

HMC diagnostics

Be aware:

I Autocorrelation times are observable-dependent.

I Monitoring the history of values of your observable and check ifthermalisation has reached

I If the “moving average“ has saturated, thermalisation has been reached.

I Drop the unthermalised values and start “measuring” your observable

I Get an estimate for the autocorrelation time.

86 / 107

Inverting the fermion matrix using the CG

87 / 107

Inverting the Fermion matrix

There are various algorithms to invert the (sparse) fermion matrix, in our casethe hermitian product M = MM† for two mass-degenerate fermions.

Popular solvers are

(Bi)CG: (Bi-)Conjugate gradient algorithm for (non-) symmetricmatrices

GCR: Generalized Conjugate Residual for non-symmetric matrices

GMRES: Generalized Minimal REsidual method for non-symmetricmatrices

These are so-called Krylov subspace methods.

Do not provide the inverse of a matrix, say the full matrix A−1 of A, but theaction of A−1 on some given vector ~b:

~x = A−1~b by solving the linear system: A~x = ~b

This is fully sufficient, because we need M−1φ and not M−1 itself.

88 / 107

Inverting the Fermion matrix

In practise, pre-conditioned systems are solved. With the conjugate gradient(CG), for example, instead of solving A~x = ~b, one solves

[AA−1]~y = ~b where ~y = A~x .

The level of efficiency increases with the level A−1

I is a good approximation of A−1

I its application to ~y does not introduce significant additional cost

For an illustration, how these “inverters” work we look at the CG algorithm,but leave any pre-conditioning aside.

89 / 107

System of conjugate vectors

If for two vectors ~u and ~v it holds

0 = 〈~u,A~v〉 ≡∑

kl

ukAklvl (87)

we say ~u and ~v are conjugate to each other with respect to the (symmetric)matrix A.

Supposed we had a set{~pi : 〈~pi ,A~pk〉 = 0, ∀i 6= k ∈ [1, n]

}of n mutually

conjugate “direction” vectors ~pk , we could write the solution to A~x = ~b as

~x =n∑i

αi~pi where αi =〈~pi ,~b〉〈~pi ,A~pi 〉

(88)

because〈~pi ,~b〉 = 〈~pi ,A~x〉 =

∑k

αk〈~pi ,A~pk〉 = αi 〈~pi ,A~pi 〉

System solved!

90 / 107

The conjugate gradient algorithm

But how to get the vectors ~pi ? Also, n may be very large?

It turns out, we do not need all ~pi ’s. A subset is sufficient, if it gives a goodapproximation to ~x . That is |r | < ε where

~r = ~b − A~x ′ is the residual of ~x ′ =∑i�n

αi~pi (89)

Note

The full solution ~x to A~x = b will minimize the function

f (~x) = ~xTA~x − ~b~x , (90)

because its Gradient∇f (~x) ≡ A~x − b = 0

91 / 107


For a symmetric matrix A, the CG algorithm tries to find all the relevantdirection vectors ~pi which are conjugate to each other{

~pi : 〈~pi ,A~pk〉 = 0, ∀i 6= k ∈ [1, n]}

The solution (within the desired precision) is then

~x =n∑i

αi~pi where αi =〈~pi ,~b〉〈~pi ,A~pi 〉

(91)

92 / 107


1. Initial guess: ~x (0) = 0

2. Initial residual: ~r (0) = ~b − A~x (0) “minus the Gradient”

3. Initial direction: ~p0 = ~r (0) “in direction minus the Gradient”

4. Next (better) solution ~x (1) = ~x (0) + α0~p0 where

α0 =〈p0,~b〉〈p0,A~p0〉

All the remaining ~pi>0 are now chosen conjugate to the gradient r (i>0) and toeach other.

For this, we use the Gram-Schmidt orthogonalization, where 〈~pk ,A~pi 〉 reads〈~u, ~v〉.

93 / 107

Reminder on the Gram-Schmidt orthogonalization/orthonormalization

Goal: sequence of orthonormal vectors (normalized and orthogonal)

~e1,~e2, . . . ,~en |ei | = 1, ~ei~ek = δik (92)

Gram-Schmidt orthonormalization

Starts with a set of n vectors: ~v1, ~v2, . . . , ~vn

orthogonal orthonormal

~u1 = ~v1 ~e1 = ~u1/|~u1|~u2 = ~v2 − P~u1 (~v2) ~e2 = ~u2/|~u2|~u3 = ~v3 − P~u1 (~v3)− P~u2 (~v3) ~e3 = ~u3/|~u3|

: : :

~un = ~vn −∑j<n

P~uj(~vn) ~en = ~un/|~un|

where

P~u(~v) ≡ 〈~u, ~v〉〈~u, ~u〉 (93)

94 / 107


1. Residual of current solution ~x (i): ~r (i) = ~b − A~x (i)

2. Choose next direction orthogonal to this and the other ~pk<i

~pi = ~r (i) −∑k<i

〈~pk ,A~r(i)〉

〈~pk ,A~pk〉~pk (Gram-Schmidt)

3. Next ~x (i+1) = ~x (i) + αi~pi where

αi =〈p(i),~b〉〈pi ,A~pi 〉

=〈pi ,~r

(i−1) − A~x (i−1)〉〈pi ,A~pi 〉

=〈pi ,~r

(i−1)〉〈pi ,A~pi 〉

4. The algorithm stops if | ~r (i)|2 < ε

Note ~r (i) points into the direction of steepest descent of f (~x (i)). Going intothat direction would be the gradient descent method.

95 / 107


r0 := b− Ax0

p0 := r0 (first direction = − Gradient)

k := 0

repeat

αk :=rTk rk

pTk Apk

xk+1 := xk + αkpk

rk+1 := rk − αkApk

if rk+1 is sufficiently small then exit loop

βk :=rTk+1rk+1

rTk rk

pk+1 := rk+1 + βkpk

k := k + 1

end repeat

The result is xk+1

96 / 107

HMC: Adding another flavor

97 / 107

“Adding flavor”

The standard HMC algorithm uses the fact that M(u) = M(d) and so detMM†

is positive and real.

Good approximation of two-flavor QCD, because mu ≈ md

What about adding heavier quark flavors to the simulation, which do not comein (almost mass-degenerate) pairs? That is, how do we simulate the +1 quarkin current Nf = 2 + 1 simulations?

This is more difficult, but we are lucky because for the relevant regimedetM(q≥s) is positive and can be replaced in the MD evolution by anon-negative hermitean operator.

Two popular approximations are:

1. Rational approximation (see, e.g., lecture by Luscher [L2010])

2. Polynomial approximation (see, e.g., book by Gattringer and Lang[GL2010]).

98 / 107

Polynomial Approximation

Here we follow [GL2010]: If detM is positive one can approximate it by anon-negative hermitean operator

M−1 = TT † (94)

and correct this approximation during the accept/reject step.

Polynomial approximation (zk = 1− e2πik/(2n−1)):

M−1 ≈ P2n ≡n∏

k=1

(M − z2k−1)(M − z∗2k−1) (95)

Due to γ5 Hermiticity det[M − z∗k ] = det[γ5M†γ5 − z∗k ] = det[M − zk ]∗, and so

P2n = TnT†n where T =

n∏k=1

(M − z2k−1) (96)

99 / 107

Polynomial Approximation

P2n = TnT†n is just an approximation, because the correct determinant is

detM = C/ det[T †n Tn] with C = det[MT †n Tn] (97)

However, it is sufficient to use

Sf = −φ†T †n Tnφ (98)

for the MD evolution (setp 2 of HMC) and then correct for C once in theMetropolis accept-reject step at the end of the trajectory. This makes it exactagain.

100 / 107

HMC: Improvements

101 / 107

Improving the efficiency of the HMC

Most expensive part of HMC is the fermionic force in particular solvingM[U(i)]χ(i) = φ for each leapfrog step of a trajectory

Typical targets for improvements of the HMC in the recent years:

I Reducing the number of if IP steps, in particular those where an inversionof the fermion matrix is needed (larger ∆τ , multiple-time scale)

I Reduce the cpu-time for a single inversion (preconditioning, chronologicalinverter)

It is beyond the scope of this lecture to cover all the improvements achievedover the last years. We will just highlight a few standard tricks.

102 / 107

Improving the efficiency of the HMCAchievements over the last years

Figure: From [L2010]: “Number of floating-point operations required for the generation of 100 statistically

independent gauge-field configurations in O(a)-improved two-flavour QCD on a 64× 323 lattice with spacinga = 0.08 fm. The top curve (Ukawa, 2002) represents the status reported at the memorable Berlin latticeconference in 2001, the middle one was obtained a few years later, using the so-called domain-decomposed HMCalgorithm (Luscher, 2005; Del Debbio et al., 2007), and the lowest curve shows the performance of a recentlydeveloped deflated version of the latter (Luscher, 2007b)”

103 / 107

Even-Odd Preconditioning

Standard trick for Wilson-Dirac operator

Mxy [U] =

(δxy − κ

∑±µ

δy,x+µ(1 + γµ)Uxµ

)

whereMee ≡ Mee −MeoM

−1oo Moe (99)

DeterminantdetM = det Mee detMoo (100)

Need to invert only det(MeeM

†ee

), the inverse of Moo is simple.

Advantages:

I Condition number of M typically less then half that of M

I less memory because φe is only half as long.

104 / 107

Even-Odd Preconditioning

Standard trick for Wilson-Dirac operator

M =

(Mee Meo

Moe Moo

)=

(1 −MeoM

−1oo

0 1

)︸︷︷︸

U

(Mee 0

0 Moo

)(1 0

−M−1oo Moe 1

)︸︷︷︸

Lwhere

Mee ≡ Mee −MeoM−1oo Moe (99)

DeterminantdetM = det Mee detMoo (100)

Need to invert only det(MeeM

†ee

), the inverse of Moo is simple.

Advantages:

I Condition number of M typically less then half that of M

I less memory because φe is only half as long.

104 / 107

Multiple step size integration

Force of MD evolution: F = Fg + Ff . Empirically one finds Fg � Ff . The mostexpensive is Ff , can accelerate MD evolution by introducing different step sizes

∆τg =τ

NgNfand ∆τf =

τ

Nf(101)

and split the MD evolution (n = NgNf )

(P ′,U ′) =

[IP

(∆τ

2

)IU (∆τ) IP

(∆τ

2

)]n

︸︷︷︸I (∆τ,n)

(P,U) (102)

105 / 107

Multiple step size integration

Force of MD evolution: F = Fg + Ff . Empirically one finds Fg � Ff . The mostexpensive is Ff , can accelerate MD evolution by introducing different step sizes

∆τg =τ

NgNfand ∆τf =

τ

Nf(101)

and split the MD evolution (n = NgNf ) into two parts

(P ′,U ′) =

[I fP

(∆τf

2

)Ig (∆τg ) I f

P

(∆τf

2

)]Nf

(P,U) (103)

where I fP (ε) : P → P − εFf [U], I g

P (ε) : P → P − εFg [U] and

Ig (∆τg ) =

[I gP

(∆τg

2

)IU (∆τg ) I g

P

(∆τg

2

)]Ng

(104)

Much less computing time because the fermionic part is not needed in I gP

105 / 107

Frequency splitting of determinant

Popular factorizations

1. RHMC [Kennedy et al. ’99, Clark et al. ’07]

detM2 =N∏i

det(M2)1/n (105)

2. Hasenbusch-trick [Hasenbusch et al 2001,’03] (mass preconditioning)

detM2 = detM2

M2 + µ2detM2 + µ2 (106)

3. Domain decomposition of lattice into blocks (Luscher 2004)

detM = detMblocks detMRest (107)

Represent the different parts by separate pseudo fermions: φ1, φ2,... and do theMD evolution of fermionic force again with a multiple-time scale integrator.

106 / 107

Thank you for your attention!

107 / 107

simulations with (hybrid) monte carlo...

Documents