hybrid update algorithms for regular lattice and small

Hybrid Update Algorithms for Regular Lattice and Small-World IsingModels on Graphical Processing Units

A. Leist, K.A. Hawick and D.P. PlayneComputer Science, Institute for Information and Mathematical Sciences,

Massey University, North Shore 102-904, Auckland, New Zealand{ a.leist, k.a.hawick, d.p.playne }@massey.ac.nz

Tel: +64 9 414 0800 Fax: +64 9 441 8181

Abstract— Local and cluster Monte Carlo updatealgorithms offer a complex tradeoff space for opti-mising the performance of simulations of the Isingmodel. We systematically explore tradeoffs betweenhybrid Metropolis and Wolff cluster updates for the3D Ising model using data-parallelism and graph-ical processing units. We investigate performancefor both regular lattices as well as for small-worldperturbations when the lattice becomes a genera-tised graph and locality can no longer be assumed.In spite of our use of customised Compute UnifiedDevice Architecture (CUDA) code optimisations toimplement it, we find the Wolff cluster update losesout in computational performance efficiency overthe localised Metropolis algorithm systemically asthe small-world rewiring parameter is increased.This manifests itself as a phase transition in thecomputational performance.

Keywords: GPU; CUDA; Ising model; Wolff; Metropolis

1. IntroductionThe Ising model [1] is a thoroughly studied

model of a computational ferromagnet. Its popu-larity over the last decades comes from the factthat it is one of the simplest models of a system ofinteracting particles with some physically realisticfeatures [2]. Like metal alloys that are magneticonly below a certain critical temperature Tc, alsoknown as the Curie temperature, the Ising modelundergoes a phase transition when the system tem-perature transitions from ”hot” to ”cold”.

When the system is in a ”hot” state, then thereis no order in the spin values of its cells. But as thesystem approaches the critical temperature, clumpsof like-like spins begin to form, creating order inthe formerly unordered system. However, it hasbeen shown that the phase transition only occursin dimensions greater than one [1]. The cells in theIsing model know the two states ”up” and ”down”,

but extensions to more states can be analysed usingthe Q-state Potts model [3]. While analytical meth-ods to determine the critical temperature are knownfor the two-dimensional case [4], no such methodsare known for the three or more dimensional cases.Instead, Monte Carlo simulations are often used toapproximate the solution using random sampling.

The interactions between neighbouring cells inthe ferromagnetic Ising model are defined by aHamiltonian of the form [5]:

H = −∑〈i,j〉

Jijσiσj, (1)

where σi = ±1, i = 1, 2, ...N sites. Jij is|J | = 1/kBT is the ferromagnetic coupling overneighbouring sites i and j on the network, T isthe temperature and kB is the Boltzmann constant.The total energy E of a single configuration isobtained from the Hamiltonian. The magnetisationM is measured from a single configuration as [5]:

M =1

N

∣∣∣∣∣∑i

σi

∣∣∣∣∣ (2)

The Ising model can also be used to studyvarious other kinds of systems with pairwise cor-relations between neighbouring nodes, like thepropagation of opinions in a social community [6].However, these models are often more accuratelydescribed using irregular graphs instead of the tra-ditionally regular lattices used for the Ising model.Of particular interest are complex structures ob-served in many natural and social networks. Suchnetworks often exhibit properties that classify themas small-world [7], [8], [9] or scale-free [5] graphs.We are particularly interested in small-world graphstructures, and even though Ising model simula-tions on these networks have been performed inthe past [10], [11], [6], the system sizes wererather limited due to the computational complexity

involved. But to determine the scaling of varioussystem properties as the rewiring probability p –which is used to rewire a fraction of the edges ofthe originally regular lattice to obtain a networkstructure similar to that generated by the Watts-Strogatz small-world model [8] – approaches zero,very large systems need to be analysed. Of specialinterest is the change in the critical temperaturewith respect to the rewiring probability, Tc(p), overseveral length scales of p.

To tackle this problem, we have parallelised twocommonly used Markov chain Monte Carlo algo-rithms to run on the highly data-parallel architec-ture of modern graphics processing units (GPUs),specifically those based on the Compute UnifiedDevice Architecture (CUDA) [12]. The first oneis the Metropolis-Hastings algorithm [13], [14],which selects a random cell at every time step andproposes to flip its spin. The new configuration isalways accepted if it has a lower energy than thecurrent configuration, ∆E ≤ 0. Otherwise, it isaccepted with probability exp(−∆E/kbT ), whichdepends on the change in energy and the systemtemperature T . The parallel implementation of thisalgorithm is described in [15], where all N cellsof the system are updated during each simulationstep.

The Metropolis algorithm is shown to performvery well on the parallel architecture of the GPUand significant speed-ups compared to a sequentialCPU implementation are achieved. However, thisdoes not improve upon an issue known as thecritical slow down, which is inherent to the localupdate dynamics of this algorithm. The localisednature of the interactions between individual cellsand the size of correlated regions near the criticaltemperature make it necessary to perform manysystem updates to obtain two system configurationsthat are sufficiently decorrelated to be consideredas independent.

Cluster updates like those performed by theWolff algorithm [16] do not suffer from the criticalslow down, as they update entire clusters of cellswith like spins during every simulation step. Thealgorithm works by picking a random cell i andmarking it as the first site of a cluster. It thenvisits all neighbouring cells j and adds them tothe cluster if their spins σi and σj are equal and ifthe bond is activated, which happens with randomprobability p(〈i, j〉) = 1 − e−2β , where β is thereciprocal temperature. The algorithm continuesiteratively until no new cells are added to the

Fig. 1: A 64× 64× 64 Ising model near criticality.

cluster. Then the spin values of the entire clusterare flipped and the simulation step is completed.

While [17] describes how the Wolff algorithmfor both regular lattice and small-world Ising mod-els can be parallelised on the GPU, the per-formance is significantly lower than that of theMetropolis algorithm on the GPU. It is thereforeinteresting to investigate whether the higher perfor-mance of the Metropolis updates or the avoidanceof the critical slow down with the Wolff algorithmcan produce decorrelated system configurationsmore quickly and what effect the different graphstructures have on the results. Going one step fur-ther, we also investigate whether interleaving bothalgorithms at various ratios offers a performanceadvantage.

The Ising model simulation has two distinctphases. The system is initialised randomly andmust be allowed to settle around its equilibriumenergy before its properties are representative forthe the given parameters. This first phase is dis-cussed in section 2. Once equilibrated, the ac-tual simulation begins, where decorrelated systemconfigurations are generated and various systemproperties can be obtained from these independentconfigurations. This is discussed in section 3. Fi-nally, section 4 offers a discussion and conclusions.

2. System EquilibrationStarting from a random configuration, the system

is quenched until it reaches its equilibrium state.We look at the performance of the Metropolisalgorithm on its own, but do not consider theWolff algorithm on its own in this phase, as itperforms very poorly with the initially randomsystem. The reason for this is that the clusters arevery small as a result of the randomness, which isnot a good situation for the parallel implementa-tions of the Wolff algorithm. However, two hybridcombinations M:W 1:1 and 4:1, are tested. Thelatter test case is used to investigate whether arelatively small number of the slower Wolff updatesis enough to counter the effects of the criticalslow down that reduces the efficiency of the fastMetropolis updates. These algorithmic combina-tions are tested on regular lattice and small-worldrewired lattice structures in three-dimensions. Onlythe results for rewiring probability p = 10−4 areshown for the small-world structures, as they arerepresentative for all other tested rewiring proba-bilities in the range p = 10−2 to p = 10−7 in thisphase.

The system temperature T is set to the bestapproximation of the critical temperature Tc(p)for the given configuration. From previous studies[18], [19], [20], this is known to be Tc(p =0) ≈ 4.5115 for the regular lattice Ising model.But there are no good estimates for the small-world lattice with rewiring probabilities as smallas those considered here available at this time. Itis the eventual aim of this study to determine thesecritical temperatures to a high precision. Therefore,we obtain initial rough estimates by computingBinder’s fourth-order magnetisation cumulant [21]over a relatively large temperature range with alarge step size. The result from this is Tc(p =10−4) ≈ 4.514, which is then used to calibrate theequilibration phase for more thorough simulationruns.

Essentially, the aim is to ensure that the systemis equilibrated but to avoid an unnecessarily largenumber of system updates. Figure 2 illustrates thechange in the system energy over a number ofsimulation steps and with respect to time for thedifferent algorithmic combinations. The generallymore important result is the time to equilibrate thesystem, as it directly affects the wall clock timerequired to execute the simulation.

The results for the regular lattice Ising simula-tion show that all tested algorithmic ratios equili-

brate the system in approximately the same numberof steps. It is important to note that only the bottomend of the energy range, where the system settlesinto its equilibrium state, is shown in the graph.The results of the change in energy with respectto time, on the other hand, show the measurementsover the entire energy spectrum. The step functionsvisible for ratios 4:1 and 1:1 on this plot alsoreinforce the before mentioned issues of the Wolffcluster updates with the initially random systemconfiguration. During the first few update steps,the Wolff algorithm has almost no effect on thesystem energy and all the work is performed bythe Metropolis updates. The Metropolis algorithmon its own thus equilibrates the system in less timethan the hybrid ratios.

The results for the small-world rewired latticealso show that all tested combinations of Metropo-lis and Wolff updates equilibrate the system innearly the same number of simulation steps. Al-though the mean energy measured for the 1:0 ratiois slightly higher than that for the hybrid ratios,the difference is within the region of uncertaintyillustrated by the inset to the plot. The plot of theenergy with respect to time clearly emphasises theperformance difference between the Metropolis andWolff algorithms. Once again, the step functionsvisible in the hybrid update measurements illustratethat the Wolff algorithm is not a good choice forthis phase.

To verify that the system is indeed equilibrated,the histogram of the energy distribution over anumber of simulation steps is plotted as shown inFigure 3. The approximately normal distribution ofthe values is characteristic for random fluctuationsaround a mean, in this case the equilibrium energyof the system. From these results, we can be confi-dent that the system with p = 10−4 is equilibratedafter 3000 simulation steps of either the 1:0, 1:1 or4:1 ratios.

3. Decorrelated MeasurementsOnly when the system has reached its equilib-

rium state for a given temperature can meaningfulmeasurements of its characteristics be taken. Butanother factor needs to be considered to obtainstatistically relevant results too, namely the correla-tion between successive system states. As demon-strated in Figure 4, every Metropolis or Wolff steponly truly updates a fraction of the cells. This is dueto the correlated regions of like-like bonds causedby the local update mechanics of the Metropolis

rewired lattice - p=0.0001 rewired lattice - p=0.0001

Fig. 2: The plots illustrate the equilibration phase of the Ising model on a regular lattice (top) and rewiredlattice with p = 10−4 (bottom). The system size is N = 1283 for the regular lattice and N = 3843 for thesmall-world graph. The change in energy is shown with respect to the number of simulation steps (left)and with respect to time (right). Each data point is averaged over 30 simulation runs. The inset to theplots on the left illustrate the standard deviations for ratios 1:0 and 1:1 to give an idea of the extent ofthe uncertainty at this level of magnification. The uncertainty in the energy measurements of the timingresults is negligible relative to the given energy range, but therefore the actual time measurements showa significant error for some of the data points as illustrated by the horizontal error bars.

algorithm and the fact that only a limited numberof cells are part of any one Wolff cluster, especiallyin the region around the critical temperature. It istherefore necessary to perform enough update stepsbetween measurements to sufficiently decorrelatethe system. We use the Pearson product-momentcorrelation coefficient ρ [22] to quantify the corre-lation between two populations. It is defined as:

ρX,Y =E[(X − µX)(Y − µY )]

σXσY, (3)

where E is the expected value operator, µXand µY are the expected values and σX and σYare the standard deviations of populations X andY respectively. The coefficient ρ is +1 in thecase of perfect correlation and −1 in the case ofperfect anticorrelation. Here, either case is equallyundesirable and, thus, the results presented in thispaper use the absolute value of the correlationcoefficient |ρ|. While a coefficient of zero indicatesthat the metric does not detect any correlationbetween the two system states, trying to obtain

Fig. 3: The histogram shows the energy distributionfor simulation steps 3000 to 10000 of the 1:1hybrid Metropolis-Wolff update ratio and rewiringprobability p = 10−4. The approximately normaldistribution of the values indicates that the systemis equilibrated.

this level of decorrelation would be inefficient.Therefore, two system states are deemed to besufficiently decorrelated to be used for independentmeasurements of system properties when |ρ| ≤10−2.

Figure 4 shows how the correlation decreaseswith the number of system updates and, onceagain more relevant to the actual execution of thesimulation, how it changes with respect to time.The same hybrid algorithmic ratios used during theequilibration phase are tested again. Like before,no results for the Wolff algorithm on its own areshown. The reason to omit them in the alreadyequilibrated system is that the difference in per-formance compared to the Metropolis algorithm istoo great to even consider using only Wolff updateswhen running the simulations on the GPU. In-stead, additional results for the small-world rewiredlattice with p = 10−2, Tc ≈ 4.592 and systemsize N = 3523 are given. As demonstrated in theresults of the equilibration phase, the Metropolisalgorithm parallelises significantly better than theWolff cluster updates. It is therefore interestingto observe if any of the hybrid algorithmic ratiosperforms better than the Metropolis algorithm onits own and whether the results are the same forall tested graph structures.

The plots for the results of the regular latticeIsing simulation illustrate that the hybrid updateswith both Metropolis and Wolff steps interleavedat different ratios indeed require fewer simulationsteps to decorrelate the system configuration to agiven degree than the Metropolis algorithm on itsown. This is the expected result as – due to thecritical slow down effect – the local Metropolisupdates are less efficient near the phase transition.However, as the measurements with respect to thewall clock time show, the sheer speed advantageof the data-parallel CUDA implementation of theMetropolis algorithm over the respective imple-mentation of the Wolff algorithm more than makesup for the reduced efficiency.

The results for the small-world Ising simulationgive an interesting insight into the effects of therewired lattice structure on the dynamics of thesystem. With the small rewiring probability p =10−4, the Wolff algorithm still reduces the numberof simulation steps required to sufficiently decorre-late the system. However, the results are reversedfor p = 10−2, where the Metropolis algorithmon its own decorrelates the system in fewer stepsthan the Wolff algorithm. All of the tested ratiosproduce independent measurements in significantlyfewer steps when compared to p = 10−4 and evenreach an apparently minimum correlation value ofapproximately 10−3.6, after which additional simu-lation steps no longer produce a noticeable change.The explanation for this is that the perturbations tothe lattice created during the rewiring procedurehelp reduce the effects of the critical slow down,as they facilitate long distance interactions that areotherwise not possible. While this benefits bothalgorithms, the Metropolis updates clearly profitmore. The results for the execution time once againshow that the much higher performance of theMetropolis updates offsets any advantage offeredby the Wolff algorithm for the small-world simu-lations.

4. Discussion & ConclusionsTo determine the scaling of the critical tem-

perature with respect to the rewiring probability,tens of millions of system updates have to beperformed for every combination of system size,rewiring probability and temperature. And manysuch combinations are necessary to compute a goodapproximation of the critical temperature from thecross over of several Binder cumulant curves. Toget a single data point on a Binder cumulant curve,

rewired lattice

Fig. 4: The graphs show how the correlation between system configurations of the Ising model changeswith respect to the number of system updates (left) and with respect to time (right). Different ratios ofthe Metropolis and Wolff (M:W) update algorithms are compared. The results for simulations on regularlattices (top) and small-world rewired lattices with p = 10−4 and p = 10−2 (bottom) are given. Thesystem size is N = 1283 for the regular lattice, and N = 3843 and N = 3523 for the small-worldsystems with p = 10−4 and p = 10−2 respectively. Each data point is averaged over 3000 measurementsfrom 30 independent simulation runs. Measurements are taken at intervals of 200 simulation steps (thereference configuration), to which the subsequent 200 system configurations are compared. The insetsillustrate the standard deviations from the correlation measurements as error bars. In addition, the secondinset to the plot on the lower right shows the error in the time measurements.

a large number of magnetisation values need to becomputed from independent system configurations.To make matters even more difficult, the systemsize required to reliably support a given rewiringprobability and remain in the small-world regimemust be much larger than the reciprocal of p [23].

Highly parallel architectures are needed to com-plete the enormous amount of computation de-manded by these simulations in a realistic amount

of time. The GPU has proven to be an excellentchoice for affordable, high performance simula-tions of a large number of complex models. TheMetropolis algorithm discussed here can utilise thedata-parallel architecture of the GPU particularlywell, making it a good choice for the Ising modelsystem updates. Although the Wolff cluster updatealgorithm can decorrelate system configurations infewer steps when the rewiring probability is very

small, it can not keep up with the Metropolisupdates in terms of raw performance. We alsoshow that for larger values of p, the avoidance ofthe critical slow down – usually one of the bigadvantages of the Wolff cluster algorithm over thelocalised nature of the Metropolis updates – losessignificantly in relevance. The random shortcutsintroduced to the lattice structure essentially havethe same effect, as they enable long distance inter-actions between different parts of the system.

We have demonstrated that the interleaving ofdifferent update algorithms can offer improvementsto the number of simulation updates needed todecorrelate the system configuration. But whichalgorithm or combination thereof gives the bestperformance is highly dependent on the specificsof the hardware architecture and system parametersused for the simulation. For the Fermi-architecturebased GPUs used during our tests, the Metropolisalgorithm always offers a better performance –in terms of the wall clock time to complete thesimulation – over any hybrid combination with theWolff algorithm.

Independent of the algorithmic decisions, it isworthwhile to invest some time into the optimi-sation of the different phases of the simulation.Any savings in the time required to generate inde-pendent system configurations add up particularlyquickly with the large number of measurementstypically needed for statistically reliable results.

With the parallel CUDA implementations of thesmall-world rewired lattice Metropolis and Wolffalgorithms [15], [17] and based on the optimisa-tions of the execution configurations for the dif-ferent phases of the simulation described in thispaper, it was possible to improve the estimate ofthe critical temperature for rewiring probabilitiesas small as p = 10−7 and system sizes of up toN = 5123 cells [24].

References[1] E. Ising, “Beitrag zur Theorie des Ferromagnetismus,”

Zeitschrift fuer Physik A Hadrons and Nuclei, vol. 31, no. 1,pp. 253–258, 1925.

[2] G. F. Newell and E. W. Montroll, “On The Theory Of TheIsing Model Of Ferromagnetism,” Reviews of Modern Physics,vol. 25, no. 2, pp. 353–389, 1953.

[3] R. B. Potts, “Some Generalized Order-Disorder Transforma-tions,” in Proceedings of the Cambridge Philosophical Society,vol. 48, no. 1, 1952.

[4] L. Onsager, “Crystal Statistics. I. A Two-Dimensional Modelwith an Order-Disorder Transition,” Physical Review, vol. 65,no. 3-4, pp. 117–149, 1944.

[5] J. J. Binney, N. J. Dowrick, A. J. Fisher, and M. E. J. Newman,The Theory of Critical Phenomena - An Introduction to theRenormalization Group. Oxford University Press, 1992.

[6] P. Svenson, “Damage spreading in small world Ising models,”Physical Review E, vol. 65, no. 3, p. 036105, 2002.

[7] S. Milgram, “The Small-World Problem,” Psychology Today,vol. 1, pp. 61–67, 1967.

[8] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442,June 1998.

[9] M. E. J. Newman, “Models of the Small World,” Journal ofStatistical Physics, vol. 101, no. 3-4, pp. 819–841, November2000.

[10] A. Barrat and M. Weigt, “On the properties of small-worldnetwork models,” The European Physical Journal B, vol. 13,no. 3, pp. 547–560, 2000.

[11] C. P. Herrero, “Ising model in small-world networks,” PhysicalReview E, vol. 65, no. 6, p. 066110, 2002.

[12] NVIDIA R© Corporation, “The Compute Unified Device Archi-tecture (CUDA),” http://developer.nvidia.com/ (last accessedMarch 2012).

[13] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H.Teller, and E. Teller, “Equation of State Calculations by FastComputing Machines,” Journal of Chemical Physics, vol. 21,no. 6, pp. 1087–1092, 1953.

[14] W. K. Hastings, “Monte-Carlo Sampling Methods UsingMarkov Chains And Their Applications,” Biometrika, vol. 57,no. 1, pp. 97–107, 1970.

[15] K. A. Hawick, A. Leist, and D. P. Playne, “Regular Lat-tice and Small-World Spin Model Simulations using CUDAand GPUs,” International Journal of Parallel Programming,vol. 39, no. 2, pp. 183–201, April 2011.

[16] U. Wolff, “Comparison Between Cluster Monte Carlo Algo-rithms in the Ising Model,” Physics Letters B, vol. 228, no. 3,pp. 379–382, 1989.

[17] K. A. Hawick, A. Leist, and D. P. Playne, “Cluster andFast-Update Simulations of Regular and Rewired LatticeIsing Models Using CUDA and Graphical Processing Units,”Massey University, Tech. Rep. CSTN-104, 2010. [Online].Available: http://www.massey.ac.nz/∼kahawick/cstn/

[18] A. M. Ferrenberg and D. P. Landau, “Critical behavior of thethree-dimensional Ising model: A high-resolution Monte Carlostudy,” Physical Review B, vol. 44, no. 10, pp. 5081–5091,September 1991.

[19] G. S. Pawley, R. H. Swendsen, D. J. Wallace, and K. G.Wilson, “Monte Carlo renormalization-group calculations ofcritical behavior in the simple-cubic Ising model,” PhysicalReview B, vol. 29, no. 7, pp. 4030–4040, 1984.

[20] C. F. Baillie, R. Gupta, K. A. Hawick, and G. S. Paw-ley, “Monte Carlo renormalization-group study of the three-dimensional Ising model,” Physical Review B, vol. 45, no. 18,pp. 10 438–10 453, 1992.

[21] K. Binder, “Finite Size Scaling Analysis Of Ising Model BlockDistribution Functions,” Zeitschrift fuer Physik B, vol. 43,no. 2, pp. 119–140, 1981.

[22] K. Pearson, “Note on Regression and Inheritance in the Caseof Two Parents,” Proceedings of the Royal Society, vol. 58,pp. 240–242, 1895.

[23] M. Barthelemy and L. A. N. Amaral, “Small-World Networks:Evidence for a Crossover Picture,” Physical Review Letters,vol. 82, no. 15, pp. 3180–3183, 1999.

[24] A. Leist, “Experiences in Data-Parallel Simulation andAnalysis of Complex Systems with Irregular GraphStructures,” Ph.D. dissertation, Massey University, Auckland,New Zealand, November 2011. [Online]. Available:http://hdl.handle.net/10179/2992

hybrid update algorithms for regular lattice and small

Documents