worldline algorithm by oracle-guided variational

7
Worldline algorithm by oracle-guided variational autoregressive network Zhifang Shi, 1 Yuchuang Cao, 1 Qiangqiang Gu, 1 and Ji Feng 1, 2, 3, 1 International Center for Quantum Materials, School of Physics, Peking University, Beijing 100871, China 2 Collaborative Innovation Center of Quantum Matter, Beijing 100871, China 3 CAS Center for Excellence in Topological Quantum Computation, University of Chinese Academy of Sciences, Beijing 100190, China (Dated: December 29, 2020) The classical variational autoregressive network is extended to the Euclidean path integral repre- sentation of quantum partition function. An essential challenge is adapting the sequential process of sample generation by an autoregressive network to a nonlocal constraint due to the periodic boundary condition in path integral. An add-on oracle is devised for this purpose, which accurately identifies and stalls unviable configurations on the fly. The oracle leads to rejection-free sampling conforming to the periodic boundary condition. As a demonstration, the oracle-guided autoregres- sive network is applied to obtain variational solutions of quantum spin chains at finite temperatures with relatively large system sizes and numbers of time slicing, which enable ecient computation of thermodynamic quantities. The Euclidean path integral 1 representation of a quan- tum mechanical partition function creates a bridge be- tween quantum and classical statistical mechanics, and has engendered continued developments in the theory of quantum many-body systems. For a lattice Hamiltonian H, the imaginary time τ -axis [0, β] can be discretized into slices separated by τ and the Boltzmann measure becomes a product of e τ H evaluated on the space- time lattice, with a suitable boundary condition. This reduces the quantum statistical problem to a classical one in (d + 1) dimensions, whose configuration can of- ten be represented conveniently by worldlines. Numeri- cal techniques, often referred to as worldline algorithms, have been developed to evaluate the partition function based mostly on the Monte Carlo method. 25 , especially powerful for boson and quantum spin problems. Recently, Wu et al proposed a variational approach based on an autoregressive network to classical statisti- cal physics. 6 In this approach, the distribution function of a classical system is factored into a product of condi- tional probabilities that can be processed naturally and optimized by an autoregressive network. This variational autoregressive network (VAN) approach to classical sta- tistical mechanics is free of Markov chains, and conse- quently, highly parallelizable. Furthermore, because its sample generation carries minimal correlation, the VAN method may avoid the critical slowing down. VAN has also been adapted to represent wavefunctions of a many- body system to enable a variational approach for obtain- ing ground state wavefunctions. 7 In this paper, we report an investigation of the VAN approach to quantum statistical problems. For a quan- tum statistical problem at finite temperatures, an essen- tial obstacle to adapting VAN to a path integral is the constraint that the worldlines must be periodic in τ . This nonlocal constraint is not easily expressible in the autore- gressive network, which parses and predicts configura- tions consecutively on a spacetime lattice. This problem is overcome by an add-on oracle, which eciently iden- tifies and stalls dead ends among intermediate configu- rations in the sequential sample generation process. The oversight by the oracle ensures ecient sampling of the spacetime configurations, which are then parsed and op- timized by the VAN to reach thermal equilibrium. We il- lustrate our method using the XXZ quantum spin chain model. It is found that thermodynamic quantities can be computed eciently and accurately for antiferromag- netic and XY -like regimes, establishing a new technique for exploring problems in quantum many-body systems at finite temperatures. For concreteness, we consider a quantum spin-1/2 model on a 1-dimensional lattice with N (even) sites, and the Hamiltonian involves only interactions between spin connected by nearest-neighbor bonds. The XXZ model with a Zeeman field B H = J i s x i s x i+1 + s y i s y i+1 + λs z i s z i+1 B i s z i (1) breaks the SU (2) symmetry of the Heisenberg model down to the U (1) group, corresponding to rotation of every spin around the z-axis. Here the in-plane coupling J is used as the natural energy units, and λ = J z /J is the anisotropy factor. Periodic boundary condition is assumed. The spin operators s α i (α = x, y, z) sat- isfy the commutation relations, [s α i ,s β j ]=iδ ij αβγ s γ . The U (1) symmetry means the z-component of the total spin, S z = s z i , is conserved. Following the checkerboard de- composition, H can be divided into two separately com- muting parts, involving even and odd bonds only. Thus, upon discretization of imaginary time axis into 2M slices, the partition function is 3,5,8 Z = tr e βH = c πc w(π c ), (2) where c = {s(m, n)|m =0, ··· , 2M 1,n =0, ··· ,N 1} is the spin configuration on the spacetime lattice and w(π c ) is the plaquette weight to be discussed shortly. As a result of the checkerboard decomposition, the spacetime is partitioned into a 2M × N checkerboard,

Upload: others

Post on 25-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Worldline algorithm by oracle-guided variational

Worldline algorithm by oracle-guided variational autoregressive network

Zhifang Shi1 Yuchuang Cao1 Qiangqiang Gu1 and Ji Feng1 2 3 lowast

1International Center for Quantum Materials School of Physics Peking University Beijing 100871 China2Collaborative Innovation Center of Quantum Matter Beijing 100871 China

3CAS Center for Excellence in Topological Quantum ComputationUniversity of Chinese Academy of Sciences Beijing 100190 China

(Dated December 29 2020)

The classical variational autoregressive network is extended to the Euclidean path integral repre-sentation of quantum partition function An essential challenge is adapting the sequential processof sample generation by an autoregressive network to a nonlocal constraint due to the periodicboundary condition in path integral An add-on oracle is devised for this purpose which accuratelyidentifies and stalls unviable configurations on the fly The oracle leads to rejection-free samplingconforming to the periodic boundary condition As a demonstration the oracle-guided autoregres-sive network is applied to obtain variational solutions of quantum spin chains at finite temperatureswith relatively large system sizes and numbers of time slicing which enable efficient computationof thermodynamic quantities

The Euclidean path integral1 representation of a quan-tum mechanical partition function creates a bridge be-tween quantum and classical statistical mechanics andhas engendered continued developments in the theory ofquantum many-body systems For a lattice HamiltonianH the imaginary time τ -axis [0 β] can be discretizedinto slices separated by ∆τ and the Boltzmann measurebecomes a product of eminus∆τH evaluated on the space-time lattice with a suitable boundary condition Thisreduces the quantum statistical problem to a classicalone in (d + 1) dimensions whose configuration can of-ten be represented conveniently by worldlines Numeri-cal techniques often referred to as worldline algorithmshave been developed to evaluate the partition functionbased mostly on the Monte Carlo method2ndash5 especiallypowerful for boson and quantum spin problems

Recently Wu et al proposed a variational approachbased on an autoregressive network to classical statisti-cal physics6 In this approach the distribution functionof a classical system is factored into a product of condi-tional probabilities that can be processed naturally andoptimized by an autoregressive network This variationalautoregressive network (VAN) approach to classical sta-tistical mechanics is free of Markov chains and conse-quently highly parallelizable Furthermore because itssample generation carries minimal correlation the VANmethod may avoid the critical slowing down VAN hasalso been adapted to represent wavefunctions of a many-body system to enable a variational approach for obtain-ing ground state wavefunctions7

In this paper we report an investigation of the VANapproach to quantum statistical problems For a quan-tum statistical problem at finite temperatures an essen-tial obstacle to adapting VAN to a path integral is theconstraint that the worldlines must be periodic in τ Thisnonlocal constraint is not easily expressible in the autore-gressive network which parses and predicts configura-tions consecutively on a spacetime lattice This problemis overcome by an add-on oracle which efficiently iden-tifies and stalls dead ends among intermediate configu-

rations in the sequential sample generation process Theoversight by the oracle ensures efficient sampling of thespacetime configurations which are then parsed and op-timized by the VAN to reach thermal equilibrium We il-lustrate our method using the XXZ quantum spin chainmodel It is found that thermodynamic quantities canbe computed efficiently and accurately for antiferromag-netic and XY -like regimes establishing a new techniquefor exploring problems in quantum many-body systemsat finite temperaturesFor concreteness we consider a quantum spin-12

model on a 1-dimensional lattice with N (even) sitesand the Hamiltonian involves only interactions betweenspin connected by nearest-neighbor bonds The XXZmodel with a Zeeman field B

H = J983131

i

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minusB

983131

i

szi (1)

breaks the SU(2) symmetry of the Heisenberg modeldown to the U(1) group corresponding to rotation ofevery spin around the z-axis Here the in-plane couplingJ is used as the natural energy units and λ = JzJis the anisotropy factor Periodic boundary conditionis assumed The spin operators sαi (α = x y z) sat-

isfy the commutation relations [sαi sβj ] = iδij983171

αβγsγ The

U(1) symmetry means the z-component of the total spinSz =

983123szi is conserved Following the checkerboard de-

composition H can be divided into two separately com-muting parts involving even and odd bonds only Thusupon discretization of imaginary time axis into 2M slicesthe partition function is358

Z = tr eminusβH =983131

c

983132

πc

w(πc) (2)

where c = s(mn)|m = 0 middot middot middot 2Mminus1 n = 0 middot middot middot Nminus1is the spin configuration on the spacetime lattice andw(πc) is the plaquette weight to be discussed shortlyAs a result of the checkerboard decomposition the

spacetime is partitioned into a 2M times N checkerboard

2

FIG 1 (a) The checkerboard representation of the quantumspin model The green lines connecting blue dots at τ = 0and β are loops The red line is an example of a worldlineviolating periodic boundary condition (b) The plaquettesarising in the XXZ model (c) A schematic of the oracle-guided VAN with M = 1 N = 2 h is a hidden layer and qis the last layer of neurons which produce qθ Each layer ofthe artificial network is an 2M times N array of neuron In theautoregressive architecture succeeding layers are densely butnot fully connected which evidently mirrors the structure ofthe right-hand side in Eq (3)

as shown in Fig 1(a) For each configuration c onecan connect spin-up sites with non-intersecting and pe-riodic paths or loops as illustrated in Fig 1(a) A loopstarts at τ = 0 and ends at τ = β on the same latticepoint and owing to the spin conservation can only travelbetween adjacent time slices either vertically or diago-nally across a dark square Each spacetime lattice pointthe loops visit takes spin up state whereas an unvisitedlattice point is in the down spin state A dark square dec-orated by the loop segments passing through along thevertical edges or the diagonals is referred to as a plaque-tte Because of the division of the H into even and oddbond terms and the U(1) symmetry only the six types ofplaquettes as shown in Fig 1(b) need to be evaluated8

Rewriting the partition function as Z =983123exp(minusβHcl) the spacetime action S[c] = βHcl

is leads to an energy function of the Gibbs measureHcl In special cases this classical system is exactlydescribed by the six-vertex ice-type model910 Viewingp[c] = eβ(FminusHcl) as a Gibbs distribution the jointprobability can be factored into a product of successiveconditional probabilities61112 We now implement thisfactorization on an autoregressive network as shown inFig 1(c)

qθ(c) =983132

mn

q(smn =uarr |cmn) (3)

where cmn = smprimenprime |mprimeN +nprime lt mN +n and q(s00 =uarr) = 12 The output of each sigmoidal neuron pknm isin[0 1] is the current predicted conditional probabilitysmn =uarr given the knowledge of spin states of all pre-

ceding sites A configuration c is generated with a totalof 2MtimesN passages of the network in a row-major orderIt should be remarked only on the first row (τ = 0) thespins are generated freely by the network For all subse-quent time slices the spin conservation on each plaquetteis coded into our VANThus the autoregressive network constitutes a surro-

gate model for sampling the spin configuration whichcan be used to compute the free energy

F [qθ] =1

β

983131

c

qθS[c] + qθ log qθ (4)

The best approximant qθ(c) to p(c) is obtained by mini-mizing the free energy F [qθ] with respect to the variationof network parameters θ in a standard batched stochas-tic minimization process12 This procedure formally de-scribes a variational approach the quantum statisticalproblemUnlike conventional loop algorithms in which space-

time lattice configuration is subject to periodic updatesin every Monte Carlo step2ndash5 a key obstacle to the VANfor the quantum statistical problem outlined above lieswith the fact that periodic boundary condition is a non-local constraint and consequently difficult to implementin a sequential process of sampling generation Thus anaive implementation inevitably leads to a large numberconfigurations violating the boundary condition whichwe call the dead ends and consequently has low samplingefficiency and staggering slow learning rate To overcomethis problem we introduce an add-on oracle13 to VANwhich is designed to predict and eliminate dead ends farbefore they occur which we now describe

FIG 2 (a)A single lightcone emanating from one end pointis shown The region subtended by the lightcone (cyan lines)is in the absolute past (AP) of ej and everywhere else isabsolutely separated (AS) (b) and (c) show a spacetime con-figuration with five loops with end points ej and points ojon a time slice at an intermediately step of VAN samplingThe dark regions are absolutely separated from all end points

At the beginning of a configuration generation the spinconfiguration of time slice τ = 0 is decided stochasti-cally by VAN giving a set of starting points of the loops

3

which will be returned to at β Call the set of end pointsei = (xj β) For an end point ei the left- and right-most paths leading up to it form a lightcone in the sensethat all points bounded between this pair of paths canfind a way to end at ei and are hence in the absolute pastwith respect to ei and all other points are absolutely sep-arated from ei We employ a width-first greedy solver14

to construct lightcones for every ej with which we canassess the viability of an intermediate configurationThe oversight of VAN sampling by the oracle kicks in

only for the last ms = min(N2 + 1M2) steps insteadof at the very beginning of a sample generation Thisis ensured by a lemma a set of legit paths can alwaysbe found connecting k pairs of grid points located sep-arately on two time slices N2 + 1 apart The proof iselementary but lengthy so we leave it to SupplementaryMaterials8 Every time the spins on a time slice are cre-ated by VAN the oracle is queried for the viability of thecurrent configuration Shown in Fig 2(b) is a viable in-termediate configuration where every intermediate pointoj can be uniquely assigned to one of the lightcones ema-nating from ej rsquos The situation of Fig 2(c) is consideredunviable for two reasons o3 is absolutely separated fromall end points and the lightcones of e3 and e4 are emptyAn intermediate configuration deemed by the oracle todead-end will be rejected right awayThe oracle not only enables highly efficient rejection-

free sampling at an O(N) computational cost it alsoalleviates partly the the mode collapse problem com-monly plaguing generative adversarial networks1516 Asa comparison we also perform a sampling without theoracle where dead ends are allowed but with a largepenalty to the loss function As shown in Fig 3(a)for a 16-site antiferromagnetic (J gt 0) antiferromagneticchain (β|J | = 1) the penalty approach shows step-wiseplateauing in the variational process whereas the oracle-guided VAN shows much faster convergence toward theexact free energy This is because in rejection-free sam-pling the network can be better focused on a special-ized learning goal whereas in the penalty method theearly training is biased by prioritizing eliminating thedead ends by generating ferromagnetic configurationsHowever it is also observed that the oracle alone does

not eliminate mode collapse during the variational pro-cess completely Therefore we also introduce a actionrescaling

Srescaled = Smin + (1minus repoch)(S minus Smin) (5)

in which 0 lt r lt 1 is the strength of rescaling Smin

is the smallest action value within the current epochThis rescaling narrows the distribution of reward sig-nal initially This evens out the distribution of initialVAN predictions and keeps the samples from being at-tracted to locally plausible configurations As shown inFig 3(b) for a 32-site Antiferromagnetic chain with lit-tle action rescaling (r = 09) mode collapse shows upin the oracle-guided VAN solution But as soon as weincrease r mode collapse disappears We also remark

that large r value leads to slower early descent of the lossfunction but better approach to the exact free energylater in the minimization

FIG 3 (a) The VAN solution of a 16-site Heisenberg chainusing oracle or energy penalty for eliminating dead ends Thedifference between computed and exact free energies is plottedas a function of epoch (b) The oracle-guided VAN solution ofa 32-site Heisenberg model with different r values for actionrescaling βJ = 1 and M = 8 in both (a) and (b)

Now we apply the oracle-guided fully visible sigmoidalbelief network11 to examine the thermodynamics of thequantum XXZ spin chain in a few regimes namely theantiferromagnetic (λ = 1) and XY (λ = 18) limits overa range of temperatures Apart from free energy as givenin Eq (4) we also compute two of the most importantexperimentally accessible quantities heat capacity andmagnetic susceptibility The specific heat per spin is de-fined as c = Nminus1(partEpartT )N where E = partSpartβ and overbar denotes thermal averaging by VAN Therefore thespecific heat is computed as the following average

c =kBβ

2

N

983059E2 minus E2 minus C

983060 (6)

where C = part2Spartβ2 Analytical expressions for E and Ccan be easily derived8 and sampled using the VAN Thesusceptibility is computed by sampling the total magneticmoments mz = Nminus1

983123szi as

χ = minus983061part2f

partB2

983062

TN

= β983059m2

z minusmz2983060 (7)

The same quantities are also computed with stochasticseries expansion (SSE)1718 and exact diagonalization8

for calibrationsShown in Fig 4(a) are the free energies of antiferro-

magnetic and XY -like chains with 32 64 and 96 sites(J gt 0) The number of time slices is M = 8 exceptfor N = 32 and β ge 2 where M = 16 is used We seethat for the antiferromagnet the VAN-computed free en-ergies agree with the SSE results reasonably well onlyshowing slight overestimation at elevated temperaturesIt is also seen that the exact diagonalization results forshorter chains agree very well with SSE results regard-less of chain lengths showing little finite size effect Forthe XY -like chains we find that the free energies all fall

4

FIG 4 Various thermodynamic quantities computed usingoracle-guided VAN in comparison with SSE and exact diago-nalization (ED) results (a) Free energies of antiferromagneticand XY -like (marked in legends) chains with M = 8 or 16in VAN (see text) (b) Heat capacities and (c) susceptibilitiesof Heisenberg and XY chains

very close to that from exact diagonalization of a 14-site chain Since in this regime the finite size effect isseen to be minimal the computed free energies for theXY chains are also accurate We would like to remarkthat even though VAN produces accurate free energiesin the temperature range studied here the performancefor the antiferromagnetic Ising limit is not satisfactorylikely due to very slow unbolting from localized modesexaggerating the occurrence of local excitations19

The computed susceptibilities shown in Fig 4(b)again display little dependences on system sizes andboth show maxima at finite temperatures For the an-tiferromagnetic case where SSE solutions are availablethe VAN-computed susceptibilities agree well with theMonte Carlo results and exact results For the XY-likechains the computed susceptibilities are considerablylarger than the antiferromagnetic chains as expected dueto the floppiness of spins in this regime In both casesthe susceptibilities for the long chain (N = 96) showslightly larger deviations from references The computedheat capacities are shown in Fig 4(c) which are in gen-eral agreement with the reference values offered and showthe maxima at finite temperatures but have somewhatlarger deviation than susceptibilities shown in (b)In summary we have constructed a generative model

base on autoregressive networks to efficiently and accu-rately represent the partition function of quantum spinsystems which is subsequently solved variationally Anoracle accompanying this VAN approach predicts andforestalls dead ends as soon as they occur in sample gen-eration and enables rejection-free sampling satisfying therequisite boundary condition Clearly this approach toquantum statistical problem allows for highly paralleliz-able solutions of quantum many-body systems at finitetemperatures An immediate future direction is to extendthis approach to problems at higher dimensions as wellsas quantum many-body problems of other types suchas Bose-Hubbard models If extending this approach toFermionic systems the VAN offer an unforeseen avenuefor exploring methods for alleviating the Fermion signproblem

ACKNOWLEDGMENTS

This work is supported by the National Natural Sci-ence Foundation of China (Grant No 11725415 andNo 11934001) the Ministry of Science and Technol-ogy of China (Grant No 2018YFA0305601 and No2016YFA0301004) and by the Strategic Priority Re-search Program of Chinese Academy of Science (GrantNo XDB28000000)

lowast jfeng11pkueducn1 R P Feynman Phys Rev 91 1291 (1953)2 M Suzuki Prog Theor Phys 56 1454 (1976)3 M Suzuki S Miyashita and A Kuroda Prog TheorPhys 58 1377 (1977)

4 J E Hirsch R L Sugar D J Scalapino and R Blanken-becler Phys Rev B 26 5033 (1982)

5 H G Evertz Adv Phys 52 1 (2003)6 D Wu L Wang and P Zhang Phys Rev Lett 122080602 (2019)

7 O Sharir Y Levine N Wies G Carleo and A ShashuaPhys Rev Lett 124 020503 (2020)

8 Supplementary materials9 E H Lieb Phys Rev 162 162 (1967)

10 B Sutherland Phys Rev Lett 19 103 (1967)11 B J Frey Graphical models for machine learning and dig-

ital communication (MIT press Cambridge 1998)12 I Goodfellow Y Bengio and A Courville Deep Learning

(MIT Press Cambridge 2016)13 In the context of computational complexity theory an or-

acle is a black box attached to a Turing machine whichsolves a certain decision problem

14 T Cormen C E Leiserson R L Rivest andC Stein Introduction to Algorithms (MIT press Cam-

5

bridge 2009)15 I Goodfellow J Pouget-Abadie M Mirza B Xu

D Warde-Farley S Ozair A Courville and Y BengioAdv Neural Inform Process Syst 3 2672 (2014)

16 T Che Y L A Jacob Y Bengio and W Li in ICLR(2017)

17 A W Sandvik Phys Rev B 59 R14157 (1999)18 F Alet S Wessel and M Troyer Phys Rev E 71 036706

(2005)19 Z Shi Y Cao Q Gu and J Feng In preparation

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 2: Worldline algorithm by oracle-guided variational

2

FIG 1 (a) The checkerboard representation of the quantumspin model The green lines connecting blue dots at τ = 0and β are loops The red line is an example of a worldlineviolating periodic boundary condition (b) The plaquettesarising in the XXZ model (c) A schematic of the oracle-guided VAN with M = 1 N = 2 h is a hidden layer and qis the last layer of neurons which produce qθ Each layer ofthe artificial network is an 2M times N array of neuron In theautoregressive architecture succeeding layers are densely butnot fully connected which evidently mirrors the structure ofthe right-hand side in Eq (3)

as shown in Fig 1(a) For each configuration c onecan connect spin-up sites with non-intersecting and pe-riodic paths or loops as illustrated in Fig 1(a) A loopstarts at τ = 0 and ends at τ = β on the same latticepoint and owing to the spin conservation can only travelbetween adjacent time slices either vertically or diago-nally across a dark square Each spacetime lattice pointthe loops visit takes spin up state whereas an unvisitedlattice point is in the down spin state A dark square dec-orated by the loop segments passing through along thevertical edges or the diagonals is referred to as a plaque-tte Because of the division of the H into even and oddbond terms and the U(1) symmetry only the six types ofplaquettes as shown in Fig 1(b) need to be evaluated8

Rewriting the partition function as Z =983123exp(minusβHcl) the spacetime action S[c] = βHcl

is leads to an energy function of the Gibbs measureHcl In special cases this classical system is exactlydescribed by the six-vertex ice-type model910 Viewingp[c] = eβ(FminusHcl) as a Gibbs distribution the jointprobability can be factored into a product of successiveconditional probabilities61112 We now implement thisfactorization on an autoregressive network as shown inFig 1(c)

qθ(c) =983132

mn

q(smn =uarr |cmn) (3)

where cmn = smprimenprime |mprimeN +nprime lt mN +n and q(s00 =uarr) = 12 The output of each sigmoidal neuron pknm isin[0 1] is the current predicted conditional probabilitysmn =uarr given the knowledge of spin states of all pre-

ceding sites A configuration c is generated with a totalof 2MtimesN passages of the network in a row-major orderIt should be remarked only on the first row (τ = 0) thespins are generated freely by the network For all subse-quent time slices the spin conservation on each plaquetteis coded into our VANThus the autoregressive network constitutes a surro-

gate model for sampling the spin configuration whichcan be used to compute the free energy

F [qθ] =1

β

983131

c

qθS[c] + qθ log qθ (4)

The best approximant qθ(c) to p(c) is obtained by mini-mizing the free energy F [qθ] with respect to the variationof network parameters θ in a standard batched stochas-tic minimization process12 This procedure formally de-scribes a variational approach the quantum statisticalproblemUnlike conventional loop algorithms in which space-

time lattice configuration is subject to periodic updatesin every Monte Carlo step2ndash5 a key obstacle to the VANfor the quantum statistical problem outlined above lieswith the fact that periodic boundary condition is a non-local constraint and consequently difficult to implementin a sequential process of sampling generation Thus anaive implementation inevitably leads to a large numberconfigurations violating the boundary condition whichwe call the dead ends and consequently has low samplingefficiency and staggering slow learning rate To overcomethis problem we introduce an add-on oracle13 to VANwhich is designed to predict and eliminate dead ends farbefore they occur which we now describe

FIG 2 (a)A single lightcone emanating from one end pointis shown The region subtended by the lightcone (cyan lines)is in the absolute past (AP) of ej and everywhere else isabsolutely separated (AS) (b) and (c) show a spacetime con-figuration with five loops with end points ej and points ojon a time slice at an intermediately step of VAN samplingThe dark regions are absolutely separated from all end points

At the beginning of a configuration generation the spinconfiguration of time slice τ = 0 is decided stochasti-cally by VAN giving a set of starting points of the loops

3

which will be returned to at β Call the set of end pointsei = (xj β) For an end point ei the left- and right-most paths leading up to it form a lightcone in the sensethat all points bounded between this pair of paths canfind a way to end at ei and are hence in the absolute pastwith respect to ei and all other points are absolutely sep-arated from ei We employ a width-first greedy solver14

to construct lightcones for every ej with which we canassess the viability of an intermediate configurationThe oversight of VAN sampling by the oracle kicks in

only for the last ms = min(N2 + 1M2) steps insteadof at the very beginning of a sample generation Thisis ensured by a lemma a set of legit paths can alwaysbe found connecting k pairs of grid points located sep-arately on two time slices N2 + 1 apart The proof iselementary but lengthy so we leave it to SupplementaryMaterials8 Every time the spins on a time slice are cre-ated by VAN the oracle is queried for the viability of thecurrent configuration Shown in Fig 2(b) is a viable in-termediate configuration where every intermediate pointoj can be uniquely assigned to one of the lightcones ema-nating from ej rsquos The situation of Fig 2(c) is consideredunviable for two reasons o3 is absolutely separated fromall end points and the lightcones of e3 and e4 are emptyAn intermediate configuration deemed by the oracle todead-end will be rejected right awayThe oracle not only enables highly efficient rejection-

free sampling at an O(N) computational cost it alsoalleviates partly the the mode collapse problem com-monly plaguing generative adversarial networks1516 Asa comparison we also perform a sampling without theoracle where dead ends are allowed but with a largepenalty to the loss function As shown in Fig 3(a)for a 16-site antiferromagnetic (J gt 0) antiferromagneticchain (β|J | = 1) the penalty approach shows step-wiseplateauing in the variational process whereas the oracle-guided VAN shows much faster convergence toward theexact free energy This is because in rejection-free sam-pling the network can be better focused on a special-ized learning goal whereas in the penalty method theearly training is biased by prioritizing eliminating thedead ends by generating ferromagnetic configurationsHowever it is also observed that the oracle alone does

not eliminate mode collapse during the variational pro-cess completely Therefore we also introduce a actionrescaling

Srescaled = Smin + (1minus repoch)(S minus Smin) (5)

in which 0 lt r lt 1 is the strength of rescaling Smin

is the smallest action value within the current epochThis rescaling narrows the distribution of reward sig-nal initially This evens out the distribution of initialVAN predictions and keeps the samples from being at-tracted to locally plausible configurations As shown inFig 3(b) for a 32-site Antiferromagnetic chain with lit-tle action rescaling (r = 09) mode collapse shows upin the oracle-guided VAN solution But as soon as weincrease r mode collapse disappears We also remark

that large r value leads to slower early descent of the lossfunction but better approach to the exact free energylater in the minimization

FIG 3 (a) The VAN solution of a 16-site Heisenberg chainusing oracle or energy penalty for eliminating dead ends Thedifference between computed and exact free energies is plottedas a function of epoch (b) The oracle-guided VAN solution ofa 32-site Heisenberg model with different r values for actionrescaling βJ = 1 and M = 8 in both (a) and (b)

Now we apply the oracle-guided fully visible sigmoidalbelief network11 to examine the thermodynamics of thequantum XXZ spin chain in a few regimes namely theantiferromagnetic (λ = 1) and XY (λ = 18) limits overa range of temperatures Apart from free energy as givenin Eq (4) we also compute two of the most importantexperimentally accessible quantities heat capacity andmagnetic susceptibility The specific heat per spin is de-fined as c = Nminus1(partEpartT )N where E = partSpartβ and overbar denotes thermal averaging by VAN Therefore thespecific heat is computed as the following average

c =kBβ

2

N

983059E2 minus E2 minus C

983060 (6)

where C = part2Spartβ2 Analytical expressions for E and Ccan be easily derived8 and sampled using the VAN Thesusceptibility is computed by sampling the total magneticmoments mz = Nminus1

983123szi as

χ = minus983061part2f

partB2

983062

TN

= β983059m2

z minusmz2983060 (7)

The same quantities are also computed with stochasticseries expansion (SSE)1718 and exact diagonalization8

for calibrationsShown in Fig 4(a) are the free energies of antiferro-

magnetic and XY -like chains with 32 64 and 96 sites(J gt 0) The number of time slices is M = 8 exceptfor N = 32 and β ge 2 where M = 16 is used We seethat for the antiferromagnet the VAN-computed free en-ergies agree with the SSE results reasonably well onlyshowing slight overestimation at elevated temperaturesIt is also seen that the exact diagonalization results forshorter chains agree very well with SSE results regard-less of chain lengths showing little finite size effect Forthe XY -like chains we find that the free energies all fall

4

FIG 4 Various thermodynamic quantities computed usingoracle-guided VAN in comparison with SSE and exact diago-nalization (ED) results (a) Free energies of antiferromagneticand XY -like (marked in legends) chains with M = 8 or 16in VAN (see text) (b) Heat capacities and (c) susceptibilitiesof Heisenberg and XY chains

very close to that from exact diagonalization of a 14-site chain Since in this regime the finite size effect isseen to be minimal the computed free energies for theXY chains are also accurate We would like to remarkthat even though VAN produces accurate free energiesin the temperature range studied here the performancefor the antiferromagnetic Ising limit is not satisfactorylikely due to very slow unbolting from localized modesexaggerating the occurrence of local excitations19

The computed susceptibilities shown in Fig 4(b)again display little dependences on system sizes andboth show maxima at finite temperatures For the an-tiferromagnetic case where SSE solutions are availablethe VAN-computed susceptibilities agree well with theMonte Carlo results and exact results For the XY-likechains the computed susceptibilities are considerablylarger than the antiferromagnetic chains as expected dueto the floppiness of spins in this regime In both casesthe susceptibilities for the long chain (N = 96) showslightly larger deviations from references The computedheat capacities are shown in Fig 4(c) which are in gen-eral agreement with the reference values offered and showthe maxima at finite temperatures but have somewhatlarger deviation than susceptibilities shown in (b)In summary we have constructed a generative model

base on autoregressive networks to efficiently and accu-rately represent the partition function of quantum spinsystems which is subsequently solved variationally Anoracle accompanying this VAN approach predicts andforestalls dead ends as soon as they occur in sample gen-eration and enables rejection-free sampling satisfying therequisite boundary condition Clearly this approach toquantum statistical problem allows for highly paralleliz-able solutions of quantum many-body systems at finitetemperatures An immediate future direction is to extendthis approach to problems at higher dimensions as wellsas quantum many-body problems of other types suchas Bose-Hubbard models If extending this approach toFermionic systems the VAN offer an unforeseen avenuefor exploring methods for alleviating the Fermion signproblem

ACKNOWLEDGMENTS

This work is supported by the National Natural Sci-ence Foundation of China (Grant No 11725415 andNo 11934001) the Ministry of Science and Technol-ogy of China (Grant No 2018YFA0305601 and No2016YFA0301004) and by the Strategic Priority Re-search Program of Chinese Academy of Science (GrantNo XDB28000000)

lowast jfeng11pkueducn1 R P Feynman Phys Rev 91 1291 (1953)2 M Suzuki Prog Theor Phys 56 1454 (1976)3 M Suzuki S Miyashita and A Kuroda Prog TheorPhys 58 1377 (1977)

4 J E Hirsch R L Sugar D J Scalapino and R Blanken-becler Phys Rev B 26 5033 (1982)

5 H G Evertz Adv Phys 52 1 (2003)6 D Wu L Wang and P Zhang Phys Rev Lett 122080602 (2019)

7 O Sharir Y Levine N Wies G Carleo and A ShashuaPhys Rev Lett 124 020503 (2020)

8 Supplementary materials9 E H Lieb Phys Rev 162 162 (1967)

10 B Sutherland Phys Rev Lett 19 103 (1967)11 B J Frey Graphical models for machine learning and dig-

ital communication (MIT press Cambridge 1998)12 I Goodfellow Y Bengio and A Courville Deep Learning

(MIT Press Cambridge 2016)13 In the context of computational complexity theory an or-

acle is a black box attached to a Turing machine whichsolves a certain decision problem

14 T Cormen C E Leiserson R L Rivest andC Stein Introduction to Algorithms (MIT press Cam-

5

bridge 2009)15 I Goodfellow J Pouget-Abadie M Mirza B Xu

D Warde-Farley S Ozair A Courville and Y BengioAdv Neural Inform Process Syst 3 2672 (2014)

16 T Che Y L A Jacob Y Bengio and W Li in ICLR(2017)

17 A W Sandvik Phys Rev B 59 R14157 (1999)18 F Alet S Wessel and M Troyer Phys Rev E 71 036706

(2005)19 Z Shi Y Cao Q Gu and J Feng In preparation

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 3: Worldline algorithm by oracle-guided variational

3

which will be returned to at β Call the set of end pointsei = (xj β) For an end point ei the left- and right-most paths leading up to it form a lightcone in the sensethat all points bounded between this pair of paths canfind a way to end at ei and are hence in the absolute pastwith respect to ei and all other points are absolutely sep-arated from ei We employ a width-first greedy solver14

to construct lightcones for every ej with which we canassess the viability of an intermediate configurationThe oversight of VAN sampling by the oracle kicks in

only for the last ms = min(N2 + 1M2) steps insteadof at the very beginning of a sample generation Thisis ensured by a lemma a set of legit paths can alwaysbe found connecting k pairs of grid points located sep-arately on two time slices N2 + 1 apart The proof iselementary but lengthy so we leave it to SupplementaryMaterials8 Every time the spins on a time slice are cre-ated by VAN the oracle is queried for the viability of thecurrent configuration Shown in Fig 2(b) is a viable in-termediate configuration where every intermediate pointoj can be uniquely assigned to one of the lightcones ema-nating from ej rsquos The situation of Fig 2(c) is consideredunviable for two reasons o3 is absolutely separated fromall end points and the lightcones of e3 and e4 are emptyAn intermediate configuration deemed by the oracle todead-end will be rejected right awayThe oracle not only enables highly efficient rejection-

free sampling at an O(N) computational cost it alsoalleviates partly the the mode collapse problem com-monly plaguing generative adversarial networks1516 Asa comparison we also perform a sampling without theoracle where dead ends are allowed but with a largepenalty to the loss function As shown in Fig 3(a)for a 16-site antiferromagnetic (J gt 0) antiferromagneticchain (β|J | = 1) the penalty approach shows step-wiseplateauing in the variational process whereas the oracle-guided VAN shows much faster convergence toward theexact free energy This is because in rejection-free sam-pling the network can be better focused on a special-ized learning goal whereas in the penalty method theearly training is biased by prioritizing eliminating thedead ends by generating ferromagnetic configurationsHowever it is also observed that the oracle alone does

not eliminate mode collapse during the variational pro-cess completely Therefore we also introduce a actionrescaling

Srescaled = Smin + (1minus repoch)(S minus Smin) (5)

in which 0 lt r lt 1 is the strength of rescaling Smin

is the smallest action value within the current epochThis rescaling narrows the distribution of reward sig-nal initially This evens out the distribution of initialVAN predictions and keeps the samples from being at-tracted to locally plausible configurations As shown inFig 3(b) for a 32-site Antiferromagnetic chain with lit-tle action rescaling (r = 09) mode collapse shows upin the oracle-guided VAN solution But as soon as weincrease r mode collapse disappears We also remark

that large r value leads to slower early descent of the lossfunction but better approach to the exact free energylater in the minimization

FIG 3 (a) The VAN solution of a 16-site Heisenberg chainusing oracle or energy penalty for eliminating dead ends Thedifference between computed and exact free energies is plottedas a function of epoch (b) The oracle-guided VAN solution ofa 32-site Heisenberg model with different r values for actionrescaling βJ = 1 and M = 8 in both (a) and (b)

Now we apply the oracle-guided fully visible sigmoidalbelief network11 to examine the thermodynamics of thequantum XXZ spin chain in a few regimes namely theantiferromagnetic (λ = 1) and XY (λ = 18) limits overa range of temperatures Apart from free energy as givenin Eq (4) we also compute two of the most importantexperimentally accessible quantities heat capacity andmagnetic susceptibility The specific heat per spin is de-fined as c = Nminus1(partEpartT )N where E = partSpartβ and overbar denotes thermal averaging by VAN Therefore thespecific heat is computed as the following average

c =kBβ

2

N

983059E2 minus E2 minus C

983060 (6)

where C = part2Spartβ2 Analytical expressions for E and Ccan be easily derived8 and sampled using the VAN Thesusceptibility is computed by sampling the total magneticmoments mz = Nminus1

983123szi as

χ = minus983061part2f

partB2

983062

TN

= β983059m2

z minusmz2983060 (7)

The same quantities are also computed with stochasticseries expansion (SSE)1718 and exact diagonalization8

for calibrationsShown in Fig 4(a) are the free energies of antiferro-

magnetic and XY -like chains with 32 64 and 96 sites(J gt 0) The number of time slices is M = 8 exceptfor N = 32 and β ge 2 where M = 16 is used We seethat for the antiferromagnet the VAN-computed free en-ergies agree with the SSE results reasonably well onlyshowing slight overestimation at elevated temperaturesIt is also seen that the exact diagonalization results forshorter chains agree very well with SSE results regard-less of chain lengths showing little finite size effect Forthe XY -like chains we find that the free energies all fall

4

FIG 4 Various thermodynamic quantities computed usingoracle-guided VAN in comparison with SSE and exact diago-nalization (ED) results (a) Free energies of antiferromagneticand XY -like (marked in legends) chains with M = 8 or 16in VAN (see text) (b) Heat capacities and (c) susceptibilitiesof Heisenberg and XY chains

very close to that from exact diagonalization of a 14-site chain Since in this regime the finite size effect isseen to be minimal the computed free energies for theXY chains are also accurate We would like to remarkthat even though VAN produces accurate free energiesin the temperature range studied here the performancefor the antiferromagnetic Ising limit is not satisfactorylikely due to very slow unbolting from localized modesexaggerating the occurrence of local excitations19

The computed susceptibilities shown in Fig 4(b)again display little dependences on system sizes andboth show maxima at finite temperatures For the an-tiferromagnetic case where SSE solutions are availablethe VAN-computed susceptibilities agree well with theMonte Carlo results and exact results For the XY-likechains the computed susceptibilities are considerablylarger than the antiferromagnetic chains as expected dueto the floppiness of spins in this regime In both casesthe susceptibilities for the long chain (N = 96) showslightly larger deviations from references The computedheat capacities are shown in Fig 4(c) which are in gen-eral agreement with the reference values offered and showthe maxima at finite temperatures but have somewhatlarger deviation than susceptibilities shown in (b)In summary we have constructed a generative model

base on autoregressive networks to efficiently and accu-rately represent the partition function of quantum spinsystems which is subsequently solved variationally Anoracle accompanying this VAN approach predicts andforestalls dead ends as soon as they occur in sample gen-eration and enables rejection-free sampling satisfying therequisite boundary condition Clearly this approach toquantum statistical problem allows for highly paralleliz-able solutions of quantum many-body systems at finitetemperatures An immediate future direction is to extendthis approach to problems at higher dimensions as wellsas quantum many-body problems of other types suchas Bose-Hubbard models If extending this approach toFermionic systems the VAN offer an unforeseen avenuefor exploring methods for alleviating the Fermion signproblem

ACKNOWLEDGMENTS

This work is supported by the National Natural Sci-ence Foundation of China (Grant No 11725415 andNo 11934001) the Ministry of Science and Technol-ogy of China (Grant No 2018YFA0305601 and No2016YFA0301004) and by the Strategic Priority Re-search Program of Chinese Academy of Science (GrantNo XDB28000000)

lowast jfeng11pkueducn1 R P Feynman Phys Rev 91 1291 (1953)2 M Suzuki Prog Theor Phys 56 1454 (1976)3 M Suzuki S Miyashita and A Kuroda Prog TheorPhys 58 1377 (1977)

4 J E Hirsch R L Sugar D J Scalapino and R Blanken-becler Phys Rev B 26 5033 (1982)

5 H G Evertz Adv Phys 52 1 (2003)6 D Wu L Wang and P Zhang Phys Rev Lett 122080602 (2019)

7 O Sharir Y Levine N Wies G Carleo and A ShashuaPhys Rev Lett 124 020503 (2020)

8 Supplementary materials9 E H Lieb Phys Rev 162 162 (1967)

10 B Sutherland Phys Rev Lett 19 103 (1967)11 B J Frey Graphical models for machine learning and dig-

ital communication (MIT press Cambridge 1998)12 I Goodfellow Y Bengio and A Courville Deep Learning

(MIT Press Cambridge 2016)13 In the context of computational complexity theory an or-

acle is a black box attached to a Turing machine whichsolves a certain decision problem

14 T Cormen C E Leiserson R L Rivest andC Stein Introduction to Algorithms (MIT press Cam-

5

bridge 2009)15 I Goodfellow J Pouget-Abadie M Mirza B Xu

D Warde-Farley S Ozair A Courville and Y BengioAdv Neural Inform Process Syst 3 2672 (2014)

16 T Che Y L A Jacob Y Bengio and W Li in ICLR(2017)

17 A W Sandvik Phys Rev B 59 R14157 (1999)18 F Alet S Wessel and M Troyer Phys Rev E 71 036706

(2005)19 Z Shi Y Cao Q Gu and J Feng In preparation

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 4: Worldline algorithm by oracle-guided variational

4

FIG 4 Various thermodynamic quantities computed usingoracle-guided VAN in comparison with SSE and exact diago-nalization (ED) results (a) Free energies of antiferromagneticand XY -like (marked in legends) chains with M = 8 or 16in VAN (see text) (b) Heat capacities and (c) susceptibilitiesof Heisenberg and XY chains

very close to that from exact diagonalization of a 14-site chain Since in this regime the finite size effect isseen to be minimal the computed free energies for theXY chains are also accurate We would like to remarkthat even though VAN produces accurate free energiesin the temperature range studied here the performancefor the antiferromagnetic Ising limit is not satisfactorylikely due to very slow unbolting from localized modesexaggerating the occurrence of local excitations19

The computed susceptibilities shown in Fig 4(b)again display little dependences on system sizes andboth show maxima at finite temperatures For the an-tiferromagnetic case where SSE solutions are availablethe VAN-computed susceptibilities agree well with theMonte Carlo results and exact results For the XY-likechains the computed susceptibilities are considerablylarger than the antiferromagnetic chains as expected dueto the floppiness of spins in this regime In both casesthe susceptibilities for the long chain (N = 96) showslightly larger deviations from references The computedheat capacities are shown in Fig 4(c) which are in gen-eral agreement with the reference values offered and showthe maxima at finite temperatures but have somewhatlarger deviation than susceptibilities shown in (b)In summary we have constructed a generative model

base on autoregressive networks to efficiently and accu-rately represent the partition function of quantum spinsystems which is subsequently solved variationally Anoracle accompanying this VAN approach predicts andforestalls dead ends as soon as they occur in sample gen-eration and enables rejection-free sampling satisfying therequisite boundary condition Clearly this approach toquantum statistical problem allows for highly paralleliz-able solutions of quantum many-body systems at finitetemperatures An immediate future direction is to extendthis approach to problems at higher dimensions as wellsas quantum many-body problems of other types suchas Bose-Hubbard models If extending this approach toFermionic systems the VAN offer an unforeseen avenuefor exploring methods for alleviating the Fermion signproblem

ACKNOWLEDGMENTS

This work is supported by the National Natural Sci-ence Foundation of China (Grant No 11725415 andNo 11934001) the Ministry of Science and Technol-ogy of China (Grant No 2018YFA0305601 and No2016YFA0301004) and by the Strategic Priority Re-search Program of Chinese Academy of Science (GrantNo XDB28000000)

lowast jfeng11pkueducn1 R P Feynman Phys Rev 91 1291 (1953)2 M Suzuki Prog Theor Phys 56 1454 (1976)3 M Suzuki S Miyashita and A Kuroda Prog TheorPhys 58 1377 (1977)

4 J E Hirsch R L Sugar D J Scalapino and R Blanken-becler Phys Rev B 26 5033 (1982)

5 H G Evertz Adv Phys 52 1 (2003)6 D Wu L Wang and P Zhang Phys Rev Lett 122080602 (2019)

7 O Sharir Y Levine N Wies G Carleo and A ShashuaPhys Rev Lett 124 020503 (2020)

8 Supplementary materials9 E H Lieb Phys Rev 162 162 (1967)

10 B Sutherland Phys Rev Lett 19 103 (1967)11 B J Frey Graphical models for machine learning and dig-

ital communication (MIT press Cambridge 1998)12 I Goodfellow Y Bengio and A Courville Deep Learning

(MIT Press Cambridge 2016)13 In the context of computational complexity theory an or-

acle is a black box attached to a Turing machine whichsolves a certain decision problem

14 T Cormen C E Leiserson R L Rivest andC Stein Introduction to Algorithms (MIT press Cam-

5

bridge 2009)15 I Goodfellow J Pouget-Abadie M Mirza B Xu

D Warde-Farley S Ozair A Courville and Y BengioAdv Neural Inform Process Syst 3 2672 (2014)

16 T Che Y L A Jacob Y Bengio and W Li in ICLR(2017)

17 A W Sandvik Phys Rev B 59 R14157 (1999)18 F Alet S Wessel and M Troyer Phys Rev E 71 036706

(2005)19 Z Shi Y Cao Q Gu and J Feng In preparation

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 5: Worldline algorithm by oracle-guided variational

5

bridge 2009)15 I Goodfellow J Pouget-Abadie M Mirza B Xu

D Warde-Farley S Ozair A Courville and Y BengioAdv Neural Inform Process Syst 3 2672 (2014)

16 T Che Y L A Jacob Y Bengio and W Li in ICLR(2017)

17 A W Sandvik Phys Rev B 59 R14157 (1999)18 F Alet S Wessel and M Troyer Phys Rev E 71 036706

(2005)19 Z Shi Y Cao Q Gu and J Feng In preparation

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 6: Worldline algorithm by oracle-guided variational

6

SUPPLEMENTARY MATERIAL

S-1 Quantum spin chain and checkerboarddecomposition

In this section we summarize some of the main resultsconcerning the quantum spin models and its worldline al-gorithm solutions related to the discussions in the paperFor more details the readers are referred to the reviewarticle by Evertz5

The checkerboard decomposition means separating (1)into even-bond and odd-bond parts

Hevenodd =983131

i=evenodd

sxi sxi+1 + syi s

yi+1 + λszi s

zi+1 minus hszi

(S-1)with a magnetic field in z-direction included The parti-tion function Z = Tr eminusβH can be written as

Z = limMrarrinfin

Tr983147eminus∆τ(Heven+Hodd)

983148M (S-2)

Keeping M finite and inserting the complete set of basisthat diagonalizes sz we find

Z =983131

c

S[c] =983131

c

983132

π

w(π) (S-3)

where c is configuration on 2M timesN lattice and w(π) isthe weight function of plaquette π As defined in the pa-per a plaquette is a dark square decorated with loop(s)passing through (Fig 1(a)) There are six types of pla-quettes as shown in Fig 1(b) whose weights are givenby

w(π1) = eminus∆τ(λminus2h)4

w(π2) = eminus∆τ(λ+2h)4

w(π3) = w(π4) = e∆τλ4 cosh ∆τ2

w(π5) = w(π6) = e∆τλ4 sinh ∆τ2

S-2 WHEN TO TURN ON ORACLE

In this section we provide the proof of the lemma usedin the paper regarding the onset of unviable configura-tions when generating spin configuration on a spacetimelattice We will have to start with a few definitions be-fore a statement of the lemma is presented The prooffor the lemma is comprised of two propositions

Definition 1 A checkerboard is a pattern formed by di-viding a N timesH rectangle into N timesH unit squares whichare then colored white and dark alternately in both di-rections Periodic boundary condition is imposed in thehorizontal direction only so N is even

The direction along which there are N squares is calledhorizontal direction and the other is called vertical di-rection In view of the periodic boundary condition

the checkerboard is a finite cylindrical surface and canbe stereographically mapped onto an annulus of whichWLOG the inner circle is the bottom edge and the outercircle the upper edge

Definition 2 Corners of the N timesH squares are calledgrid points A BtimesH checkerboard has N times (H +1) gridpoints

The locations of a grid point is specified by an or-dered doublet (n h) n = 1 N h = 1 H + 1For the problem we will discuss there are k points lo-cated at distinct grid points x|x isin Z 1 le x le Non the bottom edge and k points on the top edge aty|y isin Z 1 le y le N

Definition 3 A path connects pair of grid points on thetop and bottom edges of the checker board and is as acollection of connected and directed line segments Everyline segment is upward directed starting and ending ongrid points and is either vertical or a diagonal of a darksquare

When H is small a path may be be found for a pair ofedge points The question is what is the lower bound ofH for a given N a path can be found for a pair of edgepoints What if there are more than one pair of pointsThe answer is Lemma 1

Lemma 1 When H ge N2+1 a set of paths can alwaysbe found connecting k le N arbitrary grid points on thelower edge to k arbitrary grid points on the top edge

The proof for Lemma 1 consists of two proposi-tiondefinitions

Propositiondefinition 1 On each of two concentriccircles lie k points N rays from the center can be drawnto partition the plane into N sections For an arbitraryN we can always find a partitioning where the two arcsin each section have identical numbers of points and theangle made by abutting rays are less than or equal to180

Proposition 1 can be demonstrated by a constructiveprocedure for achieving this kind of partitioning So longas k ge 1 there must a point such that if we draw a raythrough it and rotate the ray clockwise through an anglele 180 the first point the ray encounters lies on the othercircle We then draw rays through this pair of points to

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)

Page 7: Worldline algorithm by oracle-guided variational

7

create a section and erase this pair of points Repeatingthe above process k times we will have k sections eachcontaining a pair of points located on the two circles Toobtain the final partitioning we take unions of overlap-ping and neighboring sections with the constraint thatthe angle subtended by a final section is no greater than180As a consequence of Proposition 1 we can partition the

checkerboard into non-empty sections and in each sec-tion there are equal number of points on the top and bot-tom edges and the horizontal separations between pointsare no greater than N2 The partitioning resulted fromthe above algorithm is usually not unique and may notbe the simplest PropositionDefinition 1 merely assertsits existence In σth section we number the points onthe top and bottom edges separately as iσ = 1 nσ inincreasing order from left to right

Propositiondefinition 2 In a unidirectional sectionσ x(iσ) le y(iσ) for all iσ If its width +1 le H thenthe points on the top and bottom edges can be pair-wiseconnected by nonintersecting paths

Proof We first show that in a contiguous section σwhere x(iσ) are consecutive integers and so are y(iσ)(which then is unidirectional) pairwise paths can be con-structed WLOG we suppose ℓσ = y(1σ)minus x(1σ) ge 0In the first step a path connecting the last (nσth) pair

of points is constructed by the procedure starting at(x(nσ) 1) we draw a vertical line to the first grid point(x(nσ) hσ) upper-right to whom a dark square lies anddraw along the diagonals to the grid point beneath y(nσ)and then draw a vertical line to reach the top edge SinceH ge lσ + 1 this path can always be drawnSubsequently similar procedure is performed consecu-

tively for each pair to the left a vertical line is drawnfrom (x(nσ minus j) 1) to (x(nσ minus j) hσ + j) which thencontinues diagonally upward and to the right till beneathy(nσ minus j) and then continues upward The diagonal partof the path can be completed iff

H minus (hσ + nσ minus 1) ge ℓσ

Since ℓσ +nσ le H and hσ = 0 or 1 the inequality holds

For a unidirectional section that is not contiguouspaths can be constructed from those of an auxiliary con-tiguous section with the same numbers of points on thetop and bottom edges at x(jσ) and y(jσ) respec-tively where x(1) = x(1σ) y(nσ) = y(nσ) Now start-ing at (x(jσ) 1) a path will travel upward till it meetsauxiliary path jσ then travels along with the pathjσ until beneath y(jσ) and then continues vertically to(y(jσ) H+1) This is always possible since x(jσ) le x(jσ)and y(jσ) ge y(jσ)

Obviously each non-unidirectional section can alwaysbe further divided into unidirectional sections The prooffor Lemma 1 is completed Incidentally since the pathsconstructed in our proof always travel from the startingpoint on the lower edge toward the end point in the hor-izontal direction we come to a corollary

Corollary 1 If H ge N2+1 at least one greedy solutionexists

S-3 Exact diagonalization

We exactly diagonalize the many-body Hamiltonian ofshort spin chains to provide a reference for the resultsfrom VAN The many-body Hamiltonian is constructedon a basis of spin configurations |sz〉 = Πj |szj 〉 Rewrit-ing Eq (1) as

H =983131

j

λszjszj+1 +

1

2(s+j s

minusj+1 +Hc) (S-4)

where splusmnj = sxj plusmn isyj and B = 0For example the basis of a two-spin system can be

chosen as |darrdarr〉 |uarrdarr〉 |darruarr〉 |uarruarr〉and the hamiltonian ma-trix will be

HN=2 = J

983091

983109983107

λ4 0 0 00 minusλ4 12 00 12 minusλ4 00 0 0 λ4

983092

983110983108 (S-5)

We can see that in (S-5) the Hamiltonian matrix isblock diagonal which is a natural result from the conser-vation of total spins In general the N-spins hamiltonianhas N+1 blocks HNnuarr where nuarr = 0 1 middot middot middot N is the

number of up-spins Computing the eigenvalues E(i)Nnuarr

of HNnuarr using numpy offered in standard python where

i = 1 2 middot middot middot N nuarr(Nminusnuarr)

we obtain the partition functionas

Z = Tr[eminusβHN ] =983131

nuarr

983131

i

eminusβE

(i)Nnuarr (S-6)