hydrodynamics - arxivhydrodynamics long-gang pang 1 ;2 34, hannah petersen 5 6, xin-nian wang 1key...

Pseudorapidity distribution and decorrelation of anisotropic flow within CLVischydrodynamics

Long-Gang Pang1,2,3,4, Hannah Petersen4,5,6, Xin-Nian Wang1,2,3

1Key Laboratory of Quark & Lepton Physics (MOE) and Institute of Particle Physics,Central China Normal University, Wuhan 430079, China

2Physics Department, University of California, Berkeley, CA 94720, USA3Nuclear Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

4Frankfurt Institute for Advanced Studies, Ruth-Moufang-Strasse 1, 60438 Frankfurt am Main, Germany5Institute for Theoretical Physics, Goethe University,

Max-von-Laue-Strasse 1, 60438 Frankfurt am Main, Germany and6GSI Helmholtzzentrum für Schwerionenforschung, Planckstr. 1, 64291 Darmstadt, Germany

Studies of fluctuations and correlations of soft hadrons and hard and electromagnetic probesof the dense and strongly interacting medium require event-by-event hydrodynamic simulations ofhigh-energy heavy-ion collisions that are computing intensive. We develop a (3+1)D viscous hy-drodynamic model – CLVisc that is parallelized on Graphics Processing Unit (GPU) using OpenComputing Language (OpenCL) with 60 times performance increase for space-time evolution andmore than 120 times for the Cooper-Frye particlization relative to that without GPU parallelization.The model is validated with comparisons with different analytic solutions, other existing numericalsolutions of hydrodynamics and experimental data on hadron spectra in high-energy heavy-ion colli-sions. The pseudo-rapidity dependence of anisotropic flow vn(η) are then computed in CLVisc withinitial conditions given by the A Multi-Phase Transport (AMPT) model, with energy density fluc-tuations both in the transverse plane and along the longitudinal direction. Although the magnitudeof vn(η) and the ratios between v2(η) and v3(η) are sensitive to the effective shear viscosity overentropy density ratio ηv/s, the shape of the vn(η) distributions in η do not depend on the value ofηv/s. The decorrelation of vn along the pseudo-rapidity direction due to the twist and fluctuationof the event-planes in the initial parton density distributions is also studied. The decorrelationobservable rn(ηa, ηb) between vn−ηa and vnηa with the auxiliary reference window ηb is foundnot sensitive to ηv/s when there is no initial fluid velocity. For small ηv/s, the initial fluid velocityfrom mini-jet partons introduces sizable splitting of rn(ηa, ηb) between the two reference rapiditywindows ηb ∈ [3, 4] and ηb ∈ [4.4, 5.0], as has been observed in experiment. The implementation ofCLVisc and guidelines on how to efficiently parallelize scientific programs on GPUs are also provided.

PACS numbers: 12.38.Mh,25.75.Ld,25.75.Gz

I. INTRODUCTION

Heavy-ion collisions at the Relativistic Heavy-Ion Col-lider (RHIC) and Large Hadron Collider (LHC) createstrongly coupled QCD matter that exhibits multiple ex-treme properties. It is the hottest – temperature reachingmore than 100,000 times that at the core of the Sun, mostvortical – angular momentum on the order of 103− 105h[1] and almost perfect fluid – very low shear viscosityover entropy density ratio [2–4], that is exposed to thestrongest magnetic field (|B| = 5 ∼ 10 m2

π) [5] ever pro-duced in laboratory. This strongly coupled QCD matteris believed to share some of the properties of the quark-gluon-plasma epoch in the early universe.

Numerical simulations of the dynamical evolution ofthis strongly coupled QCD matter and comparisons withexperimental data are vital to extract the physical prop-erties of the strong interaction matter. Relativistic vis-cous hydrodynamics is the most successful effective the-ory in describing the space-time evolution of QCD mattercreated in high-energy heavy-ion collisions [6, 7]. Hybridapproaches that comprise hydrodynamics and hadronictransport agree with experimental data on various ob-servables such as charged multiplicity, transverse momen-

tum spectra and transverse momentum pT -differentialelliptic flow of identical particles [8] (and referencestherein). Event-by-event simulations with energy densityfluctuations [9–18] in the initial states are indispensableto describe not only the ensemble average of odd-orderharmonic flows but also their probability distributions[19]. New observables such as the correlation between dif-ferent event plane angles [20–23], different harmonic flows[24] and pT -differential harmonic flows [25] can providemore rigorous constraints on medium properties such asthe shear viscosity to entropy density ratio, but also re-quire efficient algorithms to reach sufficient statistics in areasonable amount of CPU time. Furthermore, (3+1)Devent-by-event hydrodynamics is also necessary to un-derstand the longitudinal structure of the collective flow.The initial state fluctuations along the longitudinal di-rection have been built in many models [26–33]. Ob-servables [34–44] have been designed to either constrainthe longitudinal structure in the initial state or to de-termine other QGP properties using the multiplicity oranisotropic flow correlations along the longitudinal direc-tion. Taking into account the asymmetry between for-ward and backward going participants, the non-centralheavy-ion collisions not only produce strong angular mo-

arX

iv:1

802.

0444

9v2

[nu

cl-t

h] 8

Mar

201

8

2

mentum, strong magnetic field but also global and localvorticity [5] and hyperon polarization [45].

The space-time evolution of high-energy heavy-ion col-lisions from event-by-event relativistic hydrodynamicsalso provide critical background information for thermalphoton, di-lepton emission, heavy flavor transport andjet energy loss studies when they are produced in or tra-verse the fluctuating hot and dense medium. For stud-ies of thermal photon and di-lepton production [46–48],the emission rates are computed with the local temper-ature and fluid velocity at each space-time point fromevent-by-event (3+1)D viscous hydrodynamics, which isquite computing intensive. In the simultaneous simu-lations of parton shower propagation and bulk mediumevolution, the bottle neck in the numerical simulationsis also the relativistic hydrodynamic evolution of themedium in each time step of the parton shower prop-agation as shown in CoLBT-Hydro [49] and the forth-coming JetScape [50]. Big data analyses in relativisticheavy-ion collisions using machine learning [51–53] anddeep learning techniques [54] demand huge amount ofdata from event-by-event hydrodynamic simulations withup to O(107) events across a high dimensional parameterspace. These studies will all benefit from a fast numericalsolver for the (3+1)D relativistic hydrodynamics.

In order to reduce the running time of one single simu-lation, Message Passing Interface (MPI) library has beenused in MUSIC [12, 55, 56] to parallelize the (3+1)Dviscous hydrodynamic program by communicating be-tween multiple CPUs. The communication costs be-tween CPUs on different nodes are usually heavy com-paring to the workload of the numerical computations.On the other hand, a Graphics Processing Unit (GPU)has a huge amount of processing elements (>2500) on onesingle computing device, which makes it quite popularto accelerate numerical computations via massive paral-lelization. The SHASTA algorithm is first parallelizedon heterogeneous devices using OpenCL to simulate theQGP expansion by solving the (3+1)D ideal hydrody-namic equations [57]. The (3+1)D viscous hydrodynam-ics for simulations of heavy-ion collisions has been par-allelized on GPU using both OpenCL (CLVisc [58]) andCuda (GPU-VH [59]). In this paper and its appendix, weprovide a detailed description of the parallelization of hy-drodynamic evolution, hyper-surface finding and spectracalculation in CLVisc hydrodynamic model. OpenCL hasthe benefit that the same code can run on heterogeneouscomputing devices (CPUs, GPUs, FPGAs and Intel Phi).However, the basic concepts and optimization principlesare the same for both OpenCL and Cuda. The acronymCLVisc refers to both CCNU (Central China Normal Uni-versity) and LBNL (Lawrence Berkeley National Labo-ratory) viscous hydrodynamic model and OpenCL GPUparallelization that is used.

After providing validations of CLVisc through com-parisons with several analytic solutions to the viscoushydrodynamics and experimental data on bulk hadronspectra in high-energy heavy-ion collisions, we apply the

CLVisc to the study of pseudo-rapidity distribution andfluctuation of anisotropic flow with event-by-event ini-tial conditions from A Multi-Phase Transport (AMPT)model [60]. We compute the pseudo-rapidity dependenceof the anisotropic flows vn(η) and rn(ηa, ηb) which rep-resents the de-correlation between vn−ηa and vnηawith the auxiliary reference window ηb. Effects of shearviscosity and initial fluid velocity on these longitudinalobservables are also investigated for the first time withCLVisc.

This paper is organized as follows: in Sec. II, werewrite the hydrodynamic equations in a specific wayto simplify the numerical implementation. In Sec. III,we describe in detail how the relativistic hydrodynamicequations are solved numerically in CLVisc with GPUparallelization. In Sec. IV, we introduce the GPU par-allelized smooth particle spectra calculation and the fastMonte-Carlo sampler to sample four-momenta of parti-cles from freeze-out hyper-surface. In Sec. V, we verifyour numerical code with a variety of analytical solutionsand numerical results from other implementations. Com-parisons with experimental data on hadron spectra andanisotropic flow are given in Sec. VI. In Secs. VII andVIII we discuss the pseudo-rapidity distribution, corre-lation and fluctuation of anisotropic flow. In the Ap-pendix, we provide a detailed description of the struc-ture and GPU parallelization of the algorithm to solvethe hydrodynamics equations, two methods to sampleJuttner, Fermi-Dirac and Bose-Einstein distributions ef-ficiently and assess the performance of GPU paralleliza-tion.

II. HYDRODYNAMIC EQUATIONS

Let us start by recapitulating the exact form of the rel-ativistic hydrodynamic equations that are solved withinCLVisc. The second-order hydrodynamic equations aresimply given by

∇µTµν = 0, (1)∇µNµ = 0, (2)

with the energy-momentum tensor Tµν = εuµuν − (p +Π)∆µν + πµν , where ε is the energy density, p the pres-sure, uµ the fluid four-velocity normalized as uµuµ = 1and ∆µν = gµν − uµuν the projection operator which isorthogonal to the fluid velocity, and the net charge cur-rent Nµ = nuµ + dµ where dµ is the charge diffusioncurrent. The shear stress tensor πµν and the bulk pres-sure Π represent the deviation from ideal hydrodynamicsand local equilibrium. We choose to work in Landauframe, which yields the traceless (πµµ = 0) and transverse(uµπ

µν = 0) shear stress tensor. By projecting along thefluid velocity uµ direction, we simply get uµTµν = εuν .

The bulk pressure Π and the shear stress tensor πµν

3

satisfy the equations [61],

Π = −ζθ − τΠ[uλ∇λΠ +

4

3Πθ

](3)

πµν = ηvσµν − τπ

[∆µα∆ν

βuλ∇λπαβ +

4

3πµνθ

]−λ1π

〈µλ π

ν〉λ − λ2π〈µλ Ων〉λ − λ3Ω

〈µλ Ων〉λ, (4)

with the expansion rate θ, symmetric shear tensor σµνand the antisymmetric vorticity tensor Ωµν defined as

θ ≡ ∇µuµ,σµν ≡ 2∇〈µuν〉 ≡ 2∆µναβ∇αuβ ,

Ωµν ≡ 1

2∆µα∆νβ(∇αuβ −∇βuα),

∆µναβ ≡ 1

2(∆µα∆νβ + ∆µβ∆να)− 1

3∆µν∆αβ , (5)

where ∆µναβ is the double projection operator thatmakes the resulting contracted tensor symmetric, trace-less and orthogonal to the fluid velocity uµ. In Eqs.(3) and (4), the τΠ, τπ, λ1, λ2, λ3 are five independentsecond-order transport coefficients. Nonzero relaxationtimes τΠ and τπ in the second-order Israel-Stewart (IS)equations solve the causality problem of the first-orderNavier-Stokes equations. In the current calculation weset τπ = 5ηv/(Ts) [62] and τΠ = 5ζ/(Ts), where T is thetemperature, s the entropy density, ηv the shear viscouscoefficient, and ζ the bulk viscous coefficient.

The time-like fluid four-velocity in Cartesian coordi-nates xµ = (t, x, y, z) is defined as,

u,µ ≡ dxµ

dσ≡ u0(1, v,x, v

,y, v

,z) (6)

where σ =√t2 − x2 − y2 − z2 and spatial components

of the fluid velocity are defined as v,i = u,i/u0. We workin Milne coordinates Xµ = (τ, x, y, ηs), in which τ =√t2 − z2 is the proper time and ηs = 1

2 ln t+zt−z the space-

time rapidity. The fluid four-velocity in these coordinatesis,

uµ ≡ dXµ

dσ=dXµ

dxνdxν

dσ=dXµ

dxνu,ν

=

u0 cosh ηs − u,z sinh ηs~u,⊥1τ (−u0 sinh ηs + u,z cosh ηs)

≡ uτ 1~v⊥vηsτ

(7)where v⊥ and vηs are defined as,

~v⊥ = ~v,⊥ cosh(yv)/ cosh(yv − ηs) (8)vηs = tanh(yv − ηs) (9)

and yv denotes the rapidity of the longitudinal fluid ve-locity as given by v,z = tanh yv, uτ = 1/

√1− v2

⊥ − v2ηs

and uηs = uτvηs/τ . In the Bjorken scaling scenario wherethe energy density is uniform along ηs direction, we sim-ply get vηs = 0 and yv = ηs, which implies vz = z/t.

In full 3D expansion, vηs denotes the relative fluid veloc-ity at coordinate (t, x, y, z), in a reference frame which ismoving at the speed of vz = z/t.

From the invariant line element ds2 = gµνdXµdXν =

dτ2−dx2−dy2−τ2dη2s we get the metric tensor in Milne

coordinates,

gµν = diag(1,−1,−1,−τ2) (10)gµν = diag(1,−1,−1,−1/τ2) (11)

The Christoffel symbols are explicitly solved as a functionof the metric tensor, Γikl = 1

2gim(∂lgmk+∂kgml−∂mgkl),

and contain three nonzero components,

Γτηsηs = τ, Γηsτηs = Γηsηsτ = 1/τ, (12)

which are used in the covariant derivative operation ∇µfor all vectors and tensors in the hydrodynamics equa-tions and IS equations,

∇bλa ≡ ∂bλa + Γabcλc (13)

∇cλab ≡ ∂cλab + Γacdλdb + Γbcdλ

ad (14)

For example, there are 3 terms in ∇µuν which are differ-ent from their ordinary derivatives,

∇τuηs = ∂τuηs +

1

τuηs , (15)

∇ηsuτ = ∂ηsuτ + τuηs , (16)

∇ηsuηs = ∂ηsuηs +

1

τuτ , (17)

The ∂τλ + λ/τ terms from covariant derivatives arecombined as 1

τ ∂τ (τλ), to reduce the numerical errorwhen τ is small. The new independent variables arethus defined as λ = τλ. In this way, we defineTµν , Nµ, πµν , uµ, ∂µ and gµν as,

Tµν =

τTµν for µ 6= ηs and ν 6= ηs

τ2Tµηs for µ 6= ηs

τ3T ηsηs otherwise(18)

Nµ =

τNµ for µ 6= ηs

τ2Nηs for µ = ηs(19)

πµν =

πµν for µ 6= ηs and ν 6= ηs

τπµηs for µ 6= ηs

τ2πηsηs otherwise(20)

uµ = (uτ , ux, uy, τuηs) (21)

∂µ = (∂τ , ∂x, ∂y, ∂ηs/τ) (22)gµν = gµν = diag(1,−1,−1,−1) (23)

One benefit of these substitutions is that all the com-ponents in the same vector or tensor have the same di-mension. This technique is widely used in all well-known(2+1)D or (3+1)D relativistic hydrodynamic codes forheavy-ion collisions [12, 63–66]. However, the Christoffelsymbols calculated from gµν satisfy Γikl = 0. Neither Γiklnor Γikl constitute the proper new covariant derivativesto leave the hydrodynamic equations and IS equations

4

unchanged. Those three covariant derivatives in the newsystem become,

∇τ uηs = ∂τ uηs (24)

∇ηs uτ = ∂ηs uτ +

1

τuηs (25)

∇ηs uηs = ∂ηs uηs +

1

τuτ (26)

From now on, Christoffel symbols will not appear in theequations to avoid possible typos. Using the new covari-ant derivatives ∇µuν , the hydrodynamic equations andIS equations are expanded in the following way to sim-plify the explanation of the numerical implementation inthe next section,

∂τ Tτν + ∂iT

iν = SνT (27)

∂τ Nτ + ∂iN

i = SN (28)

∂τ (uτ πµν) + ∂i(uiπµν) = Sµνπ (29)

∂τ (uτΠ) + ∂i(uiΠ) = SΠ (30)

where the source terms are,

SνT = (1

τT ηsηs , 0, 0,

1

τT τηs)T , (31)

SN = 0, (32)

Sµνπ = − πµν − ηvσµν

τπ− 1

3πµν θ

−gαβ(uµπνβ + uν πµβ)Duα + πµνuτ

τ

− 1

τπ

[λ1π

〈µλ π

ν〉λ + λ2π〈µλ Ων〉λ + λ3Ω

〈µλ Ων〉λ

]+Iµν , (33)

SΠ = −Π− ζθτΠ

− 1

3Πθ, (34)

where θ = ∂µuµ + uτ/τ is the expansion rate, D = uλ∇λ

the comoving derivatives. The Iµν are source terms fromChristoffel symbols which are given in Ref. [66],

Iττ = 2uηs πτηs/τ, Iτx = uηs πηsx/τ, (35)Iτy = uηs πηsy/τ, Iτηs = uηs(πττ + πηsηs)/τ, (36)Iηsx = uηs πτx/τ, Iηsy = uηs πτy/τ, (37)Iηsηs = 2uηs πτηs/τ, Ixy = Ixy = Iyy = 0, (38)

III. NUMERICAL IMPLEMENTATION

The task of the numerical algorithm is to obtain thetime evolution of the energy density ε and fluid four-velocity uµ by solving partial differential equations (27),(28), (29) and (30). These equations have the commonform,

∂τQ+ ∂xFx + ∂yF

y + ∂ηsFηs = S (39)

where Q is the conservative variable, F x,y,ηs the fluxalong x, y, ηs directions and S the source term. We use

a second-order central scheme Kurganov-Tadmor (KT)algorithm [67] for the convective part ∂τQ+ ∂iF

i = 0 inEq. (39).

dQ

dτ= −

Hxi+1/2,j,k −H

xi−1/2,j,k

dx

−Hyi,j+1/2,k −H

yi,j−1/2,k

dy

−Hηsi,j,k+1/2 −H

ηsi,j,k−1/2

τdηs≡ SKT (40)

where Q stands for the mean value of Q in one cell, SKTstands for source terms from flux in KT algorithm. TheKT algorithm is a finite volume algorithm which hasa very clear physical meaning–the change of conservedquantities in a finite volume equals to the flux enteringminus the flux leaving this volume. Take the x directionas an example, the flux leaving this volume is,

Hxi+1/2 =

F x(Qri+1/2) + F x(Qli+1/2)

2(41)

−ci+1/2

Qri+1/2 −Qli+1/2

2(42)

where

Qri+1/2 = Qi+1 − (∂xQ)i+1dx

2(43)

Qli+1/2 = Qi + (∂xQ)idx

2(44)

and ci+1/2 is the maximum propagating speed of the localcollective signal given in Ref. [55]. Notice that five nodes(i−2, i−1, i, i+1, i+2) are needed to update the hydro-dynamic cell at i for the one-dimensional case. In (3+1)Dhydrodynamics, another 4 nodes (j−2, j−1, j+ 1, j+ 2)along the y and 4 nodes (k − 2, k − 1, k + 1, k + 2) alongthe ηs direction are needed. The KT algorithm is widelyused in relativistic hydrodynamic simulations of heavy-ion collisions [55, 58, 59], after being introduced to thefield of high-energy physics by the McGill group [55].Some higher order KT algorithms use more nodes in theoff-diagonal direction to achieve a higher precision. How-ever, the simplicity of the 2nd order central scheme makesit much easier to parallelize on GPU. The equations arefurther simplified by moving the KT source terms to theright hand side,

∂τ Tτµ = SµT,tot (45)

∂τ Nτ = SµN,tot (46)

∂τ (uτ πµν) = Sµνπ,tot (47)

∂τ (uτ Π) = SΠ,tot (48)

where S∗,tot = S∗ + SKT. The upper index µ in thevector and µ, ν in the tensor are neglected in the following

5

notation for simplicity.

u∗n+1π′n+1 = unπn + hSπ,tot(ε

n, un, u∗n+1, πn) (49)

T′n+1 = Tn + hST,tot(ε

n, un, πn) (50)

T′n+1ideal = T

′n+1 − π′n+1 → ε

′n+1, u′n+1 (51)

u′n+1πn+1 = unπn +

h

2

[Sπ,tot(ε

n, un, u∗n+1, πn)

+Sπ,tot(ε′n+1, u

′n+1, un, π′n+1) ] (52)

Tn+1 = Tn +h

2[ST,tot(ε

n, un, πn)

+ST,tot(ε′n+1, u

′n+1, πn+1) ] (53)Tn+1

ideal = Tn+1 − πn+1 → εn+1, un+1 (54)

where h is the time spacing. From this flow chart thedifficulty in solving 2nd order viscous hydrodynamics be-comes clear. In order to update πµν to time step n + 1,one needs information of fluid velocity un+1. However,un+1 can only be determined through Tµνideal = Tµνvisc−πµν ,assuming that πµν at time step n+ 1 are already known.Implicitly solving Tµν , πµν together with root-finding is apossible solution, however, very complex. The two stepRunge-Kutta method is good at solving this problem,since the first step is a prediction step, it does not ask forexact solution. We first predict π

′n+1, by extrapolatingthe fluid velocity to n+1 step using u∗n+1 = 2un−un−1,and then get some predicted values for ε and uµ. Af-terwards, we update πn+1,Πn+1, Nn+1 and Tn+1 usingthe averaged source terms in 2 steps. For the first timestep where un−1 is not known, ideal hydrodynamics isemployed to estimate u∗1. Notice that the bulk viscosityand net baryon density are set to 0 in the current version.

CLVisc has been applied with a various set of initialenergy-momentum tensors for the initial stage of high-energy heavy-ion collisions. The first model is the opti-cal Glauber model [68] which can reproduce the chargedmultiplicity, transverse momentum spectra and ellipticflow v2 of heavy-ion collisions. The second model isTrento [53] developed by the Duke group which pa-rameterizes MC-Glauber [68, 69], MC-KLN [70–73], IP-Glasma [14, 17, 74] and EKRT [75–77] initial conditions.It can additionally describe higher order anisotropic flowvn due to the inclusion of entropy/energy density fluc-tuations in the transverse plane. Since Trento is veryflexible and successful, this is used as the default for thepublic version of CLVisc. To verify that bulk observablesare well described the corresponding results are presentedin Sec. VI. The third model is A-Multi-Phase-Transport(AMPT) model [60] which includes further fluctuationsalong the space-time rapidity and of the initial fluid ve-locity [64]. Due to the longitudinal fluctuations and theasymmetric distribution of forward and backward goingparticipants in heavy-ion collisions, CLVisc with AMPTinitial conditions can describe the twisting of event planesalong the longitudinal direction [29, 78], di-hadron cor-relation as a function of rapidity and azimuthal angledifferences [79]. It is also used to describe the rich vor-tical structure of the QGP fluid during the expansion

and the global and local polarization of hyperons [80] innon-central heavy-ion collisions. Due to the longitudinaldynamics incorporated in the AMPT initial conditions,they are going to be used for all the results of this workshown in Secs. VII and VIII.

0.0 0.5 1.0 1.5 2.0 2.5 3.0Energy Density [GeV/fm3]

0.0

0.2

0.4

0.6

0.8

1.0

Pres

sure

[GeV

/fm3 ]

EOSIlattice-wb2014s95p-pceEOSQPure Gauge

Figure 1. (color online) Pressure as a function of energy den-sity for 5 different equations of state. They are denoted asEOSI, lattice-wb2014, s95p-pce, EOSQ and pure gauge fromtop to down.

There are 5 options for the equation of state (EoS) inCLVisc as shown in Fig. 1:

EOSI: The simplest EoS – ideal gas EoS where pressureis 1/3 of energy density.

lattice-wb2014: The recent lattice QCD calculationsfrom Wuppertal-Budapest group, whose traceanomaly differ from s95p lattice results by a largemargin for the temperature range 180 − 320 MeV[81].

s95p-pce: The default s95p partial chemical equilibriumEoS [82] used in this paper is given by latticeQCD EoS at high energy density and hadronic res-onance gas (HRG) EoS at low energy density witha smooth crossover in between using interpolation.

EOSQ: Employs a first order phase transition betweenQGP and HRG [83].

pure gauge: Pure gauge EoS with a first orderphase transition given by gluodynamics without(anti)quarks [84–86].

IV. FREEZE-OUT AND PARTICLIZATION

We use the Cooper-Frye formula [87] to calculate themomentum distribution of particle i on the freeze-out

6

hypersurface,

dNidY pT dpT dφ

=gi

(2π)3

ˆpµdΣµfeq(1 + δf) (55)

where dΣµ is a freeze-out hyper-surface element deter-mined by the constant freeze-out temperature Tf or con-stant freeze-out energy density εf . Particles passingthrough the freeze-out hyper-surface elements are as-sumed to obey Fermi/Bose distributions at temperatureTf with the non-equilibrium correction δf ,

feq =1

exp [(p · u− µi)/Tfrz]± 1(56)

δf = (1∓ feq)pµpνπ

µν

2T 2frz(ε+ P )

(57)

where ± is for fermion/bosons, respectively, µi the ef-fective chemical potential in the partial chemical equilib-rium EoS to fix the particle ratio when the temperatureis below the chemical freeze-out temperature. µi is set to0 for chemical equilibrium EoS.

Two methods are used to compute the particle spec-tra on the freeze-out hyper-surface. The first method(called ’smooth’) is to carry out the numerical integra-tion over the freeze-out hyper-surface and obtain smoothparticle spectra in NY ×Npt ×Nφ = 41× 15× 48 tabu-lated (Y, pT , φ) bins. pT and φ are chosen to be GaussianQuadrature nodes to simplify the calculation of pT or φintegrated spectra. Hadron spectra from resonance de-cays are also computed via integration. In practice, thereare millions of small freeze-out hyper-surface elementsdΣµ, that make the spectra calculation quite CPU timeconsuming. This module is parallelized on GPU and theimplementation details are described in the Appendix.

The second method for computing final hadron spec-tra is Monte Carlo sampling based on Eq. (55) (dubbed’MC sampling’). This method is similar to Monte Carloevent generators and the sampled particles can be redi-rected to hadron cascade models like UrQMD [88–90],JAM [91] and SMASH [92] to simulate hadronic rescat-tering and resonance decays. In the present work we donot employ a hadronic afterburner, but force the sam-pled resonances to decay to stable particles immediatelyafter they are produced. This setup saves CPU timeand allows for an efficient calculation of correlation ob-servables and provides a baseline calculation for futuremore quantitative work including hadronic rescattering.By comparing with this baseline one can distinguish theeffect of hadronic scattering from resonance decays only.

Since the particle number is Lorentz invariant, par-ticles and their energy-momentum are sampled in thecomoving frame of fluid, and then boosted back to thecollision frame via Lorentz transformation with the fluidvelocity uµ. This is possible, if the proper weights aretaken into account. The total number of hadrons pro-duced from the freeze-out hyper-surface is N = n×u·dΣ,where u ·dΣ is the invariant volume and n =

∑i ni is the

thermal density of all hadrons in the co-moving frame.For systems without bulk viscosity and net charge cur-rent (net baryon, net electric charge or net strangeness),the thermal density of hadron type i is fixed for a givenfreeze-out temperature. In this case, the thermal densi-ties ni for all hadron species are computed a priori andtabulated for efficiency. For systems with non-zero netcharge current and bulk viscosity, the thermal densitiesare different for hyper-surface elements that have dif-ferent net charge and bulk viscosity. In that case, thethermal density ni must be computed locally for eachhyper-surface element which is rather computing inten-sive, and also demands parallelization on GPUs. Thepresent Monte Carlo particlization obeys global conser-vation laws in one ensemble of sampled events. If thecode is used to compute the net baryon fluctuations orcharge correlation, one has to consider global conserva-tion laws in each single event [93].

The thermal density ni in the co-moving frame is com-puted numerically by one-dimensional integration,

ni =gs

(2π2)

ˆ 100T

0

p2dp

exp[(√p2 +m2

i − µi)/T]± 1

(58)

where gs is the spin-degeneracy, T is the temperature, pis the momentum magnitude, mi is the mass of hadrontype i, µi is the chemical potential, ±1 is for baryons andmesons, respectively.

The total number of hadrons computed from onefreeze-out hyper-surface element dΣj is λj = nu · dΣj ,where n =

∑i ni is the summation of thermal density

over all hadrons. λj is a very small float number thatgives the mean number of hadrons produced from dΣjin multiple independent samplings. This probability forthe hadron multiplicity in the jth hyper-surface elementis assumed to follow a Poisson distribution,

Pj(k) = e−λjλkjk!

(59)

where k is an integer that indicates the hadron multi-plicity in one sampling. We draw k from this Poissondistribution and determine the particle type for each ofthese k hadrons through a discrete distribution whoseprobabilities are given by ni/

∑i ni.

Once the total number of hadrons and their species aredetermined for one hyper-surface element, the magnitudeof their momenta in the local rest frame can be sampled.Since the total number of hadrons from the hyper-surfaceelement is Lorentz invariant, one can compute dN from,

dN =gi

(2π)3

ˆd3p∗

p∗0

ˆp∗µdΣ∗µf0(1 + δf)

=gi

2π2

ˆ ˆ|p∗|2d|p∗|dΣ∗0f0

=gi

2π2

ˆuµdΣµ

ˆd|p∗| × |p∗|2f0 (60)

7

where we have used the properties that the p∗i is inte-grated over (−∞,∞) for i = (1, 2, 3) and the integrationof δf (shear viscosity only) also vanishes. It is straightforward to sample the magnitude of the momentum |p∗|from |p∗|2f0(|p∗|, µ, T, λ) where µ is chemical potential,T is freeze-out temperature and λ = ±1 for Fermi-Diracand Boson-Einstein distribution, respectively. See IXDfor details.

Once |p∗| is determined, f0 and p∗0 =√|p∗|2 +m2

can be treated as constants when sampling the directionof the momentum in the co-moving frame. The momen-tum directions are determined by rejection sampling withacceptance rate rideal and rvisc, where

rideal =p∗ · dΣ∗

p∗0(dΣ0∗ +

√|dΣ∗|2

) ≤ 1 (61)

with p∗ = (p∗0, |p∗| sin θ cosφ, |p∗| sin θ sinφ, |p∗| cos θ)the four-momentum determined by |p∗|, the hadronmass, the polar angle θ and the azimuthal angle φ. ThedΣ∗ is the hyper-surface element in the co-moving frame.

For viscous hydrodynamics, there is an additional ac-ceptance rate that depends on the direction of the mo-mentum,

rvisc =A+ (1∓ f0)p∗µp

∗νπ

µν∗

A+ |1∓ f0| × |p∗µp∗νπ∗µν |max(62)

where A = 2T 2(ε+P ) is positive on the freeze-out hyper-surface. Since p∗0 and f0 are constants for a given |p∗|,the easiest way to get |p∗µp∗νπ∗µν |max is as follows,

|p∗µp∗νπ∗µν | ≤∑µν

|p∗µp∗νπ∗µν | ≤ (p∗0)2∑µν

|π∗µν | (63)

One problem in the smooth resonance decay is thatthe numerical integrations over the phase space of parenthadrons are difficult to verify. The Monte Carlo samplingand decay program, however, can be tested easily. Giventhe freeze-out temperature, the thermal density of eachhadron species before resonance decay is easily computedfrom numerical integration as shown in Eq. (58). Giventhe density of each hadron and the tree-structure in thedecay table, one can compute the ratio of π+ density be-fore and after resonance decay. We have verified thatresults from Monte Carlo sampling and decay agree withthe analytical solution. It is straight forward to checkthe accuracy of the GPU parallelized smooth spectra andresonance decay via integration by comparing the parti-cle yield and transverse momentum distribution with theMonte Carlo sampling and force decay method.

As shown in Fig. 2 and Fig. 3, the yields and themomentum distribution of charged and identified par-ticles from the Monte Carlo sampling agree with thesmooth particle spectra via integration from Cooper-Fryeformula. These hydrodynamic simulations use opticalGlauber initial condition with the impact-parameter b =2.4 fm, initial time τ0 = 0.4 fm, maximum energy density

8 6 4 2 0 2 4 6 8

0

100

200

300

400

500

600

dN/d

charged

+

K +

p

smoothMC sampling

Figure 2. (color online) Pseudo-rapidity distributions forcharged hadrons and identified particles π+, K+ and pro-ton from smooth particle spectra (black solid line) with inte-gral resonance decay and Monte Carlo sampling (red dashedline) with forced resonance decay. The hydrodynamic evolu-tion is given by CLVisc with optical Glauber initial conditionat impact-parameter b = 2.4 fm, with initial time τ0 = 0.4fm, the maximum energy density in most central collisionsεmax = 55 GeV/fm3 and lattice QCD EoS from Wuppertal-Budapest 2014 computation.

0 1 2 3 4 5pT [GeV]

104

103

102

101

100

101

102

dN/(2

dYp T

dpT)

[GeV

2 ]

+

K +

p

smoothMC sampling

Figure 3. (color online) The transverse momentum distribu-tion for identified particles π+, K+ and proton from smoothparticle spectra (black solid line) with integral resonance de-cay and Monte Carlo sampling (red dashed line) with forcedresonance decay. The hydrodynamic evolution is the same asin Fig. 2.

at the center of the overlap region εmax = 55 GeV/fm3,ηv/s = 0 and lattice QCD EoS (lattice-wb2014) based onthe Wuppertal-Budapest 2014 results.

8

V. COMPARISONS WITH ANALYTICALSOLUTIONS AND OTHER NUMERICAL

SOLUTIONS

To ensure the numerical accuracy of the GPU par-allelized CLVisc code, we validate it by comparing thenumerical results with both analytical solutions of thehydrodynamic equations and numerical solutions fromother independently developed codes.

For the first validation, analytical solutions are basedon simple assumptions. The Bjorken solution, for ex-ample, assumes that the energy density distribution isuniform in (x, y, ηs) coordinates. Under this assumption,pressure gradients along x, y and ηs vanish, fluid velocityvx = vy = vηs = 0, all the nonvanishing terms that af-fect the time evolution in hydrodynamic equations comefrom nonzero Christoffel symbols. This solution thereforecan be used to check whether the Christoffel symbols arecorrectly implemented and to quantify numerical errorsaccumulated during many time steps of evolution. Onthe other hand this solution can not be used to check theaccuracy of spatial derivatives.

The cross check between different codes on the otherhand works for arbitrary initial configurations. However,comparisons of numerical results from different codeswith the same initial configurations, cannot directly vali-date one model over the other or judge which implemen-tation results in smaller numerical errors. Below we willcompare results from CLVisc with the Riemann, Bjorkenand Gubser solution for 2nd order viscous hydrodynamicsand the viscous hydrodynamic code VISH2+1 developedby the Ohio State University (OSU) group.

A. Riemann solution

The Riemann solution considers fluid expansion witha step-like initial energy density distribution. It teststhe performance of the numerical hydrodynamic simula-tions in regions with sharp gradients (e.g. the shock wavefront) [94–96]. The initial condition is specified as

ε(t = 0, z) =

ε0, z ≤ 00, z ≥ 0

(64)

vz(t = 0, z) =

0, z ≤ 01, z ≥ 0

(65)

where the initial fluid velocity at z > 0 is set to 1.In relativistic hydrodynamics, the Riemann solution de-scribes how the QGP expands into vacuum. In the non-relativistic case, the Riemann solution is used to studydam breaking. The solution is a function of the similarityvariable ζ ≡ z/t. Because of causality, nothing changesin the |ζ| > 1 region. For −1 < ζ < 1, the solution is a

simple rarefaction wave which is given by [97],

ε(ζ)

ε0=

1, −1 ≤ ζ ≤ −cs[

1−cs1+cs

1−ζ1+ζ

](1+c2s)/2cs, −cs ≤ ζ ≤ 1

(66)

vz(ζ) = tanh

[− cs

1 + c2sln

(ε

ε0

)]. (67)

10 5 0 5 10z [fm]

0.5

0.0

0.5

1.0

1.5

/ 0

t=0

t =2

t =4

t =8RiemannCLVisc

Figure 4. (color online) The comparison between CLVisc andRiemann solution for energy density evolution as a functionof time.

10 5 0 5 10z [fm]

0.5

0.0

0.5

1.0

1.5

v z

t =0 t =2 t =4 t =8

RiemannCLVisc

Figure 5. (color online) The comparison between CLVisc andRiemann solution for fluid velocity evolution as a function oftime.

To compare to the Riemann solution, the ideal gas EoS(EOSI) is used where the speed of sound c2s = 1/3 inCLVisc simulations. All the Christoffel symbols are set

9

to 0 to return to (t, x, y, z) coordinates. The energy den-sity is set to constant in the transverse direction. CLViscsolves the Riemann problem precisely for the energy den-sity evolution as shown in Fig. 4. For the fluid velocityprofile, there is a quick drop-off in the light cone region(z = t) which is caused by the numerical cutoff usedin the simulations. In high-energy heavy-ion collisions,an energy density cut-off ε = 10−7 GeV/fm3 is reason-ably safe comparing with the typical freeze-out energydensity ε ∼ 0.1 GeV/fm3, when the hydrodynamic evo-lution stops. The physics processes at such low energydensity region around and after the freeze-out should bedescribed by hadronic transport models instead of hydro-dynamics. By setting ε = 0, when the energy density issmaller than the cutoff, an artificial shock wave is formedat the edge of the expanding fireball. The Riemann so-lution test verifies that this artificial cutoff does not leadto sizable difference for the region where we apply hydro-dynamics.

B. Bjorken solution

The Bjorken solution assumes uniform distributionin the transverse direction and in spatial rapidity ηsin Milne coordinates, which gives rise to vx = vy =vηs = 0. This solution derived in [98] is used exten-sively to model the longitudinal expansion dynamics inhigh-energy heavy-ion collisions, where a pleateau in therapidity profile is observed in final state particle spec-tra. It is applied in otherwise 2+1 dimensional hydrody-namic models or in analytic calculations. However, theenergy density still decreases with time due to nonzerolongitudinal fluid velocity vz = z/t in (t, x, y, z) coordi-nates. The nonzero components of shear stress tensorsare πxx = πyy = −τ2πηsηs = 4ηv

3τ . With all the spatialgradients vanishing under this assumption, the hydrody-namic equations are simplified to,

∂ε

∂τ+ε+ P + τ2πηsηs

τ= 0 (68)

For the ideal gas EoS where ε = 3P and T ∝ ε1/4, wehave the solution,

T

T0=(τ0τ

)1/3[1 +

2ηv3sTτ0

(1−

(τ0τ

)2/3)]

, (69)

where T and T0 are temperature at proper time τ andτ0, respectively. Shown in Fig. 6 is the numerical solu-tion from CLVisc (solid) compared to the above Bjorkenanalytic solution with the same initial temperature, timeand shear viscosity to entropy ratio.

C. Gubser solution for 2nd order viscoushydrodynamics

The Bjorken solution assumes homogeneous distribu-tion of energy density in (τ, x, y, ηs) coordinates at any

0 2 4 6 8 10 12

0 [fm]0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

T/T 0

0 =0.6 fm

T0 =0.36 GeV

v /s =0.08

CLVisc

Bjorken

Figure 6. (color online) The comparison between CLVisc andBjorken solution for viscous hydrodynamics

given time τ which leads to uµ = (1, 0, 0, 0). This so-lution, however, gives rise to nonzero longitudinal fluidvelocity vz = z/t when transformed back to (t, x, y, z)coordinates. The same philosophy is used in the Gubsersolution for the 2nd order viscous hydrodynamics [58],where we perform a conformal/Weyl transformation tothe coordinate system following Gubser [99],

ds2 ≡ ds2

τ2= dρ2 − cosh2 ρ(dθ2 + sin2 θdφ2)− dη2

s , (70)

which indicates that the Minkowski space is conformal todS3 ×R with,

sinh ρ = −L2 − τ2 + x2

⊥2Lτ

, tan θ =2Lx⊥

L2 + τ2 − x2⊥,

(71)where L can be interpreted as the radius of the dS3 spaceor the typical size of a relativistic heavy-ion collisions.Hereafter in this section, dynamical variables in the newcoordinate system xµ = (ρ, θ, φ, ηs) will carry a hat toavoid confusion. Assuming the energy density distribu-tion is uniform in this xµ coordinates, one simply getsuµ = (1, 0, 0, 0). When ηvλ2

1 = 3τπ, we find a very simpleanalytical solution,

ε ∝(

1

cosh ρ

) 83−

2λ1

, uµ = (1, 0, 0, 0), (72)

C = −2A = −2B =2

λ1

ε. (73)

where C ≡ πηsηs , A ≡ πθθ cosh2 ρ and B ≡πφφ cosh2 ρ sin2 θ. After Weyl rescaling, we can get back

10

8 6 4 2 0 2 4 6 8x [fm]

105

104

103

102

101

100

[fm4 ]

= 1. 0 fm

= 2. 0 fm

= 4. 0 fm

= 6. 0 fm

L = 2, v/s = 0. 2, 1 = 10. 0

CLVisc

Gubser

Figure 7. (color online) The time evolution of energy den-sity distribution from CLVisc numerical results (solid) andGubser analytical solution (dashed) for 2nd order viscous hy-drodynamics.

to the (τ, x, y, ηs) space and obtain,

ε =ε

τ4, (74)

~v⊥ =−2τ~x⊥

L2 + τ2 + x2⊥, (75)

πµν =1

τ2

∂xα

∂xµ∂xβ

∂xνπαβ . (76)

Notice that the dimensionless transport coefficients aredefined as ηv = ηv/ε

3/4, τπ = τπε1/4, λ1 = λ1ε. The

conditional solution is nontrivial since there are threedifferent transport coefficients and many non-vanishingπµν components. Since the energy density distribution isnot uniform in the transverse plane of (τ, x, y, ηs) coordi-nates, the spatial gradients along x and y are nontrivial.This solution is very good at verifying the numerical ca-pability of any 2nd order viscous hydrodynamics code.

The parameters we used for the comparison in thissection are L = 2, ηv/s = 0.2 and λ1 = −10. The relax-ation time τπ is calculated from the constraint equationηvλ

21 = 3τπ. Notice that we can still cover the whole pa-

rameter space for ηv/s and λ1, to investigate the stabilityof the code in different limits. In practice, λ1 = ε

πµν >> 1

is required for consistency and stability. When λ1 →∞,the hydrodynamic equations recover the ideal fluid solu-tion. As shown in Figs. 7 and 8, with λ1 = −10, CLViscreproduces very accurately the energy density and trans-verse fluid velocity evolution given by the Gubser solu-tion. Another interesting property of this 2nd order Gub-ser solution is that the fluid velocity is the same as thatfor ideal hydrodynamics, since it is fixed by conformaltransformation.

8 6 4 2 0 2 4 6 8x [fm]

1.0

0.5

0.0

0.5

1.0

v

= 1. 0 fm

= 2. 0 fm

= 4. 0 fm

= 6. 0 fm

L = 2, v/s = 0. 2, 1 = 10. 0

CLVisc

Gubser

Figure 8. (color online) The time evolution of transverse fluidvelocity from CLVisc numerical results (solid) and Gubseranalytical solution (dashed) for 2nd order viscous hydrody-namics.

8 6 4 2 0 2 4 6 8x [fm]

106

105

104

103

102

101

xx[fm

4 ]= 1. 0 fm

= 2. 0 fm

= 4. 0 fm

= 6. 0 fm

L = 2, v/s = 0. 2, 1 = 10. 0

CLVisc

Gubser

Figure 9. (color online) The time evolution of πxx fromCLVisc numerical results and Gubser analytical solution for2nd order viscous hydrodynamics.

In principle λ1 can be either positive or negative. Inheavy-ion collisions, one gets negative πηsηs in Bjorkenscaling. Therefore we choose a negative λ1 for positiveπxx, πyy and negative πηsηs . As a result, −τ2πηsηs isroughly two times πxx and πyy, which preserve the trace-less property together with a small but nonzero πττ inthis solution.

As shown in Figs. 9 and 10, there are tiny deviationsbetween the analytical solution and the CLVisc relativis-tic hydrodynamic simulations, on the shoulders (x = ±6)of πxx and −τ2πηsηs at a late time τ = 6 fm. It is ex-

11

8 6 4 2 0 2 4 6 8x [fm]

106

105

104

103

102

101

100

2[fm

4 ]

= 1. 0 fm

= 2. 0 fm

= 4. 0 fm

= 6. 0 fm

L = 2, v/s = 0. 2, 1 = 10. 0

CLVisc

Gubser

Figure 10. (color online) The time evolution of −τ2πηsηs fromCLVisc numerical results and Gubser analytical solution for2nd order viscous hydrodynamics.

pected that the deviation could be larger at even latertime due to the accumulated numerical error. At present,this tiny deviation is acceptable since the energy densitydrops much faster in Gubser expansion, than Bjorken ex-pansion or realistic time evolutions of QGP in heavy-ioncollisions.

We have collected these analytical solutions and putthem in a python package gubser. The package is up-loaded to the Python Package Index website, and can bedownloaded and installed on a local machine using pipinstall –user gubser. More analytical solutions [100–113] from the community are welcomed to be added tothe package.

D. Comparison with VISH2+1

We now compare the numerical solutions from CLViscwith VISH2+1 viscous hydrodynamic model devel-oped by the OSU group, which is a (2+1)D vis-cous hydrodynamic model assuming Bjorken scalingin the longitudinal direction. The configurations andhydrodynamic results from VISH2+1 can be foundin TechQM website https://wiki.bnl.gov/TECHQM/index.php/Momentum_anisotropies. We use the sameinitial conditions and model parameters in the simula-tions for comparison. Shown in Fig. 11 are results for thepT differential elliptic flow v2, in Fig. 12 the mean trans-verse fluid velocity 〈vr〉 and in Fig. 13 the momentumeccentricity from CLVisc (symbol points) as compared toresults from VISH2+1 viscous hydro (lines), for Au+Aucollisions at

√sNN = 200 GeV at impact parameter b = 7

fm with the optical Glauber initial condition. They agreewith each other to a reasonable precision.

From this extensive comparison to available analyti-

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0pT [GeV]

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

v 2

CLVisc v/s = 0.0Vish2 + 1 v/s = 0.0CLVisc v/s = 0.08 no dfVish2 + 1 v/s = 0.08 no dfCLVisc v/s = 0.08Vish2 + 1 v/s = 0.08

Figure 11. (color online) Comparison between CLVisc (sym-bol points) and VISH2+1 (lines) results for elliptic flow ofdirect π+ in Au+Au collisions at

√sNN = 200 GeV with the

optical Glauber initial condition at impact-parameter b = 7fm and with different values of shear viscosity to entropy ra-tio. Results without the viscous correction δf to the localphase-space distributions [Eq. (57)] are also shown.

1 2 3 4 5 6 7 8 9 [fm]

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

<v r

>

CLVisc v/s = 0.08Vish2 + 1 v/s = 0.08CLVisc v/s = 0.0Vish2 + 1 v/s = 0.0

Figure 12. (color online) Comparison between CLVisc (sym-bol points) and VISH2+1 (lines) results for mean transversefluid velocity 〈vr〉 in Au+Au collisions at

√sNN = 200

GeV with the optical Glauber initial condition at impact-parameter b = 7 fm and with different values of shear viscosityto entropy density ratio.

cal solutions and other numerical solution of relativistichydrodynamics, we conclude that CLVisc is performingcompetitively well.

https://wiki.bnl.gov/TECHQM/index.php/Momentum_anisotropies

https://wiki.bnl.gov/TECHQM/index.php/Momentum_anisotropies

12

1 2 3 4 5 6 7 [fm]

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

mom

entu

mec

cent

ricity

p CLVisc v/s = 0.0p Vish2 + 1 v/s = 0.0,p CLVisc v/s = 0.08,p Vish2 + 1 v/s = 0.08p CLVisc v/s = 0.08p Vish2 + 1 v/s = 0.08

Figure 13. (color online) Comparison between CLVisc (sym-bols points) and VISH2+1 (lines) results for momentum ec-centricity in Au+Au collisions at

√sNN = 200 GeV with the

optical Glauber initial condition at impact-parameter b = 7fm and with different values of shear viscosity to entropy den-sity ratio.

VI. HADRON SPECTRA AND ANISOTROPICFLOW

In this section, we compare CLVisc results for hadronspectra and anisotropic flow in heavy-ion collisions to ex-perimental data at both RHIC and LHC energies. We usethe Trento Monte Carlo model with the default option ofthe IP-Glasma approximator for fluctuating initial condi-tions in event-by-event hydrodynamic simulations. Sincethe public version of CLVisc uses Trento as the defaultinitial state configuration the results in this Section pro-vide a reference baseline for future users as well as forfurther calculations within CLVisc. The Trento MonteCarlo model assumes fluctuations in the transverse planewith a spatial-rapidity-dependent envelop in the longitu-dinal direction. Therefore, we switch to AMPT initialconditions for the later sections of this manuscript thatinclude also longitidunal initial dynamics. The centralityrange is determined by the event-by-event distributionsof the total entropy. Initial conditions with top 5% high-est total entropies are chosen as 0 − 5% collisions andso on. The partial chemical equilibrium EoS s95p-pce[82] is used in the hydrodynamic simulations. The othermodel parameters for Au+Au

√sNN = 200 GeV, Pb+Pb√

sNN = 2.76 TeV and√sNN = 5.02 TeV collisions are

listed in Tab. I,Where ηw and ση are used to parameterize the initial

state longitudinal profile using the following function

H(ηs) = exp

[− (ηs − ηw)2

2σ2η

θ(ηs − ηw)

](77)

system τ0 fm norm Tf MeV ηv/s ηw ση

Au+Au 200 GeV 0.6 57 100-137 0.15 1.3 1.5Pb+Pb 2760 GeV 0.6 128 100-137 0.15 2.0 1.8Pb+Pb 5020 GeV 0.6 151 100-137 0.15 2.2 1.8

Table I. Default parameters for event-by-event hydrodynam-ics using Trento initial conditions. The normalization is fittedto the hadron multiplicity in the central rapidity region in themost central heavy-ion collisions.

A. Au+Au at√sNN 200 GeV collisions

Shown in Figs. 14 and 15 are the pseudo-rapidity dis-tributions for charged hadrons and the transverse mo-mentum spectra for identified particles π+. We focus onpion transverse momentum spectra in this section sincefor pure relativistic hydrodynamic results without con-sidering hadronic after-burner, the transverse momentumspectra of kaon and proton are not expected to agree withexperimental data.

We use a constant ηv/s in the current CLVisc sim-ulations. It has been shown that the linear relation-ship between initial entropy and final charged multi-plicity breaks down in viscous hydrodynamics with atemperature-dependent ηv/s [23]. In future studies us-ing Bayesian analysis with temperature-dependent ηv/s,the centrality classes should be defined by the final statemultiplicities after hydrodynamic evolution.

Notice that the pseudo-rapidity distributions forcharged hadrons does not change much, if the freeze-outtemperature Tfrz changes from 137 MeV to 100 MeV inCLVisc with partial chemical equilibrium EoS, and thesame group of τ0, normalization factor and ηv/s. How-ever, the slope of the pion transverse momentum spectrabecomes slightly steeper and describes low pT experimen-tal data better with Tfrz = 100 MeV than Tfrz = 137MeV. At the same time, the pT differential anisotropicflow increases approximately 10% when Tfrz is decreasedfrom 137 MeV to 100 MeV which agrees with the obser-vation in [114]. In order to get the best global fit to manydifferent observables, a Bayesian analysis [51–53] has tobe employed to explore the huge parameter space. Mini-jets and their thermalization will also play a role in thetransverse momentum spectra at high pT > 2 GeV/c.

B. Pb+Pb at√sNN = 2760 GeV collisions

Shown in Fig. 16 are pseudo-rapidity distributions forcharged hadrons in Pb+Pb collisions at

√sNN = 2.76

TeV for 4 different centralities – 0−5%, 5−10%, 10−20%and 20 − 30%. The centrality dependence of the event-averaged charged multiplicity is determined by event-by-event distributions of initial total entropy. A freeze-outtemperature of Tfrz = 100 MeV is used in the CLViscsimulations. Nice agreement with experimental data on

13

8 6 4 2 0 2 4 6 80

200

400

600

800

1000

dNch

/d

0-6

6-15

15-25

25-35

Au + Au sNN = 200 GeV

CLVisc, Tfrz = 100 MeVCLVisc, Tfrz = 137 MeVPHOBOS

Figure 14. (color online) Pseudo-rapidity distribution forcharged hadrons in Au+Au collisions at

√sNN = 200 GeV

with centrality range 0−6%, 6−15%, 15−25% and 25−35%,from CLVisc with freeze-out temperature 100 MeV (solid-lines) and 137 MeV (dashed lines) as compared with RHICexperimental data by PHOBOS collaboration [115].

0.0 0.5 1.0 1.5 2.0 2.5 3.0pT [GeV]

105

104

103

102

101

100

101

102

103

(1/2

)d2 N/

dYp T

dpT

[GeV

]2

0-5

× 1. 010-15

× 0. 220-30

× 0. 0430-40

× 0. 008

Au + Au sNN = 200 GeV, for +

CLVisc, Tfrz = 100 MeVCLVisc, Tfrz = 137 MeVPHENIX

Figure 15. (color online) Invariant yield of π+ in Au+Aucollisions at

√sNN = 200 GeV with centrality range 0− 5%,

10− 15%, 20− 30% and 30− 40%, from CLVisc with freeze-out temperature 100 MeV (solid-lines) and 137 MeV (dashedlines) as compared with RHIC experimental data by PHENIXcollaboration.

the pseudo-rapidity distribution of charged particles isfound over a wide range of centralities.

Shown in Fig. 17, is the transverse momentum spectrafor charged pions, in 6 different centralities of collisions,which agree with experimental data well. The hydrody-namic simulations always underestimate low pT pions ascompared to the experimental data at LHC. This prob-lem is not solved up to date, but may be partially ex-plained by the missing finite widths of resonances [117]

10 5 0 5 100

500

1000

1500

2000

dNch

/d

0-5

5-10

10-20

20-30

Pb + Pb sNN = 2. 76 TeV

CLViscALICE

Figure 16. (color online) Pseudo-rapidity distribution forcharged hadrons in Pb+Pb collisions at

√sNN = 2.76 TeV

with centrality range 0−5%, 5−10%, 10−20% and 20−30%,from CLVisc (solid-lines) and LHC experimental data by AL-ICE collaboration [115].

0.0 0.5 1.0 1.5 2.0 2.5 3.0pT [GeV]

107

106

105

104

103

102

101

100

101

102

103

104

(1/2

)d2 N/

dYp T

dpT

[GeV

]2 0 5

×5 05 10×5 110 20×5 220 40×5 340 60×5 460 80×5 5

Pb + Pb sNN = 2. 76 TeV, + +

CLViscALICE

Figure 17. (color online) pT spectra of charged pions forPb+Pb

√sNN = 2.76 TeV collisions at centrality range

0 − 5%, 5 − 10%, 10 − 20%, 20 − 40%, 40 − 60%, 60 − 80%,from CLVisc (solid-lines) and LHC experimental data by AL-ICE collaboration [116].

in the current hadronization modules.

C. Higher order harmonic flow in Pb+Pb at√sNN

= 2760 GeV collisions

CLVisc with Trento initial conditions and Tf = 137MeV can reproduce experimental data on v2, v3, v4

and v5 for charged pions for all available centralities asshown in Fig. 18. For pure relativistic hydrodynamic

14

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.02

0.04

0.06

0.08

0.10

0.12

0.14v n

for

+

v2

v3

v4

v5


0 5%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.02

0.04

0.06

0.08

0.10

0.12

0.14

v nfo

rK+

v2v3

v4

v5


0 5%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.02

0.04

0.06

0.08

0.10

0.12

v nfo

rpro

ton

v2v3

v4

v5


0 5%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

v nfo

r+

v2v3v4

v5


5 10%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

v nfo

rK+

v2v3

v4

v5


5 10%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

v nfo

rpro

ton

v2

v3

v4

v5


5 10%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

v nfo

r+

v2

v3

v4v5


10 20%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

v nfo

rK+ v2

v3

v4

v5


10 20%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

v nfo

rpro

ton v2

v3

v4

v5


10 20%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

0.25

0.30

v nfo

r+

v2

v3

v4v5


20 30%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

0.25

v nfo

rK+ v2

v3v4v5


20 30%CLViscALICE

0.0 0.5 1.0 1.5 2.0 2.5pT [GeV]

0.05

0.10

0.15

0.20

0.25

v nfo

rpro

ton v2

v3

v4v5


20 30%CLViscALICE

Figure 18. (color online) The centrality dependence of the anisotropic flows v2, v3, v4 and v5 from scalar-product method inPb+Pb collisions at

√sNN = 2.76 TeV with centrality ranges 0−5%, 5−10%, 10−20% and 20−30%, from CLVisc (solid-lines)

and LHC experimental data (markers) by ALICE collaboration [118].

simulations without hadronic after-burner, the vn’s fromCLVisc overshoot the experimental data by 5% for K+

and a large margin for protons. It has been shown thatthe pT differential elliptic flow of kaon and protons areboosted to higher pT in hydro-transport hybrid modelsby hadronic rescattering [114]. On the other hand, thepion vn(pT ) is not very sensitive to hadronic afterburnerand serves as a good measure of the QGP expansion. Theconsistency of freeze-out temperature best fitted to thetransverse momentum spectra (100 MeV) and transverse

momentum differential anisotropic flow (137 MeV) canalso be resolved by matching hydrodynamic models withhadronic transport evolution in the final stage which willcontribute to the further development of anisotropic flow.The range of freeze-out temperatures could also be usedas a prior for Bayesian analysis.

15

VII. THE PSEUDO-RAPIDITY DEPENDENCEOF ANISOTROPIC FLOW

To study the pseudo-rapidity dependence ofanisotropic flow v22 and v32 of charged hadrons inthis section and the longitudinal fluctuation and correla-tion in the next section, we need realistic and fluctuatinglongitudinal distributions of the initial entropy density.For this purpose, the AMPT model is employed togenerate event-by-event initial conditions that fluctuateboth in the transverse plane and along the longitudinaldirection. Notice that the vn2 in this section are givenby 2-particle cumulants method using sampled hadronswhile the vn(pT ) in the previous section are given byscalar product method using smooth particle spectra.

As shown in Fig. 19, v22 and v32 from CLVisc withηv/s = 0.16 in Pb+Pb collisions at

√sNN = 2.76 TeV

agree well with experimental data from the ALICE col-laboration [119] for most of the centralities. The ratiosbetween v22 and v32 are correctly reproduced formost central and semi-central collisions. The mean valueof the ratio v22/v32 increases as the system goesfrom most central to peripheral collisions. In most centralcollisions, both v22 and v32 from CLVisc+AMPTsimulations are larger than experimental data. For veryperipheral collisions (e.g. 50 − 60% centrality), the hy-drodynamic simulations still produce reasonable v22 asa function of pseudo-rapidity while the v32(η) is twotimes larger than the experimental data. For all cen-tralities, the vn2(η) decreases faster at large rapiditiesin the experimental data than that given by the rela-tivistic hydrodynamics with AMPT initial conditions. Itwas conjectured that temperature dependent ηv/s mayresolve this small overshoot of vn2 at large rapidities[120]. In earlier works the rapidity dependence was re-produced by including the hadronic rescattering in 3+1dimensional hydrodynamic calculations [69, 121]. To in-vestigate the sensitivity of the shape along rapidity, weshow a calculation with ηv/s = 0 that is scaled to matchthe v2(η) and see the same drop from middle to largerapidities. With the same scaling factor for v22 andv32 in ideal hydrodynamics, we see that the shape ofvn2(η) from CLVisc is not sensitive to ηv/s at all. Theratio v22/v32 is quite sensitive to ηv/s since shearviscosity suppresses higher order harmonics stronger thanlower order harmonics. As a result, the shape of thevn2(η) is only sensitive to the longitudinal distributionof initial entropy density but the ratios between differentharmonic flows are good observables to constrain ηv/s.

With constant ηv/s and energy density fluctuationsalong the space-time rapidity in CLVisc, the vn2(η)overshoots the experimental data at large rapidities.It is not yet clear whether the temperature dependentηv/s(T ) can fix the disagreement as suggested in [120]or if hadronic rescattering is necessary. Furthermore,the net baryon density should become significant in thelarge rapidity region, especially in low beam energy col-lisions at RHIC. One in principle has to take into ac-

count baryon chemical potential dependence of the EoSin the forward rapidity region [122] in order to describethe pseudo-rapidity dependence of vn2.

VIII. LONGITUDINAL DECORRELATION OFANISOTROPIC FLOW

The decorrelation of anisotropic flow along the longitu-dinal direction has been computed in CLVisc with AMPTinitial conditions and ηv/s = 0 for the hydrodynamic evo-lution [78]. In the current work, we focus on the effectof the shear viscosity and the initial fluid velocity on thelongitudinal decorrelation observables.

The longitudinal decorrelation observable rn(ηa, ηb),which does not only capture the twist of event planesbut also the anisotropic flow fluctuations along the lon-gitudinal direction, is defined as [42],

rn(ηa, ηb) =〈 ~Qn(−ηa) ~Q∗n(ηb)〉〈 ~Qn(ηa) ~Q∗n(ηb)〉

(78)

where ηa and −ηa are 16 pseudo-rapidity windows eachwith size ∆η = 0.3 uniformly distributed in the range[−2.4, 2.4] and ηb are reference pseudo-rapidity windowsto remove the effect of short range non-flow correlations,with the first reference window ηb ∈ (3, 4) denoted as“ref1” and the second ηb ∈ (4.4, 5.0) denoted as “ref2”.The anisotropic flows and their orientation angles in agiven pseudo-rapidity window are quantified by ~Qn,

~Qn ≡ QneinΦn =1

N

N∑j=1

einφj =

´einφj dN

dηdpT dφdpT dφ´

dNdηdpT dφ

dpT dφ,

(79)where φj = arctan pyj/pxj is the azimuthal angle ofthe jth particle in momentum space. The smooth par-ticle spectra are integrated over the azimuthal angleφ ∈ [0, 2π) and the corresponding transverse momentumpT ranges. Following the CMS experimental setup [42],the pT range is [0.3, 3.0] GeV/c for particles in ηa and is[0.0,∞) for particles in ηb. Since the Pb+Pb collisionsare symmetric along the beam direction, by definitionrn(ηa, ηb) should equal rn(−ηa,−ηb). Following the sug-gestion through private communication with CMS col-laboration, we use

√rn(ηa, ηb)rn(−ηa,−ηb) to improve

statistics. Let us note here once again, that the highlyefficient GPU parallelized algorithm is crucial to obtainreliable results for correlation observables within reason-able computing time.

We study the effect of the shear viscosity and the initialfluid velocity on rn(ηa, ηb) by comparing the results fromCLVisc with ηv/s = 0.0 and ηv/s = 0.16, starting fromAMPT initial conditions with the initial state fluid veloc-ity switched on and off. Notice that in the comparison,parameters for ideal hydrodynamics are kept unchangedas given in the previous paper except that the freeze-outtemperature is changed from 137 MeV to 100 MeV. Inthe viscous hydrodynamics simulation, the initial scaling

16

6 4 2 0 2 4 60.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040v n

2

v22

v32

(A) 0 5%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 83ALICE

6 4 2 0 2 4 60.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

v n2

v22

v32

(B) 10 20%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 83ALICE

6 4 2 0 2 4 60.00

0.02

0.04

0.06

0.08

0.10

v n2

v22

v32

(C) 20 30%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 82ALICE

6 4 2 0 2 4 60.00

0.02

0.04

0.06

0.08

0.10

0.12

v n2

v22

v32

(D) 30 40%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 80ALICE

6 4 2 0 2 4 60.00

0.02

0.04

0.06

0.08

0.10

v n2

v22

v32

(E) 40 50%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 80ALICE

6 4 2 0 2 4 60.00

0.02

0.04

0.06

0.08

0.10

0.12

v n2

v22

v32

(F) 50 60%CLVisc /s = 0. 16(CLVisc /s = 0) × 0. 80ALICE

Figure 19. The pseudo-rapidity dependence of elliptic flow and triangular flow, for Pb+Pb√sNN = 2.76 TeV collisions with

centrality range 0-5, 10-20, 20-30, 30-40, 40-50 and 50-60, from (3+1)D viscous hydrodynamic simulations starting from AMPTinitial conditions without initial fluid velocity and evolve with ηv/s = 0.16 as compared with LHC measurements from ALICEcollaboration [119].

0.8

0.9

1.0

r 2(

a ,b )

(1a) 0-5CMS ref1CMS ref2

(1b) 5-10 (1c) 10-20

1 2a

0.8

0.9

1.0

r 2(

a ,b )

(1d) 20-30CLVisc ref1 /s = 0.0CLVisc ref2 /s = 0.0

1 2a

(1e) 30-40CLVisc ref1 /s = 0.16CLVisc ref2 /s = 0.16

1 2a

(1f) 40-50

0.8

0.9

1.0

r 3(

a ,b )

(2a) 0-5 (2b) 5-10 (2c) 10-20

1 2a

0.8

0.9

1.0

r 3(

a ,b )

(2d) 20-30

1 2a

(2e) 30-40

1 2a

(2f) 40-50

Figure 20. (color online) The decorrelation of elliptic flow (1a)-(1f) and triangular flow (2a)-(2f) along the pseudo-rapiditydirection, for Pb+Pb

√sNN = 2.76 TeV collisions with centrality range 0-5, 5-10, 10-20, 20-30, 30-40 and 40-50, from (3+1)D

viscous hydrodynamic simulations starting from AMPT initial conditions without the initial fluid velocity (ηv/s = 0 for redlines and ηv/s = 0.16 for blue circles and stars) as compared with LHC measurements at CMS (black squares). The “ref1”denotes 3.0 < ηb < 4.0 while “ref2” denotes 4.4 < ηb < 5.0.

factor is changed to K = 1.2 to take into account theextra entropy production due to finite shear viscosity inorder to fit the charged multiplicity for 0 − 5% centralPb+Pb collisions at

√sNN = 2.76 TeV.

Shown in Fig. 20 are the decorrelation functions of el-liptic flow (1a-1f) and triangular flow (2a-2f) from CLViscwith AMPT initial conditions and initial fluid velocityswitched off as compared with CMS experimental data

[42] at the LHC. Both the decorrelations of elliptic flowand triangular flow agree with experimental data to areasonable level. Two different values of ηv/s used inCLVisc produce very similar longitudinal decorrelations.This indicates that the decorrelation observable is notsensitive to the value of ηv/s used for the hydrodynamicevolution if there is no initial flow. For r2(ηa, ηb), thehydrodynamic results do not show difference for two dif-

17

0.8

0.9

1.0r 2

(a ,

b )

(1a) 0-5CMS ref1CMS ref2

(1b) 5-10 (1c) 10-20

1 2a

0.8

0.9

1.0

r 2(

a ,b )

(1d) 20-30CLVisc ref1 /s = 0.0CLVisc ref2 /s = 0.0

1 2a

(1e) 30-40CLVisc ref1 /s = 0.16CLVisc ref2 /s = 0.16

1 2a

(1f) 40-50

0.8

0.9

1.0

r 3(

a ,b )

(2a) 0-5 (2b) 5-10 (2c) 10-20

1 2a

0.8

0.9

1.0

r 3(

a ,b )

(2d) 20-30

1 2a

(2e) 30-40

1 2a

(2f) 40-50

Figure 21. (color online) The decorrelation of elliptic flow (1a)-(1f) and triangular flow (2a)-(2f) along the pseudo-rapiditydirection, for Pb+Pb

√sNN = 2.76 TeV collisions with centrality range 0-5, 5-10, 10-20, 20-30, 30-40 and 40-50, from (3+1)D

viscous hydrodynamic simulations starting from AMPT initial conditions with the initial fluid velocity (ηv/s = 0 for red linesand ηv/s = 0.16 for blue circles and stars) as compared with LHC measurements at CMS (black squares). The “ref1” denotes3.0 < ηb < 4.0 while “ref2” denotes 4.4 < ηb < 5.0.

ferent ηb reference windows. For r3(ηa, ηb), there is avery small splitting between two different ηb referencewindows. It is suggested that the non-flow short-rangecorrelations in the denominator between particles in thewindow [ηa − 0.15, ηa + 0.15] and the first reference win-dow 3 < ηb < 4 depress the value of rn(ηa, ηb). Thisis consistent with the negligible splitting from CLViscwith the zero-flow initial condition, since no near-sideshort-range correlations from jets are considered in thesimulations.

The agreement between r2(ηa, ηb) and experimentaldata for all centralities are as good as our previously pub-lished results using ideal hydrodynamics with Tf = 137MeV [78]. Moreover, the r3(ηa, ηb) with Tf = 100 MeVincreases slightly as compared with Tf = 137 MeV.

With a finite ratio of shear viscosity over entropy den-sity ηv/s = 0.16, r2 from CLVisc simulations fits the CMSdata better, if the second reference window ηb ∈ [4.4, 5.0)is chosen. For rn(ηa, ηb) computed with the first refer-ence ηb window, the shear viscosity decreases the decor-relation of elliptic flow slightly for zero-flow initial con-dition but strongly when initial fluid velocity is includedin the initial condition. For rn(ηa, ηb) computed withthe second reference ηb window, the effect of the shearviscosity is very small. When there are longitudinal fluc-tuations, the non-Bjorken longitudinal expansion due topressure gradients along the space-time rapidity is strong.In ideal hydrodynamics, this longitudinal expansion de-creases elliptic flow [64]. However, in viscous hydro-dynamics, the shear viscosity speed up the expansionalong the transverse direction and slow down the expan-sion along the longitudinal (space-time rapidity) direc-tion. The anisotropic flow in viscous hydrodynamics withboth transverse and longitudinal fluctuations are there-fore affected by the entanglement between the acceleratedtransverse expansion and the decelerated longitudinal ex-

pansion.When the initial fluid velocity computed from T τµ

is included in the initial condition, the short range“non-flow” correlations from mini-jets become strongerin ideal hydrodynamics. The short range correlations inthe denominator between particles in the window [ηa −0.15, ηa+ 0.15] and the first reference window 3 < ηb < 4suppress the value of rn(ηa, ηb). This is clearly seen inFig. 21 as the red-dashed line for rn(ηa, ηb = ref2) is al-ways above the red-solid line for rn(ηa, ηb = ref1) fromideal hydrodynamic simulations. For viscous hydrody-namics with initial fluid velocity, the splitting betweentwo ηb reference windows is much smaller than idealhydrodynamics. The comparisons between Fig. 20 andFig. 21 shows that the decorrelation strength togetherwith the splitting between two reference windows are sen-sitive to both the initial fluid velocity and shear viscos-ity. With shear viscosity constrained by other physicalobservables, the splitting between two reference windowsfor 0−5% and 5−10% central collisions might be a goodobservable to determine the initial fluid velocity.

IX. SUMMARY

We have developed a full (3+1)D viscous relativistichydrodynamic model CLVisc in which both the hydro-dynamic evolution with KT algorithm and Cooper-Fryeparticlization with integration on the freeze-out surfaceare parallelized on GPU using OpenCL. We achieved 60and 120 times performance increase for the space-timeevolution and Cooper-Frye particlization, respectively,relative to the performance of the code on a single coreCPU. Such increased performance makes many event-by-event studies of high-energy-heavy-ion collisions, such asthe Coupled Linear Boltzmann Transport and hydrody-

18

namics (CoLBT-hydro) model [49] for jet propagationand medium response, possible. We have validated theCLVisc code with comparisons with several analytic solu-tions of ideal and viscous hydrodynamic equations suchas Riemann, Bjorken and Gubser solutions as well as nu-merical solutions from VISH2+1. We have also comparedresults from CLVisc using the Trento Monte Carlo initialconditions with experimental data on hadron spectra inheavy-ion collisions at both RHIC and LHC. We carriedout a novel study with CLVisc on the pseudo-rapiditydependence and decorrelation of anisotropic flows in thelongitudinal direction with initial conditions given by theAMPT model. We confirmed the observation that themagnitude and the relative ratio of anisotropic flows aresensitive to the shear viscosity to entropy density ratioηv/s. We also found that the decorrelation of anisotropicflow along the pseudo-rapidity and the splitting betweendifferent reference rapidity window are sensitive both tothe initial flow velocity and the shear viscosity to entropydensity ratio.

In the comparisons to the experimental data on flavordependence of the hadron spectra and anisotropic flows,CLVisc fails to describe the experimental data like allother pure hydrodynamic models. As illustrated by pre-vious studies [114, 123], it is imperative to include non-equilibrium dynamics of hadronic scattering after thehadronization. CLVisc with the option of Monte Carlosampling for Cooper-Frye particlization is well suited towork together with a hadronic transport model to ac-count for this dynamic process. This will be investigatedin the near future.

ACKNOWLEDGMENTS

We thank Derek Teaney for helpful discussions on howto estimate the derivatives before each time step. Thiswork was supported in part by the National ScienceFoundation of China under grant No. 11521064 (L.-G.P. and X.-N.W), National Science Foundation (NSF)within the framework of the JETSCAPE collabora-tion, under grant number ACI-1550228 (L.-G.P. and X.-N.W.), the Director, Office of Energy Research, Office ofHigh Energy and Nuclear Physics, Division of NuclearPhysics, of the U.S. Department of Energy under Con-tract Nos. DE-AC02-05CH11231 (X.N.W.), funding of aHelmholtz Young Investigator Group VH-NG-822 fromthe Helmholtz Association and GSI and the HelmholtzInternational Center for the Facility for Antiproton andIon Research (HIC for FAIR) within the framework ofthe Landes-Offensive zur Entwicklung Wissenschaftlich-Oekonomischer Exzellenz (LOEWE) program launchedby the State of Hesse (L.-G.P and H.P.). Computationalresources have been provided by the GSI green cube andthe GPU workstations at Central China Normal Univer-sity.

APPENDIX

A. GPU architecture and the parallelization of theKT algorithm

Parallelization and optimization of relativistic hydro-dynamic program on GPUs require expertise. In this sec-tion we provide many technical details that are criticalto GPU parallelization. Shown in Fig. 22 is one cartoondiagram of the GPU architecture. The smallest compo-nent of the GPU is the processing element (PE) which iscomprised of a worker (the ant) that owns a very smallpiece of private memory (the dish). The accessing latencyfor the processing element to read data from the privatememory is very low. However, usually the private mem-ory is so small that it is impossible to store a big amountof data in private memory for processing at the sametime. If more private memory is used than provided, theprocessing element will store data in global memory andread from there in each access. This is not good practice,since there is a long distance between the global mem-ory (food source in the out environment) and the privatememory (the dish of the ant). As a result, reading datadirectly from global memory to private memory has alarge latency. The clever ants decided to construct onegranary (named as shared memory in CUDA and localmemory in OpenCL) to store food that is fetched fromout environment and will be shared by multiple ants. Thememory access from shared memory (the granary) to pri-vate memory (the dish) is more than 100 times fasterthan directly reading data from global memory (out en-vironment). Pre-fetching data from global memory toshared memory for frequent accessing usually speeds upthe program by a large margin. Although the privatememory and the shared memory have lower accessinglatency than global memory, their capacities and hori-zons are much smaller. The private memory (capacity= dozens of float numbers) can only be accessed by eachprocessing element, while the shared memory (capacity =32KB – 64KB) can be accessed by all the processing ele-ments in the same computing unit. As a comparison, theglobal memory (capacity = several GB) is large and canbe accessed by all the processing elements. If some datais shared by all the processing elements, a special regionof the global memory – “constant memory” can be usedto balance the horizon and accessing latency. Notice thatall memories are located on the GPUs and transferringdata from CPU memory to the global memory of GPUsalso take time. The good practice is to transfer data fromCPU memory to the GPU global memory and perform-ing all calculations before transferring back to CPUs foroutput.

In the 3D KT algorithm, the required data to updatethe source terms Sπ, SN , ST and SΠ at lattice (i, j, k) are4 components in (ε, vx, vy, vηs), 10 components in πµν , 2components in N and Π, on 13 lattice grids. As a result,at least 16× 13 = 208 float numbers are necessary to up-date one hydrodynamic cell. Without using shared mem-

19

Global Memory

Shared Memory

Private memory: fast Shared memory: slower Global memory: 100 times slower than shared memory

Memory access latency

CU0

CU1

CU3

CU4

CU5

CU6

CU2

: processing element

GPU Architecture

CU : computing unit

Figure 22. (color online) Cartoon diagram of the architectureof GPUs.

ory, there is too much redundant fetching from globalmemory to private memory, which slows down the calcu-lation. In the beginning, a 3D stencil was used to fetcha 3D block of data to shared memory, all the threadsin the same work group read data from shared memory.However, numerous halo cells are needed in each direc-tion in order to update the boundary cells in the localblock. In order to update one 7× 7× 7 block, one needs7 × 7 × 4 × 3 halo cells. The total shared memory usedfor the effective block and halo cells in this simple caseis 16 × 7 × 7 × (7 + 12) × 4/1024 = 56 KB, which al-ready exceeds the maximum shared memory provided bythe most advanced GPUs on the market (typical size ofshared memory is 32 KB). A trade off is to read halo cellsdirectly from global memory instead of storing them inshared memory, which reduces the shared memory usageto 20 KB. On the other hand, concurrent reading fromglobal memory is only possible along one dimension, de-pending on in which direction the data is stored contin-uously. The data in one 3D array can only be storedcontinuously in one direction, which makes concurrentreading impossible in the other 2 directions. For the 3Dstencil, it is possible to store each block of data (7, 7, 7)continuously in global memory, other than the common(x, y, z) order for the whole (nx,ny,nz) array. It is alsopossible to construct the halo cells for each block andstore them continuously in global memory for concurrentaccessing. One should keep in mind that constructinghalo cells for the 3D block is error-prone and asks formuch more global memory.

In the current version of CLVisc, the source terms aresplit into 3 directions. The 1D data along each directionis put in the shared memory as shown in Fig. 23. The to-tal shared memory used for one strip is N×16×4/1024 =32 KB for N = 512 lattices along the x direction. Eachhydrodynamic cell shares 5× 16 single precision floatingnumbers along the x direction and only 4 halo cells at

Figure 23. One strip of data stored in the shared memory for5-cell stencil in KT algorithm.

the boundary are needed.

B. Parallelization of the smooth particle spectracalculation

Since the integration kernel in Eq. (55) is indepen-dently calculated for different freeze-out hyper-surfaceelements before the summation, it is a perfect job to fitin GPU parallel computing. If the Cooper-Frye integra-tion is only needed once for all the hyper-surface, it canbe done efficiently using the two step parallel reductionalgorithm as shown in Fig. 24 from Nvidia and AMDSDK. In reality we need to do hyper-surface integration308×41×15×48 times, it is quite slow to load each hyper-surface element from global memory to private memoryso many times. In order to reduce the global memoryaccess, we share the hyper-surface elements in one workgroup for multiple (pid, Y, pT , φ) combinations. The com-puting time for 300 resonances is reduced from 8 hourson a single core CPU to 3 minutes on the modern GPUslike Nvidia K20 and AMD firepro S9150 for one typicalhydrodynamic event.

Figure 24. Parallel reduction used on GPU to compute thesummation of particle spectra from millions of freeze-outhyper-surface elements.

Shown in Fig. 24 is one demonstration of parallel re-duction. E.g., in order to sum all the numbers in onebig array, one first put the numbers in many groups, in

20

each working group the working items iteratively add thesecond half of the sub-array to the first half in parallel.After several iterations, the final result will be the valuein the first working item. Notice that the parallel reduc-tion has not only been used in CLVisc to compute thesummation of particle spectra from the huge amount offreeze-out hyper-surface cells, but has also been used tocompute the maximum energy density εmax in the fluidfield at each output time step. The εmax is used to stopthe time evolution of hydrodynamics when its value issmaller than the freeze-out energy density determinedby the freeze-out temperature. In order to find εmax inthe fluid field, one has to check Nx×Ny ×Nηs fluid cellsin the collision system with both transverse and longitu-dinal fluctuations. This can be done easily in python, ifthe energy density values of the whole fluid field stay inthe host memory (CPU memory). However, transferringthe values of a big 3D matrix from GPU to CPU at eachoutput time step is very time consuming. CLVisc usesparallel reduction to compute the maximum energy den-sity of the fluid field on the GPU side and transfer a scalarεmax back to the CPU side. In order to avoid the datatransfer between CPU and GPU memory, the freeze-outhyper-surface finding algorithm [64] is also implementedon GPU.

C. Profiling for the (3+1)D viscous fluid dynamicevolution

In order to solve 3D partial differential equation, weneed to update the values ofNcells = NX×NY×NZ cellsat each time step. Without parallel computing, thereis only one computing element that updates these cellsone after another. The modern GPUs have more thanNworkers = 2500 processing elements such that more than2500 cells can be updated simultaneously. In practice,the performance boost can not approach 2500 for severalreasons, (1) the computing power of each computing el-ement on GPU is not as strong as CPU (2) reading datafrom global memory of GPU to the private memory ofone computing element has big latency. The easiest op-timization on GPU is to put the data shared by a blockof processing elements on shared memory to reduce theglobal accessing latency. In the 5-stencil central schemeKT algorithm, the site information on each cell is shared5, 9 and 13 times by its neighbors in 1-D, 2-D and 3-Drespectively.

block size 8 16 32 64 128Ideal(s)-GPU 0.37 0.218 0.178 0.155 0.157Visc(s)-GPU 3.12 1.65 1.17 1.01 1.17Visc(s)-CPU 6.64 6.45 6.63 7.0 7.58

Table II. Computing time for one time step on various com-puting devices for several different block sizes.

The optimal block size – denotes the number of pro-cessing elements assigned to process one workgroup ofcells, vary between different computing devices. Asshown in Table. II, we run (3+1)D viscous hydrodynam-ics with number of cells Ncell = 385× 385× 115 for 1600time steps. Shown in the table are the mean time forone-step update on GPU AMD S9150 (2496 processingelements) and server CPU Intel Xeon 2650v2 (10 cores,20 threads). The computing time for one-step updatechanges for different block sizes. For GPU AMD S9150,the optimal block size for this task is 64 while for the CPUIntel Xeon 2650v2, the optimal block size is 16. Runningon GPU is about 6 times faster than running on a 10 coresCPU with the same program. The (3+1)D ideal hydro-dynamics with the same parallelization is about 6.5 timesfaster than the viscous version.

The performance can be further improved using deeperoptimizations. In the 1D-KT algorithm together with di-mension splitting, each lattice point needs to be loaded3 times. This is a trade off between implementation dif-ficulty and efficiency. However, it is already much betterthan independent fetching from global memory where thedata on each lattice point are reloaded 13 times.Concurrent reading from global memoryIt is shown that the 1D KT algorithm is much faster

along ηs direction than along x and y direction for Nx =Ny = Nηs = 256 grids. The ratio of computing timealong these three axis is tx : ty : tηs = 38 : 28 : 1. Thereis the concurrent reading problem, since the data is onlystored continuously in one direction. Transposing thematrix in each time step is suggested by [124] to increasethe concurrent reading. Another way is to use the native3D image buffer, which provides a different storing orderand constant extrapolation for boundary cells. We didnot choose image buffer because it is read only or writeonly in one kernel in OpenCL version earlier than 2.0,and it does not support double precision.Warp divergence Threads in the same workgroup are

executed in warps of 32 or 64, with all the threads inone warp execute the same instruction at the same time.If there is if/else branching for two threads in the samewarp, all the threads in the same warp will execute theinstruction under both of the two branches. This is calledwarp divergence. The root finding algorithm on each lat-tice cell needs different number of iterations to achievethe required precision, which will bring serious warp di-vergence. This should be kept in mind, but currentlythere is no way to tackle this problem.Bank conflict On each computing unit there is one

piece of shared memory whose size is around 32KB −48KB. Each work group occupy one piece of sharedmemory, the data in this piece of shared memory arestored in 32 banks with each bank holds many 32 bitsdata. For example if we have one floats (32 bits) ar-ray A whose length is 500, the first bank will storeA[0], A[32], . . . , A[32 ∗ n] and the second bank will bankwill store A[1], A[33], . . . , A[32∗n+1]. If multiple threadsin the same warp read the same 32 bits data from one

21

bank, the data will be read only once and broadcast toall the requested threads, there is no bank conflict in thiscase. However, if n threads in the same warp read n dif-ferent 32 bits data from the same bank, the operation isserialized and the program is slowed down, this is calledn-way bank conflict. Bank conflict is also one way to slowdown the program if the data is poorly structured. Formore details of GPU parallel computing, one can refer to[124–126].

D. Momentum sampling from Fermi-Dirac andBose-Einstein distributions

On the freeze-out hyper-surface, the baryons obeyFermi-Dirac distribution and mesons obey Bose-Einsteindistribution. One needs to sample the momentum mag-nitude from these two distribution functions. The moststraight forward method is native rejection sampling,which is not encouraged here due to too many rejec-tions at large momentum when the probability is small.We introduce Scott Pratt’s method and Adaptive Rejec-tion Sampling (ARS) which are much faster to tackle thisproblem.Scott Pratt’s method There is a math trick to sam-

ple momenta from Juttner distribution function f(p) =

p2 exp(−√p2 +m2/T ). The Fermion-Dirac distribution

function can be approximated by Juttner distributionsince exp(m/T ) 1 even for the lightest baryon (e.g.proton with mass mp = 0.938 GeV and freeze-out tem-perature T ∼ 0.2 GeV gives out exp(m/T ) ≈ 90 1).

The Bose-Einstein distribution can be approximatedusing geometric sequence expansion with high precision,

f(p) =p2

eE/T − 1= p2e−E/T

1

1− e−E/T

= p2(e−E/T + e−2E/T + e−3E/T + e−4E/T + ...

),

where E =√p2 +m2 is the energy of one particle in

the co-moving frame of fluid. The problem is simplifiedto sampling from several Juttner distribution functionswith effective freeze-out temperatures T , T/2, T/3, T/4....

For massless particles whose distribution functionsread f(p) = p2e−p/T , one uses the math trick: for prob-ability distribution xn−1e−x, one can draw x by takingthe natural log of n random numbers x = − ln(r1r2...rn)with ri uniformly distributed between zero and one. Itis easy to draw the momentum magnitude, polar and az-imuthal angles in 3-dimensions, from Juttner distributionfunction,

p = −T ln(r1r2r3),

cos θ =ln(r1)− ln(r2)

ln(r1) + ln(r2),

φ =2π [ln(r1r2)]

2

[ln(r1r2r3)]2 .

By checking the Jacobian, indeed,

dpd cos θdφ = |J | dr1dr2dr3

=8πT

r1r2r3 [ln(r1r2r3)]2 dr1dr2dr3

=8πT

e−p/T p2/T 2dr1dr2dr3,

and dr1dr2dr3 = 18πT 3 p

2e−p/T dpd cos θdφ.For massive hadrons,

p2e−(E−µ)/T = p2e−p/T e(p−E+µ)/T .

One first draws p from p2e−p/T , then accept or rejectwith weight function ω(p) = e(p−E)/T = e(p−

√p2+m2)/T .

For heavy hadrons ω(p) 1, too many rejections slowsdown the sampling. Scott Pratt introduces a numericaltrick,

p =√E2 −m2, dp = E/pdE (80)

dpp2e−E/T = dEE

pp2e−E/T (81)

= dEpEe−E/T (82)

= dkp

E(k +m)2e−k/T e−m/T (83)

= dk(k +m)2e−k/Tω(p) (84)

= dk(k2 + 2mk +m2)e−k/Tω(p) (85)

where k = E −m and ω(p) = pE e−m/T are weight func-

tions that satisfy E −m > 0 and p/E < 1. The e−m/Tand e−µ/T terms are not important and can be discarded.The upper distribution is split into 3 parts and their dis-crete probabilities are determined by the k-integration,ˆ

dkk2e−k/T = 2T 3 (86)ˆdk2mke−k/T = 2mT 2 (87)ˆdkm2e−k/T = m2T (88)

Using this method, the sampled k is accepted with veryhigh probability p/E.Adaptive Rejection Sampling (ARS) can not only be

used to sample the Juttner, Fermion-Dirac and Boson-Einstein distribution, but also Woods-Saxon distribu-tion and any distribution functions that are log-concave(h′′(x) < 0 for any x where h(x) = log f(x)). ARS isvery useful in nuclear physics and high energy physics.The philosophy of ARS is to generate a piecewise expo-nential upper bound q(x) for f(x) and refine this boundwith rejected points. Notice that q(x) ∝ exp(g(x)) isconstructed from g(x) which is the piecewise linear up-per bound of log f(x) – whose existence requires the log-concave property. The ordered change points are z0 <z1 < z2... < zn and g(x) has slope mi in (zi−1, zi). Thearea under each piece of exponential segment exp(g(xi))is,

22

Ai =

ˆ zi

zi−1

eg(x)dx =1

mi

(eg(zi) − eg(zi−1)

)First sample j from discrete_distribution(Ai), thensampling x ∈ (zj−1,zj) from distribution function q(x) =exp(a+mix). By inversely sampling uniform distributionr ∈ [0, 1] from the cumulative probability

Q(x) =

ˆ x

zi−1

q(y)dy =q(x)− q(zi−1)

q(zi)− q(zi−1)= r

we get x from the exponential distribution,

x =1

miln (remizi + (1− r)emizi−1)

With this x we can do rejection test: ran() < f(x)q(x) =

exp(h(x)− g(x)). If a point is rejected, it will be used torefine the upper bound which will make the upper boundcloser to f(x). In squeezing test step, lower bound isalso needed which we call l(x). Squeezing test is trueif ran() < l(x)

q(x) . The ARS method can be extended toarbitrary distributions by isolating the distribution func-tion into concave and convex parts with different upperbounds.

E. Code structure

This section describes the software aspect of the GPUparallelization and the code structure of CLVisc. Pro-gramming on GPUs usually uses two levels of language,one for the host side to read configurations, query de-vices, dispatch jobs to different computing devices andtransferring data between host and devices, the other ison the device side to do the real calculation using CUDAor OpenCL. The CLVisc is comprised of several mod-ules with two modules that provide examples for bothPython–OpenCL and C++–OpenCL combinations.

– The relativistic hydrodynamic module which solvesthe partial differential equations and finds thefreeze-out hyper-surface uses Python for the hostside and OpenCL for the device side.

– The smooth particle spectra calculation and reso-nance decay program use C++ for the host sideand OpenCL for the device side.

– Sampling hadrons from freeze-out hyper-surfaceand forcing resonance decay uses C++.

In CLVisc, the computing kernels are written inOpenCL and the host side for fluid dynamics is inPython. Employing python as the host side languagefor the main CLVisc program has several benefits. Com-paring the host side language in C++ (used in smoothspectra calculation) and that is given in python by Py-OpenCL, we found that the python version is much morecompact and easier to program. The built-in modulesargparse, logging, unittest together with PyOpenCLmake the host side programming in Python a much bet-ter experience than using C++. The kernels written inOpenCL can be directly used in a program whose hostside language is C++ without any changes. It is alsomuch easier to connect to the later data analysis usingnumpy, scipy, pandas and matplotlib. All the popularmodern machine learning and deep learning libraries usePython as their user interface, which can also be easilyconnected to the CLVisc output.

F. Code Availibility

The CLVisc code is publicly available from https://gitlab.com/snowhitiger/PyVisc. In the package,there are example codes to run event-by-event hydrody-namics with optical Glauber, Trento initial condition orAMPT initial conditions.

[1] L. Adamczyk et al. Global Λ hyperon polarization innuclear collisions: evidence for the most vortical fluid.Nature, 548:62–65, 2017.

[2] Paul Romatschke and Ulrike Romatschke. ViscosityInformation from Relativistic Nuclear Collisions: HowPerfect is the Fluid Observed at RHIC? Phys. Rev.Lett., 99:172301, 2007.

[3] Huichao Song and Ulrich W. Heinz. Suppression of el-liptic flow in a minimally viscous quark-gluon plasma.Phys. Lett., B658:279–283, 2008.

[4] Huichao Song, Steffen A. Bass, Ulrich Heinz, TetsufumiHirano, and Chun Shen. 200 A GeV Au+Au colli-sions serve a nearly perfect quark-gluon liquid. Phys.

Rev. Lett., 106:192301, 2011. [Erratum: Phys. Rev.Lett.109,139904(2012)].

[5] Dmitri E. Kharzeev, Larry D. McLerran, and Harmen J.Warringa. The Effects of topological charge change inheavy ion collisions: ’Event by event P and CP viola-tion’. Nucl. Phys., A803:227–253, 2008.

[6] Charles Gale, Sangyong Jeon, and Bjoern Schenke. Hy-drodynamic Modeling of Heavy-Ion Collisions. Int. J.Mod. Phys., A28:1340011, 2013.

[7] Etele Molnar, Hannu Holopainen, Pasi Huovinen, andHarri Niemi. Influence of temperature-dependent shearviscosity on elliptic flow at backward and forward ra-pidities in ultrarelativistic heavy-ion collisions. Phys.

https://gitlab.com/snowhitiger/PyVisc

https://gitlab.com/snowhitiger/PyVisc

23

Rev., C90(4):044904, 2014.[8] Hannah Petersen. Anisotropic flow in transport

+ hydrodynamics hybrid approaches. J. Phys.,G41(12):124005, 2014.

[9] B. Alver et al. Importance of correlations and fluctua-tions on the initial source eccentricity in high-energynucleus-nucleus collisions. Phys. Rev., C77:014906,2008.

[10] B. Alver and G. Roland. Collision geometry fluc-tuations and triangular flow in heavy-ion collisions.Phys. Rev., C81:054905, 2010. [Erratum: Phys.Rev.C82,039903(2010)].

[11] Derek Teaney and Li Yan. Triangularity and DipoleAsymmetry in Heavy Ion Collisions. Phys. Rev.,C83:064904, 2011.

[12] Bjorn Schenke, Sangyong Jeon, and Charles Gale. Ellip-tic and triangular flow in event-by-event (3+1)D viscoushydrodynamics. Phys. Rev. Lett., 106:042301, 2011.

[13] Zhi Qiu and Ulrich W. Heinz. Event-by-event shapeand flow fluctuations of relativistic heavy-ion collisionfireballs. Phys. Rev., C84:024911, 2011.

[14] Bjoern Schenke, Prithwish Tribedy, and Raju Venu-gopalan. Fluctuating Glasma initial conditions and flowin heavy ion collisions. Phys. Rev. Lett., 108:252301,2012.

[15] Hannu Holopainen, Harri Niemi, and Kari J. Eskola.Event-by-event hydrodynamics and elliptic flow fromfluctuating initial state. Phys. Rev., C83:034901, 2011.

[16] Guang-You Qin, Hannah Petersen, Steffen A. Bass, andBerndt Muller. Translation of collision geometry fluctu-ations into momentum anisotropies in relativistic heavy-ion collisions. Phys. Rev., C82:064903, 2010.

[17] Bjoern Schenke, Prithwish Tribedy, and Raju Venu-gopalan. Event-by-event gluon multiplicity, energy den-sity, and eccentricities in ultrarelativistic heavy-ion col-lisions. Phys. Rev., C86:034908, 2012.

[18] K. Werner, Iu. Karpenko, T. Pierog, M. Bleicher, andK. Mikhailov. Event-by-Event Simulation of the Three-Dimensional Hydrodynamic Evolution from Flux TubeInitial Conditions in Ultrarelativistic Heavy Ion Colli-sions. Phys. Rev., C82:044904, 2010.

[19] Charles Gale, Sangyong Jeon, Björn Schenke, Prith-wish Tribedy, and Raju Venugopalan. Event-by-eventanisotropic flow in heavy-ion collisions from combinedYang-Mills and viscous fluid dynamics. Phys. Rev. Lett.,110(1):012302, 2013.

[20] Zhi Qiu and Ulrich Heinz. Hydrodynamic event-planecorrelations in Pb+Pb collisions at

√s = 2.76ATeV.

Phys. Lett., B717:261–265, 2012.[21] D. Teaney and L. Yan. Event-plane correlations and

hydrodynamic simulations of heavy ion collisions. Phys.Rev., C90(2):024902, 2014.

[22] Georges Aad et al. Measurement of event-plane correla-tions in

√sNN = 2.76 TeV lead-lead collisions with the

ATLAS detector. Phys. Rev., C90(2):024905, 2014.[23] H. Niemi, K. J. Eskola, and R. Paatelainen. Event-by-

event fluctuations in a perturbative QCD + saturation+ hydrodynamics model: Determining QCD mattershear viscosity in ultrarelativistic heavy-ion collisions.Phys. Rev., C93(2):024907, 2016.

[24] Jaroslav Adam et al. Correlated event-by-event fluctu-ations of flow harmonics in Pb-Pb collisions at √sNN =2.76 TeV. Phys. Rev. Lett., 117:182301, 2016.

[25] Jing Qian, Ulrich Heinz, Ronghua He, and Lei Huo. Dif-ferential flow correlations in relativistic heavy-ion colli-sions. Phys. Rev., C95(5):054908, 2017.

[26] Hannah Petersen, Vivek Bhattacharya, Steffen A. Bass,and Carsten Greiner. Longitudinal correlation of thetriangular flow event plane in a hybrid approach withhadron and parton cascade initial conditions. Phys.Rev., C84:054908, 2011.

[27] Yun Cheng, Yu-Liang Yan, Dai-Mei Zhou, Xu Cai, Ben-Hao Sa, and Laszlo P. Csernai. Longitudinal Fluctua-tions in Partonic and Hadronic Initial State. Phys. Rev.,C84:034911, 2011.

[28] Kai Xiao, Feng Liu, and Fuqiang Wang. Event-planedecorrelation over pseudorapidity and its effect on az-imuthal anisotropy measurements in relativistic heavy-ion collisions. Phys. Rev., C87(1):011901, 2013.

[29] Long-Gang Pang, Guang-You Qin, Victor Roy, Xin-Nian Wang, and Guo-Liang Ma. Longitudinal decor-relation of anisotropic flows in heavy-ion collisionsat the CERN Large Hadron Collider. Phys. Rev.,C91(4):044904, 2015.

[30] A. Adil, M. Gyulassy, and T. Hirano. 3D jet tomogra-phy of the twisted color glass condensate. Phys. Rev.,D73:074006, 2006.

[31] A. Adil and M. Gyulassy. 3D jet tomography of twistedstrongly coupled quark gluon plasmas. Phys. Rev.,C72:034907, 2005.

[32] Piotr Bozek, Wojciech Broniowski, and Joao Moreira.Torqued fireballs in relativistic heavy-ion collisions.Phys. Rev., C83:034911, 2011.

[33] Adrian Dumitru, Jamal Jalilian-Marian, Tuomas Lappi,Bjoern Schenke, and Raju Venugopalan. Renormaliza-tion group evolution of multi-gluon correlators in highenergy QCD. Phys. Lett., B706:219–224, 2011.

[34] N. Borghini, P. M. Dinh, and J. Y. Ollitrault. Analysisof directed flow from three particle correlations. Nucl.Phys., A715:629–632, 2003.

[35] Adam Bzdak and Derek Teaney. Longitudinal fluctua-tions of the fireball density in heavy-ion collisions. Phys.Rev., C87(2):024906, 2013.

[36] The ATLAS collaboration. Measurement of two-particlepseudorapidity correlations in lead-lead collisions at√sNN = 2.76 TeV with the ATLAS detector. 2015.

[37] Akihiko Monnai and Bjoern Schenke. Pseudorapiditycorrelations in heavy ion collisions from viscous fluiddynamics. Phys. Lett., B752:317–321, 2016.

[38] Piotr Bozek, Wojciech Broniowski, and Adam Ol-szewski. Two-particle correlations in pseudorapidity in ahydrodynamic model. Phys. Rev., C92(5):054913, 2015.

[39] Peng Huo, Jiangyong Jia, and Soumya Mohapatra. Elu-cidating the event-by-event flow fluctuations in heavy-ion collisions via the event shape selection technique.Phys. Rev., C90(2):024910, 2014.

[40] Jiangyong Jia and Peng Huo. Forward-backward ec-centricity and participant-plane angle fluctuations andtheir influences on longitudinal dynamics of collectiveflow. Phys. Rev., C90(3):034915, 2014.

[41] L. P. Csernai and H. Stöcker. Global collective flow inheavy ion reactions from the beginnings to the future.J. Phys., G41(12):124001, 2014.

[42] Vardan Khachatryan et al. Evidence for transversemomentum and pseudorapidity dependent event planefluctuations in PbPb and pPb collisions. Phys. Rev.,C92(3):034911, 2015.

24

[43] Piotr Bozek and Wojciech Broniowski. Longitudinaldecorrelation measures of flow magnitude and event-plane angles in ultra-relativistic nuclear collisions. 2017.

[44] Morad Aaboud et al. Measurement of longitudinal flowdecorrelations in Pb+Pb collisions at

√sNN = 2.76 and

5.02 TeV with the ATLAS detector. Eur. Phys. J.,C78(2):142, 2018.

[45] Zuo-Tang Liang and Xin-Nian Wang. Globally polar-ized quark-gluon plasma in non-central A+A collisions.Phys. Rev. Lett., 94:102301, 2005. [Erratum: Phys. Rev.Lett.96,039901(2006)].

[46] Fu-Ming Liu and Klaus Werner. Direct photons at lowtransverse momentum: A QGP signal in pp collisionsat LHC. Phys. Rev. Lett., 106:242301, 2011.

[47] Hao-jie Xu, Longgang Pang, and Qun Wang. Ellip-tic flow of thermal dileptons in event-by-event hydrody-namic simulation. Phys. Rev., C89(6):064902, 2014.

[48] Chun Shen, Ulrich W Heinz, Jean-Francois Paquet, andCharles Gale. Thermal photons as a quark-gluon plasmathermometer reexamined. Phys. Rev., C89(4):044910,2014.

[49] Wei Chen, Shanshan Cao, Tan Luo, Long-Gang Pang,and Xin-Nian Wang. Effects of jet-induced medium ex-citation in γ-hadron correlation in A+A collisions. Phys.Lett., B777:86–90, 2018.

[50] S. Cao et al. Multistage Monte-Carlo simulation ofjet modification in a static medium. Phys. Rev.,C96(2):024909, 2017.

[51] Scott Pratt, Evan Sangaline, Paul Sorensen, and HuiWang. Constraining the Eq. of State of Super-HadronicMatter from Heavy-Ion Collisions. Phys. Rev. Lett.,114:202301, 2015.

[52] Jonah E. Bernhard, Peter W. Marcy, Christopher E.Coleman-Smith, Snehalata Huzurbazar, Robert L.Wolpert, and Steffen A. Bass. Quantifying properties ofhot and dense QCD matter through systematic model-to-data comparison. Phys. Rev., C91(5):054910, 2015.

[53] Jonah E. Bernhard, J. Scott Moreland, Steffen A. Bass,Jia Liu, and Ulrich Heinz. Applying Bayesian parame-ter estimation to relativistic heavy-ion collisions: simul-taneous characterization of the initial state and quark-gluon plasma medium. Phys. Rev., C94(2):024907, 2016.

[54] Long-Gang Pang, Kai Zhou, Nan Su, Hannah Petersen,Horst Stöcker, and Xin-Nian Wang. An equation-of-state-meter of QCD transition from deep learning. 2016.

[55] Bjoern Schenke, Sangyong Jeon, and Charles Gale.(3+1)D hydrodynamic simulation of relativistic heavy-ion collisions. Phys. Rev., C82:014903, 2010.

[56] Jean-François Paquet, Chun Shen, Gabriel S. Denicol,Matthew Luzum, Björn Schenke, Sangyong Jeon, andCharles Gale. Production of photons in relativisticheavy-ion collisions. Phys. Rev., C93(4):044906, 2016.

[57] Jochen Gerhard, Volker Lindenstruth, and Marcus Ble-icher. Relativistic Hydrodynamics on Graphic Cards.Comput. Phys. Commun., 184:311–319, 2013.

[58] Long-Gang Pang, Yoshitaka Hatta, Xin-Nian Wang,and Bo-Wen Xiao. Analytical and numerical Gubser so-lutions of the second-order hydrodynamics. Phys. Rev.,D91(7):074027, 2015.

[59] Dennis Bazow, Ulrich W. Heinz, and Michael Strick-land. Massively parallel simulations of relativistic fluiddynamics on graphics processing units with CUDA.Comput. Phys. Commun., 225:92–113, 2018.

[60] Zi-Wei Lin, Che Ming Ko, Bao-An Li, Bin Zhang, andSubrata Pal. A Multi-phase transport model for rel-ativistic heavy ion collisions. Phys. Rev., C72:064901,2005.

[61] Rudolf Baier, Paul Romatschke, Dam Thanh Son, An-drei O. Starinets, and Mikhail A. Stephanov. Relativis-tic viscous hydrodynamics, conformal invariance, andholography. JHEP, 04:100, 2008.

[62] Huichao Song and Ulrich W. Heinz. Multiplicity scal-ing in ideal and viscous hydrodynamics. Phys. Rev.,C78:024902, 2008.

[63] Tetsufumi Hirano. Is early thermalization achieved onlynear mid-rapidity at RHIC? Phys. Rev., C65:011901,2002.

[64] Longgang Pang, Qun Wang, and Xin-Nian Wang. Ef-fects of initial flow velocity fluctuation in event-by-event(3+1)D hydrodynamics. Phys. Rev., C86:024911, 2012.

[65] Chun Shen, Zhi Qiu, Huichao Song, Jonah Bernhard,Steffen Bass, and Ulrich Heinz. The iEBE-VISHNUcode package for relativistic heavy-ion collisions. Com-put. Phys. Commun., 199:61–85, 2016.

[66] Iu. Karpenko, P. Huovinen, and M. Bleicher. A 3+1dimensional viscous hydrodynamic code for relativis-tic heavy ion collisions. Comput. Phys. Commun.,185:3016–3027, 2014.

[67] Alexander Kurganov and Eitan Tadmor. New high-resolution central schemes for nonlinear conservationlaws and convection-diffusion equations. J. Comp.Phys., (160):241–282, 2000.

[68] Michael L. Miller, Klaus Reygers, Stephen J. Sanders,and Peter Steinberg. Glauber modeling in high energynuclear collisions. Ann. Rev. Nucl. Part. Sci., 57:205–243, 2007.

[69] Tetsufumi Hirano, Ulrich W. Heinz, Dmitri Kharzeev,Roy Lacey, and Yasushi Nara. Hadronic dissipative ef-fects on elliptic flow in ultrarelativistic heavy-ion colli-sions. Phys. Lett., B636:299–304, 2006.

[70] Dmitri Kharzeev and Marzia Nardi. Hadron productionin nuclear collisions at RHIC and high density QCD.Phys. Lett., B507:121–128, 2001.

[71] Dmitri Kharzeev, Eugene Levin, and Marzia Nardi.QCD saturation and deuteron nucleus collisions.Nucl. Phys., A730:448–459, 2004. [Erratum: Nucl.Phys.A743,329(2004)].

[72] Tetsufumi Hirano and Yasushi Nara. Hydrodynamic af-terburner for the color glass condensate and the partonenergy loss. Nucl. Phys., A743:305–328, 2004.

[73] H. J. Drescher and Y. Nara. Effects of fluctuations onthe initial eccentricity from the Color Glass Condensatein heavy ion collisions. Phys. Rev., C75:034905, 2007.

[74] T. Lappi and R. Venugopalan. Universality of the sat-uration scale and the initial eccentricity in heavy ioncollisions. Phys. Rev., C74:054905, 2006.

[75] K. J. Eskola, K. Kajantie, P. V. Ruuskanen, and KimmoTuominen. Scaling of transverse energies and multiplic-ities with atomic number and energy in ultrarelativisticnuclear collisions. Nucl. Phys., B570:379–389, 2000.

[76] R. Paatelainen, K. J. Eskola, H. Holopainen, andK. Tuominen. Multiplicities and pT spectra in ultra-relativistic heavy ion collisions from a next-to-leadingorder improved perturbative QCD + saturation + hy-drodynamics model. Phys. Rev., C87(4):044904, 2013.

[77] Kari J. Eskola, Harri Niemi, Risto Paatelainen, andKimmo Tuominen. Latest results from the EbyE NLO

25

EKRT model. Nucl. Phys., A967:313–316, 2017.[78] Long-Gang Pang, Hannah Petersen, Guang-You Qin,

Victor Roy, and Xin-Nian Wang. Decorrelation ofanisotropic flow along the longitudinal direction. Eur.Phys. J., A52(4):97, 2016.

[79] Longgang Pang, Qun Wang, and Xin-Nian Wang. Relicsof Minijets amid Anisotropic Flows in High-energyHeavy-ion Collisions. Phys. Rev., C89(6):064910, 2014.

[80] Long-Gang Pang, Hannah Petersen, Qun Wang, andXin-Nian Wang. Vortical Fluid and Λ Spin Correlationsin High-Energy Heavy-Ion Collisions. Phys. Rev. Lett.,117(19):192301, 2016.

[81] Szabocls Borsanyi, Zoltan Fodor, Christian Hoelbling,Sandor D. Katz, Stefan Krieg, and Kalman K. Szabo.Full result for the QCD equation of state with 2+1 fla-vors. Phys. Lett., B730:99–104, 2014.

[82] Pasi Huovinen and Pter Petreczky. QCD Equationof State and Hadron Resonance Gas. Nucl. Phys.,A837:26–53, 2010.

[83] Josef Sollfrank, Pasi Huovinen, Markku Kataja, P. V.Ruuskanen, Madappa Prakash, and Raju Venugopalan.Hydrodynamical description of 200-A/GeV/c S + Aucollisions: Hadron and electromagnetic spectra. Phys.Rev., C55:392–410, 1997.

[84] G. Boyd, J. Engels, F. Karsch, E. Laermann, C. Leg-eland, M. Lutgemeier, and B. Petersson. Equation ofstate for the SU(3) gauge theory. Phys. Rev. Lett.,75:4169–4172, 1995.

[85] Sz. Borsanyi, G. Endrodi, Z. Fodor, S. D. Katz, andK. K. Szabo. Precision SU(3) lattice thermodynamicsfor a large temperature range. JHEP, 07:056, 2012.

[86] V. Vovchenko, Long-Gang Pang, H. Niemi, Iu. A.Karpenko, M. I. Gorenstein, L. M. Satarov, I. N.Mishustin, B. Kämpfer, and H. Stoecker. Hydro-dynamic modeling of a pure-glue initial scenario inhigh-energy hadron and heavy-ion collisions. PoS,BORMIO2016:039, 2016.

[87] Fred Cooper and Graham Frye. Comment on the SingleParticle Distribution in the Hydrodynamic and Statis-tical Thermodynamic Models of Multiparticle Produc-tion. Phys. Rev., D10:186, 1974.

[88] S. A. Bass et al. Microscopic models for ultrarelativisticheavy ion collisions. Prog. Part. Nucl. Phys., 41:255–369, 1998. [Prog. Part. Nucl. Phys.41,225(1998)].

[89] M. Bleicher et al. Relativistic hadron hadron colli-sions in the ultrarelativistic quantum molecular dynam-ics model. J. Phys., G25:1859–1896, 1999.

[90] Hannah Petersen, Jan Steinheimer, Gerhard Burau,Marcus Bleicher, and Horst Stocker. A Fully Inte-grated Transport Approach to Heavy Ion Reactionswith an Intermediate Hydrodynamic Stage. Phys. Rev.,C78:044901, 2008.

[91] Y. Nara, N. Otuka, A. Ohnishi, K. Niita, and S. Chiba.Study of relativistic nuclear collisions at AGS energiesfrom p + Be to Au + Au with hadronic cascade model.Phys. Rev., C61:024901, 2000.

[92] J. Weil et al. Particle production and equilibriumproperties within a new hadron transport approach forheavy-ion collisions. Phys. Rev., C94(5):054905, 2016.

[93] C. Schwarz, D. Oliinychenko, L. G. Pang, S. Ryu, andH. Petersen. Different realizations of Cooper–Frye sam-pling with conservation laws. J. Phys., G45(1):015001,2018.

[94] Yukinao Akamatsu, Shu-ichiro Inutsuka, Chiho Nonaka,and Makoto Takamoto. A new scheme of causal viscoushydrodynamics for relativistic heavy-ion collisions: ARiemann solver for quark-gluon plasma. J. Comput.Phys., 256:34–54, 2014.

[95] I. Bouras, E. Molnar, H. Niemi, Z. Xu, A. El, O. Fochler,C. Greiner, and D. H. Rischke. Relativistic shock wavesin viscous gluon matter. Phys. Rev. Lett., 103:032301,2009.

[96] I. Bouras, E. Molnar, H. Niemi, Z. Xu, A. El, O. Fochler,C. Greiner, and D. H. Rischke. Investigation of shockwaves in the relativistic Riemann problem: A Compar-ison of viscous fluid dynamics to kinetic theory. Phys.Rev., C82:024910, 2010.

[97] Dirk H. Rischke, Stefan Bernard, and Joachim A.Maruhn. Relativistic hydrodynamics for heavy ion col-lisions. 1. General aspects and expansion into vacuum.Nucl. Phys., A595:346–382, 1995.

[98] J. D. Bjorken. Highly Relativistic Nucleus-NucleusCollisions: The Central Rapidity Region. Phys. Rev.,D27:140–151, 1983.

[99] Steven S. Gubser. Symmetry constraints on generaliza-tions of Bjorken flow. Phys. Rev., D82:085027, 2010.

[100] Tamas S. Biro. Generating new solutions for relativis-tic transverse flow at the softest point. Phys. Lett.,B487:133–139, 2000.

[101] T. Csorgo, L. P. Csernai, Yogiro Hama, and T. Ko-dama. Simple solutions of relativistic hydrodynamics forsystems with ellipsoidal symmetry. Acta Phys. Hung.,A21:73–84, 2004.

[102] M. I. Nagy, T. Csorgo, and M. Csanad. Detailed de-scription of accelerating, simple solutions of relativisticperfect fluid hydrodynamics. Phys. Rev., C77:024908,2008.

[103] Maxim S. Borshch and Valery I. Zhdanov. Exact so-lutions of the equations of relativistic hydrodynam-ics representing potential flows. SIGMA, 3:116, 2007.[SIGMA3,116(2007)].

[104] Guillaume Beuf, Robi Peschanski, and Emmanuel N.Saridakis. Entropy flow of a perfect fluid in (1+1) hy-drodynamics. Phys. Rev., C78:064909, 2008.

[105] Shu Lin and Jinfeng Liao. On Analytic Solutionsof (1+3)D Relativistic Ideal Hydrodynamic Equations.Nucl. Phys., A837:195–209, 2010.

[106] Robi Peschanski and Emmanuel N. Saridakis. On anexact hydrodynamic solution for the elliptic flow. Phys.Rev., C80:024907, 2009.

[107] T. Csörgő and M. I. Nagy. New family of exact androtating solutions of fireball hydrodynamics. Phys. Rev.,C89(4):044901, 2014.

[108] Cheuk-Yin Wong, Abhisek Sen, Jochen Gerhard, Gior-gio Torrieri, and Kenneth Read. Analytical Solutionsof Landau (1+1)-Dimensional Hydrodynamics. Phys.Rev., C90(6):064907, 2014.

[109] Yoshitaka Hatta and Bo-Wen Xiao. Building up theelliptic flow: analytical insights. Phys. Lett., B736:180–185, 2014.

[110] Yoshitaka Hatta, Jorge Noronha, and Bo-Wen Xiao.A systematic study of exact solutions in second-orderconformal hydrodynamics. Phys. Rev., D89(11):114011,2014.

[111] Mate Csanad and Andras Szabo. Multipole solution ofhydrodynamics and higher order harmonics. Phys. Rev.,C90(5):054911, 2014.

26

[112] Yoshitaka Hatta, Bo-Wen Xiao, and Di-Lun Yang. Non-boost-invariant solution of relativistic hydrodynamics in1+3 dimensions. Phys. Rev., D93(1):016012, 2016.

[113] Pu Shi and Di-Lun Yang. Analytic Solutions of Trans-verse Magneto-hydrodynamics under Bjorken Expan-sion. EPJ Web Conf., 137:13021, 2017.

[114] Huichao Song, Steffen A. Bass, and Ulrich Heinz.Viscous QCD matter in a hybrid hydrody-namic+Boltzmann approach. Phys. Rev., C83:024912,2011.

[115] Ehab Abbas et al. Centrality dependence of the pseu-dorapidity density distribution for charged particles inPb-Pb collisions at

√sNN = 2.76 TeV. Phys. Lett.,

B726:610–622, 2013.[116] Jaroslav Adam et al. Centrality dependence of the nu-

clear modification factor of charged pions, kaons, andprotons in Pb-Pb collisions at

√sNN = 2.76 TeV. Phys.

Rev., C93(3):034913, 2016.[117] Pasi Huovinen, Pok Man Lo, Michał Marczenko, Kenji

Morita, Krzysztof Redlich, and Chihiro Sasaki. Effectsof rho-meson width on pion distributions in heavy-ioncollisions. Phys. Lett., B769:509–512, 2017.

[118] Jaroslav Adam et al. Higher harmonic flow coefficientsof identified hadrons in Pb-Pb collisions at

√sNN = 2.76

TeV. JHEP, 09:164, 2016.[119] Jaroslav Adam et al. Pseudorapidity dependence of the

anisotropic flow of charged particles in Pb-Pb collisions

at√sNN = 2.76 TeV. Phys. Lett., B762:376–388, 2016.

[120] Gabriel Denicol, Akihiko Monnai, and Bjoern Schenke.Moving forward to constrain the shear viscosity of QCDmatter. Phys. Rev. Lett., 116(21):212301, 2016.

[121] Chiho Nonaka and Steffen A. Bass. Space-time evo-lution of bulk QCD matter. Phys. Rev., C75:014902,2007.

[122] Joseph Kapusta and Ming Li. High baryon densitiesachievable in the fragmentation regions at RHIC andLHC. J. Phys. Conf. Ser., 779(1):012077, 2017.

[123] Sangwook Ryu, Sangyong Jeon, Charles Gale, BjoernSchenke, and Clint Young. MUSIC with the UrQMDAfterburner. Nucl. Phys., A904-905:389c–392c, 2013.

[124] Timothy G. Mattson James Fung Dan GinsburgAaftab Munshi, Benedict R Gaster. OpenCL Program-ming Guide. Addison-Wesley Professional, 2011.

[125] Matthew Scarpino. OpenCL in Action: How to Acceler-ate Graphics and Computations. Manning Publications,November 2011.

[126] John E. Stone, David Gohara, and Guochun Shi.OpenCL: A parallel programming standard for het-erogeneous computing systems. Computing in Science& Engineering, 12(3):66–73, May 2010.

hydrodynamics - arxivhydrodynamics long-gang pang 1 ;2 34, hannah petersen 5 6, xin-nian wang 1key...

Documents