![Page 1: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/1.jpg)
Impact of parallelismon HEP software
Impact of parallelismon HEP software
April 29th 2013
Ecole Polytechnique/LLR
Rene Brun
![Page 2: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/2.jpg)
Software Upgrades
All LHC experiments and groups like CERN/SFT are looking at all possible performance improvements or rethinking their software stack for the post LS2 years.
This effort is driven by the new hardware and also the analysis of the hot spots.
Work is going on in ROOT to support thread safety, parallel buffer merges and parallel Tree I/O.
In the GEANT world, several projects(eg G4MT) investigate multi-core, gpus or like solutions. In this talk I will review the progress with one of these projects.
2
![Page 3: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/3.jpg)
Hardware
R.Brun : Paralllelism and HEP software3
From a recent talk
by Intel
From a recent talk
by Intel
![Page 4: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/4.jpg)
If you trust Intel
R.Brun : Paralllelism and HEP software 4
![Page 5: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/5.jpg)
If you trust Intel 2
R.Brun : Paralllelism and HEP software 5
![Page 6: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/6.jpg)
Vendors race
R.Brun : Paralllelism and HEP software 6
parallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelism
parallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelismparallelism
![Page 7: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/7.jpg)
Parallelism: many failures
R.Brun : Paralllelism and HEP software 7
inmos
cray
cm2
We failed in vectorizing codes like GEANT3 in 1985-1987 on CRAY, Cyber205, ETA10, IBM3090 because our approach was wrong
Some successful attempts in online systems in 1983
We failed too on MPP systems like the Thinking Machines, Elxsi in 1991-1993 because our approach was wrong
Are we going to take a
wrong approach
again?
Are we going to take a
wrong approach
again?
![Page 8: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/8.jpg)
R.Brun : Paralllelism and HEP software 8
![Page 9: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/9.jpg)
Parallelism: key points
R.Brun : Paralllelism and HEP software9
Minimize the sequential/synchronization parts (Amdhal law): Very difficult
Run the same code (processes) on all cores to optimize the memory use (code and read-only data sharing)
Job-level is better than event-level parallelism for offline systems.
Use the good-old principle of data locality to minimize the cache misses.
Exploit the vector capabilities but be careful with the new/delete/gather/scatter problem
Reorganize your code to reduce tails
![Page 10: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/10.jpg)
Data Structures & parallelism
R.Brun : Paralllelism and HEP software 10
eventevent
vertices
tracks
C++ pointersspecific to a process
C++ pointersspecific to a process
Copying the structure implies a
relocation of all pointers
Copying the structure implies a
relocation of all pointers
I/O is a nightmare
I/O is a nightmare
Update of the structure from a different thread implies a
lock/mutex
Update of the structure from a different thread implies a
lock/mutex
![Page 11: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/11.jpg)
Data Structures & Locality
R.Brun : Paralllelism and HEP software 11
sparse data structures defeat the system memory caches
sparse data structures defeat the system memory caches
Group object elements/collections such that the storage matches
the traversal processes
Group object elements/collections such that the storage matches
the traversal processes
For example: group the cross-sections for all processes per
material instead of all materials per process
For example: group the cross-sections for all processes per
material instead of all materials per process
![Page 12: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/12.jpg)
Tools & Libs
R.Brun : Paralllelism and HEP software12
hbookhbook
zebrazebra
pawpawzbookzbook
hydrahydra
geant1geant1
geant2geant2
geant3geant3 geant4geant4
rootroot
minuitminuit
bosbos
geant5geant5
![Page 13: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/13.jpg)
Detector Simulation tools
13
All based on the same principle:Sequential particle transport
![Page 14: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/14.jpg)
The GEANT versions
R.Brun : Paralllelism and HEP software 141975 1980 1990 1995 2010
G1G1
G2G2
G3G3
G4G4
functionality
![Page 15: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/15.jpg)
Conventional Transport
R.Brun : Paralllelism and HEP software 15
oooo
o
oo
oo
o
oo
ooo
o
oo
o oo
o
o
o
T1
T3
T2
o
o
o
oooo
oo
o
o
ooo
o
oo
oo
oT4
Each particle tracked step by step through hundreds of volumes
Each particle tracked step by step through hundreds of volumes
when all hits for all tracks are in
memory summable digits
are computed
when all hits for all tracks are in
memory summable digits
are computed
![Page 16: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/16.jpg)
Analogy with car traffic
R.Brun : Paralllelism and HEP software 16
2
5
3
1
4
![Page 17: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/17.jpg)
Starting Assumptions
The LHC experiments use extensively G4 as main simulation engine. They have invested in validation procedures. Any new project must be coherent with their framework.
One of the reasons why the experiments develop their own fast MC solution is the fact that a full simulation is too slow for several physics analysis. These fast MCs are not in the G4 framework (different control, different geometries, etc), but becoming coherent with the experiments frameworks.
Giving the amount of good work with the G4 physics, it is unthinkable to not capitalize on this work.
R.Brun : Paralllelism and HEP software 17
![Page 18: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/18.jpg)
Goals
Design a new detector simulation tool derived from the Geant4 physics , but with a radically new transport engine supporting: Full and Fast simulation (not exclusive) Designed to exploit parallel hardware this talk
18
![Page 19: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/19.jpg)
Definitions
19
Detector Physical volumes Logical volumes
ALICE 4,354,735 4,764
ATLAS 29,046,966 7,143
CMS 1,166,318 1,537
LHCb 18,491,756 709
A logical volume has a given shape and material
![Page 20: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/20.jpg)
Steps/lvolume in Atlas
R.Brun : Paralllelism and HEP software 20
Huge dynamic range 7100 lvolume types
29 million instances
![Page 21: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/21.jpg)
Simple observation: HEP transport is mostly local !
• Locality not exploited by the classical transportation approach
• Existing code very inefficient (0.6-0.8 IPC)
• Cache misses due to fragmented code
50 per cent of the time spent in 50/7100 lvolumes
21
![Page 22: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/22.jpg)
Neighbors/lvolume in Atlas
R.Brun : Paralllelism and HEP software 22
Volumes with too many neighbors
![Page 23: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/23.jpg)
Neighbors/lvolume in CMS
R.Brun : Paralllelism and HEP software 23
Same problem with CMS
![Page 24: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/24.jpg)
LHCB geometry statistics
R.Brun : Paralllelism and HEP software 24
Better situation with neighbors because of a non cylindrical
geometry
90 per cent of steps in 50/700 volumes
![Page 25: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/25.jpg)
New Transport Scheme
R.Brun : Paralllelism and HEP software 25
oooo
o
oo
oo
o
oo
ooo
o
oo
o oo
o
o
o
T1
T3
T2
o
o
o
oooo
oo
o
o
ooo
o
oo
oo
oT4
All particles in the same volume type are
transported in parallel.
Particles entering new volumes or generated
are accumulated in the volume basket.
All particles in the same volume type are
transported in parallel.
Particles entering new volumes or generated
are accumulated in the volume basket.
Events for which all hits are
available are digitized in
parallel
Events for which all hits are
available are digitized in
parallel
![Page 26: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/26.jpg)
Tails again
R.Brun : Paralllelism and HEP software 26
A killer if one has to wait the end of col(i) before
processing col(i+1)
Average number of objects in
memory
![Page 27: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/27.jpg)
A better solution
R.Brun : Paralllelism and HEP software 27
Pipeline of objects
CheckpointSynchronization.
Only 1 « gap » every N events
This type of solution required
anyhow for pile-up studies
![Page 28: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/28.jpg)
A better better solution
R.Brun : Paralllelism and HEP software 28
checkpoints At each checkpoint we have to keep the
non finished objects/events.
We can now digitize with parallelism on events, clear and reuse the slots.
![Page 29: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/29.jpg)
29
Benchmarks/lessons from a prototype
HT mode
Excellent CPU usage
Benchmarking 10+1 threads on a 12 core Xeon
Locks and waits: some overhead due to transitions coming from exchanging baskets via concurrent queues
Event re-injection will improve the speed-up
29
![Page 30: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/30.jpg)
R.Brun : Paralllelism and HEP software 30
![Page 31: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/31.jpg)
R.Brun : Paralllelism and HEP software 31
![Page 32: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/32.jpg)
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t sVectorizing the geometry (ex1)
R.Brun : Paralllelism and HEP software 32
Double_t TGeoPara::Safety(Double_t *point, Bool_t in) const{ // computes the closest distance from given point to this shape. Double_t saf[3]; // distance from point to higher Z face saf[0] = fZ-TMath::Abs(point[2]); // Z
Double_t yt = point[1]-fTyz*point[2]; saf[1] = fY-TMath::Abs(yt); // Y // cos of angle YZ Double_t cty = 1.0/TMath::Sqrt(1.0+fTyz*fTyz);
Double_t xt = point[0]-fTxz*point[2]-fTxy*yt; saf[2] = fX-TMath::Abs(xt); // X // cos of angle XZ Double_t ctx = 1.0/TMath::Sqrt(1.0+fTxy*fTxy+fTxz*fTxz); saf[2] *= ctx; saf[1] *= cty; if (in) return saf[TMath::LocMin(3,saf)]; for (Int_t i=0; i<3; i++) saf[i]=-saf[i]; return saf[TMath::LocMax(3,saf)];}
Huge performance gain expected in this type of code where shape constants can
be computed outside the loop
![Page 33: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/33.jpg)
Vectorizing the geometry (ex2)
R.Brun : Paralllelism and HEP software
33
G4double G4Cons::DistanceToIn( const G4ThreeVector& p, const G4ThreeVector& v ) const{ G4double snxt = kInfinity ; // snxt = default return value const G4double dRmax = 100*std::min(fRmax1,fRmax2); static const G4double halfCarTolerance=kCarTolerance*0.5; static const G4double halfRadTolerance=kRadTolerance*0.5;
G4double tanRMax,secRMax,rMaxAv,rMaxOAv ; // Data for cones G4double tanRMin,secRMin,rMinAv,rMinOAv ; G4double rout,rin ;
G4double tolORMin,tolORMin2,tolIRMin,tolIRMin2 ; // `generous' radii squared G4double tolORMax2,tolIRMax,tolIRMax2 ; G4double tolODz,tolIDz ;
G4double Dist,s,xi,yi,zi,ri=0.,risec,rhoi2,cosPsi ; // Intersection point vars
G4double t1,t2,t3,b,c,d ; // Quadratic solver variables G4double nt1,nt2,nt3 ; G4double Comp ;
G4ThreeVector Normal;
// Cone Precalcs
tanRMin = (fRmin2 - fRmin1)*0.5/fDz ; secRMin = std::sqrt(1.0 + tanRMin*tanRMin) ; rMinAv = (fRmin1 + fRmin2)*0.5 ;
if (rMinAv > halfRadTolerance) { rMinOAv = rMinAv - halfRadTolerance ; } else { rMinOAv = 0.0 ; } tanRMax = (fRmax2 - fRmax1)*0.5/fDz ; secRMax = std::sqrt(1.0 + tanRMax*tanRMax) ; rMaxAv = (fRmax1 + fRmax2)*0.5 ; rMaxOAv = rMaxAv + halfRadTolerance ; // Intersection with z-surfaces
tolIDz = fDz - halfCarTolerance ; tolODz = fDz + halfCarTolerance ;
…… //here starts the real algorithm
Huge performance gain expected in this type of code
where shape constants can be computed outside
the loop
All these statements are independent of the particle !!!
![Page 34: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/34.jpg)
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Vectorizing the Physics
• This is going to be more difficult when extracting the physics classes from G4. However important gains are expected in the functions computing the distance to the next interaction point for each process.
• There is a diversity of interfaces and we have now sub-branches per particle type.
R.Brun : Paralllelism and HEP software 34
![Page 35: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/35.jpg)
Where are we now?
Present status Several investigations of possible alternatives for “extremely
parallel – no lock” transport Not much code written, several blackboards full Some investigation on a simplified but fully vectorized model to
prove vectorization gain New design in preparation
35
![Page 36: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/36.jpg)
Major points under discussion
How to minimize locks and maximize local handling of particles
How to handle hit and digit structures How to preserve the history of the particles
This point seems more difficult at the moment and it requires more design
What is the possible speedup obtained by micro-parallelisation
What are the bottlenecks and opportunities with parallel I/O
36
![Page 37: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/37.jpg)
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
37
Current design
Input particle list
Output particle list
p array
Hits
p array
History
List of logical Volumes
List of baskets for lv
Active event list
Sensitive volumes
Digits for lv and event ev
Logical Volume lv
List of active events for lv
Event ev Digitizer thread
Events
BF: basket status (one char per B)
Transport thread
Ev build thread
Reused after each transport
task
Flushedat the end of
event
![Page 38: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/38.jpg)
Features
Pros Excellent potential locality Easy to introduce hits and digits
Cons One more copy (but it is done in parallel) More difficult to preserve particle history (it is non-local!) and
introduce particle pruning
38
![Page 39: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/39.jpg)
Processing flow I
The transport thread takes particles from the input buffer and transports them till they stop, interact or exit from the volume At this point they are inserted in the output particle buffer for
further processing If the LV is a sensitive detector, hits are generated and stored
per LV basket A LV basked history record is kept (under investigation)
Input and output particle buffers are fixed size structures, which can however evolve (be optimised) during simulation
39
![Page 40: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/40.jpg)
40
Design under study
Input particle list
Output particle list
p array
Hits
p array
History
List of logical Volumes
List of baskets for lv
Active event list
Sensitive volumes
Digits for lv and event ev
Logical Volume lv
List of active events for lv
Event ev
✔ full!
✗ empty!
BF: basket status (one char per B)
![Page 41: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/41.jpg)
Note
Containers are “slow growing” contiguous containers Every time a container has to grow, it is realloc-ated
contiguously to the new size A blocking operation
We expect containers size to converge If not, there is a design problem
41
![Page 42: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/42.jpg)
Processing flow II
When an input particle buffer is exhausted It is marked as such by the transport thread in the LV#BF
(Logical Volume # Basket Flag) Then the transport thread scans the LV#BF (Logical
Volume # Basket Flag) data structure to find the next basket to be transported
Used buffers are scanned by the dispatcher thread that updates a global track counter per event
And then they are declared available to be filled (a) to be reused
42
![Page 43: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/43.jpg)
Important!!
a (available) basket being filled by the dispatcher f (full) basket ready to be dispatched r (ready) basket ready to be transported t (transporting) basket being transported
43
![Page 44: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/44.jpg)
44
Current design
Input particle list
Output particle list
p array
p array
List of baskets for lv
Logical Volumes
tt1) The transport thread has finished working on the input array
tt2) The transport thread marks the lv#bn from transporting (t) to “to be dispatched” (f)
LV
BN t
dt1) The dispatch takes the first f basket and dispatches the output particles into the input particle lists of the baskets available to be filled (a)
tt3) The transport thread gets a basket to be transported (r) from the fast selection list and marks it “transporting” (t)
fa
Dispatcher
a
dt2) When dispatching is finished the basket is moved from f to a
a
dt3) When a input list is full, the basket is moved from a to r, ready to be transported and it is pushed into the fast selection list
tt0) Initial status: the transport thread is transporting a basket, marked as “transporting” (t)
Input particle list
Output particle list
p array
p array
List of baskets for lv
LV
BN rarta
Transport thread Dispatch thread
BF: basket status
BF: basket status
asynch!!
Fast LV#BN queueLV#BN
![Page 45: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/45.jpg)
Locks… The only lock is the push and pop from the fast selection queue The dispatcher watches continuously the done byte-vector and
dispatch every new basket that is ready Or it can sleep some and then process a number of done baskets in
succession The transport thread marks the done basket (no lock!)
No one touches a (t) basket apart the transport thread that deals with it The transport thread gets a new basket (lock!)
This is to avoid that two threads get the same basket or that the dispatcher thread is updating the fast selection queue
The dispatcher thread does not need to lock the whole bit-array while dispatching The basket in f or in a will not be touched by the transport threads
The only doubtful situation is when there are no basket to be transported… In this case the “global” threshold for transporting should be lowered by a
hungry transport thread (lock, but just to update an integer!) And the dispatcher will mark baskets as ready to be transported (r)
45
![Page 46: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/46.jpg)
Memory
We hope to have a self-adjusting system that will stabilise with time
In case of an “accident” (an event much larger that any other), we need a way to “quench inflation”
We have identified two methods Event flushing: do NOT transport particles from a given set of events
and move them directly to the output buffer Energy flushing: transport low energy particles and move the high
energy ones to the output buffer “Untransported” particles are just reinjected into the
system, but they do not shower
46
![Page 47: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/47.jpg)
Processing flow III Note an important point
The LV basket structure has input and output particle buffers and hits and history buffers
Input and output particle buffers are Multi-event Volatile, they get emptied and filled during transport of a
single event Hits and history buffers are
Per event Permanent during the transport of a single event A basket of a LV can be handled by different threads
successively, each one with a new input and output buffers …but all these threads will add to the Hits and history data
structure till the event is flushed
47
![Page 48: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/48.jpg)
Processing flow IV When an event is finished, the digitizer thread
kicks in and scans all the hits in all the baskets of all the LVs and digitise them, inserting them in the LV event->digit structure
When this is over, the event is built into the event structure (to be designed!) by the event builder thread
After that, the history for this event is assembled by the same thread If…
Then the event is output By an output thread or in parallel?
48
![Page 49: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/49.jpg)
Questions?
How many dispatcher, digitizer and event-builder threads? Difficult to say, we need some more quantitative design work Measurements with G4 simulations could help
Transport thread numbers will have to adapt to the size of simulation and of the detector In ATLAS for instance 50% of the time is spent in 0.75% of
the volumes Threads could be distributed proportionally to the time spent
in the different LVs
49
![Page 50: Impact of parallelism on HEP software April 29 th 2013 Ecole Polytechnique/LLR Rene Brun](https://reader038.vdocuments.net/reader038/viewer/2022110205/56649cc95503460f94991a83/html5/thumbnails/50.jpg)
50
Short term tasks
Continue the design work – essential before any more substantial implementation This is the most important task at the moment We have to evaluate the potential bottlenecks before starting the
implementation Implement the new design and evaluate it against the first Demonstrate speedup of some chosen geometry routines
Both on x86 CPUs and GPUs Demonstrate speedup of some chosen physics methods
Particularly in the EM domain