geant4 10.0-beta steps towards release 10 gabriele cosmo, ph/sft
DESCRIPTION
Multi-threading 10.0-beta features - 1/2 Event-level parallelism Each worker thread proceeds independently Initializes its state from a master thread Identifies its part of the work (events) Generates hits in its own hits- collection Uses thread-private objects and state Shares read-only data structures (e.g. geometry, cross-sections, …) Has its own read-write part in a few ‘shared/split’ objects SFT Group Meeting - 8 July 2013Geant beta: Steps towards release 10 - G.Cosmo3 Possibility to install/run Geant4 either in pure sequential or parallel (MT) mode Choice at configuration/installation time Sequential mode currently the defaultTRANSCRIPT
Geant4 10.0-betaSteps towards release 10
Gabriele Cosmo, PH/SFT
2
Multi-threadingfrom prototype to production …
Capitalizing the work started back in 2009By X.Dong and G.Cooperman, Northeastern University
Strong contribution by SFT Simulation team membersBig effort brought to success
10.0-beta announced on June 28th on schedule
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
G4MT 9.4 (2011)
G4MT 9.5 (2012)
G4 10.0-beta (now)
G4 10.0 (Dec. 2013)
G4 10 series
(2014+)• Proof of
principle• Identify objects
to be shared• First testing
• MT code integrated into G4
• API re-design• Examples
migration• Further testing• First
optimisations
• Public release• All
functionalities ported to MT
• Further refinements
• Focus on further performance improvements
3
Multi-threading10.0-beta features - 1/2
Event-level parallelism
Each worker thread proceeds independently
Initializes its state from a master threadIdentifies its part of the work (events)Generates hits in its own hits-collectionUses thread-private objects and stateShares read-only data structures (e.g. geometry, cross-sections, …)Has its own read-write part in a few ‘shared/split’ objects
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
Possibility to install/run Geant4 either in pure sequential or parallel (MT) mode
Choice at configuration/installation timeSequential mode currently the default
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
4
Multi-threading10.0-beta features - 2/2
SFT Group Meeting - 8 July 2013
Focus on “lock-free” codeMetrics currently in use: linearity of speed-up (w.r.t. #threads)Absolute throughput optimisation will followEnforce use of POSIX standards to allow for integration with user preferred parallelization frameworks (e.g. TBB, MPI, …)
See: https://twiki.cern.ch/twiki/bin/view/Geant4/Geant4MTForApplicationDevelopers
Design aimed to minimize changes in users code
Keep API changes at minimum
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
5
Multi-threading10.0-beta known limitations
Radioactive-decay and/or ion beams not yet portedRequires specific treatment for ion / isomers / decay of nuclei
Some cases of event non-reproducibility to be investigatedGoal: guarantee reproducibility at -numerical- level vs. sequential runs
Visualization not yet fully functionalEvent/hits display during the event loop is possible only on some circumstances
Some objects not cleanly deleted at the termination of the thread/programNo proper sanity checking for memory leaks applied for 10.0-beta
Some UI commands combinations have not been fully ported yetLimited testing coverage
For physics/geometry options/phase-spaces
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
6
Multi-threading10.0-beta performance - 1/3
Showing good efficiency w.r.t. perfect linearity (90%, 80% in HT)
SFT Group Meeting - 8 July 2013
(*) Based on performance analysis by S.Yung Jun, FNAL on AMD Opteron™ 6128, 32 cores
No measured CPU degradation vs. sequential runs (*)
Preliminary: CMS geometry
HT regime
Intel® Xeon®
CPU L5520 @
2.27G
Hz
e-
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
7
Multi-threading10.0-beta performance - 2/3
Hybrid mode: Host + Intel® Xeon Phi™ coprocessor (MIC)
First look at total throughput (evt/s) (*)Excellent results: factor ~x3 in events produced w.r.t. host only
SFT Group Meeting - 8 July 2013
(*) Preliminary analysis on full-CMS benchmark by A.Dotti, SLAC
Confirmed good scalability up to O(100) threadsReduced use of memory
(see next slide)
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
8
Multi-threading10.0-beta performance - 3/3
Hybrid mode: Host + Intel® Xeon Phi™ coprocessor
Using out-of-the-box 10.0-beta (i.e. no optimisations)40 MB/thread
Baseline: Full-CMS benchmark; 200 MB (geometry and physics)
Speedup almost linear with reasonably small increase of memory usage
SFT Group Meeting - 8 July 2013
(*) Preliminary analysis on full-CMS benchmark by A.Dotti, SLAC
Number of threads
Mem
ory u
sage
(MB)
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
9
Multi-threadingFirst physics validation results…
20 Gev proton on W-LarFTFP_BERT physics-listSequential: 5000 eventsMulti-threaded: 20000 events
4 threadsresults for 1 thread shown
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
10
Multi-threadingNext to come …
Review and further refinements to APIBased on feedback from users and Beta testersRationalisation and better modularisation of code for the initialisation of threadsAiming to further simplify user-code migration
Address and solve current limitations & problemsImprove testing coverage
Further improve performanceIdentify and solve hotspotsUse of thread-private malloc (to remove hidden locks in new/delete)
Further investigations on task-based parallelism (TBB)TBB works already with Geant4-MTProvide one or more examples based on the new API
Study heterogeneous parallelism (MPI together with multi-threading)Use in hybrid systems (host + one [or more] MIC card)Adoption of check-pointing technique (DMTCP) to improve start-up time
SFT Group Meeting - 8 July 2013
Electromagnetic Physics
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
12
Electromagnetic Physics10.0-beta features - 1/2
Reviewed e+e- pair-production modelCorrected inaccuracy of interpolation to improve e+e- spectraImplemented larger table
Consolidation of multiple-scattering modelsFixed long-standing issue with Urban93Fine tuning of Urban96, now the most accurate for e- transportUpdated WentzelVI and Single-scattering models, to use different screening parameters for e+e- and heavy particles; now the most accurate model for muons and hadrons
Developed new validation testsFluctuation models; tracker devices; electron ionisation
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
13
Electromagnetic Physics10.0-beta features - 2/2
SFT Group Meeting - 8 July 2013
ATLAS-barrel type calorimeter
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
14
Electromagnetic PhysicsNext to come …
Implement full sharing of EM tables among threadsComplete analysis of possible alternative models for fluctuation of energy lossRefinements to PAI model to be ready for use in productionRefinements to effective charge approach for ion ionisationCombination of models for multiple and single scattering of hadrons
Taking into account interference between Coulomb and strong amplitudesSummer student project
Improvements to muon-nuclear cross-sections
SFT Group Meeting - 8 July 2013
Hadronic Physics
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
16
Hadronic Physics10.0-beta features - 1/3
Fritiof string model (FTF)Extension to nucleus-nucleus interactions
Now FTF (from ~3 GeV to ~TeV) together with QMD, or INCL++ or BIC (below a few GeV) allows to simulate ion-ion collisions for the first time in Geant4
Extended validation of hadron-nucleus interactionsImproved excitation energies of nuclear residuals
Bertini-like intra-nuclear cascade model (BERT)Improved two-body angular distributions
effect on lateral shapes of hadronic showersAdded nuclear capture of muons (which generate cascade)Revised treatment of cascade kinematics
SFT Group Meeting - 8 July 2013
17
Hadronic Physics10.0-beta features - 2/3
SAID calculations of cos(Ɵ) distributions for p-p elastic scattering
Histograms fitted for interpolation (binned distribution of cos(Ɵ) vs. Ekin)
Below: hadronic showers in simplified calorimeters
Wider lateral shapes
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
18
Hadronic Physics10.0-beta features - 3/3
Pre-Compound and nuclear de-excitationIntroduced the possibility to produce isomers
Neutron High Precision (HP)Allow reading compressed data files
INCL++Improvement in the nucleus-nucleus sector
Removed deprecated CHIPS classes and modulesMajor restructure of Physics Lists module and sub-modules
Replaced residual dependencies on LEP/HEP models (parameterised, Geisha-like) in QGS-based lists with FTFP-BERT
FTF will be replaced by GGS when this is extended to lower energiesRemoved several deprecated lists and added new INCL++- based physics-lists
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
19
Hadronic PhysicsNext to come …
SFT Group Meeting - 8 July 2013
Refinements to diffraction and code improvements to FTFValidation of pre-compound for all de-excitation in Bertini and enhancements to model for in-medium N-N cross-sections, physical unit parameters and coalescenceNeutron High-Precision (HP): model review and extension to generic particles; extended validation of libraries and further comparisons with MCNPCross-sections: extended validation and new general comprehensive test-suiteRevision of Pre-Compound and Radioactive-decay for isomers
Geometry & more…
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
21
Geometry10.0-beta features
Replaced UI commands for geometry overlaps check
Now based on built-in overlaps checking for random points generated on solids’ surfaces
Using precise safety computation by default in navigationArchived obsolete BREPs classes and module
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
22
GeometryNext to come …
Integration of the AIDA Unified Solids library
To be included as optional component, for replacing the original solids
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
23
More features in 10.0-beta…
Automatically generating isotope vector with natural abundances for NIST materialsVariables shadowing …
Units & constants inclusion
Enhanced CMake build systemRedesigned examples (basic & extended)
Several examples migrated to support multi-threading
New data setsG4EMLOW-6.33G4NDL-4.3G4NEUTRONXS-1.3G4RadioactiveDecay-3.7
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
24
Physics ValidationEnhanced tool for presentation of results
Allowing for easy extension with new validation tests
SFT Group Meeting - 8 July 2013
Planned to complement validation on the GRID with new tests
Tracker testLAr electro-magnetic calorimeter testThin-target performance test
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
25
Supported platformsGeant4 10.0-beta
Linux SLC6, gcc-4.4.7, 4.3.x, 64 bitsMacOSX 10.7, 10.8, gcc-4.2.1, 64 bitsWindows 7, Visual C++ 10.0 (Visual Studio 2010)
Multi-threading not ported yet !
Also tested:Linux SLC5, gcc-4.7, gcc-4.8, icc-13Linux Ubuntu 12, gcc-4.6Windows 7, VC++ 9.0 (no MT port yet)
SFT Group Meeting - 8 July 2013
Geant4 10.0-beta: Steps towards release 10 - G.Cosmo
26
SummaryRelease 10.0-beta introduces ‘optional’ event-level parallelism through use of independent working threads
Excellent scalability vs. #threads up to O(100) threads with no performance penalty vs. sequential modeFirst physics validation tests are positiveWe’re on track!
… but still quite some work ahead of us to the final release for further improvements in testing coverage, performance and API optimisation
Lots of new features in all areas and more to come before the final release in December
Notes: http://geant4.cern.ch/support/Beta4.10.0-1.txtWork plan: http://geant4.cern.ch/support/planned_features.shtml
Improved physics validation testing suite
SFT Group Meeting - 8 July 2013
Special thanks to A.Dotti (SLAC), S.Y.Jun (FNAL), M.Gayer, V.Ivantchenko,G.Lestaris, and A.Ribon for providing most of the material presented in these slides!