many-core scalability of the online event reconstruction in the cbm experiment

11
Many-Core Scalability Many-Core Scalability of the Online Event Reconstruction of the Online Event Reconstruction in the CBM Experiment in the CBM Experiment Ivan Kisel Ivan Kisel GSI, Germany GSI, Germany (for the CBM Collaboration) (for the CBM Collaboration) CHEP-2010 CHEP-2010 Taipei, October 18-22, 2010 Taipei, October 18-22, 2010

Upload: holly-adams

Post on 30-Dec-2015

23 views

Category:

Documents


0 download

DESCRIPTION

Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment. Ivan Kisel GSI, Germany (for the CBM Collaboration). CHEP-2010 Taipei, October 18-22, 2010. Tracking Challenge in CBM (FAIR/GSI, Germany). Fixed-target heavy-ion experiment 10 7 Au+Au collisions/s - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

Many-Core ScalabilityMany-Core Scalabilityof the Online Event of the Online Event

ReconstructionReconstructionin the CBM Experimentin the CBM Experiment

Ivan KiselIvan KiselGSI, GermanyGSI, Germany

(for the CBM Collaboration)(for the CBM Collaboration)

CHEP-2010CHEP-2010Taipei, October 18-22, 2010Taipei, October 18-22, 2010

Page 2: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 22/11/11

Tracking Challenge in CBM (FAIR/GSI, Germany)

• Fixed-target heavy-ion experiment• 107 Au+Au collisions/s• 1000 charged particles/collision• Non-homogeneous magnetic field• Double-sided strip detectors (85% combinatorial space points)

Track reconstruction in STS/MVD and displaced vertex search are required in the first trigger level.

Reconstruction packages:• track finding Cellular Automaton (CA)• track fitting Kalman Filter (KF)• vertexing KF Particle

Page 3: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 33/11/11

Many-Core CPU/GPU ArchitecturesMany-Core CPU/GPU Architectures

2x4 cores 512 cores

1+8 cores32 cores

Intel/AMD CPU NVIDIA GPU

Intel MICA IBM Cell

Future systems are heterogeneousFuture systems are heterogeneous

Page 4: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 44/11/11

Kalman Filter (KF) based Track FitKalman Filter (KF) based Track Fit

Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF)

Detector layersHits

(r, C)

r – Track parameters C – Precision

Initialising

Prediction

Correction

Precision

11

22

33

r = { x, y, z, px, py, pz }

Position, direction and momentum

State vector

Nowadays the Kalman Filter is used in almost all HEP experiments

Nowadays the Kalman Filter is used in almost all HEP experiments

Kalman Filter: 1.Start with an arbitrary initialization.2.Add one hit after another. 3.Improve the state vector. 4.Get the optimal parameters after the last hit.

KF as a recursive least squares method

KF Block-diagram 11

22 33

Page 5: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 55/11/11

Performance of the KF Track Fit on CPU/GPU Systems

scalar double single ->2 4 8 16 32

1.00

10.00

0.10

0.01

Scalability on different CPU architectures – speed-up 100

Data Stream Parallelism(10x)

Task Level Parallelism(100x)

2xCell SPE (16 )

Woodcrest ( 2 )Clovertown ( 4 )

Dunnington ( 6 )

SIMD Cores and Threads

Tim

e/T

rack

, s

Threads

Cores

Threads

SIMD

Real-time performance on different Intel CPU platforms Real-time performance on NVIDIA GPU graphic cards

Scalabilty

CPUGPU

The Kalman Filter algorithm performs at ns levelThe Kalman Filter algorithm performs at ns level

Page 6: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 66/11/11

Scalability of the KF Track Fit on Scalability of the KF Track Fit on 2xNehalem = 8 Cores2xNehalem = 8 Cores

Aiming to develop a scheduler, we investigate scalability and CPU load.

Given n threads each filled with 10m tracks, run them on specific n logical cores with 1 thread per 1 logical core.

Each physical core of Nehalem has 2 HW threads (read/exe), which are considered by the operating system as logical cores.

For small groups of tracks the overhead becomes significant, while large groups of tracks use CPU more efficient.

Measure tracks throughput rather than time per track.

Core 1Core 2

Page 7: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 77/11/11

Cellular Automaton (CA) as Track FinderCellular Automaton (CA) as Track Finder

Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA)

0. Hits

1. Segments

1 2 3 42. Counters

3. Track Candidates

4. Tracks

Cellular Automaton:• local w.r.t. data• intrinsically parallel• extremely simple• very fast

Perfect for many-core CPU/GPU !

Cellular Automaton:• local w.r.t. data• intrinsically parallel• extremely simple• very fast

Perfect for many-core CPU/GPU !

Detector layers

Hits

4. Tracks (CBM)

0. Hits (CBM)

1000 Hits

1000 Tracks

Cellular Automaton:1.Build short track segments.2.Connect according to the track model, estimate a possible position on a track.3.Tree structures appear, collect segments into track candidates.4.Select the best track candidates.

Page 8: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 88/11/11

CBM Cellular Automaton Track Finder770

TracksTop view Front view

Efficiency Scalability

Highly efficient reconstruction of 150 central collisions per second

Highly efficient reconstruction of 150 central collisions per second

Intel X5550, 2x4 cores at 2.67 GHz

Page 9: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 99/11/11

CA Track Finder: First 4D Version

The beam will have no bunch structure, therefore 4D measurements (x, y, z, t).

• 100 mbias events have been divided into groups, each of them has been processed as a single event.• Each hit had an integer index (equal to the event number, later will represent the time measurement t ).• Each tracklet was constructed from hits with the same t.

The same efficiency. Slight increase of the processing time with larger time slices.

Reconstruction of time slices rather then events will be needed.

Ready for more realistic 4D simulations

Ready for more realistic 4D simulations

Events/group Speed, ev/s

1 62

2 67

4 62

5 61

10 54

20 47

Page 10: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 1010/11/11

Consolidate Efforts: International Tracking Workshop

45 participants from Austria, China, Germany, India, Italy, Norway, Russia, Switzerland, UK and USA

Follow-up Workshop:10-11 March, 2011,

CERNSverre Jarp

Follow-up Workshop:10-11 March, 2011,

CERNSverre Jarp

Page 11: Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment

21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 1111/11/11

SummarySummary

• CBM event reconstruction is at the ms levelCBM event reconstruction is at the ms level• Strong many-core scalability Strong many-core scalability • Data parallelism vs. task parallelismData parallelism vs. task parallelism• Use SIMD unit, multi-threading, many-coresUse SIMD unit, multi-threading, many-cores• Parallelization is now becoming a standard in the CBM experimentParallelization is now becoming a standard in the CBM experiment