advanced european infrastructures for detectors at accelerators … · 2017. 8. 25. · 1 data...

13
AIDA-2020-CONF-2017-002 AIDA-2020 Advanced European Infrastructures for Detectors at Accelerators Conference/Workshop Paper Tracking at LHC as a collaborative data challenge use case with RAMP Amrouche, Sabrina (University of Geneva, Switzerland) et al 08 August 2017 The AIDA-2020 Advanced European Infrastructures for Detectors at Accelerators project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 654168. This work is part of AIDA-2020 Work Package 3: Advanced software. The electronic version of this AIDA-2020 Publication is available via the AIDA-2020 web site <http://aida2020.web.cern.ch> or on the CERN Document Server at the following URL: <http://cds.cern.ch/search?p=AIDA-2020-CONF-2017-002> Copyright c CERN for the benefit of the AIDA-2020 Consortium

Upload: others

Post on 30-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

AIDA-2020-CONF-2017-002

AIDA-2020Advanced European Infrastructures for Detectors at Accelerators

Conference/Workshop Paper

Tracking at LHC as a collaborative datachallenge use case with RAMP

Amrouche, Sabrina (University of Geneva, Switzerland) et al

08 August 2017

The AIDA-2020 Advanced European Infrastructures for Detectors at Accelerators projecthas received funding from the European Union’s Horizon 2020 Research and Innovation

programme under Grant Agreement no. 654168.

This work is part of AIDA-2020 Work Package 3: Advanced software.

The electronic version of this AIDA-2020 Publication is available via the AIDA-2020 web site<http://aida2020.web.cern.ch> or on the CERN Document Server at the following URL:

<http://cds.cern.ch/search?p=AIDA-2020-CONF-2017-002>

Copyright c© CERN for the benefit of the AIDA-2020 Consortium

Page 2: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences will be set by the publisherDOI: will be set by the publisherc© Owned by the authors, published by EDP Sciences, 2017

Track reconstruction at LHC as a collaborative data challengeuse case with RAMP

Sabrina Amrouche1, Nils Braun2, Paolo Calafiura3, Steven Farrell3, Jochen Gemmler2, Cé-cile Germain4, Vladimir Vava Gligorov5, Tobias Golling1, Hadrien Grasland6, Heather Gray3,Isabelle Guyon7, Mikhail Hushchyn8, Vincenzo Innocente9, Balázs Kégl6,10, Sara Neuhaus11,David Rousseau6, Andreas Salzburger9, Andrei Ustyuzhanin8,12, Jean-Roch Vlimant13, Chris-tian Wessel14, and Yetkin Yilmaz6,10

1University of Geneva, Switzerland2Karlsruhe Institute of Technology, Germany3Lawrence Berkeley National Laboratory, CA, USA4LAL and LRI, Orsay, France5LPNHE, Paris, France6LAL, Univ. Paris-Sud, CNRS/IN2P3, Université Paris-Saclay, Orsay, France7LRI and Université Paris-Saclay, France8Yandex School of Data Analysis (YSDA), Moscow, Russia9CERN, Geneva, Switzerland10Center for Data Science, Université Paris-Saclay, France11Max Planck Institute for Physics, Munich, Germany12NRU Higher School of Economics (HSE), Moscow, Russia13California Institute of Technology, CA, USA14University of Bonn, Germany

Abstract. Charged particle track reconstruction is a major component of data-processingin high-energy physics experiments such as those at the Large Hadron Collider (LHC),and is foreseen to become more and more challenging with higher collision rates. Asimplified two-dimensional version of the track reconstruction problem is set up on a col-laborative platform, RAMP, in order for the developers to prototype and test new ideas. Asmall-scale competition was held during the Connecting The Dots / Intelligent Trackers2017 (CTDWIT 2017) workshop. Despite the short time scale, a number of different ap-proaches have been developed and compared along a single score metric, which was keptgeneric enough to accommodate a summarized performance in terms of both efficiencyand fake rates.

1 Data challenges and RAMP

Advances in machine-learning technology is moving towards a new paradigm in data processing de-sign, in which the role of the individual will have to be redefined. In order to integrate the humancreativity with the computational resources in an efficient way, various data-challenge platforms havebeen emerging, which provide a competitive and/or collaborative environment to crowd-source theanalysis expertise.

Page 3: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

Rapid Analytics and Model Prototyping (RAMP) [1] is an online data challenge platform, pri-marily motivated by the need to achieve a more systematic integration of the development. It aimsto incorporate specific types of interactions between developers during the design process, and seeksinsight into the evolution of the design. What distinguishes RAMP among various data challengeplatforms is its focus on the progress rather than the result.

The RAMP includes many different types of data challenges, in addition to the track reconstructionchallenge, to each of which any user can subscribe and submit several "solutions" for the formulated"problem". The problems can be of different types, such as classification, regression or clustering, ina variety of contexts such as chemistry, biology or climate research [1]. The distinguishing featureof RAMP is that the submission is the "code" to solve the problem, which runs remotely on a fullymodularized workflow. For a given problem the users can implement methods for training and pre-dicting, with any level of intermediate processes, as long as the methods return the expected formatfor the prediction. A screenshot of the submission sandbox is shown in Figure 1. The submissions arewritten in python and can use common libraries such as numpy [2], pandas [3], scikit-learn [4].

Figure 1. Screenshot of the RAMP sandbox window, with submission based on a basic DBSCAN clusteringmethod [5]. Code submissions are made by either editing code on the page directly, or uploading the relevantfiles.

In order to test the code before submitting, the users are provided with a limited portion of thedataset that contains the relevant input and output columns, on which they can perform a quick cross-validation. The submitted code is then run in the backend to train and test on a larger dataset, and theresults are displayed in an online table. Such modularization of the workflow allows the algorithms tobe trained and tested on different datasets even after the data challenge, which will be the topic of thediscussion in Section 6.

A typical hackathon with RAMP contains two phases: a closed phase and an open phase. Inthe closed phase, the participants cannot access each others submissions (although they may discussamong themselves), therefore each individual implements new code. In the open phase, participantscan see and download each other’s code, and make new submissions upon improvements on eachother’s ideas.

Page 4: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

Connecting The Dots / Intelligent Trackers 2017

Taking into account the complexity of the track reconstruction problem and a short time period,the hackathon event during the CTDWIT 2017 workshop at LAL-Orsay started directly with the openphase in order to boost the first submission of the participants. This open phase also included a fewtest algorithms (without machine-learning techniques) as a guiding baseline.

2 Use case: Track reconstruction at the LHC experiments

Track reconstruction is one of the most important tasks in a high-energy physics experiment, as itprovides high-precision position and momentum information of charged particles. Such informa-tion is crucial for a diverse variety of physics studies - from Standard Model tests to new particlesearches - which require robust low-level information which can be further refined for a narrower andmore specific physics context. Through the history of high-energy physics, there have been manydifferent types of tracking detectors with very different design principles: from bubble chambers totime-projection chambers, from proportional counters to spark chambers, and more recently with sil-icon detectors... Although each of them have provided a different data topology, they all relied on thesame principle: the small energy deposit of charged particles in well-defined locations, with particletrajectories being bent in an externally applied magnetic field. Future developments, like the High-Luminosity LHC [6, 7] which introduces a factor 10 increase in track multiplicity w.r.t. current LHCconditions, will make track finding very difficult.

A track reconstruction challenge [8] with a rather complete LHC detector simulation is beingprepared by a number of co-authors and others. But for the limited time frame of this workshop,we propose a challenge where we reduce the track finding problem to two dimensions and emulatethe topology of silicon detectors, such that there are few locations along the radius, and a very highprecision along the azimuth. Such topology helps to simplify the track reconstruction problem tolayer-by-layer azimuth determination, however, we hope to open room for further innovation as well.

3 Simulation

The challenge uses a simplified 2D detector model representative of an LHC detector with a toy modelfor particle generation and simulation, with sufficient complexity to make it non-trivial, with sourcecode available in [9]. The detector consists of a plane with nine concentric circular layers with radiibetween 3.9 cm and 1 m, with a distribution typical of a full Silicon LHC detector. A homogeneousmagnetic field of 2 Tesla perpendicular to the plane bends the charged particles so that a particleof momentum P (in MeV, with the convention c = 1) has a circular trajectory of radius R = P

0.3(in millimeters). Layers have negligible thickness and each layer is segmented in azimuth with finegranularity. There are 2πR/pitch pixels in every layer, where pitch is 0.025 mm for innermost layers1-5 and 0.05 mm for outermost layers 6-9. Every "pixel" corresponds to the smallest detector unitdefined by layer and azimuthal index.

A particle crossing a layer activates one pixel with a probability of 97%. The location of thisactivated pixel will be called a hit in the following. Since the detector has a very fine granularity inazimuth, the cases where two particles pass through a single pixel are rare (less than 0.2% probability).As a very simple simulation of hadronic interaction, a particle has a 1% probability to stop at eachlayer. Given all this, participants cannot assume a particle will leave a hit in each layer.

A charged particle crossing matter will accumulate many small random deviations due to inter-action with atomic nuclei. This multiple scattering phenomenon is simulated by attributing to eachlayer a radiation length x0 of 2%, such that a particle crossing a layer is deviated by an angle drawn

Page 5: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

from a gaussian centered at zero and of width σMS = 13.6√

x0/P in radians. Thus trajectories devi-ate from perfect circular arcs. The energy loss is neglected, so that particles keep the same absolutemomentum, hence the same local curvature, throughout the trajectory.

For each event, a Poisson distribution is sampled to determine the number of particles, with anaverage of 10 particles per event (3 orders of magnitude less than what is expected at HL-LHC).The particles are sampled uniformly in azimuth and momentum, with bounds on the momentumbetween 300 MeV and 10000 MeV. The lowest momentum is such that particles cannot loop in thedetector. Each particle originates from a different vertex that is also randomly sampled from a normaldistribution around the origin of width 0.667 mm, as a simple simulation of particles coming fromB-hadrons or τ-leptons, so that particles cannot be assumed to come exactly from the origin. Noadditional random background hits have been simulated.

The data provided to the participant is one csv file with one line per hit corresponding to 10000events, while a separate 10000 event file is kept private for the evaluation of the submissions by theRAMP platform. Hits are grouped per event in a random order. The following features are provided:

event number : arbitrary index, unique identifier of the event

particle number : arbitrary index, unique identifier of the ground truth particle having caused thathit. This number is hidden from the prediction method.

hit identifier : index of the hit in the event (implicit variable)

layer number : index of the layer where the hit occurs, from 1, the innermost layer, to 9, the outer-most layer

phi number index of the pixel corresponding to the hit, counting clockwise, from 0, at φ = 0 to themaximum, which depend of the layer

x : horizontal coordinate of the hit

y : vertical coordinate of the hit

It should be noted that although one can easily translate layer and phi information into x and y (andvice versa), both are provided in order to help participants to gain time. The user code should returnfor each event of the test dataset a track number for each hit, the track number being an arbitraryunique (in the event) number corresponding to a cluster of hits.

4 Scoring

The programs of LHC experiments span a diverse range of physics topics, each with a different selec-tion and use of data. Although tracks are used in almost all analysis applications, they are employedin very different ways: sometimes as an important part of the signal, sometimes as an indicator ofbackground. The overall goal of track reconstruction is to achieve the best correspondence betweenparticles and tracks, where the term "particle" refers to the true particles and "track" refers to the in-formation deduced from the detector by the reconstruction algorithm. This correspondence is neverperfect: Sometimes a particle is missed by the algorithm and this is referred to a lack of efficiency,and sometimes a track is reconstructed out of a coincidental alignment of hits without correspondingto a genuine particle, which is referred to as fake.

Depending on the analysis goals, one may be focused on very different aspects of the reconstruc-tion: some analyses need minimum combinatorial fake rate, some need the highest efficiency or the

Page 6: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

Connecting The Dots / Intelligent Trackers 2017

best momentum resolution. Ultimately, the figure of merit is defined by the systematic uncertaintyon the measured quantity which, as stated, varies for every study. It is therefore difficult to define inadvance the perfect track reconstruction score that fits all.

Ultimately the score that was used in the hackathon was based on "fraction of hits that are correctlymatched to a particle in a given event", and more precisely defined as follows:

1. Loop over particles to associate each particle to a set of tracks

2. Choose the track with the most hits in the particle, and associate it to the particle. The resultwill be many(track)-to-one(particle).

3. Loop over tracks and check which tracks are matched to the same particle.

4. In the case of multiple matching, remove all tracks (and their corresponding hits) from the listexcept the one with the most hits in the particle.

5. Count the number of remaining hits (that are all associated to tracks by construction), and divideit by the initial number of hits.

6. Assign the score of the event with the above, and average it over all the events.

The code for the score function is provided to the participants as a part of the starting kit. Figure 2illustrates the scoring procedure on a single event. Every event is scored separately, and the final resultis averaged over events (all events having equal weight, regardless of number of hits).

Figure 2. Illustration of the scoring procedure. First plot shows the ground truth and the middle plot is thereconstructed output, after indices of tracks (represented by colors) are matched to the input. The black dots inthe last plot shows the hits that are considered to be correctly assigned. For this particular event, the score is 0.81.

It is important to remark that, it is possible to think of analyses tasks where the trade-off betweenan optimum fake rate and efficiency can be contrary to the relative contribution of these properties onthe clustering score. The correspondence between the RAMP score and the actual track reconstructionefficiency and fake rate is studied for various algorithms and various samples. Here the efficiency isdefined as the number of well-reconstructed tracks divided by the number of true particles, wherewell-reconstructed means that at least 70% of the hits belonging to the same particle are clusteredsuccessfully. Similarly, fake rate is defined as the number of fake tracks divided by the number of trueparticles, when a fake track is defined as a cluster of at least 5 hits in which no more than 70% of thehits belong to the same true particle.

Page 7: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

In Figure 3, several submissions for a specific algorithm are plotted, in which the participant onlyknows the score and tries to optimize that in every update. According to this study, an improvementin the score represents a monotonic improvement on both the efficiency and fake rate, therefore thescore is reliable when upgrading the algorithms to achieve both better efficiency and less fakes. Therobustness of the score function, in terms of efficiency and fake rate optimization in various samples,is further discussed in Section 6.

Figure 3. Illustration of the score as a performance indicator during the development process. Each panelcorresponds to the submissions with a specific approach, incrementally being improved.

5 Example submissions

There were many interesting submissions during the hackathon, of which some with the highest scoresare displayed in the leaderboard shown in Figure 4. At the end of the first day, the scores alreadystarted to approach towards a saturation around 97% as shown in Figure 5. Some of the approacheswere totally independent, although some were a combination of existing baseline examples with im-proved features. We will now illustrate some of the example submissions which displayed interestinglines of development. The source code for these submissions can be accessed by signing up on theRAMP [1].

5.1 Untrained baseline examples

We started the hackathon with three baseline examples which we kept as simple as possible. Thesealgorithms were not data-driven, therefore the training function was left blank.

Among these three, two algorithms were known to be only valid for the assumption that the cir-cular tracks have very small curvature and therefore can be approximated by straight lines. They allassumed the vertex position to be at the exact origin. In the linear approximation algorithm, thecompatibility of hits with a straight line is tested and hits are clustered accordingly. In the nearestneighbors algorithm, the hits at each next layer are matched to the closest hit in the previous layer.

The third algorithm was a simplified version of Hough transform, in which the individual hitpositions were projected on to the track parameter space of P × φ. The actual implementation hadthe further simplification that the binning and searching in the P × φ space was restricted to bandsaround hit pairs in the first two layers, therefore it can be considered as a "seeded" Hough transformapproach.

Page 8: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

Connecting The Dots / Intelligent Trackers 2017

Figure 4. Screenshot of the RAMP leaderboard, showing top 10 submissions as of March 9, 2017.

Figure 5. Development of algorithms during the hackathon. Although the submissions from the beta-testingperiod already had a high score, the participants were able to implement faster algorithms with even better scores.

Although the score of the Hough transform algorithm was one of the highest (0.97), it is slow(0.16s/event). The algorithms described below are faster, with similar scores.

5.2 The ∆φ algorithm

The delphi_multilayer submission is an evolution of the baseline example, to which it owes itssimplicity and good CPU performance. It is a O(N2) algorithm which builds a track by aligning thedirection vector of hit pairs from consecutive layers. In the (R,φ) plane this means cutting on the"second derivative" Cn = φn − 2φn−1 + φn−2, where φn is the azimuth of the hit in the n-th layer. Five

Page 9: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

simple improvements allowed to increase the algorithm efficiency from 84% to 96% with small CPUperformance degradation:

1. Allow tracks to have missing layers, including in the innermost and outermost layers;

2. Search for tracks bidirectionally: first outside-in, then inside-out;

3. Introduce a layer-dependent cut on direction vector alignment to take into account sensor pitch,and multiple scattering;

4. Follow the curvature of the track when comparing vector alignments, cutting on the differencebetween the current Cn and the running average < C >;

5. Allow hits to be shared among tracks, with a "sharing cost" tuned to limit fake tracks.

5.3 Inward search tree

The winning contribution (submitted as hyperbelle_tree ) in the hackathon was inspired by atypical track reconstruction algorithm used in high energy physics, e.g. in the software currentlydeveloped for the Belle II experiment [10]. The implemented code is similar to a combinatoric Kalmanfilter and loosely based on the principles of a Monte Carlo Search Tree. To keep the implementationsimple, the update step of the Kalman filter which, in real-world applications defines how well thealgorithm works, was not implemented.

The algorithm starts with a random hit (the seed) on the outermost layer and builds up a tree usingthe hits on the next layer. A weight is calculated for each hit, which will be discussed below. Usingonly the hit with the maximal weight and those, which have a weight close to the maximal one, a newtrack candidate is built and the procedure is repeated until the innermost layer is processed.

The weight calculation is the crucial part in this algorithm. In this example implementation,the azimuthal angle difference between two successive hits is sampled on training data, where thetruth information is known. From this histogrammed data, a probability for a given phi difference isextracted and used as a weight.

The tree search may produce multiple track candidates starting with the same seed hit, so onlythe candidate with the most consistency to a circle is stored. As there are no background hits, thetree search is continued until all hits of the event are assigned. To cope with hit inefficiencies, trackswhich do not include hits of all layers are fitted together pairwise and tested for their fit quality. Goodcombinations are merged together.

As one part of the challenge was also to write fast algorithms, a runtime optimization of the imple-mentation was performed. Caching for the hits on each layer was implemented and heavy calculationsand loops were realized using numpy functionality.

5.4 LSTM based algorithm

The LSTM submission [11] is based on a Long Short Term Memory neural network algorithm [12].LSTMs can be powerful state estimators due to their effective usage of memory and non-linear mod-eling of sequence data. For this problem, the detector layers form the sequence of the model, and thepredictions are classification scores of the hits for a given track candidate.

The input for the LSTM model was prepared as follows. First, the hit data is histogrammed foreach detector layer in the φ coordinate with 200 binary bins. A bin has a value of 1 if it contains a hit,otherwise 0. The data is thus a sequence of 9 vectors of length 200. The model operates on one track

Page 10: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

Connecting The Dots / Intelligent Trackers 2017

candidate at a time, so track "seeds" were identified using only the hits in the first detector layer. Eachtraining sample input is prepared by using only the seed hit in the first layer plus all of the event’s hitsin the remaining layers. The corresponding training sample target contains only the hits of the truetrack.

For each training sample, the LSTM model reads the input sequence and returns a predictionsequence for each corresponding detector layer. Each vector of the output sequence is fed through afully connected neural network layer and a softmax activation function to give a set of "probability"scores over the φ bins on that detector layer. For the bins that have hits in them, this score quantifiesconfidence that a hit belongs to the track candidate. The training objective, or cost function, is acategorical cross-entropy function.

Once trained, the model’s predictions can be used to assign labels to hits. Each event will contain anumber of track candidates and thus each hit will have an associated score from the model predictionson each of those candidates. The assignment is made by choosing the candidate that gave the highestprediction score to the hit.

6 Tests with post-hackathon samples

As mentioned earlier, two of the baseline algorithms were only valid under the conditions of highaverage momentum and low detector occupancy. These algorithms are expected to fail if we test on asample that moves out of these assumptions.

Some of the other algorithms, however, can learn the patterns and adapt to the conditions, if wealso train on a sample that was prepared differently. This is how RAMP is set up, and here weillustrate the advantage of using the platform for the full analysis workflow rather than submitting afixed solution.

In addition to the sample used in the hackathon (described in Section 3), we produced two alter-native samples which are more challenging in terms of the physics: lower momentum (more curved)tracks, and higher multiplicity (hence detector occupancy), summarized in Table 1.

Table 1. Parameters of the input samples.

sample multiplicity momentum range (MeV/c)hackathon 10 [300,10000]high-mult 20 [300,10000]low-mom 10 [100,1000]

Since the linear approximation algorithm relies on the assumption that the tracks are almoststraight lines, the algorithm starts to fail on a sample of tracks with more curvature. As shown inFigure 6, the test with high-multiplicity sample performs similarly to the hackathon sample, whereasthe low-momentum sample displays a big degradation in the performance.

The Hough transform takes into account the curvature of the tracks, however it also displays(Figure 7) a slight degradation of performance in the high-multiplicity and low-momentum samples.

Performances of the ∆φ algorithm (Figure 8) degrades slightly when going from the hackathonto the high-multiplicity sample, and significantly on the low-momentum sample due to difficultiesaccomodating the larger curvature of the tracks.

The Inward search tree displays (Figure 9) similar performance in the hackathon and high-multiplicity samples, and a slight decrease in the score in the low-momentum sample. The efficiencydegrades less than the score, which can be interpreted as being able to reconstruct the same tracksmissing some of the hits along the track.

Page 11: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

Figure 6. Performance of the linear approximation algorithm as a function of the angular window width, whichis the only parameter of the algorithm. The left figure displays the performance in the nominal hackathon sample,the center and right figures display the performances in the modified "high-multiplicity" and "low-momentum"samples, respectively. The optimum window size highly depends on the momentum distribution.

Figure 7. Performance of the Hough transform algorithm as a function of the angular binning. The left figuredisplays the performance in the nominal hackathon sample, the center and right figures display the performancesin the modified "high-multiplicity" and "low-momentum" samples, respectively. Although finer binning appearsto almost monotonically increasing the efficiency, it may cause the loss of hits on the tracks, hence the score hasslightly steeper dependence.

Figure 8. Performance of the ∆φ algorithm as a function of the angular window width, which is the onlyparameter of the algorithm. The left figure displays the performance in the nominal hackathon sample, the centerand right figures display the performances in the modified "high-multiplicity" and "low-momentum" samples,respectively. The optimum window size highly depends on the momentum distribution.

The LSTM algorithm displays (Figure 10) a small degradation of efficiency in the high-multiplicity sample. The low-momentum sample is out of validity of the submitted implementation.Due to high curvature, this sample has several tracks not having hits in all layers or two hits in onelayer, and the algorithm can be improved in the future to incorporate such cases in a better way.

Page 12: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

Connecting The Dots / Intelligent Trackers 2017

Figure 9. Performance of the inward search tree algorithm, as a function of the exponent of the cut applied on theweights. The left figure displays the performance in the nominal hackathon sample, the center and right figuresdisplay the performances in the modified "high-multiplicity" and "low-momentum" samples, respectively.

Figure 10. Performance of the LSTM algorithm as a function of number of epochs used in the training. The leftfigure displays the performance in the nominal hackathon sample and the right figure displays the performancein the modified "high-multiplicity" sample.

7 Conclusions and outlook

The problems for the CTDWIT 2017 hackathon were designed in a simplified way in 2D in order toallow the users to experiment on novel approaches, instead of being confined to regular methods oftrack reconstruction, within a 36 hours hackathon. In particular, the 2-dimensional setup simplified thevisualisation of both the problem and solution, and the high average momentum and low multiplicitysetup of the input samples made it easier to find a starting point for the development. With about30 submissions in two days, the hackathon established a proof of principle for a wider scale datachallenge for a more realistic experimental setup.

The post-hackathon test of the analysis in much more challenging physics conditions illustratedthat it is possible to think ahead of the available data and implement robust solutions even if somefeatures of the data are unknown. As expected, the algorithms that relied on the assumptions limitedto the provided data failed, however a number of them maintained a high score. The experimentalaspect of the hackathon was the observation of the contrasts of these approaches and their relativeperformances.

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovationprogramme under grant agreement No 654168

Page 13: Advanced European Infrastructures for Detectors at Accelerators … · 2017. 8. 25. · 1 Data challenges and RAMP ... of RAMP is that the submission is the "code" to solve the problem,

EPJ Web of Conferences

References

[1] Rapid analytics and model prototyping (2017), http://ramp.studio/[2] S. van der Walt, S.C. Colbert, G. Varoquaux, The numpy array: A structure for efficient numeri-

cal computation, computing in science & engineering (2011), http://www.numpy.org/[3] W. McKinney, pandas : powerful python data analysis toolkit (2011),http://pandas.sourceforge.net/

[4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-tenhofer, R. Weiss, V. Dubourg et al., Journal of Machine Learning Research 12, 2825 (2011)

[5] M. Ester, H.P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters inlarge spatial databases with noise (AAAI Press, 1996), pp. 226–231

[6] The ATLAS Collaboration,CERN-LHCC-2015-020, "ATLAS Phase-II Upgrade Scoping Docu-ment" (2015)

[7] The CMS Collaboration, CERN-LHCC-2015-019, "CMS Phase-II Upgrade Scoping Document"(2015)

[8] The Tracking Machine Learning Challenge (2017), https://github.com/trackml[9] TrackMLRamp Simulation (2017), https://github.com/ramp-kits/HEP_tracking/tree/ramp_ctd_2017/simulation

[10] T. Abe et al. (Belle-II) (2010), arXiv:1011.0352[11] S. Farrell, The HEP.TrkX Project: deep neural networks for HL-LHC online and offline tracking,

in Connecting The Dots / Intelligent Trackers (2017)[12] S. Hochreiter, J. Schmidhuber, Neural Comput. 9, 1735 (1997)