exploring the performance of spark for a scientific use...
TRANSCRIPT
Saba Sehrish ([email protected]), Jim Kowalkowski and Marc PaternoIEEE International Workshop on High-Performance Big Data Computing (HPBDC) In conjunction with the 30th IEEE IPDPS 201605/27/2016
Exploring the Performance of Spark for a Scientific Use Case
Scientific Use Case: Neutrino Physics• The neutrino is an elementary particle which holds no
electrical charge, travels at nearly the speed of light, and passes through ordinary matter with virtually no interaction.– The mass is so small that it is not detectable with our
technology• Neutrinos are among the most abundant particles in the
universe.– Every second trillions of neutrinos from the sun pass through
your body. • There are three flavors of neutrino: electron, muon and tau.
– As a neutrino travels along, it may switch back and forth between the flavors. These flavor "oscillations" confounded physicists for decades.
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case2
Neutrino Unknowns
• The NOvA experiment is constructed to answer the following important questions about neutrinos:– What are the heaviest and lightest of neutrinos? – Can we observe muon neutrinos changing to electron
neutrinos?– Do neutrinos violate matter/anti-matter symmetry?
• NOvA - NuMI Off-Axis Electron Neutrino Appearance– NuMI - Neutrinos from the Main Injector
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case3
• Fermilab’s accelerator complex produces the most intense neutrino beam in the world and sends it straight through the earth to northern Minnesota, no tunnel required.
• Moving at close to the speed of light, the neutrinos make the 500-mile journey in less than three milliseconds.
• When a neutrino interacts in the NOνA detector in Minnesota, it creates distinctive particle tracks. • Scientists study the tracks to
better understand neutrinos
The NOvA Experiment
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case4
Fermilab
Ash River
The NOvA experiment is composed of two liquid scintillator (95% baby oil) detectors, • A 14,000 ton Far Detector on
the surface at Ash River • A ~300 ton Near Detector
(~100m underground) at Fermilab, 1 km from source
The NOvA detectors are constructed from planes of PVC modules alternating between vertical and horizontal orientations. • they form about 1000
planes of 50ft stacked tubes, each about 3x5cm
The NOvA Detectors
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case5
Neutrino interactions recorded by NOvA
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case6
Muon-neutrino Charged-Current Candidate
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case7
Beam direction
XZ-view
YZ-view
Colour shows charge
9
Electron-neutrino Charged-Current Candidate
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case8
XZ-view
YZ-view
Beam direction
Colour shows charge
10
Physics problem
• Classify types of interactions based on patterns found in the detector:– Is it a muon or electron neutrino?– Is it a charged current or neutral current?
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case9
Library Event Matching Algorithm
• Classify a detector event by comparing its cell energy pattern to a library of 77M simulated events cell energy patterns, choosing 10K that are “most similar”– Compare the pattern of energy (hit) deposited in the cells of one
event with the pattern in another event. • The “most similar” metric is motivated by an electrostatic
analogy: energy comparison for two systems of point charges laid on top of each other
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case10
Is Spark a good technology for this problem?
• Goal is to to classify 100 detector events per second– In the worst case this equates to 7.7B similarity metric
calculations per second using the 77M event library! – NOvA is thinking about increasing the library size to 1B events
to improve the accuracy• Spark has attractive features
– In-memory large-scale distributed processing– Uses distributed file system such as HDFS, which supports
automatic data distribution across computing resources– Language supports the operations needed to implement the
algorithm– Good for similar repeated analysis performed on the same large
data sets
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case12
Implementation in Spark• Used Spark’s DataFrame API (Java)• Input data - used JSON format (read once)• Transformation – create a new data set from an existing one
– filter• Return a new dataset formed by selecting those elements of the
source on which func returns true.– map
• Return a new distributed dataset formed by passing each element of the source through a function func.
• Action – return a value to the driver program after running a computation on the dataset – top
• Return the first n elements of the RDD using either their natural order or a custom comparator.
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case13
Flow of operations and data in Spark
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case14
Transformation Action
List <Tuple2<Long, Tuple2<Float,
String>>>scores = templates.filter().map(…){…//Calculate E = Eaa + Eab + Ebb
for all events in template}
scores.top(numbestmatches, new
TupleComparator())
Output
read JSON files (DataFrame templates = sqlContext.jsonFile(“lemdata/”);)
DataframeDataframeDataframeDataframe(LEM Data)
Transformation Action
List <Tuple2<Long, Tuple2<Float,
String>>>scores = templates.filter().map(…){…//Calculate E = Eaa + Eab + Ebb
for all events in template}
scores.top(numbestmatches, new
TupleComparator())
Output
Library of events is read once and stored in
Dataframes in memory
Event 1
Sequence of operations per event classification
Results from Cori (Spark 1.5)
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case15
nodes
Even
t pro
cess
ing
time
(sec
onds
)
20
40
60
80
16 32 64 128 256 512 1024
●●●●
●●●●
●●●●
●●
●●
●●
●●
●●●● ●●●●
4s
Comments about Spark• Adding a new column to the spark DataFrame from a different
DataFrame is not supported – Our data was read in to two different DataFrames– Performance of join operation is extremely slow
• It is hard to tune a Spark system; Cori was far better than ours– How many tasks per node?– How much memory per node?– How many physical disks per node?
• Interactive environment is good for rapid development– pyspark or scala-shell or sparkR
• Rapid Evolution of this product– More than 4 versions since we started developing – Introduction of DataFrame interface, which helped to improve the
expression of the problem we are solving5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case16
Alternative implementation in MPI• Input data
– data structures similar to the Spark JSON schema (read once)– Binary format, much faster to read: critical for development
• Data distribution and task assignment– Fixed by file size
• Computations– Armadillo (dot) and the C++ STL (std::partial_sort)
• Stages– MPI_Bcast to hand out an event to be classified– Filter the input data sets based on the number of hits in the
event to be classified– Scoring and sorting to find best 10K matches– MPI_reduce to collect the best 10K across all the nodes
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case17
Flow of operations and data in the MPI implementation node7
node…node1
GPFSNOvA nodes
JSON files200 metadata
200 event
art / ROOT files200 metadata
200 event
NOvA simulation
Binary files200 metadata
200 event
Binary converter~74M events
core-Ncore-1
Rank Assignment(core within node)
Load my subset of template events
into memory
Order by metadata values theta1 &
nhits
receive broadcast event-to-match
find range of template events to
match against using metadata
run LEM on range
sort by best 10K results
all-reduce to merge results from me to find overall best
Initialize Run
if rank0, get event and broadcast
if rank0, report results
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case18
• Classified 100 events• 200 cores were used by specifying 7
nodes with 32 cores• No event took more than 0.4 seconds
to classify• The un-tuned MPI implementation
using 7 nodes is 10 times faster than the Spark implementation using 512 nodes
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case19
Results on Cori
Event scoring time (seconds)
Cou
nt
0
5
10
0.00 0.05 0.10 0.15 0.20
Best 10K sorting and reduction time (seconds)
Cou
nt
0
2
4
6
8
10
0.0019 0.0020 0.0021
Event processing time (seconds)
Cou
nt
0
20
40
60
80
0.1 0.2 0.3 0.4
The distribution of score times for the sample of 100 events to be classified.
The distribution of time taken to obtain the best 10 thousand scores.
The distribution of the total time to classify each event
Comparison of the two implementations
• Orchestration– Data placement and task assignment abstracted from user in
the Spark implementation as compared with MPI• Scaling
– Good scaling observed in the Spark implementation without any tuning. Good scaling is possible with the MPI implementation but with a lot of work.
• Application tuning– Hard to isolate slow performing tasks and stages in Spark due
to lazy evaluation as compared with well-defined steps in MPI• Wrapped types
– Use of boxed primitives have performance cost for the Spark implementation as compared with the use of primitives in the MPI implementation
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case20
Comparison of the two implementations
• Vectorized linear algebra library– Unavailability of the the high performance linear algebra library
in the Spark implementation contributed to the slow performance of repetitive numerical computations as compared with the MPI implementation, which used Armadillo.
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case21
References1. A Fermilab Today article
– Nine weird facts about neutrinos by Tia Miceli, Fermilab (http://www.fnal.gov/pub/today/archive/archive_2014/today14-11-06.html)
2. Public presentations on Fermilab NOvA – The Status of NOvA by Gavin Davies, Indiana University (
http://www-nova.fnal.gov/presentations.html )3. Fermilab NOvA webpage
1. http://www-nova.fnal.gov2. http://www-nova.fnal.gov/NOvA_FactSheet.pdf
4. Library Event Matching event classification algorithm for electron neutrino interactions in the NOvA detector – http://arxiv.org/abs/1501.00968
5. Spark at NERSC– http://www.nersc.gov/users/data-analytics/data-analytics/spark-distributed-analytic-
framework/6. Armadillo – C++ linear algebra library
– http://arma.sourceforge.net
5/18/16Saba Sehrish | Evaluating the Performance of Spark for a Scientific Use Case22