ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
DESCRIPTION
Using MapReduce and Giraph to model large-scale neural networksTRANSCRIPT
Large-scale Neural ModelingLarge-scale Neural Modelingin MapReduce and Giraphin MapReduce and Giraph
Co-authorsNicholas D. Spielman Neuroscience Program University of St. Thomas
Presenter Shuo Yang Graduate Programs in Software University of St. Thomas
Special thanksBhabani Misra, PhD Graduate Programs in Software University of St. Thomas
Jadin C. Jackson PhD Department of Biology University of St. Thomas
Bradley S. Rubin, PhD Graduate Programs in Software University of St. Thomas
Why Hadoop & What is Hadoop
Why not supercomputers?
Expensive
Limited access
Scalability
Why Hadoop?
Runs on commodity hardware
Scalable
Full-fledged eco-system & community
Open-source implementation of MapReduce
based on Java
MapReduce Model
Client
MapReduce
HDFS
Split DataOutput
Map
MapReduce Output
…....…....
∑ I
input currents from neighbors
∆vI1
I2
In
currents to all neighbors
Synaptic weight matrix
0 1000Time Step
Neuron ID
Simulation results
0
2500
Neural Model (Izhikevich model)
…....…....
∑ I
input currents from neighbors
∆vI1
I2
In
currents to all neighbors
Synaptic weight matrix
0 1000Time Step
Neuron ID
Simulation results
0
2500
Neural Model (Izhikevich model)
This is a graph structure
MapperN1 I2
I3
N1 I2
I3
I2
I3
Reducer
MapperN2 I1
I3
N2 I1
I3
I1
I3
MapperN3 I2
I1
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its local structure
N2 and its local structure
N3 and its local structure
MapSort &Shuffle Reduce
Basic MapReduce Implementation
input from previous job
MapperN1 I2
I3
N1 I2
I3
I2
I3
Reducer
MapperN2 I1
I3
N2 I1
I3
I1
I3
MapperN3 I2
I1
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its local structure
N2 and its local structure
N3 and its local structure
MapSort &Shuffle Reduce
Basic MapReduce Implementation
input from previous job
Problems:synaptic currents are sent directly to the reducers without local aggregation
The graph structure is shuffled in each iteration
N1 I2
I3
Mapper
N2 I1
I3
N3 I2
I1
HDFS
initial inputMap
Sort &Shuffle Reduce
In-Mapper Combining (IMC, introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
I1
I1
I2
I2
I3
I3
∑
∑
∑
Reducer
Reducer
Reducer
I3
N2 I1
I3
N3
I1
update N1
update N2
update N3
I2
I2
N1 I2
I3
Mapper
N2 I1
I3
N3 I2
I1
HDFS
initial inputMap
Sort &Shuffle Reduce
In-Mapper Combining (IMC, introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
I1
I1
I2
I2
I3
I3
∑
∑
∑
Reducer
Reducer
Reducer
I3
N2 I1
I3
N3
I1
update N1
update N2
update N3
I2
I2
The graph structure is still shuffled!
MapperN1 I2
I3
I2
I3
Reducer
MapperN2 I1
I3
I1
I3
MapperN3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its local structure
N2 and its local structure
N3 and its local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort & shuffle Reduce
MapperN1 I2
I3
I2
I3
Reducer
MapperN2 I1
I3
I1
I3
MapperN3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its local structure
N2 and its local structure
N3 and its local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort & shuffle Reduce
Problems:Remote reading from HDFS
The graph structure is read and written in each iteration
MapperN1 I2
I3
I2
I3
Reducer
MapperN2 I1
I3
I1
I3
MapperN3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its local structure
N2 and its local structure
N3 and its local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort & shuffle Reduce
Observation:The graph structure is read-only!
MapperN1 I2
I3
Reducer
MapperI1
I3
MapperN3 I2
I1
Reducer
Reducer
N1
N2
N3
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
Mapper-side Schimmy
N1 I2
I3
N2 I1
I3
N3 I2
I1
N2
Mapsort & shuffle Reduce
Drawbacks of Graph algorithm in MapReduce
Non-intuitive and hard to implement
Not efficiently expressed as iterative algorithms
Not optimized for large numbers of iterations
input from HDFS
output to HDFS
input from HDFS
output to HDFS
Mapper Intermediate files Reducer
Iterate
Startup Penalty Disk Penalty Disk Penalty
Not optimized for large numbers of iterations
Giraph
N1 I2
I3
N2 I1
I3
N3 I2
I1
N1 I2
I3
N2 I1
I3
N3 I2
I1
HDFS
Load input Synchronous barrier Synchronous barrier
N1 I2
I3
N2 I1
I3
N3 I2
I1
HDFS
…...
Write results back
Iterative graph processing system
Powers Facebook graph search
Highly scalable
Based on BSP model
Mapper-only job on Hadoop
In-memory computation
“Think like a vertex”
More intuitive APIs
Giraph
N1 I2
I3
N2 I1
I3
N3 I2
I1
N1 I2
I3
N2 I1
I3
N3 I2
I1
HDFS
Load input Synchronous barrier Synchronous barrier
N1 I2
I3
N2 I1
I3
N3 I2
I1
HDFS
…...
Write results back
Iterative graph processing system
Powers Facebook graph search
Highly scalable
Based on BSP model
Mapper-only job on Hadoop
In-memory computation
“Think like a NEURON”
More intuitive APIs
Comparison of running time of each iteration
Comparison of speeds – 40 ms simulation
6% 0% -11% -48% -64% -91%
Conclusion
Hadoop is capable of modeling large-scale neural networks.
Based on IMC and Schimmy, our Mapper-side Schimmy improves MapReduce graph algorithms
Where graph structure is read-only.
Vertex-centric approaches, such as, Giraph showed superior performance. However,
# of iterations specified as a global variable
Limited by memory per node
Not widely adopted by industry
Large-scale Neural ModelingLarge-scale Neural Modelingin MapReduce and Giraphin MapReduce and Giraph
Co-authorsNicholas D. Spielman Neuroscience Program University of St. Thomas
Presenter Shuo Yang Graduate Programs in Software University of St. Thomas
Special thanksBhabani Misra, PhD Graduate Programs in Software University of St. Thomas
Jadin C. Jackson PhD Department of Biology University of St. Thomas
Bradley S. Rubin, PhD Graduate Programs in Software University of St. Thomas
Comparison of speeds – 40 ms simulation
Comparison of speeds – 20 ms to 40 ms simulation