ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph

Large-scale Neural ModelingLarge-scale Neural Modelingin MapReduce and Giraphin MapReduce and Giraph

Co-authorsNicholas D. Spielman Neuroscience Program University of St. Thomas

Presenter Shuo Yang Graduate Programs in Software University of St. Thomas

Special thanksBhabani Misra, PhD Graduate Programs in Software University of St. Thomas

Jadin C. Jackson PhD Department of Biology University of St. Thomas

Bradley S. Rubin, PhD Graduate Programs in Software University of St. Thomas

Why Hadoop & What is Hadoop

Why not supercomputers?

Expensive

Limited access

Scalability

Why Hadoop?

Runs on commodity hardware

Scalable

Full-fledged eco-system & community

Open-source implementation of MapReduce

based on Java

MapReduce Model

Client

MapReduce

HDFS

Split DataOutput

Map

MapReduce Output

…....…....

∑ I

input currents from neighbors

∆vI1

I2

In

currents to all neighbors

Synaptic weight matrix

0 1000Time Step

Neuron ID

Simulation results

0

2500

Neural Model (Izhikevich model)

…....…....

∑ I

input currents from neighbors

∆vI1

I2

In

currents to all neighbors

Synaptic weight matrix

0 1000Time Step

Neuron ID

Simulation results

0

2500

Neural Model (Izhikevich model)

This is a graph structure

MapperN1 I2

I3

N1 I2

I3

I2

I3

Reducer

MapperN2 I1

I3

N2 I1

I3

I1

I3

MapperN3 I2

I1

N3 I2

I1

I2

I1

Reducer

Reducer

N1 I2

I3

N2 I1

I3

N3 I2

I1

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS

N1 and its local structure



MapSort &Shuffle Reduce

Basic MapReduce Implementation

input from previous job

MapperN1 I2

I3

N1 I2

I3

I2

I3

Reducer

MapperN2 I1

I3

N2 I1

I3

I1

I3

MapperN3 I2

I1

N3 I2

I1

I2

I1

Reducer

Reducer

N1 I2

I3

N2 I1

I3

N3 I2

I1

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS




MapSort &Shuffle Reduce

Basic MapReduce Implementation

input from previous job

Problems:synaptic currents are sent directly to the reducers without local aggregation

The graph structure is shuffled in each iteration

N1 I2

I3

Mapper

N2 I1

I3

N3 I2

I1

HDFS

initial inputMap

Sort &Shuffle Reduce

In-Mapper Combining (IMC, introduced by Lin & Schatz)

N1 I2

I3

N2 I1

I3

N3 I2

I1

I1

I1

I2

I2

I3

I3

∑

∑

∑

Reducer

Reducer

Reducer

I3

N2 I1

I3

N3

I1

update N1

update N2

update N3

I2

I2

N1 I2

I3

Mapper

N2 I1

I3

N3 I2

I1

HDFS

initial inputMap

Sort &Shuffle Reduce

In-Mapper Combining (IMC, introduced by Lin & Schatz)

N1 I2

I3

N2 I1

I3

N3 I2

I1

I1

I1

I2

I2

I3

I3

∑

∑

∑

Reducer

Reducer

Reducer

I3

N2 I1

I3

N3

I1

update N1

update N2

update N3

I2

I2

The graph structure is still shuffled!

MapperN1 I2

I3

I2

I3

Reducer

MapperN2 I1

I3

I1

I3

MapperN3 I2

I1

I2

I1

Reducer

Reducer

N1 I2

I3

N2 I1

I3

N3 I2

I1

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS




Schimmy (introduced by Lin & Schatz)

N1 I2

I3

N2 I1

I3

N3 I2

I1

Map

remotely read graph structure

sort & shuffle Reduce

MapperN1 I2

I3

I2

I3

Reducer

MapperN2 I1

I3

I1

I3

MapperN3 I2

I1

I2

I1

Reducer

Reducer

N1 I2

I3

N2 I1

I3

N3 I2

I1

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS





N1 I2

I3

N2 I1

I3

N3 I2

I1

Map



Problems:Remote reading from HDFS

The graph structure is read and written in each iteration

MapperN1 I2

I3

I2

I3

Reducer

MapperN2 I1

I3

I1

I3

MapperN3 I2

I1

I2

I1

Reducer

Reducer

N1 I2

I3

N2 I1

I3

N3 I2

I1

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS





N1 I2

I3

N2 I1

I3

N3 I2

I1

Map



Observation:The graph structure is read-only!

MapperN1 I2

I3

Reducer

MapperI1

I3

MapperN3 I2

I1

Reducer

Reducer

N1

N2

N3

sum currents to N1

sum currents to N2

sum currents to N3

update N1

update N2

update N3

HDFS

initial input

write back to HDFS

Mapper-side Schimmy

N1 I2

I3

N2 I1

I3

N3 I2

I1

N2

Mapsort & shuffle Reduce

Drawbacks of Graph algorithm in MapReduce

Non-intuitive and hard to implement

Not efficiently expressed as iterative algorithms

Not optimized for large numbers of iterations

input from HDFS

output to HDFS

input from HDFS

output to HDFS

Mapper Intermediate files Reducer

Iterate

Startup Penalty Disk Penalty Disk Penalty

Not optimized for large numbers of iterations

Giraph

N1 I2

I3

N2 I1

I3

N3 I2

I1

N1 I2

I3

N2 I1

I3

N3 I2

I1

HDFS

Load input Synchronous barrier Synchronous barrier

N1 I2

I3

N2 I1

I3

N3 I2

I1

HDFS

…...

Write results back

Iterative graph processing system

Powers Facebook graph search

Highly scalable

Based on BSP model

Mapper-only job on Hadoop

In-memory computation

“Think like a vertex”

More intuitive APIs

Giraph

N1 I2

I3

N2 I1

I3

N3 I2

I1

N1 I2

I3

N2 I1

I3

N3 I2

I1

HDFS

Load input Synchronous barrier Synchronous barrier

N1 I2

I3

N2 I1

I3

N3 I2

I1

HDFS

…...

Write results back

Iterative graph processing system

Powers Facebook graph search

Highly scalable

Based on BSP model

Mapper-only job on Hadoop

In-memory computation

“Think like a NEURON”

More intuitive APIs

Comparison of running time of each iteration

Comparison of speeds – 40 ms simulation

6% 0% -11% -48% -64% -91%

Conclusion

Hadoop is capable of modeling large-scale neural networks.

Based on IMC and Schimmy, our Mapper-side Schimmy improves MapReduce graph algorithms

Where graph structure is read-only.

Vertex-centric approaches, such as, Giraph showed superior performance. However,

# of iterations specified as a global variable

Limited by memory per node

Not widely adopted by industry

Large-scale Neural ModelingLarge-scale Neural Modelingin MapReduce and Giraphin MapReduce and Giraph

Co-authorsNicholas D. Spielman Neuroscience Program University of St. Thomas

Presenter Shuo Yang Graduate Programs in Software University of St. Thomas

Special thanksBhabani Misra, PhD Graduate Programs in Software University of St. Thomas

Jadin C. Jackson PhD Department of Biology University of St. Thomas

Bradley S. Rubin, PhD Graduate Programs in Software University of St. Thomas

Comparison of speeds – 40 ms simulation

Comparison of speeds – 20 ms to 40 ms simulation

ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph

Software

hdfs n1

n2 sum currents

n1 sum currents

local structure n3

local structure n2

neighbors vi1 i2

hdfs input

local structure schimmy