hrl proprietary june 18, 2010 work performed by hrl under darpa contract hrl0011-09-c-001 1 large...
Post on 28-Dec-2015
219 Views
Preview:
TRANSCRIPT
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 1
Large Scale Simulations
HRL Shared Software FrameworkGPU Computing cluster
Narayan Srinivasa Aleksey Nogin
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 2
Shared Software Infrastructure
• Infrastructure overview – three aspects:– Legal “limited LGPL” – like agreement– General Public License (GPL) does not permit incorporating HRL code into
proprietary programs. Since the HRL code is a subroutine library, you may consider it more useful to permit linking proprietary applications (which will be our partners code) with the library. This is allowed by LGPL.
– Subversion server for sharing code– The API and the software itself
• Summary of the latest:– Legal agreement is “stuck” on some technicalities and it would take time to
resolve• In the meantime, we will rely on existing subcontracts for HRL<->Sub sharing
– The subversion server “ExRep” is fully operational• Already contains the HRL Shared Infrastructure code.
– The GPU cluster is fully operational– We have ported our infrastructure to GPUs (full 1ms updates!)– Most of the multi-GPU/multi-node code is written
• Some refactoring of the initialization and “glue” code still needed.
Work performed by HRL under DARPA contract HRL0011-09-C-001
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 3
HRL Shared Source Agreement
• Terms (reminder):– LGPL-style, but limited to “SyNAPSE Team Members” and “SyNAPSE purposes”
only– “Shared Source” code can be modified and redistributed to any “SyNAPSE Team
Members”– Object code have to be accompanied by source – or source can be placed in the
Subversion repository– Code for separate pieces that only use the “shared source” infrastructure through its
APIs does not have to become a part of “shared source”• You do not have to release your models to “shared source”
• Currently “stuck” on export restriction technicalities– Would take time to resolve
• Unfortunately our legal turnaround is very slow
– For now, we will rely on existing subcontracts for 2-way HRL ↔ Sub sharing• Disable Sub ↔ Sub sharing not covered by subcontracts:
– Provide a Shared area with read-only access to not-HRL people– Separate areas for those who want to share with HRL
Work performed by HRL under DARPA contract HRL0011-09-C-001
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 4
Subversion Repository
• The subversion server “ExRep” is fully operational– Already contains the HRL Shared Infrastructure code.
• You have to agree to ExRep Term and Conditions to get access– This is not SyNAPSE-specific and separate from subcontracts and Shared
Source Agreement
– Agreement binds you as an ExRep user, not your Institution• E.g. you promise not to share your account credentials with others
– Aleksey emailed all prospective users a copy of the Agreement• You need to send Aleksey an email stating that you agree.
• SSH public keys are used to grant access– Aleksey have emailed all prospective users instructions
– You need to email Aleksey a copy of your public key
• ExRep is capable of sending email notifications for all commits– We are waiting on IT to allow outgoing emails to non-HRL accounts
Work performed by HRL under DARPA contract HRL0011-09-C-001
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 5
GPU-Based High Performance Computing Cluster
HRL has purchased a high-performance computing cluster at no cost to DARPA– SyNAPSE project will be the primary user– Head node:
• 2of: NVIDIA Tesla C1060 GPUs, each with:– 933 GFLOP peak performance– 4GB of GDDR3 memory, at 102 GB/sec– PCIe 2.0 x16 interconnect (16 GB/sec)
• 48GB RAM• 2 of: 4-core Nehalem 2.66 Ghz CPUs (64-bit)• 11TB HDDs (RAID configuration – 8.5TB usable)
– 91 compute nodes, each:• 2 of: NVIDIA Tesla M1060 GPUs• 12 GB RAM• 2 of: 4-core Nehalem 2.26 Ghz CPUs (64-bit)
– Hi-speed 20Gbps InfiniBand Interconnect– 1Gbps Ethernet switch
The cluster is now fully operational
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 6
GPU Cluster – InfiniBand Fabric
96-port fast InfiniBand fabric
36-portSwitch
…
16 compute nodes (20Gbps each)
36-portSwitch
…16 com
pute nodes
(20Gbps each)36-port
Switch
…
16 c
ompu
te n
odes
(20G
bps
each
)
36-portSwitch
…
16 compute nodes
(20Gbps each)
36-portSwitch
…
16 compute nodes (20Gbps each)
36-portSwitch
…
16 c
ompu
te n
odes
(20G
bps
each
)
• Switches run at 40Gbps
• Interface cards run at 20Gbps.
• Each 2 switches connected at 160Gbps
4x40Gbps
4x40
Gbp
s
4x40Gbps
4x40Gbps
4x40Gbps 4x40Gbps
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 8
GPU and multi-GPU code
• We have ported our infrastructure to GPUs – Full 1ms updates, do not have to rely on UCI 1s batching
• A closer match to CPU simulations and hardware
– Do not implement axonal delays– Artificial “80%/20%” uniformly connected network:
• 105 neurons 107 synapses @ 10Hz – runs in real time
– A 2D 2-layer random Gaussian connectivity network:• 0.3*105 neurons 0.8*107 synapses @10Hz – 3.2x faster than real time
– Generic experiment code runs the same on CPU/GPU based on a compilation flag in a configuration file.
• We have mostly implemented an MPI-based framework:– Running on multiple GPUs, multiple CPUs, or even a mix of the two– Initialization code needs to be rewritten to work with MPI– The API for specifying the experiments need to be updated to work
with the new code.
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 11Work performed by HRL under DARPA contract HRL0011-09-C-001
Shared Simulation & Experimentation Infrastructure
For each experiment, a custom binary is compiled, with 4 components:
Code Glue
Network Creating a description of the neural network to be simulated (connectivity, parameters, etc)
• PyNN style C++ API
• Translation code
Inputs Generating the input signals for the network, or:
Taking the input signals from the virtual environment
C++ API
Computation Simulating the spiking neural network on a CPU, GPU, or a cluster; may have experiment-specific compilation options
• C++ API• Build scripts
Analysis Printing experiment-specific and generic statistics during the simulation; saving synaptic weights and/or spike trains for off-line analysis.
C++ APIs:• On-line• Off-line.
• Portions of the code will be experiment-specific• Portions of the code will be provided by the shared infrastructure
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 12Work performed by HRL under DARPA contract HRL0011-09-C-001
Neural Networks – Levels of Flexibility
Currently we support three different levels of flexibility:– Per-simulation – compile-time switches and compile-time global
constants defined in build scripts (including “experiment definition files”). Fastest and most efficient, least flexible.
– Per-neuron – including defining properties of synapses as a property of pre- or post-synaptic neurons.
– Per-synapse – memory-intensive, would like to avoid.
In general, would prefer to have the least flexibility that we can get away with.
Simulator may support features that are not (yet?) expected to be included in hardware, but we have to be careful.
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 13Work performed by HRL under DARPA contract HRL0011-09-C-001
Neural Model Flexibility
Per-simulation Per-neuron Per-synapse
Neuron model LIF or Izhikevich Izhikevich a,b,c,d
Synapse model • Enabled or not:– Inhibitory STDP– Short-term plasticity– Weighted STDP
• Output: instant. or exp. decay.
• Parameters:– STDP: A+,A-,t+,t-– Max weight– STP, Inh STDP, etc
• Whether or not:– Plastic (post &
pre – plastic when both say “yes”)
– Inhibitory (pre)• Parameters:
– Will be made per-neuron as needed
Axonal delays
(may decide not to support)
External inputs • Spike trains (“dummy neurons”)
• Current injection
Off-line data What to collect Spike trains (new) Synaptic weights
Outputs Spike trains
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 14
ComputeExecute simulation
steps
ComputeExecute simulation
steps
CPUCPU CUDACUDA
Users extend, if neededUsers extend, if needed
API Overview
BuildNetworkIncremental construction of neural networks
BuildNetworkIncremental construction of neural networks
NetworkImmutable portion of the network state (connectivity, parameters)
NetworkImmutable portion of the network state (connectivity, parameters)
StateMutable portion of the network state (weights, statistics)
StateMutable portion of the network state (weights, statistics)
User’s code for constructing a network
User’s code for constructing a network Statistics
At regular interval – save data for future analysis, print basic stats
StatisticsAt regular interval – save data for future analysis, print basic stats
Users extend, if neededUsers extend, if needed
InputGenCall-back functions to fill in input spike trains and/or currents
InputGenCall-back functions to fill in input spike trains and/or currents
ExperimentControls the computation
ExperimentControls the computation
MainMainVirtual Environment
(optional)
API/control dependencies, not data flow
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 15
Building networks incrementally.API Fragments (Simplified)
struct NeuronKind {struct NeuronKind SetInhibitory(bool inhibotory = true);NumberGen a, b, c, d; // Izhikevich parameters – constant, or probability distribution parameters
}
class BuildNetwork { // Add a new set of neurons to the networkPopulation NewPopulation (int size, NeuronKind & neuron);
};
struct SynapseKind {
NumberGen weight; // Initial weight
NumberGen delay; // Axonal delay
}
class Population { // New synapses - to a different populations. Return the number of synapses
int ConnectFull(NeuronPopulation& to, SynapseKind & synapse);
int Connect1to1(NeuronPopulation& to, SynapseKind & synapse);
int ConnectRandom(NeuronPopulation& to, float probability, SynapseKind & synapse);
int ConnectGauss(NeuronPopulation& to, float max_probability, float expected_inputs, SynapseKind);
int ConnectFixedPreNum(NeuronPopulation& to, float n, const SynapseKind & synapse);
}
Work performed by HRL under DARPA contract HRL0011-09-C-001
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 16
Building networks incrementally.Example
BuildNetwork build;
Population excitatory = build.NewPopulation (800, NeuronKind());
Population inhibitory = build.NewPopulation(200, NeuronKind().SetInhibitory());
excitatory.ConnectRandom(excitatory, 0.2); // E -> E
excitatory.ConnectRandom(inhibitory, 0.2); // E -> I
inhibitory.ConnectRandom(excitatory, 0.2); // I -> E
inhibitory.ConnectRandom(inhibitory, 0.2); // I -> I
Work performed by HRL under DARPA contract HRL0011-09-C-001
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 17
Overview of the Shared Framework Code
• Major components:– The simulator and related glue code (meant to be immutable)– mk/config file - selects the main parts:
• Which experiment to run (some experiments have variants)• Which computation engine to use (cpu or cuda)• Which communication engine to use (null or mpi)
– Experiment-definition file (roughly one per experiment):• Defines the per-simulation parameters• Specifies which files contain the experiment code modules
– Experiment code:• May be split into several files• Pieces of an experiment code can be reused in different experiments
– Analyzers for off-line data analysis• Generic and experiment-specific
• Code in ExRep contains the complete simulator, and several sample experiments and analyzers.
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 18
ExRep Directory Structure
• ExRep repository root: svn+ssh://svn@exrep.hrl.com/exrep/– Note: you do not have access to ExRep root, only to some
particular subdirectories• Many versions of Subversion have a problem with it, make sure to use
svn version 1.6.11 – this latest version fixes some bugs related to this scenario.
– SyNAPSE area in ExRep:
…/CRAD/SyNAPSE/• SyNAPSE Shared Area – a subdirectory of …/SyNAPSE
…/Code/Shared– Mentioned by name in the Shared Source Agreement– Right now you’ll only get read access – and only to this subdirectory
• Other subdirectories of SyNAPSE directory will be created as needed
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 19
SyNAPSE Shared Framework Directory Structure
Under svn+ssh://svn@exrep.hrl.com/exrep/CRAD/SyNAPSE/Code/Shared/– The OMake subdirectory contains some generic build scripts for the OMake Build
Tool – CUDA, MPI, etc.
– The Sim subdirectory contains the framework itself, with subdirectories:• hrlsim – core framework, C++ code and headers
– hrlsim/config.h – generated by the build process, summarizes all the per-simulation parameters (with comments) – more on next slide
• mk – core framework, build scripts– mk/config – global configuration file for the build (not in ExRep, will be created on
first invocation of the build tool)– mk/compute-consts.om – default simulation parameters
• sample_exp – sample simulation experiments and helper/template code– …/mk/*.exp – experiment definition files– …/src/ – C++ source files for experiments:
» setup a network, generate inputs, print extra statistics in on-line mode
– …/analyzers/ – off-line analysis templates and samples (C++)– …/scripts/ – shell/Python scripts for follow-up analysis and visualization
• Data – directory for temporary off-line data (weights, spikes, etc).
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 20
Running an Existing Experiment
• Once:– Download the OMake Build Tool from http://omake.metaprl.org/
• We will probably need to release an updated version soon– Go to Sim directory– Run “omake” – this will create a default mk/config file
• Edit the mk/config file– It has several configuration variables, each fully commented
• Which experiment, which computation engine, initial RNG seed, etc– The file is re-created by OMake on every run
• Only value changes for existing variables are allowed/preserved– The list of valid experiments is generated from experiment definition files
(sample_exp/mk/*.exp)• Run “omake” to build the custom simulator
– Generates the ./sim or ./sim-cuda binary– Will generate hrlsim/config.h in the process
• Useful summary of per-simulation parameters– Will also build all applicable analyzers
• Run the custom simulator “./sim N” (or “./sim-cuda N”)– Where “N” is the simulation duration in virtual seconds– “N” can be omitted when the experiment definition file gives a default duration
R A D I C A LR A D I C A L HRL PROPRIETARY
June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 21
Defining a New Experiment
• Create a Sim/private_exp directory– With the subdirectories following the structure of the sample_exp
• Create a new experiment definition file– Needs to go into Sim/private_exp/mk/– With a .exp extension– Use an existing sample file as a template
• Create the C++ code– Needs to go into Sim/private_exp/src/– The experiment definition file should list all the .cpp files you are
using – from either private_exp/src of sample_exp/src
• Proceed as described in the previous slide– After you create your new experiment definition file and run “omake”
for the first time, the list of available experiments in mk/config will include your new experiment
top related