implementation of conduction delay and collective ... · these schemes have been implemented on...

60
Implementation of Conduction Delay and Collective Communication in a Parallel Spiking Neural Network Simulator TAHMINA AKHTER Master of Science Thesis Stockholm, Sweden 2011

Upload: lekhue

Post on 19-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Implementation of Conduction Delay

and Collective Communication in a Parallel Spiking

Neural Network Simulator

T A H M I N A A K H T E R

Master of Science Thesis Stockholm, Sweden 2011

Implementation of Conduction Delay

and Collective Communication in a Parallel Spiking

Neural Network Simulator

T A H M I N A A K H T E R

Master’s Thesis in Biomedical Engineering (30 ECTS credits) at the Computational and Systems Biology Master Programme Royal Institute of Technology year 2011 Supervisor at CSC was Cristina Meli Examiner was Anders Lansner TRITA-CSC-E 2011:134 ISRN-KTH/CSC/E--11/134--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.kth.se/csc

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

Abstract As we know neural networks have a parallel structure and it is well suited for implementations in a

parallel environment. The Bayesian Confidence Propagation Neural Network (BCPNN) which has been

developed past thirty years is the main subject this thesis. An important issue is the implementation of

communications between the processors. The aim of this thesis is to investigate point to point and

collective communication methods and check how it works in real time. A second goal is to introduce

time delay in point-to-point communication. These schemes have been implemented on Blue Gene

Supercomputer using Message Passing Interface (MPI). At the end of thesis, the comparison between the

two communication methods and the results of the two different models are shown.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

To my most loving father

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

Acknowledgements I wish to thank my supervisor and examiner Professor Anders Lansner for guiding me through the

fascinating world of computational neuroscience. I am very grateful for having the opportunity to work

with him.

I owe my co-supervisor Cristina Meli, who has always been helpful, in particular when I needed

assistance in solving the well known. A special thanks to Bernhard Kaplan who has been a great colleague

and more importantly a real good friend I ever had in Stockholm at KTH. Thank you all at the department

of Computational Biology for making our department such a pleasant place.

I would like to thank my all family members, especially my dear husband Husain Ahammad Talukdar. I

could not finish this work without his love and support.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

Table of Contents

Introduction ................................................................................................................................................... 1

Chapter 1: Biological Background ................................................................................................................ 3

1.1 The Human Nervous System ............................................................................................................... 3

1.2 The Brain ............................................................................................................................................. 3

1.2.1 The Cerebral Cortex ..................................................................................................................... 4

1.2.2 Neocortex ..................................................................................................................................... 5

1.2.3 Cortical Columns .......................................................................................................................... 5

1.2.4 Hypercolumns & Minicolumns .................................................................................................... 5

1.3 Characteristics of Neurons and Synapses ............................................................................................ 6

Chapter 2: Network Structure and Methods .................................................................................................. 8

2.1 Artificial Neural Network.................................................................................................................... 8

2.1.1 Learning rule and training method ............................................................................................. 10

2.1.2 Network Architecture ................................................................................................................. 10

2.2 Detail Mathematical Neural Models ................................................................................................. 12

2.2.1 Hebbian Learning Rule............................................................................................................... 12

2.2.2 The Willshaw-Palm Model ......................................................................................................... 12

2.2.3 The Hopfield Network ................................................................................................................ 13

2.2.4 Attractor Neural Network ........................................................................................................... 14

2.2.5 The Modular Neural Networks ................................................................................................... 14

2.3 The BCPNN Model ........................................................................................................................... 15

2.3.1 Minicolumn, Hypercolumn and Connections ............................................................................. 15

2.3.2 BCPNN Learning Rule ............................................................................................................... 16

2.3.3 BCPNN with hypercolumn ......................................................................................................... 17

Chapter 3: Parallel Implementation of BCPNN .......................................................................................... 20

3.1 Parallelism in Cluster computers ....................................................................................................... 20

3.1.1 Parallel Computers ..................................................................................................................... 20

3.1.2 Distributed Memory and SPMD ................................................................................................. 21

3.1.3 Message Passing Interface (MPI) ............................................................................................... 21

3.1.4 Blue Gene / L Supercomputer .................................................................................................... 25

3.1.5 JUGENE Supercomputer............................................................................................................ 26

3.2 Implementation of BCPNN ............................................................................................................... 26

3.2.1 The Hypercolumn Module ......................................................................................................... 26

Chapter 4: Results ....................................................................................................................................... 27

4.1 Communications Comparison ........................................................................................................... 27

4.1.1 Elapsed Time .............................................................................................................................. 30

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

4.1.2 Run execution time (Cray Supercomputer) ................................................................................ 31

4.1.3 Count sent spikes per processor ................................................................................................. 32

4.1.4 Count sent spikes per second ...................................................................................................... 33

4.2 Point to Point Communications (with and without delay) ................................................................ 35

4.2.1 Elapsed time ............................................................................................................................... 38

4.2.2 Count sent spikes per processor ................................................................................................. 39

4.2.3 Count sent spike per second ....................................................................................................... 39

Chapter 5: Discussion .................................................................................................................................. 41

5.1 Run on Time ...................................................................................................................................... 41

5.2 Activity of sent spike ......................................................................................................................... 42

Chapter 6: Conclusion ................................................................................................................................. 43

References ................................................................................................................................................... 44

Appendix A ................................................................................................................................................. 48

A.1 Comparison communication ............................................................................................................. 48

A.2 Point to point communication (with and without delay) .................................................................. 49

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

The List of Figures

Figure 1: Parts of the brain [62] .................................................................................................................. 3 Figure 2: The six layer of cerebral cortex [22] ............................................................................................ 4 Figure 3 : The human cerebral cortex [38] .................................................................................................. 5 Figure 4: From neuron to neocortex[11] ...................................................................................................... 6 Figure 5: A typical nerve cell [39] ............................................................................................................... 6 Figure 6: Artificial Neural Network .............................................................................................................. 8 Figure 7: Schematic model of one ANN unit ................................................................................................. 9 Figure 8 : Mathematical model of one ANN unit. ......................................................................................... 9 Figure 9: Feedback or recurrent network ................................................................................................... 11 Figure 10 : Hopfield neural network (one layer) ........................................................................................ 13 Figure 11 : Architecture of modular neural networks [11] ........................................................................ 15 Figure 12 : Schematic architecture of hypercolumn, minicolumn and connections ................................... 16 Figure 13: BCPNN learning rules and phases ............................................................................................ 16 Figure 14: A small recurrent BCPNN with six neurons ( ) divided into three hypercolumns [13] ......... 17 Figure 15: A schematic model of a unit in a BCPNN [11] ........................................................................ 18 Figure 16: Distributed memory architecture .............................................................................................. 21 Figure 17: General MPI structure .............................................................................................................. 22 Figure 18: MPI Send/ Receive .................................................................................................................... 23 Figure 19: Point-to-point communication................................................................................................... 24 Figure 20: Collective communication (Using MPI_Allgather) ................................................................... 25 Figure 21: Flowchart of Point-to-point implementation ............................................................................. 28 Figure 22: Flowchart of Collective implementation ................................................................................... 29 Figure 23: Elapsed time of the network ...................................................................................................... 30 Figure 24: Processors elapsed time per iteration ....................................................................................... 31 Figure 25 Execution time of BG/L............................................................................................................... 31 Figure 26: Execution time of Cray ............................................................................................................. 32 Figure 27: Spike sent per network .............................................................................................................. 33 Figure 28: Bytes sent per simulated second ................................................................................................ 34 Figure 29: Bytes sent per real second ......................................................................................................... 35 Figure 30: Flowchart of time delay implementation ................................................................................... 37 Figure 31: Elapsed time of the network ...................................................................................................... 38 Figure 32: Processors elapsed time per iteration ....................................................................................... 39 Figure 33: Bytes sent per simulated second ................................................................................................ 40 Figure 34 : Bytes sent per real second ........................................................................................................ 40 Figure 35 : Compared Elapsed time (s) ...................................................................................................... 41

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

1

Introduction

Framework

The memory of the human brain is an incredible phenomenon. Even though day by day we are extricating

new secrets, our journey to full understanding the brain has not yet reached its goal. In the modern

science, research into the brain gathers tremendous momentum. Today neuroscience deals with a vast

number of different aspects, ranging from cognitive sciences to neurophysiology and neurocomputing.

Although the brain can work as a combined unit, neuroanatomy distinguishes different functional regions

and cerebral cortex is one of its regions.

The brain is a member of the nervous system family and controls conscious and automatic actions of all

parts of the body. The biggest part of the brain is the cerebral cortex and contains 85% weight [7]. The

cerebral cortex is a thin folded structure that covers the outer surface of the cerebral hemispheres. The

structure of the cortex appears to be homogeneous built by a number of different neuron types [10]. The

neocortex is a six-layered structure which takes up most of the cerebral cortex and is to a large extent

responsible for higher cognitive functions. It has strong internal connections which are able to adapt their

strengths. This plasticity is being modeled by different learning rules, which will be explained later. The

structure of the neocortex is manifested as a modular organization with an enormous storage capacity and

sparse activity.

One way to understand the nervous system is to use artificial neural networks (ANN). ANN has been

developed as generalizations of mathematical models of biological nervous systems. There are some

aspects that ANN has in common with neural networks, which makes them attractive to study models of

the brain: parallel processing of information, repetitive components, redundancy and adaptively. Inspired

by the ideas of Donald Hebb (1949), network models and learning rules have been developed which focus

on the learning ability of neural networks. A critique against the ANN approach mentioned e.g. presented

by Anders Lansner and Christopher Johansson (May 2004) is that (not in biological aspects) ANN are

built of non-spiking units which makes it difficult to relate the dynamics of the system to real neural

networks. As mentioned above the ability of real neurons to spike lacks in classical ANN. This

shortcoming has been overcome by the development of spiking neuron model.

The work presented here deals with neural network models of human memory based on the Bayesian

Confidence Propagating Neural Network (BCPNN). With BCPNN and many other learning rules, it has

been made possible to simulate learning mechanisms of the human nervous system.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

2

Motivation

In 2006, Johansson et al. proposed “Towards cortex sized artificial neural systems” [9] and his doctoral

thesis was published (2006) “An attractor memory model of neocortex” [11]. He described the functional

model of the mammalian cortex and parallel computation employed in the brain. This thesis is partly

based on earlier work by Christopher Johansson (doctoral thesis). The aim is to find out more preciously

how the different communications work on parallel environment. The parallel implementation was done

with a Blue Gene Super computer.

From the biological point of view, real neurons communicate through synapses and generate their output

signals with a threshold function. Functional models of cortex rely on modularization and connectivity of

neurons. Only few neurons can spike at any one time and a neuron need a fair amount of synaptic input to

fire an action potential.

In particular, we exemplify the BCPNN which has a columnar structure and narrow the work in to two

goals: first, we compare the different communication requirement in cluster computer through the level of

parallelism and secondly, we introduce a time delay which investigate the amount of spike transmitted in

each processor and evaluated the scale of measurement with time step and also analyze its dynamic

performance.

In this thesis, we go from explaining the cerebral cortex of mammals, and then discuss various neural

models especially BCPNN and finish with a discussion on the implementation of this model.

Thesis Structure

The fundamental concept of human nervous system and its related parts are contained in chapter 1. There

is a description of how memories are organized and an overview of the anatomy of the nervous system.

We also present a short description of neurons and synapses.

In chapter 2, we describe the mathematical model, mainly the Hopfield neural network, the Hebbian

learning rule and other network structures. A biological interpretation of this learning rule is also

presented. The basis of BCPNN model is introduced. The environment, in which the networks can

operate, is also presented.

The focus of chapter 3 is to illustrate the parallel implementation of the BCPNN model. Firstly, there is a

review with the levels of parallel methods and short descriptions of two different supercomputers which

are today‟s fastest supercomputer. Then interest is turned on BCPNN model and how can be implemented

on the cluster computer.

In chapter 4, the two experiments are explained and we also give some significant result. The experiment

is based on execution time with model parameters done and counts the activation spike where a few basic

ideas are studied.

Chapter 5 includes further developments and the conclusion.

Finally, all experiment results raw data are listed in Appendix A.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

3

Chapter 1: Biological Background In this chapter, I will introduce the fundamental facts about the human nervous system and describe the

structure of the human brain. In the following I will explain the mechanisms of neurons and synapses and

also review some knowledge about the biological subject of this thesis.

1.1 The Human Nervous System

The human nervous system is an information processing system containing a network of specialized cells

that act together to perform particular functions. The nervous system is divided into the Central Nervous

System (CNS) and the Peripheral Nervous System (PNS). The CNS is made of the brain and the spinal

cord and the PNS is made of nerves.

The human CNS contains billions of neurons. It is responsible for getting and translating signals from the

peripheral nervous system and also sends out signals to it. Brain receives sensory input from the spinal

cord and functions as primary receiver, organizer and distributor of the information about the body. The

spinal cord transmits sensory information from the PNS to the brain and motor information from the brain

to the various organs.

The PNS contains only nerve cells and connects the brain and spinal cord to the rest of the body. There are

two branches of the PNS: the somatic nervous system and the autonomic nervous system. The somatic

nervous system consists of nerve fiber that sends sensory information to the CNS and autonomic nervous

system controls many organs and muscles within the body.

1.2 The Brain

The brain is the major component of the nervous system and contains three parts, which are the Cerebrum,

Cerebellum and Brain stem. Cerebrum is the whole anterior most part of the brain and is composed of

thalamus, hypothalamus, basal ganglia, and amygdale as well as other structures.

The cerebral cortex is one (the largest) part of the human brain and controls higher functions as thought,

action, perception, reasoning and posture.

Figure 1: Parts of the brain [62]

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

4

1.2.1 The Cerebral Cortex

The cerebrum consists of two hemispheres, the right and left hemispheres. The two hemispheres look

mostly symmetrical yet it has been shown that each side functions slightly different than the other. Each

of these hemispheres has an outer layer which is called the cerebral cortex. The cortex covers the outer

surface of the cerebrum and cerebellum and plays a key role in versatile functionality. It is responsible for

many higher order functions such as thinking, perceiving, producing and understanding language. The

human cerebral cortex consists of approximately neurons. The cerebral cortex is divided into four

lobes: occipital, parietal, temporal and frontal. It is made up of six horizontal layers. The layers are

parallel to the surface of the cortex and results from variations in staining and packing density in cells

[10]. The individual layers have different roles.

Input to the cortex are mediated via layer 4

Layer 4 send most of their output up to layer 2 and 3

Layers 2 and 3 are usually seen as one layer that sends to layer 5 mostly

Output from the cortex is mediated via layer 5 and 6

Figure 2: The six layer of cerebral cortex [22]

The cerebral cortex constitutes a network of pathways of nerve fibers which links all regions of the human

brain. The structural connection patterns and synaptic weights provide the interconnection between the

areas of cortex.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

5

1.2.2 Neocortex

Neocortex, commonly known as the phylogenetic modern cortex, is only found in mammals. It is used in

special sensory and motor processing. The evolution of neocortex is responsible for intelligence such as

performing decisions, learning and intellectual manners. In humans, 90 % of the cerebral cortex is

neocortex [66]. The neurons of the neocortex are also arranged in six horizontal layers isolated by cell

types and neuronal connections. Pyramidal and granule cell are two types of neurons that exist in all types

of cortex. The neocortex is also divided into frontal, parietal, occipital and temporal lobes. There are about

one hundred areas in the human cortex but some of them are performing the following functions:

Primary visual cortex is located in the occipital lobe

Primary auditory cortex lies in the temporal lobe

Primary sensory cortex is found on the postcentral gyrus (parietal lobe)

Primary motor cortex is in the precentral gyrus (frontal lobe)

Figure 3 : The human cerebral cortex [38]

1.2.3 Cortical Columns

In the sensory cortical area, the cells or neurons with similar response properties tend to be vertically

arrayed in the cortex, forming cylinders known as cortical column. One cortical column contains 1000s of

neurons connected in the vertical and also horizontal plane [18]. In humans, there are about two million

functional columns [67].

1.2.4 Hypercolumns & Minicolumns

From the experiment of Hubel and Wiesel, we studied that the term of hypercolumn in primary visual

cortex, sometimes referred to as macrocolumn or the ice cube model [18] [19].

The smallest structures are called minicoulmns and are about 30 µm in diameter. These columns are

summed up into larger structures called hypercolumns that are about 0.4-1.0 mm wide.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

6

Figure 4: From neuron to neocortex[11]

Minicolumns are vertically connecting the cortical layers and usually consists of 80-100 neurons [20] and

approximately 100 minicolumns form a hypercolumn [19]. Pyramidal cells (a type of neuron which is

about 80% of the cells in neocortex) in layer 2/3 and 5/6 are tightly bound together by excitatory synapses

[7].

In the model, each unit in the network corresponds to a cortical minicolumn. The purpose of the

hypercolumn is to normalize the activity of the layer 2/3 and 5/6 pyramidal neurons in the minicolumns

and to facilitate the competitive learning [7].

1.3 Characteristics of Neurons and Synapses

The neuron is the functional unit of the nervous system that transmits electrical signals over long distances

throughout the body. The anatomy of the neuron can be divided into three major components: the soma

(cell body), dendrites and axon. The soma contains a nucleus, mitochondria, ribosome and other organic

structures. Axons are the transmission channels from the soma to pre-synaptic terminal and dendrites are

the transmission channels from synapse to the soma. An axon often has a few branches and a greater

length whereas dendrites have more branches but shorter length.

.

Figure 5: A typical nerve cell [39]

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

7

Synapses are the elementary structures that mediate the interconnections between neurons. There are

fundamentally two types of synapses: chemical and electrical. The most common kind of synapse is the

chemical synapse. At a chemical synapses, one neuron releases neurotransmitter from the presynaptic

axon terminal which is bound by receptors at the postsynaptic cells. Most chemical synapses operate in

only one direction. Neurotransmitters are often classified as excitatory or inhibitory on the basis of their

effects on the postsynaptic membrane. Excitatory neurotransmitters cause depolarization and promote

action potentials generation, whereas inhibitory neurotransmitters cause hyperpolarization and suppress

action potential generation.

A signal starts at the incoming synaptic terminals in the dendrites. The dendrites receive a chemical

transmitter substance from the synapses. This creates a change in membrane potential of the dendrite tree

reaches the soma and the spike trigger region. A spike is produced if the summed depolarization reaches a

threshold. The electrical event that carries signals down an axon away from the soma is called an action

potential or a spike [1].

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

8

Chapter 2: Network Structure and

Methods Within the field of Artificial Neural Network (ANN), there are a huge number of different network

structures and various learning algorithms. Here I focus on a few types of network structure with their

context and definitions. First I explain how a basic ANN is built. I give its mathematical formulation and

discuss its specific features and how they account for biologically observed phenomena. Finally, the

BCPNN model is described.

2.1 Artificial Neural Network

Commonly, artificial neural networks (ANN‟s) are referred to as neural networks. ANN is an

interconnected group of nodes, where nodes are called neurons or processing elements or units. An ANN

is a computational model inspired by the natural brain. The important components of neural networks are

units and connections. In terms of ANN, network refers to interconnections between the neurons (units) in

the different layers of each system.

Figure 6: Artificial Neural Network

According to Figure 6, the first layer has input neurons, via synapses (connections) to the second “hidden”

layer of neurons and then via more synapses (connections) to the last layer of output neurons.

The units in each layer are interconnected by connections called weights [14]. An ANN is basically

consisting of three types of parameters:

1. A set of input connections that bring activation from other units.

2. Processing units that sum the inputs and then apply a non-linear activation function (squashing,

/transfer/ threshold function)

3. An output line which transmits the result to other units.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

9

Figure 7: Schematic model of one ANN unit

It resembles the brain in two aspects: First, through knowledge, a learning process is developed by the

network and secondly, connection weights are used to store the knowledge as connecting strength [16].

Figure 8 : Mathematical model of one ANN unit.

The general characteristics of a single ANN unit (neuron) are described below:

1. Input ( ): Input comes from the outside world or through other neurons output and it can be

discrete or real value.

2. Weight ( ): The weights are real number and determine the contribution of each input channel to

the system. Weight value is calculated by the modified sum of input as they can be seen in the

following way:

+ +………………..+ {1}

3. Threshold or Bias (b): This is the quantity which is usually added to the weighted sum to get the

input for the transfer function. For simplicity in most of the cases bias is regarded as an additional

input with values = b and = -1. But in some cases, threshold (bias) could be regarded as an

additional input with values = b and = 1.

Induced local field, = + {2}

This Induced local field ( ) is the input of the transfer or activation function.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

10

4. Transfer/Activity function: The transfer function is the function giving the input-output behavior

of an artificial neural network for one unit. Several functions can be used for describing the

transfer function: Step function (Threshold function), linear function, Gaussian, Identity and

sigmoid function [27]. Most of them are non-linear and has limited output range.

5. Output ( ): Each unit computes the output from the weighted input values through a non-linear

activation function. The equation is defined as [27]:

= {3}

Depending on the type of activation function, a neuron can produce different type of output value. For

example, when the activation function is a step function, the output is 1 if the sum of the inputs is above a

threshold, otherwise it is 0.

Most ANN models are not a very good or accurate description of biological network, but should rather be

seen as biological inspired algorithms. The Learning rule and the network architecture are two important

aspects of artificial neural model.

2.1.1 Learning rule and training method

The learning rule is an algorithm that modifies connections between units. Learning implies that a

processing unit is capable of changing its input/ output behavior as a result of changes in the environment.

Weights are used for considering a suitable learning rule and training method.

In the training phase, the correct output for each record is known, and the output nodes can be assigned „1‟

for correct values and „0‟ for others. It is thus possible to compare the network‟s output with these correct

values and find the error term for each node. During the learning phase, the network learns by adjusting

the weights to be able to predict the correct output from input samples, refers to the learning rule in eq.

(1).

Various methods to set the strength of connections exist. One way is to set the weights explicitly, using a

priori knowledge. Another way is to train the neural network by feeding it with training pattern and

letting it changes the weights according to some learning rule. In some sense, it can adjust all the

necessary weights but this might be complicated for many networks. The more conventional learning

process in artificial neural networks can be divided into supervised, reinforcement and unsupervised

learning methods. Supervised learning is a learning process based on comparison between networks

computed output and the correct expected output, generating error. In unsupervised learning, an output

unit is trained to respond to cluster of pattern within the input. The following section (2.2.3), Hopfield

models use the unsupervised training algorithm for the update of weights.

2.1.2 Network Architecture

Network architecture can have several topologies, but particularly we can differentiate it by the layer

structures and direction of communication. Connections between units in the network can be sparse or all-

to-all. As for this pattern connection, artificial neural model is divided by the: feed-forward network

(single and multi-layer network) and feed-back network (recurrent network).

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

11

Feed-forward network:

In feed-forward network, the connections between units contain no directed cycles. There is no feedback

between layers. The connections imply that the neuron in each layer of the network have as their only

input the activation output of the neurons of the previous layer. This process continues until it has gone

through all layers and determines the output. They can be including one or more hidden layers. Perception

learning and Delta rule learning is a classical learning rule of feed-forward network and often used as data

mining.

Feed-back network (Recurrent network):

Feed-back occurs in almost every part of the human brain. When the connection between two units

contains one or several directed cycles, the network is known as recurrent neural network (RNN). These

directed cycles are referred as feed-back loops since they return the output as new input itself. Connection

between units has a modifiable real-value weight. The recurrent network is a powerful and non-linear

dynamical system so that the state can change frequently until getting to an equilibrium point.

Figure 9: Feedback or recurrent network

Since the function of ANN is to process information, they are used mainly in fields related with it. There

are a wide variety of ANNs that are used to model real neural networks, and also study behavior and

controls in animals and machines. But ANNs are also used for engineering purposes, such as pattern

recognition, forecasting and data compression.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

12

2.2 Detail Mathematical Neural Models

Here, we discuss several approaches for creating memory in a neural network context and also give its

mathematical formulation and explain their learning method.

2.2.1 Hebbian Learning Rule

The learning paradigms discussed above result in an adjustment of the weights of the connections between

units, according to some modification rule. Perhaps the most influential work in connectionism‟s history is

the contribution of Hebb (1949), where he presented a theory of behavior based, as much possible on the

physiology of the nervous system [23].

The most important concept emerged from Hebb‟s work was his formal statement (known as Hebb‟s

postulate) of how learning could occur. Hebb‟s original statement summarize about “Cell connectivity

strength changes” or “cells that fire together, wire together”. From the theory:

when an axon of cell A is near enough excite a cell B and repeatedly or persistently takes part in firing it,

some growth process or metabolic change takes place in one or both cells such that A’s efficiency as one

of the cells firing B, is increased.[23]

The basic idea is that the weights of the connections between two units should be increased or decreased

according to their activation [23]. In Hebb‟s theory, if two neurons are active simultaneously, their

interactions must be strengthened. Hebbian cell assemblies can be represented in recurrent or all-to-all

fashion [24]. The Hebbian rule works well as long as all the input patterns are orthogonal or uncorrelated

[64]. In this thesis, the main concern is the attractor network theories of cortical associative memory.

If and are the activation of neurons, is the connection weight matrix and γ is the learning rate

parameter, then Hebb‟s rule has been written as modifying pattern of connectivity:

∆ = γ ; Where ∆ is the change of component ij of the connection weight matrix. [26]

This form of learning is called Hebbian learning rule.

2.2.2 The Willshaw-Palm Model

Associative memory is a system which stores mapping from specific input representations to specific

output representations. It consists of feed-forward connections. We have derived the Willshaw-Palm

model by owing from the two different investigators. The associative network from the Willshaw model

[30] analyzed the input-output patterns of a two layers feed-forward network with binary activity and the

Palm model [31] is applied one-layer feed-forward network for training interactively according to

reinforcement learning with binary synaptic weights. The binary output pattern can be defined by

and non-linear threshold function [11]. In the training procedure, each pair of the mapping is

presented to the network. The learning rule is described for generating the weight matrix, where

:

Q is the total number of patterns.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

13

2.2.3 The Hopfield Network

The idea behind the Hopfield network is largely based on Donald Hebb‟s well known work: assume that

we have a set of neurons, which are connected to each other through connection-weights [28]. In discrete

Hopfield neural network (HNN), the neurons can either be active or non-active and there exist parallel

input and output channels. The Hopfield model is a single layer processing unit and its input pattern is

binary and activation function can be sigmoid or hard limiter. This means that the output of a Hopfield

model is depending whether the input is less than or greater than the threshold value [28]. The HNN is the

most implemented associative memory network and serves as content addressable memory (CAM) with

binary threshold units.

If the network is trained with a pattern and then fed with a partial pattern that fits the learned pattern, it

will stimulate the remaining neurons of the pattern to become active, completing it. If two neurons are

anti-correlated (one neuron is active while the other neuron is not) the connection-weights between them

are weakened or become inhibitory.

Figure 10 : Hopfield neural network (one layer)

The HNN consist of fully interconnected neurons n and each connected neuron has weight denoted

from neuron j to i. It implies that weight matrix is symmetric ( ) and has no self-feedback

structure ( ). The reason for this is that self-feedback would create a static network which in turn

means non-functioning memory.

Assuming we have P patterns. Each pattern has which a vector is considering the value of 0 or 1.

Mathematically, it can be formulated as below:

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

14

Where,

P is the number of training patterns.

μ is the index within the set of training patterns.

N is the number of units (neurons) in a pattern.

The patterns represent activation of the neurons and neurons can be in the states ⋲ {+1, - 1}. To recall

an output pattern (of activation), in this network we can use an arbitrary neuron i and the update rule

can be specified by:

(If hard limiter active functions)

Each network has energy in quadratic form. The energy function can be defined as:

Where denotes the value of at each iteration.

The (Lyapunov) energy function [65] is a monotonically decreasing function of time [29]. The maximum

storage capacity with error on recall of the Hopfield network is about 0.14N patterns [29]. The successive

updating of the state in the network is a convergence process and as a result the energy of the system gets

minimized [28]. The asynchronous updating rule from its initial state will allow a minimum energy state

to be achieved. In Hopfield network model, oscillation may occur. The model also has problem with

catastrophic forgetting (If the network is loaded with many pattern and will not be able to recall any

pattern at all).

2.2.4 Attractor Neural Network

An attractor network in its simplest form is an artificial neural network comprised of units connected in a

recurrent, all-to-all fashion. Clarifying the involved terms such as dynamical system is just an

interconnected network of neurons and describing how the state of neurons and synapses evolve in time. If

the connection matrix is symmetric, the dynamics is particularly simple. The computation starts in some

initial input state, then follows a trajectory in state space and may finally converge to a stable output state

(attractor). Energy function can define over the states of the network. If the connectivity is asymmetric,

more complex dynamics in the form of limit cycles and disorder activity may result [24].

2.2.5 The Modular Neural Networks

The inspiration for modular design of neural networks is mainly due to biological reasons. The modularity

is a key to the efficient and intelligent working of human brain. Vertebrate nervous systems operate on

the principle of modularity and the nervous system is comprised of different modules dedicated to

different subtasks working together to accomplish a complex task [32]. The modular behavior of the brain

is of two types: Structural modularity and Functional modularity. Structural modularity is evident from

sparse connections between strongly connected neuronal groups. Functional modularity is indicated by the

fact that neural modules have different neural response patterns which are grouped together.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

15

Figure 11 : Architecture of modular neural networks [11]

In Figure (11), picture showing a modular network with a total of 9 units divided into 3 equally sized

hypercolumns (modules). Within each hypercolumn the units compete for activation. The network is fully

connected and there are no connections between the units within a hypercolumn.

2.3 The BCPNN Model

The Bayesian Confidence Propagation Neural Network (BCPNN) is an attractor recurrent modular

network. This network is very similar to the Hopfield network that implements a form of Hebbian

learning.

The main idea underlying the BCPNN learning rule is to use neurons as probability estimators. Units

receive input from all other units in network representing confidence of feature detection. Based on the

input, the units calculate posterior probabilities of outcomes. This network can be used with both unary-

coded activity (spiking activity) and real-valued activity.

Here, we first present biological background of BCPNN and its model. Secondly, we present the learning

rule of BCPNN and detail explanation of this model.

2.3.1 Minicolumn, Hypercolumn and Connections

The BCPNN has been developed in analogy with the known columnar structure of the neocortex [10] [21].

The network consists of units that correspond to cortical minicolumns. The units are grouped into

hypercolumn like modules and the summed activity within each hypercolumn module is normalized to

one. The normalization is a way of controlling the total activity of the network. A minicolumn can

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

16

comprise 100 neurons. More precisely, each neuron in a minicolumn sends out axons that terminate in

different hypercolumns. The connection strengths are based on the probabilities of the units firing together

[25].

Figure 12 : Schematic architecture of hypercolumn, minicolumn and connections

2.3.2 BCPNN Learning Rule

The BCPNN learning rule is derived in this section uses weights and biases. In the training phase, weight

values ( ) are sparsely connected with the two group of active neurons as in the stored pattern. The

weight values are updated and retain the information contained in the pattern that has been presented to

them. During the retrieval phase, the weights of the network are assumed to be fixed and also keeping the

internal structure unchanged. In other words, the network interprets the input data using its internal

representation or knowledge.

Figure 13: BCPNN learning rules and phases

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

17

2.3.3 BCPNN with hypercolumn

Figure 14: A small recurrent BCPNN with six neurons ( ) divided into three hypercolumns [13]

This BCPNN network consists of N units grouped into H hypercolumns. is the index number of a

particular hypercolumn and is the set of all units belonging to hypercolumn . The units are connected

by a real valued weight matrix, which can be seen as [68]:

For each h = 1, 2………. {4}

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

18

It is noted that there are no connections within a hypercolumn.

Figure 15: A schematic model of a unit in a BCPNN [11]

The network is operated by initializing the activity and then run a process called relaxation in which

activity is updated. When the stability is reached, it stops automatically.

The relaxation process starts by computing a potential , based on the current support . When the activity

is real valued the support is computed as in eq. (5)

But when the activity is unary coded the computation is simple in eq. (6) and only one unit is active in

each hypercolumn (this is used for spiking implementation).

For both real valued and unary coded activity the potential is updated as in eq. (7):

{7}

The potential is initialized appropriately and we initialized it with the support generated by the retrieval

cue. is a kind of membrane potential time constant.

In the case of real valued activity, the new activity is computed by a softmax function (sometimes referred

to as a Gaussian activation function) [69] as in eq. (8). It is a transfer function. Here, is a parameter that

controlling the shape of the softmax. The sum in the denominator of eq. (8) is running over all units of

hypercolumn .

: For each h = {1, 2………. } {8}

However, the activity in each hypercolumn can be seen as a probability density function and it always sum

to 1 as in eq. (9).

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

19

When unary coded activity (spiking activity) is used only one unit is set active in each hypercolumn. If

this activity is done deterministically and synchronous over all hypercolumns, the network will sometimes

run in to limit cycles. But with the deterministic approach if the activity is done asynchronously, the

hypercolumns are update in a constant order during relaxation. Therefore the spiking activity in the

network needs to be updated randomly in order to guarantee well behaved fix point dynamics. This can be

done in two different ways. Firstly, the spiking unit in each hypercolumn is randomly selected according

to a probability density function. Secondly, the hypercolumn to which WTA (Winner-take-all) function

in eq. (10) is applied, is randomly selected.

h = rand ({1, 2………. }) {10}

Here, the biases and weights are computed from probability estimates . It exploits the statistical

properties of the activation and co-activation units. Otherwise, the presynaptic units are indexed with and

the postsynaptic units are indexed with . The biases and weights in the BCPNN can be computed as in eq.

(11):

And = {11}

On a computer with limited precision to δ (δ is the smallest change in value of a variable), the weights

are computes as in eq. (12).

{12}

A central part of the BCPNN algorithm is the probability estimates of the units‟ activity and co-activity. It

can be estimated as in eq. (13)

And {13}

Here, is a pattern, is the number of pattern and is the index of a pattern. This kind of estimation with

BCPNN is referred to as counting BCPNN and can similar to a Hopfield network. It is subjected to

catastrophic forgetting.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

20

Chapter 3: Parallel Implementation of

BCPNN In this chapter, we introduce how the BCPNN can be implemented in a parallel environment. Internal

computation of the BCPNN network requires a great amount of processor time. To be able to simulate

large networks, a cluster of computers is required rather than serial environment. This is the reason why

one of the world fastest supercomputer Blue Gene/L is used.

In order to have a matching from the BCPNN model, I will emphasize the basic idea about the parallel

techniques and also discuss its specific features and how they account for cluster computers.

Mapping as, 1 Hypercolumn = 1 Processor (contain 100 processing units)

3.1 Parallelism in Cluster computers

The mammalian brain is a very powerful and flexible computing system. In the thesis, I am discussing the

efficient implementations of BCPNN with parallel computations. In a cluster computer, the designer

should be able to improve performance proportionally with added processors. The computer clusters are

usually constructed to provide optimal communication between processors. The approach of parallel

computation in cortical model will allow not only operating larger network models faster than before but

also it will show the naturalistic behavior of neocortex.

When considering a model of the cortex it is important that it scales well in terms of implementation. Two

main properties of the parallel computation are strong scaling and weak scaling. Strong scaling means that

we fix the problem size, vary the number of processors and measure the speedup. Weak scaling means we

vary the problem size and the number of processors such that the execution time is the same.

ANNs that use non-local computations typically have weak scaling properties; it is costly in

communication [9]. More biologically realistic models implement learning rules that only require local

computations such as Hebbian learning rule, attractor network [51]. The advantage of local learning rules

is that they have a potential to scale well.

3.1.1 Parallel Computers

Traditionally, programs have been written in serial environment. But for large calculations and faster

executions, parallel environment is a must. The reasons for using parallel computer are not only for saving

time but also to solve large problems. The classification of parallel computer can be divided into SIMD

(Single Instruction, Multiple Data) and MIMD (Multiple Instruction, Multiple Data). Memory distribution

of parallel computer is classified by shared memory and distributed memory.

In shared memory, multiple processors can operate independently by sharing the same memory space and

in distributed memory, multiple processors can operate their own local memory space but require a

communication network to connect between inter processors memory. Message Passing Interface (MPI) is

a library with portable standard programming language for parallel computers and it is effectively used for

message communication. Usually, MPI is well suited for computing distributed memory but it can also be

applied into shared memory architecture.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

21

3.1.2 Distributed Memory and SPMD

In distributed memory system, there is usually a CPU, a memory and some form of interconnection

network that provides programs on each processor to interconnect each other. Each CPU can utilize the

full bandwidth to its local memory without interference from other CPUs. Data is shared across a

communication network and responsible for synchronization using message passing.

Figure 16: Distributed memory architecture

Single Program Multiple Data (SPMD) is a parallel technique which develops multiple data against a

single program to perform operations. The implementation of SPMD requires that commonly available

functionality in the serial environment be provided in the parallel environment in such a way that the serial

source code can be used on the distributed memory machine. With SPMD, tasks can be executed on

general purpose CPUs but SIMD is suitable for vector processors.

3.1.3 Message Passing Interface (MPI)

MPI is a message passing library that was developed in the early 90s [36]. It allows processes to

communicate with one another by sending and receiving messages. The programming model of MPI is

commonly used in SPMD mode. It should be noted that the basic feature of MPI implementation offers its

own parallel programming environment. Typically, MPI is used as a communication protocol for cluster

computers and supercomputers. The target platform is a distributed memory system such as single

program. MPI offers standardization, portability, performance opportunities, functionality and availability

[42].

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

22

Figure 17: General MPI structure

MPI provide a rich range of capabilities and always work with processes but commonly refers to

processes or processors. Processes have unique ranks associated with communicator, number from 0 to n-

1 [43]. The following concepts help in understanding and providing the context of that functionality:

I. MPI Basic Send/Receive

II. Point-to-point Communication

III. Collective Communication

I. MPI Basic Send / Receive:

Message passing is an approach that makes the exchange of data cooperative and data must both be

explicitly sent and received. The message passing model is defined as:

Set of processes using only local memory

Processes communicate by sending and receiving messages

Data transfer requires cooperative operations to be performed by each process.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

23

Figure 18: MPI Send/ Receive

MPI_Send commands specifies a send buffer in the sender memory from which the message data is taken.

In addition, send operation associates an envelope (consists of a fixed number of fields which are

destination, tag and communicator) that indicates the message destination and contains distinguishing

information. The receive operation MPI_Recv commands specifies that the message to be received is

selected according to the value of its envelope and the data is stored into the receive buffer.

Figure 18 depicts one processor which can receive spike from different sources and also can send spike to

different destinations. It is noted that one processor or hypercolumn can contain only one spiking neuron.

II. Point-to-point Communication:

MPI point-to-point operations typically involve message passing between two or many processes. One

task is performing a send operation and the other task is performing to match receive operations. The

originating process „sends‟ the message to the destination process. The destination process „receives‟ the

message. The message commonly includes the information, length of the message, the destination address

and a possible tag. There exist two types of sends and receives:

1. Blocking; processes waits until the message is transmitted.

2. Non-Blocking; processing continues even if message has not been transmitted yet.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

24

Figure 19: Point-to-point communication

In Figure 19, the point-to-point communication is considered only for non-blocking operation. Red circle

defines the single spiking unit from a hypercolumn and it is sending to the desired destination processes.

In the receiving part, a hypercolumn can receive only the expected spiking unit from the connected

hypercolumns.

III. Collective Communication:

Collective communication is coordinated among a group of processes. The size of data sent must exactly

match the size of data received. Collective communications exist in blocking mode only. Blocking here

means that a process will block until its role in the collective communication is complete, no matter what

the completion status is of the other processes participating in the communications. Collective

communication do not use tag field. They are matched according to the order they are executed.

Collective communication provides a more structured alternative to point-to-point communication. The

performance of MPI collective communication can express optimized algorithm and take advantages of

knowledge of the network topology and hardware-based implementation.

Collective communications are divided into three categories according to function:

1. Synchronization

2. Data Movement

3. Global reduction operations

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

25

Figure 20: Collective communication (Using MPI_Allgather)

One to all personalized:

Personalized communication sends a unique message to each processor. In one-to-all personalized

communication one processor sends a unique message to every other processor. An MPI_Allgather

command is the concatenation process of all tasks in a group. Each task can gather arrays of equal length

into one array in a one-to-all broadcasting operation within the group.

Figure 20 shows how one processor can send and receive spike to and from all other processors in the

network.

3.1.4 Blue Gene / L Supercomputer

The Blue Gene was one of the fastest supercomputer in the world designed by IBM in December 1999

[53]. The IBM research project has developed four kinds of versions: Blue Gene/L, Blue Gene/C, Blue

Gene/P and Blue Gene/Q [54]. The Blue Gene/L is specifically based on large memory space with

standard compilers and message passing interface, each of which has a relatively modest clock frequency,

it runs presently at a speed of 700 MHz [56]. The design of BG/L is a scalable system where the

maximum number of nodes assigned to a single parallel job is up to = 65,536 compute nodes [53].

Each of the dual processors can enhance two “floating point units” for performing mathematical

calculations. The target range of peak performance has designed nearly 500 teraFLOPS [54].

The BG/L system has five different inter node communication networks: 3D torus; backbone for MPI

point-to-point communication, Global collective for collective communication, Global barrier/interrupt

for MPI barrier provide an efficient solution, 10 GB Ethernet for optical fiber and 1 GB Control Ethernet/

JTAG for system boot, debugging and monitoring. The primary function of the BG/L system is to run

MPI codes for utilizing communication network. MPI programs use collective operations to calculate the

size of simulation time steps and validate physical conservation properties of the simulated system. Most

applications use MPI‟s non-blocking point-to-point messaging operations to allow concurrently between

computation and communication; BG/L‟s distinct communication and computation processors will allow

the computation processor to transfer overhead for messaging to the communication processor. Virtual

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

26

mode and Co-processor mode is configured to run the BG/L system with two processors on a node run

simultaneously by allocating half of the memory space.

In my thesis, I am particularly familiar with Blue Gene/L machine which is provided by PDC, KTH. This

BG/L system has 1024 nodes and 2048 processors with 1 TB of memory [55]. The maximum simulations

designed in this thesis were executed on one BG/L machine comprising 2048 nodes.

3.1.5 JUGENE Supercomputer

Today‟s JUGENE is the fastest supercomputer in Europe introduced by Jülich, Germany. It is an IBM

supercomputer and offering the calculate power up to 1 PetaFLOPS [57]. This supercomputer

performance is capable of a massive one trillion computing operations per second and it is equipped with

294,912 processor cores [57]. About 72,000 nodes are housed in 72 water cooled racks [57].

3.2 Implementation of BCPNN

Here, we present the parallel implementation of the theoretical cortical model based on the BCPNN

learning rule. The BCPNN network is implemented as sparsely and randomly connection strength that is

based on the probabilities of the spiking activity. The network can be used in one of the two following

modes: learning and retrieval. In the learning stage, the network of all input pattern are connected and new

patterns are stored on top of the older ones as in palimpsest memory [58] so that weights are updated.

During retrieval stage, the network interprets the input pattern, using its internal representation or

knowledge.

The simplified learning rule of BCPNN has been implemented in our simulation. The computational

requirement of cortical model is largely dependent on total number of connections. The code is written in

C++ programming language with MPI communication routines, MPI_Allgather () and MPI_Isend () /

MPI_Irecv ().

3.2.1 The Hypercolumn Module

In our simulation, the number of units and connections in each hypercolumn is constant (100). The

performance of BCPNN network is depending on the hypercolumns and the number of hypercolumns was

fixed accordingly to 128, 256, 512, 1024 and 2048. The iteration times are also constant (0.0005s). The

iteration of the time step is done locally in each processing unit. Furthermore, the activity in each

hypercolumn in our model always sum to one. This means that our cortical model has a sparse activity

constant at 1% and achieved the perfect weak scaling.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

27

Chapter 4: Results In this chapter, results are arranged in two sections. The first section contains the comparison between

point to point communications and collective communications. The second section contains only point to

point communications with delay and without delay.

In the result section, my aim is to show two different experiments and analyze their behavior by increasing

the hypercolumns such as 128, 256, 512, 1024 and 2048.

4.1 Communications Comparison

In this section, we present the comparison of results between point to point communications and collective

communications. But before that I would like to explain how communications have been implemented.

The flow chart of point-to-point and collective communications are shown below.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

28

Figure 21: Flowchart of Point-to-point implementation

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

29

Figure 22: Flowchart of Collective implementation

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

30

The comparisons have been done through following five investigations.

4.1.1 Elapsed Time

The experiment was intended to compare the scaling capabilities and performance of MPI library with

communication functionality programmed in C++. The result will be presented in the following order.

First the elapsed times of overall network are presented and then the elapsed times for each loop are

shown.

Loop time = Elapsed time / Number of loops.

In our simulation, 100 patterns were trained and recalled.

Elapsed time of overall network

Here, I have investigated the communication performance according to the elapsed time (in seconds) and

number of processors for point to point communications and collective communications. In this case, the

largest network I had run is 2048 processors.

Figure 23: Elapsed time of the network

It is observed that for small number of processors (256) collective communication take less elapsed time

(20 second) than the point-to-point communication. And it is also visible in Figure 23 that this time

difference is decreasing when the number of processor increased. That means both communication

methods capture moderately same execution time with the highest network (>2048 number of processors).

The same simulations have been run on Cray Supercomputer by Dr. Cristina Meli using a more advanced

model.

Elapsed time per iteration

Then we have intended to measure, the execution time for the single step of dynamics. The plot in Figure

24 is showing that the difference between the two communications schemes is s for small number of

processors (<1500). And for large number of processor this time difference is zero. It is also observed that

this experiments and the previous one is showing the same behavior. From theoretical point of view it also

should be the same because,

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

31

Loop time = Elapsed time / Number of loops.

To demonstrate the robustness of the operation in our model, the total number of loop was running exactly

for 200001 times.

Figure 24: Processors elapsed time per iteration

4.1.2 Run execution time (Cray Supercomputer)

This simulation has been done on Cray supercomputer (Cray XE6 system) by Cristina Meli with almost

the same code. The Cray XE6 supercomputer is based on the two new technologies: AMD Opteron 12-

core (2.1 GHz) processors and the Cray Gemini interconnect technology [70]. It has been designed to

scale over 1 million processor cores to meet science demands for scalability, reliability and flexibility

[71].

Figure 25 Execution time of BG/L

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

32

Figure 26: Execution time of Cray

In Figure 25 and 26, we have plotted the ratio of execution time (collective / point-to-point) for BlueGene

and Cray supercomputer.

But in Figure 26, the numbers of processors are different from my simulation and it is measured for 72,

144, 288, 576, 1152, 2304, 4608, 9216 and 18432. Here we have seen that first six different numbers of

processors expressed comparatively the same result and rose smoothly without any fluctuation. Beyond

the 4608 processors the ratio of communication schemes is significant.

There are some dissimilarities observed in Figure 25 and 26 since I did simulation (Figure 25) with

BlueGene supercomputer which is not beyond 2048 cores but the other simulation (Figure 26) have been

done on Cray supercomputer which can capable to scale more processing cores than BlueGene. But

according to the behavior of graph these two experiments are seems to be more or less same. Finally, the

behavior is similar till certain points so it can be say that if I increase the dimension of the system, it

would be the same for my simulation.

4.1.3 Count sent spikes per processor

One of the most notable aspects in this simulation has counted the index vector of spike that has been

communicated within the connected processor. In this experiment, I will analyses the number of spikes

that has been sent to each of the processor with mean deviation.

Spike sent per processor

In Figure 27, the computational requirements, in terms of memory usage and the processor peak

performance for sending activated spikes per processor are estimated. This estimation is based on how

many units are connected to all hypercolumns. The time required for communication is not included.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

33

Figure 27: Spike sent per network

In this plot, it has been shown that with all-to-all communication data transfer is linearly increasing with

the number of processors. But during the point-to-point communication data transfer remains same for all

number of processors. This is because; in collective communication spikes are sent to all other processors

but in point to point communication spikes are sent only to limit number of processors.

4.1.4 Count sent spikes per second

In this section we have investigated total sent spike in bytes according to elapsed simulated time and

elapsed real time. The larger the hypercolumn are needed to be sent spike over the network for that reason

the output is presented in bytes for spiking implementation.

Elapsed simulated time

Elapsed simulated time = number of loops * time step.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

34

Figure 28: Bytes sent per simulated second

In the above Figure 28, the experimental data in the network is shown that sent spike in bytes are linearly

increasing with number of processors. The collective communication and point-to-point communication

give the same output for bytes sent in the network because elapsed simulated time has been calculated for

fixed number of loops with a fixed time step.

Elapsed real time

Elapsed real time = number of loops * loop time.

Depending on the model, the amount of sent spike in real second can be quite different from simulated

second. Real second means that the elapsed time has been passed through the network. However, in Figure

29 the point-to-point communication curve is almost linear or can at least be approximately linear. For the

collective communication however, the curve becomes concave and indicate a steeper incline parallel with

the point-to-point. Loop time in point to point communication and collective communication is not the

same and that‟s why this graphs shows different flow of sent spike in bytes per real second.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

35

Figure 29: Bytes sent per real second

4.2 Point to Point Communications (with and without delay)

This experiment is based on delayed communication using non-blocking (point-to-point messaging)

routines. Communication delays can occur in any network. In the cortex, it takes time for a nerve impulse

to travel from sending to receiving neuron. The timing of successive actions potential are highly irregular

and we can view the irregular spike interval as a random process.

In this simulation, we are comparing non-blocking communication with delay and without delay. In that

sense, the following question would be arising on: “Why should we need to add the delay while we

transmitted spike throughout the network?” According to the theoretical concept of the cortical model,

hypercolumn in our brain is not sequentially organized. Indeed, some hypercolumn are nearby each other

and some has far distance. That‟s why, whenever a neuron send a spike to the nearest neuron, it takes

shorter time and on the other hand for the longer destination, it takes more time comparatively to others.

From this hypothesis, we should add a delay value regarding to the distance. The time of presentation is an

important aspect and can be handled basically in two ways. Either algorithm would have some tolerance

for receiving message that has been delayed or some sort of synchronization delay must be added before

sending message. In the following experiments, we are using some delay with respect to the destination

before sending message. How this time delay has been implemented is described below.

The computation of time delay is estimated according to the distance between hypercolumns, (d) and

conduction speed as follows:

Time delay =

Where, distance has been calculated in the following way:

=

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

36

And * diameter

of a hypercolumn

Here „ ‟ stands for distance, „r’ and „c’ is the co-ordinate of a hypercolumn.

Conduction speed is used in our simulation. Biologically, conduction speed means the speed of

electrical signal through a nerve.

As in this model, we have performed all necessary computation in training and retrieval phases in each

time step and the remaining communication is postponed at synchronization point. Although the range of

the distance in each hypercolumn is very small but in a large neural network the delay value gives some

slow response. Also it is essential to keep some physical velocity with the hypercolumn because it

accounts for irregular gap observed in real neurons.

As explained above, we can note two things. At first, I will Figure out the elapsed time is required both for

the network and for each processor even if some numerical value would be added. I will later compare the

experimental data of the sent spike as I did it before.

Again, before going to result analysis the implementation of time delay is shown below.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

37

Figure 30: Flowchart of time delay implementation

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

38

4.2.1 Elapsed time

This algorithm was implemented in C++ and used global communication in MPI. Perhaps it is not

surprising that the total execution time delay takes longer than before. We have made this test by

comparing the time delay is included or not included for overall network and per iteration.

Elapsed time of overall network

As we mentioned before that the measurement of time delay will take longer as compare without time

delay. In the Figure 31, time delay curve has required 20 seconds more elapsed time for the 128

processors. After that it has raised more according to increase the hypercolumn. Our simulation ran for

2048 processors and we have noticed the difference between them about 25 seconds. From this

experiment it could not say that for large network this time difference certainly will increase more.

Figure 31: Elapsed time of the network

Elapsed time per iteration

The simulation exhibits a linear increment of the elapsed time per iteration. The time-step of integration

DT = 0.0005 ms is significantly small and allowing for fine computation. The presentation curve in Figure

32 has been compared with the Figure 30 which is desirable. If we increase the size of the network and the

number of processors, elapsed time seems to level off below 1ms/iterations.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

39

Figure 32: Processors elapsed time per iteration

4.2.2 Count sent spikes per processor

Here, we numerically find the sent spike of active hypercolumns that fits with the requirements on the

connectivity and activity levels. Later on, we also investigate the performance of the active neurons for

scaling simulated and real time.

Spike sent per processor

Our model parameters turned out the same result with and without time delay. This is because the time

delay doesn‟t affect the number of sending spike. And the similar results also imply that there is no lose of

sent spike. Here, it is not important to add this result in the report but it is good to check this point.

4.2.3 Count sent spike per second

Here, I have done the same the experiment I did in the previous section (4.1.3). Elapsed simulated second

behaved more like previous implementation but in real second it showed a different result.

Elapsed Simulated Second

As we have known previously that simulation time is depend on the number of loops and time step which

is fixed in our simulation. That‟s why, in Figure 33, the estimated values of the activated spike always

give the same output though it is considering time delay on communication.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

40

Figure 33: Bytes sent per simulated second

Elapsed Real Second

Figure 33, shows the number of bytes sent per elapsed second with and without time delay but Figure 34

is showing the number of bytes sent per real second. According to the Figure 34, with time delay the

sending number of bytes per real second is less than without time delay. And the difference of number of

bytes between with and without time delay is increasing with the number of processors since, more

processors consumed more time delay.

Figure 34 : Bytes sent per real second

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

41

Chapter 5: Discussion In this thesis different aspects of cortical tasks and its abstract implementations have been discussed.

Specifically, learning algorithms and the BCPNN algorithm for spiking networks has been investigated.

The main results of this work are that the very large-scale experiments utilizing spiking implementations

of the BCPNN rules are feasible to conduct on contemporary supercomputers. However, more realistic

experimental setups need to be evaluated in terms of execution and communication time. This chapter

tries to sum up what so far can be said about experimental BCPNN in general.

5.1 Run on Time

In this section, I summarize the role of the communication framework in a spiking implementation of the

BCPNN model. Most importantly is to know the relationship between communication time and

computation time.

Depending on the loop time of each hypercolumn, the elapsed time can be measured in seconds. In the

first experiment, execution times for two different ways of communication have been compared: point-to-

point communication and collective communication. For implementing the communication on a cluster

computer the MPI library is used. As shown in Figures 23 and 24, the execution time for collective

communication is slightly smaller than for point-to-point. For the second experiment with time delay, as

explained before in Figure 31 and 32 the difference is significantly bigger on running time.

Figure 35 : Compared Elapsed time (s)

Combining two case studies in Figure 35, it is evident from the experimental results that collective

communication gives faster execution time compared to point-to-point communication.

If we look up in Figure 26, it represent different outcome since it simulated in high performance

supercomputer (Cray supercomputer) for large network. From that the behavior of collective

communication is changed beyond 4000 processors and it takes more time for executing the network. As

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

42

we know previously (chapter 3) collective communication is responsible to send all messages to the whole

network. So it can be say that it is obvious to take more time for large network in collective

communication.

At a first glance, the point-to-point communication with time delay require more time while running the

network. The question whether it is feasible to simulate networks making use of artificial time delay is

important, because in real neural networks communication delays depend on the distance of two

connected neurons among other things. It can be concluded here, that it is feasible to use artificial

communication delay in the networks that have been studied. So communication with time delay is

completely feasible.

The estimate on running times to the network is concluded that the communications protocol should

become the bottleneck in a large scale simulation using collective communication.

5.2 Activity of sent spike

The BCPNN algorithm is compared to other algorithms computationally expensive (see Chapter 2). Due

to the large computational requirements the memory usage also is high, computational times are mostly

depend on the speed and number of processors used.

The architecture of the network is also a matter of concern. In Figure 27, point-to-point communication

has to be taken a fairly straight result. But we have seen a dramatically increase in collective

communication. It is proved from the experimental result that the amount of information broadcasted is

huge and apparently most of it is not necessary.

It would be valuable to exhibit the active spike data over simulated and real time (see Chapter 4). During

simulated time both experiments show the same result. It gives a very good quantitative fit to the data

meaning that out spiking connectivity rule is robust. But when we calculate the amount of sent spike on

real time then collective communication takes more spikes compare to point to point communication.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

43

Chapter 6: Conclusion In this master thesis project, we have learned an adaptation of the mammalian cortex to come up with an

abstract BCPNN model for spiking units. There has been a strong motivation to have increased its

computational capabilities. We expect further developments of this model and also will make a thorough

analysis of the implications and requirements on the communications.

Our work present results in brain-inspired algorithm and gives a computational account for neural

communication. Instead of testing synaptic connection and learning rule in a network context, we have

focused on large-scale network in parallel environment in order to make a strong contribution in modular

structure of neocortex.

The BCPNN model has been developed for many years and the work in this thesis followed up on the

directions proposed in the doctoral thesis by Christopher Johansson (2006) to implement a BCPNN of P-

type. My co-supervisor Cristina Meli expects the further development of a Z-type BCPNN with an

intermediate level of complexity.

Everyday a new and higher performing supercomputer sets its foot on the market and provides more

computing power and more memory. For the purpose of simulating a cortical sized BCPNN in real time

on a cluster with thousands of nodes, faster communications are needed and less execution time required. I

am positive that continued this work will be a source of reward for the next years to come.

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

44

References 1. Dale Purves, George J. Augustine, David Fitzpatrick, William C. Hall, Anthony-Samuel LaMantia,

James O. McNamara, S. Mark Williams (2004): Neuroscience (3rd

Edition), Publisher: Sinauer

Associates, Inc, Sunderland, Massachusetts, USA. ISBN 0-87893-725-0

2. Patric Hagmann, Lelia Cammoun, Xavier Gigandet , Reto Meuli, Christopher J. Honey, Van J.

Wedeen and Olaf Sporns (2008)-Mapping the structural core of human cerebral cortex. PLoS Biol

6(7): e159. doi:10.1371/journal.pbio.0060159

3. Eric R. Kandel, James H. Schwartz, Thomas M. Jessell (1995): Essential of Neuroscience and

Behavior ,Publisher: Appleton & Lange, ISBN 0-8385-2245-9 (p-184,187)

4. Joanna Gilbert, “Biology Mad”, the Human Nervous System. Bibliographies and Web Indexes, K-12

Curriculum and Lesson Plans, Presentations, NBII.

5. Rawland Hall, “AP Psychology“, Handouts, Unit 5: Neuroscience

(http://www.rhsmpsychology.com/Handouts/Neuroscience_handouts.htm).

6. Maria Victoria,(2001)”Biology Online” Tutorial: Human Neurology; (http://www.biology-

online.org/8/1_nervous_system.htm)

7. Auke Jan Ijspeert, Toshimitsu Masuzawa and Shinji Kusumoto (2006),” Biologically inspired

approaches to advanced information technology” Publisher: SpringerLink , ISBN: 3540312536

8. Jarvis ED, Gunturkun O, Bruce L, Csillag A, Karten H, Kuenzel W, Medina L, Paxinos G, Perkel DJ,

Shimizu T, Striedter G, Wild JM, Ball GF, Dugas-Ford J, Durand SE, Hough GE, Husband S,

Kubikova L, Lee DW, Mello CV, Powers A, Siang C, Smulders TV, Wada K, White SA, Yamamoto

K, Yu J, Reiner A, Butler AB (2005): Avian brains and a new understanding of vertebrate brain

evolution. Nature Reviews Neuroscience 6, (151-159 ) doi:10.1038/nrn1606

9. Christopher Johansson and Anders Lansner (2004): “Towards Cortex Sited Attractor ANN”.

Knowledge-Based International and Engineering System- KES‟04, Willington, New Zealand, LNAI

3213. Publisher: Elsevier Ltd, doi:10.1016/j.neunet.2006.05.029

10. Rockel AJ, Hiorns RW, Powell TP (2004): “The basic uniformity in structure of the neocortex”. Brain

(1980) 103 (2): 221-244. doi: 10.1093/brain/103.2.221

11. Christopher Johansson, October 2006- “An attractor memory model of neocortex “(Doctoral thesis),

p.42. 49. 51,ISBN: 91-7178-461-6,TRITA-CSC-A-2006.14,ISSN-1653-5723,ISRN-KTH/SCS/A-

06/14.SE

12. Christopher Johansson and Anders Lansner (2006). “A Hierarchical Brain Inspired Computing

Systems”. In Proc. International Symposium on Nonlinear Theory and its Application- NOLTA‟06.

13. Christopher Johansson and Anders Lansner (2001)"A Parallel Implementation of a Bayesian Neural

Network with Hypercolumns", TRITA-NA-P0121, ISSN 1101-2250 ,ISRN KTH/NA/P-01/21S

14. Kendra Cherry, (2005),”What is a neuron?” About.com, A part of The New York Times Company

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

45

15. Neural Networks (Middle school level, The American physiological society) Developed by the

Columbus, OH local outreach team

16. Daniel Rios, ”Neuro AI” (2007-2010),Artificial neural network- A neural network tutorial.

17. http://bio1152.nicerweb.com/Locked/media/ch48/cerebral.html

18. Jonathan C. Horton and Daniel L. Adams (April, 2005). “The cortical column: a structure without a

function”. Philos Trans R Soc Lond B Biol Sci; 360(1456): 837–862.

19. Daniel Y. Ts‟o, Mark Zarella and Guy Burkitt (2009). “Whither the hypercolumn?”

doi:10.1113/jphysiol.2009.171082 June 15,The Journal of Physiology, 587, 2791-2805

20. Daniel P. Buxhoeeveden and Manual F. Casanova (2002). “The minicolumn hypothesis in

neuroscience”. Brain, 125 (5): 935-951. doi: 10.1093/brain/awf110

21. Baran Çürüklü and Anders Lansner (2002). “An Abstract Model of a Cortical Hypercolumn”. In

proceedings of the 9th International Conference on Neural Information Processing (ICONIP), pp. 80–

85, Singapore, IEEE Press.

22. http://imueos.blogspot.com/2010/10/organization-of-cerebral-cortex.html

23. Hebb, D.O., 1949: “The Organization of Behavior”. New York: John Wiley Inc

24. Anders Lansner and Erik Fransén (1998) – “Attractor Network Models of Cortical Associative

Memory”. Biocomputing and emergent computation: Proceedings of BCEC97,World Scientific press

25. Christopher Johansson, Anders Lansner and Erik Fransén (2002): Cell Assembly Dynamics in

Detailed and Abstract Attractor Models of Cortical Associative Memory. (SANS) Volume 122, Issue

1, Pages 19-36, doi:10.1078/1431-7613-00072, Published by Elsevier GmbH.

26. Simon Haykin (1999): Neural Networks- A Comprehensive Foundation. (2nd

Edition) Prientice-Hall

Inc.

27. Schoenauer Marc, Equipe Evolution Artificielle et Apprentissage de l‟x, Université Paris Sud,

(September 2007),( http://www.lri.fr/~marc/EEAAX/Neurones/tutorial/aneuron/html/index.html)

28. Hopfield J.J., (April 1982): “Neural networks and physical systems with emergent collective

computational properties”, PNAS vol. 79, no.8, pp. 2554-2558.

29. S. Haykin. Neural Networks - A comprehensive foundation. Prentice Hall International, Inc,

2nd edition, 1999. ISBN 0 13 908385 5.

30. Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969). Non-holographic associative

memory. Nature, 222(5197), 960-962.

31. Palm, G. (1980). On associative memory. Biological Cybernetics, 36(1), p.19-31.

32. Farooq Azam, (May-2000) –“Biologically Inspired Modular Neural Networks” (Doctoral Thesis).

p.12-32. Publisher: Citeseer, Pages: 149

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

46

33. Christopher Johansson and Anders Lansner (2006) – “Mapping of the BCPNN onto Cluster

Computers”. ISSN 1101-2250. TRITA-NA-P0305. SANS.

34. Christopher Johansson and Anders Lansner (2006)- “Attractor Memory with Self-organizing Input”.

A.J. Ijspeert et al. (Eds.): BioADIT, LNCS 3853.p-265-280.

35. A. Sandberg, A. Lansner, K.M. Peterson, and O. Ekeberg. “A palimpsest memory based on an

incremental learning rule”. Neurocomputing, pages 997–994, 2000.

36. MPI: “A Messaging Passing Interface Standard”, 1995, University of Tennessee, Knoxville,

Tennessee.

37. http://blogs.ubc.ca/psyc207/2011/01/19/neurons-how-do-they-communicate/

38. http://bio1152.nicerweb.com/Locked/media/ch48/cerebral.html

39. http://andreeasanatomy.blogspot.com/2011/04/you-need-to-step-up-on-step-to-reach_23.html

40. IBM System Blue Gene Solution: Application Development- An IBM Redbooks publication

41. Paul Burton (February 2009);An introduction to MPI programming, Organizer: ECMWF

42. Blaise Barney;” Message Passing Interface (MPI)”, , Lawrence Livermore National Laboratory UCRL-MI-133316

43. PDC center for high performance computing- MPI

(http://www.pdc.kth.se/education/historical/previous-years-summer-

schools/2009/handouts/lect2.pdf/view)

44. William Gropp, Ewing Lusk, and Anthony Skjellum (1991); “Using MPI: Portable Parallel

Programming with the Message-Passing Interface”, Published: MIT Press, ISBN 0-262-57133-1

45. M Snir, SW Otto, S Huss-Lederman, DW Walker, J (1998) MPI—The Complete Reference: Volume 1,

The MPI Core. MIT Press, Cambridge, MA. ISBN 0-262-69215-5

46. Argo Beowulf Cluster: MPI Commands and Examples, Organizer: ACCC Systems Group.(modified

2010-1-28)

47. Overview of Intro to MPI class-Organizer: Dartmouth College( modified February 14, 2011)

48. William Gropp-Tutorial on MPI: The Message-Passing Interface; Mathematics and Computer

Science Division ,Argonne National Laboratory

49. Scientific Computation , General Online Tutorials- The Message Passing Interface (MPI) Workshop;

University of Minnesota, Supercomputing Institute

50. MPI ,C++ Examples (http://people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html)

51. Cristopher Johansson, Orjan Ekeberg, Anders Lansner (2006)-“Clustering of stored memories in an

attractor network with local competition ” International Journal of Neural Systems, 16(6): 393-40.

52. C++ Language Tutorial

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

47

53. The Blue Gene/L Team and IBM and Lawrence Livermore National Laboratory (2002): “An

Overview of the Blue Gene/L supercomputer” 0-7695-1524-X/02 $17.00 (c) 2002 IEEE.

54. From Wikipedia, the free encyclopedia, Blue Gene.

55. http://www.pdc.kth.se/resources/computers/bluegene/hebb-description

56. Blaise Barney, Lawrence Livermore National Laboratory, tutorial , „Using the Dawn BG/P System’,

LLNL-WEB-412512

57. From Wikipedia, the free encyclopedia, JUGENE.

58. J .P. Nadal, G. Toulou, et al. (1986): “Networks of Formal Neurons and Memory Palimpsests”.

Europhysics Letters ,1(10): 535-542

59. Christopher Johansson, Martin Rehn and Anders Lansner (2005 Elsevier); “Attractor Neural Networks

with Patchy Connectivity”. Neurocomputing 69 (2006) 627-633. ISBN: 0925-2312

60. Richard B. Well, April, 2005,” Cortical Neurons and Circuits: A Tutorial Introduction”. LCNTR

Tech Brief Moscow ID, the University of Idaho.

61. Richard H. Granger and Robert A. Hearn (2007), “Model of the thalamocortical System”

Scholarpedia, 2(11):1796.

62. http://www.dermaestetica.es/avanzes/blood.php?q=the-main-parts-of-the-brain&page=3

63. Hossein Bidgoli, 2010, “The Handbook of Technology Management”, Vol-3, Hoboken, N.J.: Wiley,

c2010.p-548.

64. Bechtel, W., & Abrahamsen, A. (2002).” Connectionism and the mind”. Second edition, Oxford, UK:

Blackwell.

65. H. Siegelman and S. Fishman. “Attractor systems and analog computation”. In Second Int. Conf. on

Knowledge-Based Intelligent Systems, pages 237–242, April 1998.

66. http://en.wikipedia.org/wiki/Neocortex

67. http://en.wikipedia.org/wiki/Cortical_column

68. Christopher Johansson and Anders Lansner (2005), "A Mean Field Approximation of

BCPNN",TRITA-NA-P0506, Department of Numerical Analysis and Computer Science, Royal

Institute of Technology

69. John A. Hertz, Anders S. Krogh and Richard G. Palmer (1991), “Introduction to the theory of neural

computation”, Addison-Wesely, Elsevier Science Publishers, ISBN 0-201-51560-1. (Library of

Congress: QA76.5.H475).

70. http://www.pdc.kth.se/resources/computers/lindgren

71. http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=1430648&highlight=

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

48

Appendix A

A.1 Comparison communication

Table of Elapsed time (s)

Number of processors Point-to-point Communication (sec)

Collective Communication (sec)

128 51.78 32.74

256 65.39 50.88

512 70.03 54.43

1024 73.58 60.31

2048 77.82 76.29

Table of Elapsed time per iteration (Loop Time)

Number of processors Point-to-point Communication (sec)

Collective Communication (sec)

128 0.000258899 0.000163699

256 0.000326948 0.000254399

512 0.000350148 0.000272149

1024 0.0003679 0.000301549

2048 0.000389098 0.000381454

Table of spike sent per processors (in bytes)

Number of processors Point-to-point Communication (bytes)

Collective Communication (bytes)

128 526992 670480

256 530417 1.35358 e+06

512 531189 2.71556 e+06

1024 532934 5.4196 e+06

2048 533519 1.0919 e+07

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

49

Table of elapsed simulated second

Number of processors Point-to-point Communication (sec)

Collective Communication (sec)

128 669284 670480

256 1.35256 e+06 1.35358 e+06

512 2.71436 e+06 2.71556 e+06

1024 5.45191 e+06 5.4196 e+06

2048 1.09211 e+07 1.0919 e+07

Table of elapsed real second

Number of processors Point-to-point Communication (sec)

Collective Communication (sec)

128 1.29255 e+06 2.04789 e+06

256 2.06844 e+06 2.66035 e+06

512 3.87599 e+06 4.98909 e+06

1024 7.4095 e+06 9.03989 e+06

2048 1.40338 e+07 1.43126 e+07

A.2 Point to point communication (with and without delay)

Table of Elapsed time (s)

Number of processors Without time delay (sec) With time delay (sec)

128 51.78 70.2274

256 65.39 85.8835

512 70.03 93.0576

1024 73.58 98.954

2048 77.82 106.24

Table of Elapsed time per iteration (Loop Time)

Number of processors Without time delay (sec) With time delay (sec)

128 0.000258899 0.000351135

256 0.000326948 0.000429416

512 0.000350148 0.000465285

1024 0.0003679 0.000494772

2048 0.000389098 0.000531198

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

50

Table of spike sent per processors

Number of processors Without time delay With time delay

128 131749 131739

256 132604 132557

512 132796 132725

1024 133233 133190

2048 133379 133335

Table of elapsed simulated second

Number of processors Without time delay (sec) With time delay (sec)

128 669284 669236

256 1.35256 e+06 1.35208 e+06

512 2.71436 e+06 2.71291 e+06

1024 5.45191 e+06 5.45015 e+06

2048 1.09211 e+07 1.09175 e+07

Table of elapsed real second

Number of processors Without time delay (sec) With time delay (sec)

128 1.29255 e+06 952956

256 2.06844 e+06 1.57432 e+06

512 3.87599 e+06 2.9153 e+06

1024 7.4095 e+06 5.50775 e+06

2048 1.40338 e+07 1.02762 e+07

Implementation of conduction delay and collective communication in a parallel spiking neural network simulator.

----------------------------------------------------------------------------------------------------------------------------- ---

51

-END-

TRITA-CSC-E 2011:134 ISRN-KTH/CSC/E--11/134-SE

ISSN-1653-5715

www.kth.se