ijcnn, july 27, 2004 extending spikeprop benjamin schrauwen jan van campenhout ghent university...
DESCRIPTION
IJCNN, July 27, Introduction ● Spiking neural networks get increased attention: ● Biologically more plausible ● Computationally stronger (W. Maass) ● Compact and fast implementation in hardware possible (analogue and digital) ● Have temporal nature ● Main problem: supervised learning algorithmsTRANSCRIPT
IJCNN, July 27, 2004 [email protected] 1
Extending SpikeProp
Benjamin SchrauwenJan Van Campenhout
Ghent UniversityBelgium
IJCNN, July 27, 2004 [email protected] 2
Overview
● Introduction● SpikeProp● Improvements● Results● Conclusions
IJCNN, July 27, 2004 [email protected] 3
Introduction
● Spiking neural networks get increased attention:
● Biologically more plausible● Computationally stronger (W. Maass)● Compact and fast implementation in hardware
possible (analogue and digital)● Have temporal nature
● Main problem: supervised learning algorithms
IJCNN, July 27, 2004 [email protected] 4
SpikeProp
● Introduced by S. Bohte et al. in 2000● An error-backpropagation learning algorithm● Only for SNN using “time-to-first-spike”
coding
t
~1/a
IJCNN, July 27, 2004 [email protected] 5
Architecture of SpikeProp
● Originally introduced by Natschläger and Ruf● Every connection consists of several synaptic
connections● All 16 synaptic connections have enumerated delays (1-
16ms) and different weights, originally same filter
IJCNN, July 27, 2004 [email protected] 6
SRM neuron
● Modified Spike Response Model (Gerstner)
t
Neuron reset ofno interest because
only one spike needed !
IJCNN, July 27, 2004 [email protected] 7
Idea behind SpikeProp
Minimize SSE between actual output spike time and desired output spike time
Change weight along negative direction of the gradient
IJCNN, July 27, 2004 [email protected] 8
Math of SpikeProp
Only output layer given
Linearise around thresholdcrossing time
IJCNN, July 27, 2004 [email protected] 9
Problems with SpikeProp
● Overdetermined architecture● Tendency to get stuck when a neuron stops
firing● Problems with weight initialisation
IJCNN, July 27, 2004 [email protected] 10
Solving some of the problems
● Instead of enumerating parameters: learn them
● Delays● Synaptic time constants● Thresholds
● We can use much more limited architecture● Add specific mechanism to keep neurons
firing: decrease threshold
IJCNN, July 27, 2004 [email protected] 11
Learn more parameters
● Quite similar to weight update rule● Gradient of error with respect to parameter● Parameter specific learning rate
IJCNN, July 27, 2004 [email protected] 12
Math of the improvements - delays
Delta is the same as for weight rule,thus different delta formula for outputas for inner layers.
IJCNN, July 27, 2004 [email protected] 13
Math of the improvements – synaptic time constants
IJCNN, July 27, 2004 [email protected] 14
Math of the improvements - thresholds
IJCNN, July 27, 2004 [email protected] 15
What if training gets stuck?
● If one of the neurons in the network stops firing: training rule stops working
● Solution: actively lower threshold of neuron whenever it stops firing (multiply by 0.9)
● Same as scaling all the weights up● Improves convergence
IJCNN, July 27, 2004 [email protected] 16
What about weight initialisation
● Weight initialisation is a difficult problem● Original publication has vague description of process● S. M. Moore contacted S. Bohte personally for
clarifying the subject for his masters thesis● Weight initialisation is done by a complex procedure● Moore concluded that: ”weights should be initialized in
such a way that every neuron initially fires, and that its membrane potential doesn’t surpass the threshold too much”
IJCNN, July 27, 2004 [email protected] 17
What about weight initialisation
● In this publication we chose a very simple initialisation procedure
● Initialise all weights randomly● Afterwards, set a weight such that the sum of all
weights is equal to 1.5● Convergence rates could be increased by
using more complex initialisation procedure
IJCNN, July 27, 2004 [email protected] 18
Problem with large delays
• During the testing of the algorithm a problem arose when the trained delays got very large: delay learning stopped
• If input is preceded by output: problem• Solved by constraining delays
Output of neuron Input of neuron
IJCNN, July 27, 2004 [email protected] 19
Results
● Tested for binary XOR (MSE = 1ms)• Bohte:
• 3-5-1 architecture• 16 synaptic terminals• 20*16 = 320 weights• 250 training cycles
• Improvements:• 3-5-1 architecture• 2 synaptic terminals• 20*2 = 40 weights• 130 training cycles• 90% convergence
• 3-3-1 architecture• 2 synaptic terminals• 12*2 = 24 weights• 320 training cycles• 60% convergence
IJCNN, July 27, 2004 [email protected] 20
Results
● Optimal learning rates (found by experiment):● ● ● ●
● Some rates seem very high, but that is because the values we work with are times expressed in ms
● Idea that learning rate must be approx. 0.1 is only correct when input and weights are normalised !!
IJCNN, July 27, 2004 [email protected] 21
Conclusions
● Because parameters can be learned, no enumeration is necesarry, thus architectures are much smaller
● For XOR: ● 8 times less weights needed● Learning converges faster (50% of original)● No complex initialisation functions● Positive and negative weights can be mixed● But convergence deteriorate with further reduction
of weights
IJCNN, July 27, 2004 [email protected] 22
Conclusions
● Technique only tested on small problem, should be tested on real world applications
● But, we are currently preparing a journal paper on a new backprop rule that:
● supports a multitude of coding hypotheses (population coding, convolution coding, ...)
● better convergence● simpler weight initialisation● ...