reinforcement learning of 2-joint virtual arm reaching in motor cortex simulation

2
POSTER PRESENTATION Open Access Reinforcement learning of 2-joint virtual arm reaching in motor cortex simulation Samuel A Neymotin 1* , George L Chadderdon 1 , Cliff C Kerr 1,2 , Joseph T Francis 1 , William W Lytton 1,3 From Twenty First Annual Computational Neuroscience Meeting: CNS*2012 Decatur, GA, USA. 21-26 July 2012 Few attempts have been made to model learning of sen- sory-motor control using spiking neural units. We trained a 2-degree-of-freedom virtual arm to reach for a target using a spiking-neuron model of motor cortex that maps proprioceptive representations of limb position to motor commands and undergoes learning based on reinforce- ment mechanisms suggested by the dopaminergic reward system. A 2-layer model of layer 5 motor cortex (M1) passed motor commands to the virtual arm and received proprioceptive position information from it. The reinfor- cement algorithm trained synapses of M1 using reward (punishment) signals based on visual perception of decreasing (increasing) distance of the virtual hand from the target. Output M1 units were partially driven by noise, creating stochastic movements that were shaped to achieve desired outcomes. The virtual arm consisted of a shoulder joint, upper arm, elbow joint, and forearm. The upper- and forearm were each controlled by a pair of flexor/extensor muscles. These muscles received rotational commands from 192 output cells of the M1 model, while the M1 model received input from muscle-specific groups of sensory cells, each of which were tuned to fire over a range of muscle lengths. The M1 model had 384 excitatory and 192 inhibitory event-based integrate-and-fire neurons, with AMPA/NMDA and GABA synapses. Excitatory and inhibitory units were interconnected probabilistically. Plasticity was enabled in the feedforward connections between input and output excitatory units. Poisson noise was added to the output units for driving stochastic movements. The reinforcement learning (RL) algorithm used eligibility traces for synaptic credit/blame assign- ment, and a global signal (+1=reward, -1=punishment) corresponding to dopaminergic bursting/dipping. Eligibility traces were spike-timing-dependent, with pre- before-post spiking required. Reward (punishment) was delivered when the distance between the hand and target decreased (increased) [1]. RL learning occurred over 100 training sessions with the arm starting at 15 different initial positions. Each sub- session consisted of 15 s of RL training from a specific starting position. After training, the network was tested for its ability to reach the arm to target from each starting position, over the course of a 15 s trial. Compared to the naive network, the network post-training was able to reach the target from all starting positions. This was most clearly pronounced when the arm started at a large dis- tance from the target. After reaching the target, the hand tended to oscillate around the target. Learning was most effective when recurrent connectivity in the output units was turned off or at low levels. Best overall performance was achieved with no recurrent connectivity and moderate maximal weights. Although learning typically increased average synaptic weight gains in the input-to-output M1 connections, there were frequent reductions in weights as well. Our model predicts that optimal motor performance is sensitive to perturbations in both strength and density of recurrent connectivity within motor cortex and that therefore the wiring of recurrent connectivity during development might be carefully regulated. Acknowledgements research supported by DARPA grant N66001-10-C-2008. The authors would like to thank Larry Eberle (SUNY Downstate) for Neurosim lab computer support; Michael Hines (Yale) and Ted Carnevale (Yale) for NEURON simulator support. Author details 1 SUNY Downstate Medical Center; 450 Clarkson Avenue; Brooklyn, NY 11203, USA. 2 School of Physics, University of Sydney, NSW 2006, Australia. 3 Kings County Hospital, Brooklyn, NY 11203, USA. * Correspondence: [email protected] 1 SUNY Downstate Medical Center; 450 Clarkson Avenue; Brooklyn, NY 11203, USA Full list of author information is available at the end of the article Neymotin et al. BMC Neuroscience 2012, 13(Suppl 1):P90 http://www.biomedcentral.com/1471-2202/13/S1/P90 © 2012 Neymotin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: william-w

Post on 30-Sep-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

POSTER PRESENTATION Open Access

Reinforcement learning of 2-joint virtual armreaching in motor cortex simulationSamuel A Neymotin1*, George L Chadderdon1, Cliff C Kerr1,2, Joseph T Francis1, William W Lytton1,3

From Twenty First Annual Computational Neuroscience Meeting: CNS*2012Decatur, GA, USA. 21-26 July 2012

Few attempts have been made to model learning of sen-sory-motor control using spiking neural units. We traineda 2-degree-of-freedom virtual arm to reach for a targetusing a spiking-neuron model of motor cortex that mapsproprioceptive representations of limb position to motorcommands and undergoes learning based on reinforce-ment mechanisms suggested by the dopaminergic rewardsystem. A 2-layer model of layer 5 motor cortex (M1)passed motor commands to the virtual arm and receivedproprioceptive position information from it. The reinfor-cement algorithm trained synapses of M1 using reward(punishment) signals based on visual perception ofdecreasing (increasing) distance of the virtual hand fromthe target. Output M1 units were partially driven by noise,creating stochastic movements that were shaped toachieve desired outcomes.The virtual arm consisted of a shoulder joint, upper

arm, elbow joint, and forearm. The upper- and forearmwere each controlled by a pair of flexor/extensor muscles.These muscles received rotational commands from 192output cells of the M1 model, while the M1 modelreceived input from muscle-specific groups of sensorycells, each of which were tuned to fire over a range ofmuscle lengths. The M1 model had 384 excitatory and192 inhibitory event-based integrate-and-fire neurons,with AMPA/NMDA and GABA synapses. Excitatory andinhibitory units were interconnected probabilistically.Plasticity was enabled in the feedforward connectionsbetween input and output excitatory units. Poisson noisewas added to the output units for driving stochasticmovements. The reinforcement learning (RL) algorithmused eligibility traces for synaptic credit/blame assign-ment, and a global signal (+1=reward, -1=punishment)

corresponding to dopaminergic bursting/dipping.Eligibility traces were spike-timing-dependent, with pre-before-post spiking required. Reward (punishment) wasdelivered when the distance between the hand and targetdecreased (increased) [1].RL learning occurred over 100 training sessions with the

arm starting at 15 different initial positions. Each sub-session consisted of 15 s of RL training from a specificstarting position. After training, the network was testedfor its ability to reach the arm to target from each startingposition, over the course of a 15 s trial. Compared to thenaive network, the network post-training was able toreach the target from all starting positions. This was mostclearly pronounced when the arm started at a large dis-tance from the target. After reaching the target, the handtended to oscillate around the target. Learning was mosteffective when recurrent connectivity in the output unitswas turned off or at low levels. Best overall performancewas achieved with no recurrent connectivity and moderatemaximal weights. Although learning typically increasedaverage synaptic weight gains in the input-to-output M1connections, there were frequent reductions in weights aswell. Our model predicts that optimal motor performanceis sensitive to perturbations in both strength and densityof recurrent connectivity within motor cortex and thattherefore the wiring of recurrent connectivity duringdevelopment might be carefully regulated.

Acknowledgementsresearch supported by DARPA grant N66001-10-C-2008.The authors would like to thank Larry Eberle (SUNY Downstate) forNeurosim lab computer support; Michael Hines (Yale) and Ted Carnevale(Yale) for NEURON simulator support.

Author details1SUNY Downstate Medical Center; 450 Clarkson Avenue; Brooklyn, NY 11203,USA. 2School of Physics, University of Sydney, NSW 2006, Australia. 3KingsCounty Hospital, Brooklyn, NY 11203, USA.

* Correspondence: [email protected] Downstate Medical Center; 450 Clarkson Avenue; Brooklyn, NY 11203,USAFull list of author information is available at the end of the article

Neymotin et al. BMC Neuroscience 2012, 13(Suppl 1):P90http://www.biomedcentral.com/1471-2202/13/S1/P90

© 2012 Neymotin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Published: 16 July 2012

Reference1. Chadderdon GL, Neymotin SA, Kerr CC, Francis JT, Lytton WW: Dopamine-

based reinforcement learning of virtual arm reaching task in a spikingmodel of motor cortex. International Conference on Cognitive and NeuralSystems 16 Boston, MA.

doi:10.1186/1471-2202-13-S1-P90Cite this article as: Neymotin et al.: Reinforcement learning of 2-jointvirtual arm reaching in motor cortex simulation. BMC Neuroscience 201213(Suppl 1):P90.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Neymotin et al. BMC Neuroscience 2012, 13(Suppl 1):P90http://www.biomedcentral.com/1471-2202/13/S1/P90

Page 2 of 2